Monday, February 03, 2014

DataStage Scenario - Design 2 - job1

 DataStage Scenario Problem -->  DataStage Scenario - Problem2

Solution Design :

a) Job Design :

Below is the design which can achieve the output as we needed. Here, we are reading seq file as a input, then data is passing through Aggregator and Filter stage to achieve the output.

b) Aggregator Stage Properties

Input data contains only one column "No" , In Aggregator stage, we have group the data on the "No" column and calculate the rows for each Key ( No ). 

When we have used the "Count Rows" aggregation type, it will generate a new column which contain the count for each Key (No). Here we have given the column name - "count" and assigned to output as below.

c) Filter Stage Properties

In Filter stage, we put 2 where condition  count=1 and count>1. and assigned different  output files to both conditions.

Assigned the data ( column No ) to output tab.

d) Output File

We got two output from the jobs

i) Contains where count=1 ( unique values in input )
ii) Contains where count>1 ( dups values in input )

