Monday, February 03, 2014
DataStage Scenario - Design 2 - job1
DataStage Scenario Problem --> DataStage Scenario - Problem2
Solution Design :
a) Job Design :
Below is the design which can achieve the output as we needed. Here, we are reading seq file as a input, then data is passing through Aggregator and Filter stage to achieve the output.
b) Aggregator Stage Properties
Input data contains only one column "No" , In Aggregator stage, we have group the data on the "No" column and calculate the rows for each Key ( No ).
When we have used the "Count Rows" aggregation type, it will generate a new column which contain the count for each Key (No). Here we have given the column name - "count" and assigned to output as below.
c) Filter Stage Properties
In Filter stage, we put 2 where condition count=1 and count>1. and assigned different output files to both conditions.
Assigned the data ( column No ) to output tab.
d) Output File
We got two output from the jobs
i) Contains where count=1 ( unique values in input )
ii) Contains where count>1 ( dups values in input )