Nuts & Bolts of DataStage: uniq : a cool nix filter

Friday, November 02, 2012

uniq : a cool nix filter

Hi guys,

Yesterday, I was sitting on my desk and doing usual boaring task ;-). Then I got a seq file containing more than 3 millions records and 5 column. What I have to do is fetch the duplicates records based on first column. It's quite easy in Nix environment with out taking help of any tool.

So i thought, I should share this to you also. Here It comes.....

Be warned that it is a bad idea to use uniq or any other tool to remove duplicate lines from files containing financial or other important data. In such cases, a duplicate line almost always means another transaction for the same amount, and removing it would cause a lot of trouble for the accounting department. Do not do it! Whatever you want to do, please keep a copy of original.

Usually 'uniq' is used with 'sort'. Here we will work only in 'uniq' command.

Lets have a look on the example file.