Nuts & Bolts of DataStage: Development/Debug Stage in DataStage

Head stage

The Head Stage is a Development/Debug stage. It can have a single input link and a single output link.

It is one of a number of stages that InfoSphere DataStage provides to help you sample data

The Head Stage selects the first N rows from each partition of an input data set and copies the selected

rows to an output data set. You determine which rows are copied by setting properties which allow you

to specify:

· The number of rows to copy

· The partition from which the rows are copied

· The location of the rows to copy

· The number of rows to skip before the copying operation begins

This stage is helpful in testing and debugging applications with large data sets. For example, the Partition

property lets you see data from a single partition to determine if the data is being partitioned as you

want it to be. The Skip property lets you access a certain portion of a data set.

Tail stage

The Tail Stage is a Development/Debug stage. It can have a single input link and a single output link. It

is one of a number of stages that InfoSphere DataStage provides to help you sample data.

The Tail Stage selects the last N records from each partition of an input data set and copies the selected

records to an output data set. You determine which records are copied by setting properties which allow

you to specify:

· The number of records to copy

· The partition from which the records are copied

This stage is helpful in testing and debugging applications with large data sets. For example, the Partition

property lets you see data from a single partition to determine if the data is being partitioned as you want it to be. The Skip property lets you access a certain portion of a data set.

Sample stage

The Sample stage is a Development/Debug stage. It can have a single input link and any number of

output links when operationg in percent mode, or a single input and single output link when operating

in period mode. It is one of a number of stages that InfoSphere DataStage provides to help you sample

data.

The Sample stage samples an input data set. It operates in two modes. In Percent mode, it extracts rows,

selecting them by means of a random number generator, and writes a given percentage of these to each

output data set. You specify the number of output data sets, the percentage written to each, and a seed

value to start the random number generator. You can reproduce a given distribution by repeating the

same number of outputs.

In Period mode, it extracts every Nth row from each partition, where N is the period, which you supply.

In this case all rows will be output to a single data set, so the stage used in this mode can only have a

single output link

Peek stage

The Peek stage is a Development/Debug stage. It can have a single input link and any number of output

links.

The Peek stage lets you print record column values either to the job log or to a separate output link as

the stage copies records from its input data set to one or more output data sets.

Row Generator stage

The Row Generator stage is a Development/Debug stage. It has no input links, and a single output link.

The Row Generator stage produces a set of mock data fitting the specified meta data. This is useful

where you want to test your job but have no real data available to process.

The meta data you specify on the output link determines the columns you are generating.

For decimal values the Row Generator stage uses dfloat. As a result, the generated values are subject to

the approximate nature of floating point numbers. Not all of the values in the valid range of a floating

point number are representable. The further a value is from zero, the greater the number of significant

digits, the wider the gaps between representable values.

Column Generator stage

The Column Generator stage is a Development/Debug stage. It can have a single input link and a single

output link.

The Column Generator stage adds columns to incoming data and generates mock data for these columns

for each data row processed. The new data set is then output. (See also the Row Generator stage which

allows you to generate complete sets of mock data.

Write Range Map stage

The Write Range Map stage is a Development/Debug stage. It allows you to write data to a range map.

The stage can have a single input link. It can only run in sequential mode.

The Write Range Map stage takes an input data set produced by sampling and sorting a data set and

writes it to a file in a form usable by the range partitioning method. The range partitioning method uses

the sampled and sorted data set to determine partition boundaries. .

A typical use for the Write Range Map stage would be in a job which used the Sample stage to sample a

data set, the Sort stage to sort it and the Write Range Map stage to write the range map which can then

be used with the range partitioning method to write the original data set to a file set.

njoy the simplicity.......
Atul Singh

Monday, July 02, 2012

Development/Debug Stage in DataStage