Head
stage
The Head Stage is a
Development/Debug stage. It can have a single input link and a single output
link.
It is one of a number
of stages that InfoSphere DataStage provides to help you sample data
The Head Stage selects
the first N rows from each partition of an input data set and copies the
selected
rows to an output data
set. You determine which rows are copied by setting properties which allow you
to specify:
·
The number of rows to copy
·
The partition from which the rows are copied
·
The location of the rows to copy
·
The number of rows to skip before the copying
operation begins
This stage is helpful
in testing and debugging applications with large data sets. For example, the
Partition
property lets you see
data from a single partition to determine if the data is being partitioned as
you
want it to be. The Skip
property lets you access a certain portion of a data set.
Tail
stage
The Tail Stage is a
Development/Debug stage. It can have a single input link and a single output
link. It
is one of a number of
stages that InfoSphere DataStage provides to help you sample data.
The Tail Stage selects
the last N records from each partition of an input data set and copies the
selected
records to an output
data set. You determine which records are copied by setting properties which
allow
you to specify:
·
The number of records to copy
·
The partition from which the records are copied
This stage is helpful
in testing and debugging applications with large data sets. For example, the
Partition
property lets you see
data from a single partition to determine if the data is being partitioned as
you want it to be. The Skip property lets you access a certain portion of a
data set.
Sample
stage
The Sample stage is a
Development/Debug stage. It can have a single input link and any number of
output links when
operationg in percent mode, or a single input and single output link when
operating
in period mode. It is
one of a number of stages that InfoSphere DataStage provides to help you sample
data.
The Sample stage
samples an input data set. It operates in two modes. In Percent mode, it
extracts rows,
selecting them by means
of a random number generator, and writes a given percentage of these to each
output data set. You
specify the number of output data sets, the percentage written to each, and a
seed
value to start the
random number generator. You can reproduce a given distribution by repeating
the
same number of outputs.
In Period mode, it
extracts every Nth row from each partition, where N is the period, which you
supply.
In this case all rows
will be output to a single data set, so the stage used in this mode can only
have a
single output link
Peek
stage
The Peek stage is a
Development/Debug stage. It can have a single input link and any number of
output
links.
The Peek stage lets you
print record column values either to the job log or to a separate output link
as
the stage copies
records from its input data set to one or more output data sets.
Row
Generator stage
The Row Generator stage
is a Development/Debug stage. It has no input links, and a single output link.
The Row Generator stage
produces a set of mock data fitting the specified meta data. This is useful
where you want to test
your job but have no real data available to process.
The meta data you
specify on the output link determines the columns you are generating.
For decimal values the
Row Generator stage uses dfloat. As a result, the generated values are subject
to
the approximate nature
of floating point numbers. Not all of the values in the valid range of a
floating
point number are representable.
The further a value is from zero, the greater the number of significant
digits, the wider the
gaps between representable values.
Column
Generator stage
The Column Generator
stage is a Development/Debug stage. It can have a single input link and a
single
output link.
The Column Generator
stage adds columns to incoming data and generates mock data for these columns
for each data row
processed. The new data set is then output. (See also the Row Generator stage
which
allows you to generate
complete sets of mock data.
Write
Range Map stage
The Write Range Map
stage is a Development/Debug stage. It allows you to write data to a range map.
The stage can have a
single input link. It can only run in sequential mode.
The Write Range Map
stage takes an input data set produced by sampling and sorting a data set and
writes it to a file in
a form usable by the range partitioning method. The range partitioning method
uses
the sampled and sorted
data set to determine partition boundaries. .
A typical use for the
Write Range Map stage would be in a job which used the Sample stage to sample a
data set, the Sort
stage to sort it and the Write Range Map stage to write the range map which can
then
be used with the range
partitioning method to write the original data set to a file set.
njoy the simplicity.......
Atul Singh