Nuts & Bolts of DataStage: Some more design tips for DataStage Job Development

Wednesday, April 09, 2014

Some more design tips for DataStage Job Development

• Common information like home directory, system date, username, password should be initialized in a global variable and then variable should be referred everywhere.

• Stage Variables allow you to hold data from a previous record when the next record, allowing you to compare between previous and current records. Stage variables also allow you return multiple errors for a record of information. By being able to evaluate all data in a record and not just error on the first exception that is found, the cleanup of data is more efficient and requires less iteration.

• Nulls are a curse when it comes to using functions/routines or normal equality type expressions.
E.g. NULL = NULL doesn’t work; neither does concatenation when one of the fields is null. Changing the nulls to 0 or “” before performing operations is recommended to avoid erroneous outcomes.

• Ensure that job does not look complex. If there are more stages (more than 10) in a job divide into two or more jobs on functional basis.

• Use containers where stages in the jobs can be grouped together.

• Use Annotations for describing steps done at stages. Use Description Annotation as job title; as Description Annotation also appears in Job properties>Short Job Description and also in the Job Report when generated.

• When using String functions on decimal always use Trim function to avoid as String functions interpret an extra Space used for sign in decimal.

• When you need to get a substring (e.g. first 2 characters from the left) of a character field:
Use <Field Name>[1,2]
Similarly for a decimal field then:
Use Trim(<Field Name>)[1,2]

• Always use Hash Partition in Join and Aggregator stages. The hash key should be the same as the key used to join/aggregate.
If Join/Aggregator stages do not produce desirable results, try running in
sequential mode (verify results; if still incorrect problem is with data/logic) and then run in parallel using Hash partition.

• Use Column Generator stage to create sequence numbers or adding columns having hard coded values.

• In Job sequences; always use “Reset if required, then run” option in Job Activity stages. (Note: This is not a default option)

• When mapping a decimal field to a char field or vice versa , it is always better to convert the value in the field using the ‘Type Conversion’ functions “DecimalToString” or “StringToDecimal” as applicable while mapping.

• “Clean-up on failure” property in sequential files must be enabled (enabled by default)

for more -->

Nuts & Bolts of DataStage

Wednesday, April 09, 2014

Some more design tips for DataStage Job Development

No comments :

Post a Comment