We have moved to www.dataGenX.net, Keep Learning with us.

Wednesday, April 30, 2014

How to Create Custom Stages in Datastage Parallel Jobs ?


In addition to the wide range of parallel stage types available, the Designer allows you to define your own stage types, which you can then use in parallel jobs.
For creating custom stage in datastage which will functions as per our requirement and we will be able to modify that stage as we may want. Here is some information on custom stage functionality in datastage in parallel jobs and how can we create our own wrapped custom stage.


Performance Tunings in DataStage


JOB LEVEL

  • Parametrize all the inputs needed for the jobs; avoid hard coding of parameters like username, password, directory paths etc.
  • By using the environment variable ‘APT_CONFIG_FILE’ as a job parameter, user can dynamically change the number of nodes being used for processing a particular job.
  • For reading/writing data from large tables/files, make use of the environment variable ‘APT_BUFFER_MAXIMUM_MEMORY’. It can be used to change the memory buffer size being used for each stage.
  • It is recommended to set the environment variable $APT_DUMP_SCORE to a value of 1. When this environment variable is set, an entry is placed in the WebSphere DataStage job log showing the actual runtime structure (processes, their associated internal operators, datasets, nodes, etc) used to execute the job flow.

Friday, April 25, 2014

SQL Best Practices - Part1


Never put SQL scalar functions on columns in the WHERE predicates.
For example: WHERE YEAR(HIREDATE) = 2009 should be recoded as WHERE HIREDATE BETWEEN '2009-01-01'and '2009-12-31'.
When placing SQL scalar functions on columns in the SELECT portion of the SQL statement does incur some minimal overhead, but applying it to a column in the WHERE clause causes the predicate to become stage to non indexable.

Same applies for mathematics.
For example: WHERE HIREDATE – 7 DAYS > :HV-DATE should be recoded as WHERE HIREDATE > :HV_DATE + 7 days. Having the mathematics on the host variable is not a problem.

Thursday, April 24, 2014

DataStage Scenario - Design5 - job1



DataStage Scenario Problem -->  DataStage Scenario - Problem5
  
Solution Design :

a) Job Design :

Below is the design which can achieve the output as we needed. Here, we are reading seq file as a input, then data is passing through a Sort and Transformer stage to achieve the output.


Tuesday, April 15, 2014

DataStage Scenario - Design4 - job1


DataStage Scenario Problem -->  DataStage Scenario - Problem4

Solution Design :

a) Job Design :

Below is the design which can achieve the output as we needed. Here, we are reading seq file as a input, then data is passing through a Sort and Transformer stage to achieve the output.


Monday, April 14, 2014

DataStage Scenario - Design3 - job1


DataStage Scenario Problem -->  DataStage Scenario - Problem3

Solution Design :

a) Job Design :


Below is the design which can achieve the output as we needed. Here, we are reading seq file as a input, then data is passing through a Sort and Transformer stage to achieve the output.

Wednesday, April 09, 2014

Some more design tips for DataStage Job Development


• Common information like home directory, system date, username, password should be initialized in a global variable and then variable should be referred everywhere.

• Stage Variables allow you to hold data from a previous record when the next record, allowing you to compare between previous and current records. Stage variables also allow you return multiple errors for a record of information. By being able to evaluate all data in a record and not just error on the first exception that is found, the cleanup of data is more efficient and requires less iteration.


Tuesday, April 08, 2014

Setting up "CRON" Jobs in Nix



If you want to use the emacs editor for editing cron jobs, then, set the following in your "/home/user/.bash_profile"

EDITOR=emacs

Then, to edit cron jobs

$ crontab -e

You may want to put in the following header

Friday, April 04, 2014

DataStage Naming Standard


For maintaining the design and easy understanding of what happening inside the job, we are giving meaningful name to stages, links, jobs and sequencer. So that other developer/designer can have a idea about purpose of job/seq by seeing the design.

Here I have shared some DataStage Naming Standard we usually follow.



Thursday, April 03, 2014

DataStage error upon login: DSR.ADMIN: Failed to add user to UV_USERS file


When user first logs in, they get following error message:
DSR.ADMIN: Failed to add user to UV_USERS file.
ACCESS DENIED: INTERNAL SQL ERROR FAILED TO REGISTER THIS USER FOR SQL.