Nuts & Bolts of DataStage: merge

Wednesday, April 30, 2014

Performance Tunings in DataStage

JOB LEVEL

Parametrize all the inputs needed for the jobs; avoid hard coding of parameters like username, password, directory paths etc.
By using the environment variable ‘APT_CONFIG_FILE’ as a job parameter, user can dynamically change the number of nodes being used for processing a particular job.
For reading/writing data from large tables/files, make use of the environment variable ‘APT_BUFFER_MAXIMUM_MEMORY’. It can be used to change the memory buffer size being used for each stage.
It is recommended to set the environment variable $APT_DUMP_SCORE to a value of 1. When this environment variable is set, an entry is placed in the WebSphere DataStage job log showing the actual runtime structure (processes, their associated internal operators, datasets, nodes, etc) used to execute the job flow.

Wednesday, November 13, 2013

Interview Questions : DataStage - self-2

48    Why can’t we use sequential file as a lookup?
49    What is data warehouse?
50    What is ‘Star-Schema’?
51    What is ‘Snowflake-Schema’?
52    What is difference between Star-Schema and Snowflake-Schema?
53    What is mean by surrogate key?
54    What is ‘Conformed Dimension’?