Nuts & Bolts of DataStage: lookup

Showing posts with label lookup. Show all posts

Wednesday, October 01, 2014

Difference Between Normal Lookup and Sparse Lookup

Normal Lookup :-

Normal Lookup data needs to be in memory
Normal might provide poor performance if the reference data is huge as it has to put all the data in memory.
Normal Lookup can have more than one reference link.
Normal lookup can be used with any database

Performance Tunings in DataStage

JOB LEVEL

Parametrize all the inputs needed for the jobs; avoid hard coding of parameters like username, password, directory paths etc.
By using the environment variable ‘APT_CONFIG_FILE’ as a job parameter, user can dynamically change the number of nodes being used for processing a particular job.
For reading/writing data from large tables/files, make use of the environment variable ‘APT_BUFFER_MAXIMUM_MEMORY’. It can be used to change the memory buffer size being used for each stage.
It is recommended to set the environment variable $APT_DUMP_SCORE to a value of 1. When this environment variable is set, an entry is placed in the WebSphere DataStage job log showing the actual runtime structure (processes, their associated internal operators, datasets, nodes, etc) used to execute the job flow.

Datastage Coding Checklist

Ensure that the null handling properties are taken care for all the nullable fields. Do not set the null field value to some value which may be present in the source.
Ensure that all the character fields are trimmed before any processing. Normally extra spaces in the data may lead to some errors like lookup mismatch which are hard to detect.
Always save the metadata (for source, target or lookup definitions) in the repository to ensure re usability and consistency.

Datastage Common Errors and Solutions

1. While running ./NodeAgents.sh start command... getting the following error: “LoggingAgent.sh process stopped unexpectedly”

SOL: needs to kill LoggingAgentSocketImpl

Ps –ef | grep LoggingAgentSocketImpl (OR)

PS –ef | grep Agent (to check the process id of the above)

2. Warning: A sequential operator cannot preserve the partitioning of input data set on input port 0

SOL: Clear the preserve partition flag before Sequential file stages.

Interview Questions : DataStage - self-2

48    Why can’t we use sequential file as a lookup?
49    What is data warehouse?
50    What is ‘Star-Schema’?
51    What is ‘Snowflake-Schema’?
52    What is difference between Star-Schema and Snowflake-Schema?
53    What is mean by surrogate key?
54    What is ‘Conformed Dimension’?

Why Entire partition is used in LOOKUP stage ?

Entire partition has all data across the nodes So while matching(in lookup) the records all data should be present across all nodes.

What can you delete to free up disk space in IBM InfoSphere Information Server

What can you delete to free up disk space in IBM InfoSphere Information Server when disks are becoming full?

What to do ????

Here are some things you can do to clean up space:

Clear the &PH& file in the Project directory. There is a &PH& directory in each DataStage project directory, which contains information about active stages that is used for diagnostic purposes. The &PH& directory is added to every time a job is run, and needs periodic cleaning out.

Deleting temporary lookuptable files in IBM InfoSphere DataStage

When a DataStage job with a lookup stage aborts, there may be lookuptable files left in the resource directories and they will consume space. The filenames are similar to "lookuptable.20091210.513biba"

When a job aborts it leaves the temporary files for postmortem review in the resource directories. Usually that is done in scratch, however, for lookup files, they are created in resource. Lookup filesets will not go away, just like regular datasets.

Nuts & Bolts of DataStage

Wednesday, October 01, 2014

Difference Between Normal Lookup and Sparse Lookup

Wednesday, April 30, 2014

Performance Tunings in DataStage

Monday, February 17, 2014

Datastage Coding Checklist

Friday, January 10, 2014

Datastage Common Errors and Solutions

Wednesday, November 13, 2013

Interview Questions : DataStage - self-2

Thursday, November 07, 2013

Why Entire partition is used in LOOKUP stage ?

Wednesday, September 11, 2013

What can you delete to free up disk space in IBM InfoSphere Information Server

What to do ????

Monday, September 09, 2013

Deleting temporary lookuptable files in IBM InfoSphere DataStage