Nuts & Bolts of DataStage: parameter

Showing posts with label parameter. Show all posts

Monday, June 23, 2014

Purpose of Parameters in ETL

In an ETL process, a parameter represents an attribute that is not hard coded or sourced from the transactional system. Parameters provide flexibility so that you can adapt ETL processes to fit your business requirements.
or in Simple Words, A job parameter is a way to change a property within a job without having to alter and recompile it. These are input values to job/process which can be changed time to time as per need.

Wednesday, April 30, 2014

Performance Tunings in DataStage

JOB LEVEL

Parametrize all the inputs needed for the jobs; avoid hard coding of parameters like username, password, directory paths etc.
By using the environment variable ‘APT_CONFIG_FILE’ as a job parameter, user can dynamically change the number of nodes being used for processing a particular job.
For reading/writing data from large tables/files, make use of the environment variable ‘APT_BUFFER_MAXIMUM_MEMORY’. It can be used to change the memory buffer size being used for each stage.
It is recommended to set the environment variable $APT_DUMP_SCORE to a value of 1. When this environment variable is set, an entry is placed in the WebSphere DataStage job log showing the actual runtime structure (processes, their associated internal operators, datasets, nodes, etc) used to execute the job flow.

Oracle Interview Questions - Part-4

76. What is the OPTIMAL parameter?
It is used to set the optimal length of a rollback segment.

77. What is the functionality of SYSTEM table space?
To manage the database level transactions such as modifications of the data dictionary table that record information about the free space usage.

78. How will you create multiple rollback segments in a database?
- Create a database, which implicitly creates a SYSTEM rollback segment in a SYSTEM tablespace.
- Create a second rollback segment name R0 in the SYSTEM tablespace.
- Make new rollback segment available (after shutdown, modify init.ora file and start database)
- Create other tablespaces (RBS) for rollback segments.
- Deactivate rollback segment R0 and activate the newly created rollback segments.

Datastage Coding Checklist

Ensure that the null handling properties are taken care for all the nullable fields. Do not set the null field value to some value which may be present in the source.
Ensure that all the character fields are trimmed before any processing. Normally extra spaces in the data may lead to some errors like lookup mismatch which are hard to detect.
Always save the metadata (for source, target or lookup definitions) in the repository to ensure re usability and consistency.

How to setup environment variables in DataStage

In DataStage you can set environment variables using three different methods. The right method depends on how often you need to change the value of an environment variable and how specific this value is for a particular project or job. Here is a description of these methods:

Error : Datastage Job Aborts with "The record is too big to fit in a block"

To fix this error you need to increase the block size to accommodate the record size:

1. Log into Designer and open the job.

2. Open the job properties--> parameters-->add environment variable and select:
APT_DEFAULT_TRANSPORT_BLOCK_SIZE

ETL Job Design Standards - 2

Part 1 --> ETL Job Design Standards - 1

Parameter Management Standards

This section defines standards to manage job parameters across environments. Jobs should use parameters liberally to avoid hard coding as much as possible. Some categories of parameters include:

Environmental parameters, such as directory names, file names, etc.
Database connection parameters
Notification email addresses
Processing options, such as degree of parallelism

ETL Job Design Standards - 1

When using an off-the-shelf ETL tool, principles for software development do not change: we want our code to be reusable, robust, flexible, and manageable. To assist in the development, a set of best practices should be created for the implementation to follow. Failure to implement these practices usually result in problems further down the track, such as a higher cost of future development, increased time spent on administration tasks, and problems with reliability.

Although these standards are listed as taking place in ETL Physical Design, it is ideal that they be done before the prototype if possible. Once they are established once, they should be able to be re-used for future increments and only need to be reviewed.

Listed below are some standard best practice categories that should be identified on a typical project.

Create a unique counter in datastage

This entry describes various ways of creating a unique counter in DataStage jobs.
A parallel job has a surrogate key stage that creates unique IDs, however it is limited in that it does not support conditional code and it may be more efficient to add a counter to an existing transformer rather than add a new stage.

In a server job there are a set of key increment routines installed in the routine SDK samples that offer a more complex counter that remembers values between job executions.
The following section outlines a transformer only technique.

Interview Questions : DataStage - Part 2

For more : Visit HERE

What is the importance of Surrogate Key in Data warehousing?

Ans : Surrogate Key is a Primary Key for a Dimension table. Most importance of using it is independent of underlying database. i.e Surrogate Key is not affected by the changes going on with a database

What does a Config File in parallel extender consist of?

Ans: Config file consists of the following.
a) Number of Processes or Nodes.
b) Actual Disk Storage Location.

Subscribe to: Posts ( Atom )

Nuts & Bolts of DataStage

Monday, June 23, 2014

Purpose of Parameters in ETL

Wednesday, April 30, 2014

Performance Tunings in DataStage

Thursday, March 06, 2014

Oracle Interview Questions - Part-4

Monday, February 17, 2014

Datastage Coding Checklist

Wednesday, December 04, 2013

How to setup environment variables in DataStage

Tuesday, December 03, 2013

Error : Datastage Job Aborts with "The record is too big to fit in a block"

Friday, November 22, 2013

ETL Job Design Standards - 2

Tuesday, November 12, 2013

ETL Job Design Standards - 1

Monday, October 07, 2013

Create a unique counter in datastage

Thursday, September 26, 2013

Interview Questions : DataStage - Part 2