Something about DataStage, DataStage Administration, Job Designing,Developing, DataStage troubleshooting, DataStage Installation & Configuration, ETL, DataWareHousing, DB2, Teradata, Oracle and Scripting.
Showing posts with label parameter. Show all posts
Showing posts with label parameter. Show all posts
Monday, June 23, 2014
Purpose of Parameters in ETL
In an ETL process, a parameter represents an attribute that is not hard coded or sourced from the transactional system. Parameters provide flexibility so that you can adapt ETL processes to fit your business requirements.
or in Simple Words, A job parameter is a way to change a property within a job without having to alter and recompile it. These are input values to job/process which can be changed time to time as per need.
Wednesday, April 30, 2014
Performance Tunings in DataStage
JOB LEVEL
- Parametrize all the inputs needed for the jobs; avoid hard coding of parameters like username, password, directory paths etc.
- By using the environment variable ‘APT_CONFIG_FILE’ as a job parameter, user can dynamically change the number of nodes being used for processing a particular job.
- For reading/writing data from large tables/files, make use of the environment variable ‘APT_BUFFER_MAXIMUM_MEMORY’. It can be used to change the memory buffer size being used for each stage.
- It is recommended to set the environment variable $APT_DUMP_SCORE to a value of 1. When this environment variable is set, an entry is placed in the WebSphere DataStage job log showing the actual runtime structure (processes, their associated internal operators, datasets, nodes, etc) used to execute the job flow.
Thursday, March 06, 2014
Oracle Interview Questions - Part-4
76. What is the OPTIMAL parameter?
It is used to set the optimal length of a rollback segment.
77. What is the functionality of SYSTEM table space?
To manage the database level transactions such as modifications of the data dictionary table that record information about the free space usage.
78. How will you create multiple rollback segments in a database?
- Create a database, which implicitly creates a SYSTEM rollback segment in a SYSTEM tablespace.
- Create a second rollback segment name R0 in the SYSTEM tablespace.
- Make new rollback segment available (after shutdown, modify init.ora file and start database)
- Create other tablespaces (RBS) for rollback segments.
- Deactivate rollback segment R0 and activate the newly created rollback segments.
Monday, February 17, 2014
Datastage Coding Checklist
- Ensure that the null handling properties are taken care for all the nullable fields. Do not set the null field value to some value which may be present in the source.
- Ensure that all the character fields are trimmed before any processing. Normally extra spaces in the data may lead to some errors like lookup mismatch which are hard to detect.
- Always save the metadata (for source, target or lookup definitions) in the repository to ensure re usability and consistency.
Wednesday, December 04, 2013
How to setup environment variables in DataStage
In DataStage you can set environment variables
using three different methods. The right method depends on how often you
need to change the value of an environment variable and how specific
this value is for a particular project or job. Here is a description of
these methods:
Tuesday, December 03, 2013
Error : Datastage Job Aborts with "The record is too big to fit in a block"
To fix this error you need to increase the block size to accommodate the record size:
1. Log into Designer and open the job.
2. Open the job properties--> parameters-->add environment variable and select:
APT_DEFAULT_TRANSPORT_BLOCK_SIZE
Friday, November 22, 2013
ETL Job Design Standards - 2
This section defines standards to manage job parameters
across environments. Jobs should use parameters liberally to avoid hard coding
as much as possible. Some categories of parameters include:
- Environmental parameters, such as directory names, file names, etc.
- Database connection parameters
- Notification email addresses
- Processing options, such as degree of parallelism
Labels:
Administration
,
Data
,
DataStage
,
design
,
ETL
,
exception
,
handling
,
Job
,
load
,
notification
,
Optimizing
,
parameter
,
performance
,
report
,
source
,
Standards
,
statistics
,
tips
Tuesday, November 12, 2013
ETL Job Design Standards - 1
When using an off-the-shelf ETL tool, principles for
software development do not change: we want our code to be reusable, robust,
flexible, and manageable. To assist in the development, a set of best practices
should be created for the implementation to follow. Failure to implement these
practices usually result in problems further down the track, such as a higher
cost of future development, increased time spent on administration tasks, and
problems with reliability.
Although these standards are listed as taking place in ETL
Physical Design, it is ideal that they be done before the prototype if
possible. Once they are established once, they should be able to be re-used for
future increments and only need to be reviewed.
Listed below are some standard best practice categories that
should be identified on a typical project.
Labels:
database
,
DataStage
,
design
,
environment
,
Errors
,
ETL
,
handling
,
Job
,
Link
,
managers
,
Metadata
,
names
,
notification
,
Optimizing
,
parameter
,
process
,
reusability
,
stages
Monday, October 07, 2013
Create a unique counter in datastage
This entry describes various ways of creating a unique counter in DataStage jobs.
A parallel job has a surrogate key stage that creates unique IDs, however it is limited in that it does not support conditional code and it may be more efficient to add a counter to an existing transformer rather than add a new stage.
In a server job there are a set of key increment routines installed in the routine SDK samples that offer a more complex counter that remembers values between job executions.
The following section outlines a transformer only technique.
Thursday, September 26, 2013
Interview Questions : DataStage - Part 2
For more : Visit HERE
What is the importance of Surrogate
Key in Data warehousing?
Ans : Surrogate Key is a Primary Key for a Dimension table. Most
importance of using it is independent of underlying database. i.e Surrogate Key
is not affected by the changes going on with a database
What does a Config File in parallel
extender consist of?
Ans: Config file consists of the following.
a) Number of Processes or Nodes.
b) Actual Disk Storage Location.
a) Number of Processes or Nodes.
b) Actual Disk Storage Location.
Subscribe to:
Posts
(
Atom
)