Something about DataStage, DataStage Administration, Job Designing,Developing, DataStage troubleshooting, DataStage Installation & Configuration, ETL, DataWareHousing, DB2, Teradata, Oracle and Scripting.
Showing posts with label stages. Show all posts
Showing posts with label stages. Show all posts
Wednesday, May 27, 2015
Data sources in DataStage
IIS Datastage connectivity options give us a wide scope to connect with different source or targets. It's support RDBMS, ERP, z/OS DB, OLAP system and many more.
Below listed Data Sources are available in IIS v11.3
Monday, May 25, 2015
Execution Steps in Transformer Stage - Explanation
You can access Part1 Here -
Execution Steps in Transformer Stage
Certain constructs are inefficient if they are included in output column
derivations, because they are evaluated once for every output column
that uses them. The following examples describe these constructs:
- The same part of an expression is used in multiple column derivations.
- For example, if you want to use
the same substring of an input column in multiple columns in output
links, you might use the following test in a number of output columns
derivations:
In this case, the evaluation of the substring of DSLINK1.col1[1,3] is repeated for each column that uses it. The evaluation can be made more efficient by moving the substring calculation into a stage variable. The substring is then evaluated once for every input row. This example has thus stage variable definition for StageVar1:IF (DSLINK1.col1[1,3] = "001") THEN ...
Labels:
Concept
,
condition
,
constraint
,
DataStage
,
derivation
,
execution
,
input
,
Link
,
loop
,
output
,
processing
,
rows
,
stages
,
step
,
transformer
,
Tutorial
,
variables
Thursday, May 21, 2015
Execution Steps in Transformer Stage
I've been asked this questions so many times in interviews and by different practitioner also that What are the data processing steps when datastage is processing transformer, So here I tried to compiled. Have a look -
To write efficient Transformer stage derivations, it helps to understand what items get evaluated and when.
Labels:
Concept
,
condition
,
constraint
,
DataStage
,
derivation
,
execution
,
input
,
Link
,
loop
,
output
,
processing
,
rows
,
stages
,
step
,
transformer
,
Tutorial
,
variables
Friday, February 06, 2015
Order of temporary disk space uses by Sort stage
The Sort stage uses temporary disk space when performing a sort. It looks in the following locations, in the
following order, for this temporary space.
1. Scratch disks in the disk pool sort (you can create these pools in the configuration file).
2. Scratch disks in the default disk pool (scratch disks are included here by default).
Wednesday, February 04, 2015
Modify Stage - TRIM function
For trimming spaces or a char from a string can be done in modify stage. Below, I have shared the function syntax and some example, go Play ;-)
string_trim[character,direction,justify](string)
or
string_trim[character,direction](string)
or
string_trim[character](string)
or
string_trim(string)
Wednesday, September 17, 2014
Read Fixed Width Flat File with Sequential File Stage
If we have a fixed width Flat file, we can read it via Sequential File stage only, no need to use Import Stage. For reading a Fixed Width Flat file, you should know the format of Fixed width data. How are the data splitting into two or more output columns.
Here, I demonstrate How to read a Fixed Width Flat file from Sequential File Stage
A. Input File :
Below is the Input file which we are going to use in our job. This file contains a column RollNo which is concatenation of 4 fixed width columns which we have to split and read.
Monday, September 08, 2014
Peek Stage in DataStage - 3
In today's post, we will discuss a important property of PEEK stage. What if you want to peek some data as well as want to take action or wanna do something with data after peeking. For enabling this, datastage gives you a option to do this.
1. Job Design :
for this demo, we will use below job design to explain, as you can see, There are 2 link coming out from PEEK stage.
First Link --> This link contain the PEEK data as per property defined in peek stage. (here you can see 40 rows ( 10 rows from each partition x 4 nodes )
Second Link --> this link contain the Whole stream data. ( All 100 rows )
Tuesday, September 02, 2014
Peek Stage in DataStage - 1
The Peek stage is a Development/Debug stage. It can have a single input link and any number of output links.
The Peek stage lets you print record column values either to the job log or to a separate output link as the stage copies records from its input data set to one or more output data sets.
1. Design :
We are using below design to demonstrate the functionality of PEEK stage in datastage. Job design is having 3 stages, Row Generator , data source, for generating the dummy data for the job. Here, we can use any data source whether it is DB, dataset or flat files. Transformer, just as a stage, creating negative values of source data. Peek for Peeking :-)
Saturday, August 09, 2014
DataStage Scenario - Design10 - job1
DataStage Scenario Problem --> DataStage Scenario - Problem10
Solution Design :
a) Job Design :
Below is the design which can achieve the output as we needed. In this design, we are reading the data from flat file, generating a dummy column in column generator stage, doing aggregate on that and sending the output to seq file.
Wednesday, July 09, 2014
Column Import Stage - Read a Fixed width File
The Column Export stage is a restructure stage. It can have a single input link, a single output link and a single rejects link. Typically used to divide data arriving in a single column into multiple columns.
Here, I tried to explain How to use it for importing data...
a) Design :
As you can see, Here we have followed this design to elaborate the stage. We are reading a file, Importing Columns with Import Stage and then loading them into Sequential File.
Wednesday, June 04, 2014
Surrogate Key Generator - Generate Surrogate Key for Data
In this post, We will see how to generate surrogate key for data, where we have to use surrogate key stage.
A) Design :
Below design is a demo design of job. Here our data source is a row generator which is generating rows. In real time scenario, Source can be a flat file, DB stages, Passive Stage or can be a Active stage also.
In Row Generator Stage, we are generating a col "Name".
Labels:
DataSet
,
design
,
Designer
,
develop
,
Job
,
keys
,
output
,
properties
,
row generator
,
source
,
stages
,
surrogate
,
surrogate key generator
,
target
Tuesday, June 03, 2014
Surrogate Key Generator - Delete State File
Adding this post as well with Surrogate Key walk through, In this post we will see how to delete a Surrogate Key State file. Its quite simple as creation.
a) Design :
Again, for the deletion of state file, we need only 1 stage in design and that is Surrogate Key Stage.
Wednesday, April 30, 2014
How to Create Custom Stages in Datastage Parallel Jobs ?
In addition to the wide range of parallel stage types available, the Designer allows you to define your own stage types, which you can then use in parallel jobs.
For creating custom stage in datastage which will functions as per our requirement and we will be able to modify that stage as we may want. Here is some information on custom stage functionality in datastage in parallel jobs and how can we create our own wrapped custom stage.
Performance Tunings in DataStage
JOB LEVEL
- Parametrize all the inputs needed for the jobs; avoid hard coding of parameters like username, password, directory paths etc.
- By using the environment variable ‘APT_CONFIG_FILE’ as a job parameter, user can dynamically change the number of nodes being used for processing a particular job.
- For reading/writing data from large tables/files, make use of the environment variable ‘APT_BUFFER_MAXIMUM_MEMORY’. It can be used to change the memory buffer size being used for each stage.
- It is recommended to set the environment variable $APT_DUMP_SCORE to a value of 1. When this environment variable is set, an entry is placed in the WebSphere DataStage job log showing the actual runtime structure (processes, their associated internal operators, datasets, nodes, etc) used to execute the job flow.
Thursday, April 24, 2014
DataStage Scenario - Design5 - job1
DataStage Scenario Problem --> DataStage Scenario - Problem5
Solution Design :
a) Job Design :
Below is the design which can achieve the output as we needed. Here, we are reading seq file as a input, then data is passing through a Sort and Transformer stage to achieve the output.
Tuesday, April 15, 2014
DataStage Scenario - Design4 - job1
DataStage Scenario Problem --> DataStage Scenario - Problem4
Solution Design :
a) Job Design :
Below is the design which can achieve the output as we needed. Here, we are reading seq file as a input, then data is passing through a Sort and Transformer stage to achieve the output.
Wednesday, April 09, 2014
Some more design tips for DataStage Job Development
• Common information like home directory, system date, username, password should be initialized in a global variable and then variable should be referred everywhere.
• Stage Variables allow you to hold data from a previous record when the next record, allowing you to compare between previous and current records. Stage variables also allow you return multiple errors for a record of information. By being able to evaluate all data in a record and not just error on the first exception that is found, the cleanup of data is more efficient and requires less iteration.
Friday, April 04, 2014
DataStage Naming Standard
For maintaining the design and easy understanding of what happening inside the job, we are giving meaningful name to stages, links, jobs and sequencer. So that other developer/designer can have a idea about purpose of job/seq by seeing the design.
Here I have shared some DataStage Naming Standard we usually follow.
Monday, February 17, 2014
Datastage Coding Checklist
- Ensure that the null handling properties are taken care for all the nullable fields. Do not set the null field value to some value which may be present in the source.
- Ensure that all the character fields are trimmed before any processing. Normally extra spaces in the data may lead to some errors like lookup mismatch which are hard to detect.
- Always save the metadata (for source, target or lookup definitions) in the repository to ensure re usability and consistency.
Thursday, February 13, 2014
DataStage Parallel job: Retrieve sql codes on a failed upsert
When an enterprise database stage such as DB2 or Oracle is set to upsert it is possible to create a reject link to trap rows that fail any update or insert statements. By default this reject link holds just the columns written to the stage, they do not show any columns indicating why the row was rejected and often no warnings or error messages appear in the job log.
Subscribe to:
Posts
(
Atom
)


.jpg)
.jpg)