Nuts & Bolts of DataStage: develop

Showing posts with label develop. Show all posts

Monday, July 13, 2015

Types of Jobs in DataStage

Server jobs

Executed by the DataStage server engine
Compiled into Basic
Runtime monitoring in DataStage Director
No parallel capability

What happen when Import Sequential File with Extra Column

Job reads sequential file and load into RDBMS table.

What happen if sequential file has extra column in the back of each record?

It depends on data type of last column defined by DS and RDBMS table.

If data type of last column = varchar, extra column will be included as last column.

trim leading 0's from decimal in DataStage

1. Convert the decimal (or whatever it was) to String.
2. then trim '0's from left.

Trigger types in Sequencer Job

While designing Sequencer jobs when we are using any Sequencer Stage ( Job activity, Command activity or others ), it has a tab named 'trigger' which actually determine the actions that occur when the activity runs. Each activity can output different trigger types.

Three types of triggers are available, with some types having subtypes.

Conditional: A conditional trigger runs the target activity if the source activity fulfills the specified condition. The condition is defined by an expression, and can be one of the following types:

DataStage UNIX Commands

I've been asked so many times for Unix/Linux command which we usually used in DataStage. Here, I have tried to compile them together, please let me know if I missed something here. Njoy ....... :)

•Check the permissions
1.Log on to server as dsadm or root
2.cd $DSHOME/bin
3.ls -l | grep rws

The output should be (6) files owned by root with the setuid bit set -- especially the uv file.

Modify Stage - TRIM function

For trimming spaces or a char from a string can be done in modify stage. Below, I have shared the function syntax and some example, go Play ;-)

string_trim[character,direction,justify](string)
or
string_trim[character,direction](string)
or
string_trim[character](string)
or
string_trim(string)

Modify Stage - Handling NULLs

For handling the NULL value in columns we can use modify functions in DataStage instead of Transformer. Rather, I'll always prefer a transformer to avoid unnecessary effort which we put to setting up modify stage. But sometime to tune the job over performance we have to do that ;-)

The below Function Specification we have to use to Handle NULL --

Some Musts To Avoid When Developing Your Data Warehouse Project

Here’s a list of things to avoid when developing a data warehouse. These are not necessarily in priority order. The priorities depend on your project.

1. Avoid technology for technology’s sake. Focus on business requirements and goals.

2. Avoid not securing a high-ranking business sponsor. If you lose your current sponsor, immediately find a replacement in the business sector of your organization.

3. Avoid trying to implement the entire data warehouse all at once. Approach the data warehouse project as a series of integrated sub-projects (data marts), and deliver each sub-project as it’s completed.

4. Avoid expending excess energy and budget on structuring the data in the warehouse. Do not over-normalize (starflake schema). Your focus should be the best query performance that you can deliver and delivering a quality, easy-to-use set of user interfaces.

Read Sequential File with SCHEMA file

In Datastage, Schemas are an alternative way for you to specify column definitions for the data used by parallel jobs. You do not need to define any column and data format for reading a file. We are using this method when we needed a generic job for scenario like reading multiple files having different metadata defined.

Here, I am going to show you how to read the seq file with help of schema

A. Design

I am using below design to demonstrate this functionality, I am reading the data and dumping it to another file with different format.

Create Fixed Width Data in DataStage

In day to day life of DataStage, you are facing lot of business scenario and technical scenario. Today, we are going to discuss how to generate Fixed Width data from variable length data in datastage.

Input :
23
3345
546576
535
80
2342

Output :
0000023
0003345
0546576
0000535
0000080
0002342

DataStage Warning - Agg,1: Hash table has grown to 16384 entries

Sometime when we are running Datastage job in which we are using Aggregator stage, we get below warning -

Agg,1: Hash table has grown to 16384 entries.

Agg,0: Hash table has grown to 16384 entries.

Solution :

Read Fixed Width Flat File with Sequential File Stage

If we have a fixed width Flat file, we can read it via Sequential File stage only, no need to use Import Stage. For reading a Fixed Width Flat file, you should know the format of Fixed width data. How are the data splitting into two or more output columns.

Here, I demonstrate How to read a Fixed Width Flat file from Sequential File Stage

A. Input File :

Below is the Input file which we are going to use in our job. This file contains a column RollNo which is concatenation of 4 fixed width columns which we have to split and read.

Peek Stage in DataStage - 3

In today's post, we will discuss a important property of PEEK stage. What if you want to peek some data as well as want to take action or wanna do something with data after peeking. For enabling this, datastage gives you a option to do this.

1. Job Design :

for this demo, we will use below job design to explain, as you can see, There are 2 link coming out from PEEK stage.

First Link --> This link contain the PEEK data as per property defined in peek stage. (here you can see 40 rows ( 10 rows from each partition x 4 nodes )

Second Link --> this link contain the Whole stream data. ( All 100 rows )

Peek Stage in DataStage - 2

Lets continue our journey with PEEK stage, today in this post, I am going to share one more option about PEEK stage. Sending the Peek data into a output file..

1. Job design :

For this demo job, we will use below job design. You can see we have attached a seq file after the peek stage which will capture the Peeked data

Peek Stage in DataStage - 1

The Peek stage is a Development/Debug stage. It can have a single input link and any number of output links.

The Peek stage lets you print record column values either to the job log or to a separate output link as the stage copies records from its input data set to one or more output data sets.

1. Design :

We are using below design to demonstrate the functionality of PEEK stage in datastage. Job design is having 3 stages, Row Generator , data source, for generating the dummy data for the job. Here, we can use any data source whether it is DB, dataset or flat files. Transformer, just as a stage, creating negative values of source data. Peek for Peeking :-)

DataStage Scenario - Design10 - job1

DataStage Scenario Problem --> DataStage Scenario - Problem10

Solution Design :

a) Job Design :

Below is the design which can achieve the output as we needed. In this design, we are reading the data from flat file, generating a dummy column in column generator stage, doing aggregate on that and sending the output to seq file.

Some DataStage Job design

Updating the Master Reports List (MRL) in Datastage

Create a Job which has a Row Generator stage linking to an ODBC or DB2 stage. Change the parameter in the Row Generator stage to output zero records. Write the SQL to update the MRL table as 'After SQL' in the output ODBC or DB2 stage.

DataStage Scenario - Problem20

Goal : Reverse the Column

Input is like this :

i/p
col1,col2
101,a
102,b
103,c
104,d

Handling Filename with Spaces in Linux

Spaces in Unix/Linux Filename is not a good habit as usually it is used as delimiter for field values rather than being in filename itself. In Linux, SPACE is used as a IFS ( Internal Field Separator) that is used for word splitting after expansion and to split lines into words with the read build in command. The default values of IFS is SPACE, TAB, NEW LINE.

This is why whenever we are doing something with File-name having spaces in Linux, we got error like..
: No such File or Directory

What if you got some files from a "Unix dumb" Server source, How you will handle the filename with spaces, Here are some simple ways to deal with it.

Column Import Stage - Read a Fixed width File

The Column Export stage is a restructure stage. It can have a single input link, a single output link and a single rejects link. Typically used to divide data arriving in a single column into multiple columns.
Here, I tried to explain How to use it for importing data...

a) Design :

As you can see, Here we have followed this design to elaborate the stage. We are reading a file, Importing Columns with Import Stage and then loading them into Sequential File.

Subscribe to: Posts ( Atom )

Monday, July 13, 2015

Server jobs

Wednesday, June 24, 2015

Tuesday, June 16, 2015

Monday, June 01, 2015

Tuesday, February 17, 2015

Wednesday, February 04, 2015

Tuesday, February 03, 2015

Monday, January 19, 2015

Wednesday, November 05, 2014

A. Design

Monday, October 20, 2014

Friday, September 26, 2014

Wednesday, September 17, 2014

A. Input File :

Monday, September 08, 2014

1. Job Design :

Friday, September 05, 2014

1. Job design :

Tuesday, September 02, 2014

1. Design :

Saturday, August 09, 2014

Friday, July 18, 2014

Updating the Master Reports List (MRL) in Datastage

Thursday, July 17, 2014

Goal : Reverse the Column

Tuesday, July 15, 2014

Wednesday, July 09, 2014