We have moved to www.dataGenX.net, Keep Learning with us.

Thursday, October 01, 2015

My Blog Report - till now

First of all, I would like to say thank you all for loving this blog so much. As you are aware we have moved to "www.DataGenX.net" now. Keep Learning, Keep Sharing n Keep loving us :-)

Today I am sharing Blog Report for "Nuts & Bolts of DataStage".

Like the Facebook Page & join Group

For WHATSAPP group , drop a msg to 91-88-00-906098

Tuesday, August 04, 2015

Regular Expressions in Linux

What Are Regular Expressions?

A regular expression is a pattern template you define that a Linux utility Uses to filter text. A Linux utility (such as the sed editor or the awk program) matches the regular expression pattern against data as that data flows Into the utility. If the data matches the pattern, it's accepted for processing.

              If the data doesn't match the pattern, it's rejected. The regular expression pattern makes use of wildcard characters to represent one or more characters in the data stream.

Types of regular expressions:
There are two popular regular expression engines:

The POSIX Basic Regular Expression (BRE) engine
The POSIX Extended Regular Expression (ERE) engine

A.     Defining BRE Patterns:
The most basic BRE pattern is matching text characters in a data stream.

Wednesday, July 15, 2015

Error while creating new jobs in DataStage

If you face the error error similar to below can occur when trying to save a newly created job:

Error On CREATE.FILE command
Creating file "RT_CONFIG4817 as type 30, mkdbfile: connot create file
RT_CONFIG4817. Unable to create operating system file "RT_CONFIG4817"

Monday, July 13, 2015

Types of Jobs in DataStage

Server jobs

  • Executed by the DataStage server engine
  • Compiled into Basic
  • Runtime monitoring in DataStage Director
  • No parallel capability

Wednesday, July 08, 2015

What is EPOCH ?

EPOCH is time in seconds since 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970 ( 1970/01/01 00:00:00 UTC). Or the epoch is Unix time 0 (midnight 1/1/1970), but 'epoch' is often used as a synonym for 'Unix time'. EPOCH time is widely used in every field of IT sector (specially in unix like OS ) where we have to keep difference is generation of something based on time.

Friday, July 03, 2015

Get list of activated tools in Datastage Installation

If we want to check what features have been installed with datastage, we can check it by using LicensingServiceAdmin  tool. This will list down all the tool packs or features including Qualitystage as it comes as a part of datastage.

Tool Location :  {IS_INSTALL_DIR}/ASBServer/bin

Tuesday, June 30, 2015

Maximum number of DataStage Jobs in a project

Filesystems can have a hard limit on number of subdirectories that can be contained in a directory, for example 32766 (32K -2) for Linux

For every job, DataStage will create hash files as subdirectories in the project directory. Here is a list of the subdirectories created for every job. In this list "nn" represents the number (or id) of the job:

RT_SCnn (only for parallel jobs)

Monday, June 29, 2015

Get nth Highest or Lowest data from Table

Here, we are considering a table student and we need to find out Nth highest and lowest marks of a student.

Nth Highest

select * from student s1 where N-1 = ( select count(distinct(marks)) from student s2 where s1.marks > s2.marks)

Saturday, June 27, 2015

SQL Scenario - 1

You have a table like below :

Name Age Gender
Atul         11 F
Ram 10 F
Disha  6 M
Pavan  8 F
Mani         10 M
Lovely 10 M
Rani    9 M

Unlocking XMETA locks from jobs

How can I unlock an XMETA locked job?

The job cannot be exported and when attempting to open the job, the following error appears:

Unable to open the Job
The Job 'TestJob' is locked by user 'dsadm'
(SessionID '8F1B0326-4F7E-45A8-B8EC-EAA8C4780528')

Do you want to continue and open a read-only view of the Job?
This issue arises when a lock still remains in XMETA for the corresponding job. Information Server will lock the job to ensure that there is no indeterminancy regarding the editing and saving of a job.
The most common causes of this is when a user has closed a client window unexpectedly (e.g. Force Terminate).

Wednesday, June 24, 2015

What happen when Import Sequential File with Extra Column

Job reads sequential file and load into RDBMS table.

What happen if sequential file has extra column in the back of each record?

It depends on data type of last column defined by DS and RDBMS table.
  • If data type of last column = varchar, extra column will be included as last column.

Sunday, June 21, 2015

DataStage Installation Checklist

  • Review the system requirements.
  • learn about tiers and components.
  • Design the topology of your installation.
  • Determine whether to pre-install your database system.
  • Determine whether to pre-install IBM WebSphere Application Server or pre-configure an existing installation of WebSphere Application Server.

Tuesday, June 16, 2015

Thursday, June 04, 2015

Error - mongod shell doesn't start, /data/db doesn't exsist

Error - mongod shell doesn't start, /data/db doesn't exsist

If you face this error while starting MongoDB (mongod), most probably there are 2 reasons for this -

a.  /data/db is not exist at all
b. mongod process don't have access on /data/db

Solution :-

The default db path for mongodb is /data/db/.
So when you run:

Wednesday, June 03, 2015

MongoDB Configuration in Linux

DOWNLOAD the stable version of MongoDB. It will a tar file.

1. Create a folder named 'learn' ( or whatever you want to give )
     $ mkdir -p /learn/mongodb  /app/dbMongo
      # /learn/mongodb  = Holds the MongoDB source
      # /app/dbMongo = Holds the MongoDB database

2. Extract the Mongo DB zip file in /learn folder
  $ tar -xvf MongoDB.tar  /learn/mongodb

3.  change the permission of folder to user who run the db here-  In my case User - atul and Group - atul
  $ chown -R atul:atul  /learn/mongodb  /app/dbMongo

Monday, June 01, 2015

Trigger types in Sequencer Job

While designing Sequencer jobs when we are using any Sequencer Stage ( Job activity, Command activity or others ), it has a tab named 'trigger' which actually determine the actions that occur when the activity runs. Each activity can output different trigger types. 

Three types of triggers are available, with some types having subtypes.
A conditional trigger runs the target activity if the source activity fulfills the specified condition. The condition is defined by an expression, and can be one of the following types:

Saturday, May 30, 2015

Data Warehouse Load Methods

A data warehouse is a warehouse that contains data. Sounds funny doesn't it? Imagine a warehouse as a storage structure that may be physical or logical. In the DW it's both...logical as in the data model representation and physical as to the hard disks and other computer equipment that support the data warehouse.

The first question should be "How does the data get there?" Here are three key areas to keep in mind....

Availability - will someone create a report with transactions completed 20 minutes ago or will it be with yesterday's data?

Wednesday, May 27, 2015

Data sources in DataStage

IIS Datastage connectivity options give us a wide scope to connect with different source or targets. It's support RDBMS, ERP, z/OS DB, OLAP system and many more.

Below listed Data Sources are available in IIS v11.3

Monday, May 25, 2015

Execution Steps in Transformer Stage - Explanation

 You can access Part1 Here -  Execution Steps in Transformer Stage
Certain constructs are inefficient if they are included in output column derivations, because they are evaluated once for every output column that uses them. The following examples describe these constructs:
The same part of an expression is used in multiple column derivations.
For example, if you want to use the same substring of an input column in multiple columns in output links, you might use the following test in a number of output columns derivations:
IF (DSLINK1.col1[1,3] = "001") THEN ...
In this case, the evaluation of the substring of DSLINK1.col1[1,3] is repeated for each column that uses it. The evaluation can be made more efficient by moving the substring calculation into a stage variable. The substring is then evaluated once for every input row. This example has thus stage variable definition for StageVar1:

Thursday, May 21, 2015

Execution Steps in Transformer Stage

I've been asked this questions so many times in interviews and by different practitioner also that What are the data processing steps when datastage is processing transformer, So here I tried to compiled. Have a look -

To write efficient Transformer stage derivations, it helps to understand what items get evaluated and when.

Wednesday, May 20, 2015

Disable auto insertion of Partition and Sort

Partitioner insertion and sort insertion each make writing a flow easier by alleviating the need for a user to think about either partitioning or sorting data. By examining the requirements of operators in the flow, the parallel engine can insert partitioners, collectors and sorts as necessary within a data flow.

However, there are some situations where these features can be a avoided or not needed.
If data is pre-partitioned and pre-sorted, and the InfoSphere DataStage job is unaware of this, you could disable automatic partitioning and sorting for the whole job by setting the following environment variables while the job runs:

Tuesday, May 19, 2015

SQL Assignment08 - Delete Statement

1. Delete from customers.
2. Delete depositor of branches having number of customers between 1 and 3.
3. Delete branches havinng average deposit less than 5000.
4. Delete branches having maximum loan more than 5000.
5. Delete branches having deposit from Nagpur.

Thursday, May 14, 2015

SQL Assignment07 - Update Statement

1. Give 10% interest to all depositors.
2. give 10% interest to all depositors having branch VRCE
3. Give 10% interest to all depositors living in city Nagpur.
4. Give 10% interest to all depositors having branch in city BOMBAY and living in Nagpur.
5. Add 100 Rs to deposit of Anil and assign it to Sunil.
6. Change the deposit of VRCE branch to 1000 and change the branch as VRCE_UPT
7. Assign to the deposit of Anil the maximum deposit from VRCE branch

Monday, May 11, 2015

SQL Assignment06 - Group By or Having Clause

1. List the branches having sum of deposit more than 5000.
2. List the branches having a sum of deposit more than 5000 and located in city BOMBAY.
3. List the name of Customers having deposit in the branches where the average deposit is more than 5000.
4. List the names of customers having maximum deposit.
5. List the names of customers having maximum deposit in the table CUSTOMERS living in Nagpur.
6. List the name of branch having highest number of depositors.
7. Count the number of depositors living in Nagpur.

Thursday, May 07, 2015

SQL Assignment05 - Aggregate Functions

1. List total loan.
2. List total deposit.
3. List total loan taken from KAROLBAGH branch.
4. List total deposit of customers having account date later than 1-Jan-96.
5. List total deposit of customers living in city Nagpur.
6. List maximum deposit of customers living in BOMBAY.
7. List total deposit of customers having branch city DELHI

Tuesday, May 05, 2015

SQL Assignment04 - Operations

1. List all the customers who are depositors but not borrowers.
2. List all the customers who are both depositors and borrowers.
3. List all the customers, their amount, who are either depositors or borrowers and living in city Nagpur.
4. List all the depositors having deposit in all the branches where Sunil is having account.
5. List all the customers living in city Nagpur and having branch city BOMBAY or DELHI.

Monday, May 04, 2015

SQL Assignment03 - Join or Cartesian Product

Some more queries on SQL to work on --

1. Give names of customers having living city BOMBAY and branch city Nagpur.
2. Give Names of Customers having the same living city as their branch city.
3. Give names of customers who are borrowers as well as depositors and having living city as Nagpur.
4. Give names of customers who are depositors and having the same branch city as that of Sunil.
5. Give names of depositors having the same living city as that if Anil and having deposit amount greater than 2000.

Thursday, April 30, 2015

SQL Assignment02 - Select Statement

Now, As we created some tables with data, lets do some practice with basic SQLs, Try to write queries for below statements -

-- List all data from table DEPOSIT
-- List all data from table BORROW
-- List all data from table CUSTOMERS
-- List all data from table BRANCH

Sunday, April 26, 2015

SQL Assignment01 - Creating tables

Hello Guys, As we all know that SQL is "must" known for getting into DWH area, So considering this what I am trying to do here is sharing some basic SQL assignments to complete. Start with me if wanna to test your SQL knowledge :-)

First of all, we will create some tables ( Script may need to change as per DB you are using ) - 

Thursday, April 23, 2015

Script to UnLock DataStage Jobs

This script will help you to unlock the all jobs, if you are a DS admin, or your datastage jobs , if a DS developer, without doing human error while executing command on Datastage server UV shell or Datastage Administrator Client.

Script is quite simple, Execute it , it will ask for Project Name and Job Name and Done if you have appropriate access on Datastage server.

May be need a modification as per your Datastage platform or if you need some fancy stuff to add on.  Happy Scripting !!

Tuesday, April 07, 2015

Get Datastage Job Lock List by Script

** This script will fetch all locks currently on Datastage Server irrespective of Projects.
** We can modify this script to correct the data display format as per our need
** Can modify sort command as it changes its behavior in different Linux flavour

Monday, April 06, 2015

DataStageBackup Dos Script

Guys, sorry for being away for so many days but I am back :-)

Today, I am going to share a batch script , written by KIM DUKE, hats off to him :-). Originally this script is designed for Datastage 7.5 but we can easily re-write for current versions. Please play with it and share your script if possible :-)

Monday, March 23, 2015

MongoDB is unable to start due to lock file

While starting MongoDB server, You are facing that MongoDB is not getting up with a exception says -

Sun Jan 18 12:27:03.884 [initandlisten] exception in initAndListen: 10310 Unable to lock file: /data/db/mongod.lock. Is a mongod instance already running?, terminating
Sun Jan 18 12:27:03.884 dbexit:

Normally, we are facing this issue due to 1. improper MongoDB server shutdown or 2. someone else is started the mongo server. In the second case, we dont have any issue, we can connect to out db without issue.

Tuesday, February 17, 2015

DataStage UNIX Commands

I've been asked so many times for Unix/Linux command which we usually used in DataStage. Here, I have tried to compile them together, please let me know if I missed something here. Njoy ....... :)

•Check the permissions
1.Log on to server as dsadm or root
2.cd $DSHOME/bin
3.ls -l | grep rws

The output should be (6) files owned by root with the setuid bit set -- especially the uv file.

Wednesday, February 11, 2015

"Current Informix Installation Detected" message during installation

I am sharing this IBM tip on Cognos here again as it helped my from a big trouble :-) , Thought for a worth sharing ......


1. A separate, standalone version of Informix is installed on the host (or leftovers are found in the Windows registry)
2. IBM Cognos Express has been installed and reinstalled and the Informix registry entries were not removed properly.

3. The Windows user "informix" is present on the host because Informix had been installed on the system in the past.

Friday, February 06, 2015

Order of temporary disk space uses by Sort stage

The Sort stage uses temporary disk space when performing a sort. It looks in the following locations, in the
following order, for this temporary space.

1. Scratch disks in the disk pool sort (you can create these pools in the configuration file).
2. Scratch disks in the default disk pool (scratch disks are included here by default).

Wednesday, February 04, 2015

Modify Stage - TRIM function

For trimming spaces or a char from a string can be done in modify stage. Below, I have shared the function syntax and some example, go Play ;-)


Tuesday, February 03, 2015

Modify Stage - Handling NULLs

For handling the NULL value in columns we can use modify functions in DataStage instead of Transformer. Rather, I'll always prefer a transformer to avoid unnecessary effort which we put to setting up modify stage. But sometime to tune the job over performance we have to do that ;-)

The below Function Specification we have to use to Handle NULL --

Monday, February 02, 2015

Encrypt Password for DataStage Command Line

As datastage user,  we need lots of scripts help our daily  jobs,  most of these scripts need authenticated by datastage engine,  that means we need to pass our Datastage user password in script or command line. 
                     But there is a problem if we are using "Password" here, It can be view by checking the content of script or command history. To secure password, there is a way provided by datastage and that is ENCRYPT the password before using. Let's see how we can do that ---

Friday, January 30, 2015

Get nth character in a String in Linux

This is a  very popular Unix question, How to get 3rd char from a 3rd column of a tab delimited file ? We are usually know how to get the nth column by using "CUT" or "AWK" command. But not aware with this "CUT" options which can give the particular column. Lets see how to use this -

Monday, January 19, 2015

Some Musts To Avoid When Developing Your Data Warehouse Project

Here’s a list of things to avoid when developing a data warehouse. These are not necessarily in priority order. The priorities depend on your project.

1. Avoid technology for technology’s sake. Focus on business requirements and goals.

2. Avoid not securing a high-ranking business sponsor. If you lose your current sponsor, immediately find a replacement in the business sector of your organization.

3. Avoid trying to implement the entire data warehouse all at once. Approach the data warehouse project as a series of integrated sub-projects (data marts), and deliver each sub-project as it’s completed.

4. Avoid expending excess energy and budget on structuring the data in the warehouse. Do not over-normalize (starflake schema). Your focus should be the best query performance that you can deliver and delivering a quality, easy-to-use set of user interfaces.

Sunday, January 18, 2015

Monday, January 12, 2015

Glossary of data warehousing terms

A field or column of a dimension (or other) table.

A suite of software tools used primarily by business administrative staff to navigate through the data of the data warehouse. BI tools provide functionality including managed reporting, queyring, data analysis, data visualization, etc.

data cube
A multi-dimensional representation of business data in which the cells of the cube contain data measures (i.e. facts) and the edges of the cube represent the data dimensions.

Although a cube implies only 3 dimensions in geometry, a data cube may represent any number of dimensions.

Tuesday, January 06, 2015

Rescuer 'xxd' command in Linux

"xxd" - a Rescuer , you guys are thinking why I am saying it a Rescuer so this is why so -

In our day to day work in Datastage/Linux, sometimes we have to deal with "Non Printable Character" ( bad characters ;). To confirm these we have many commands but for me this one is more reliable as this is displaying Hexadecimal value for each char from input file.

Every 2 no is representing 1 char from a file, here I have represent a char "a" which hexadecimal value is "61".

Saturday, January 03, 2015

Windows Command Prompt setup for DataStage Command Line

As we know that for using Datastage Command Line in Linux system, we need to setup linux shell session likewise we need to do same for Windows if server is installed on Windows server.

Follow below steps to setup your Datastage command Line ---

Friday, January 02, 2015

DataStage Warning - dsjob Command Returns Status code = 80011

Running dsjob utility with or without -domain option results in an error message: Status Code = 80011

Resolution -

The dsjob utility is unable to authenticate and validate against the DataStage Engine.
Use -domain NONE option.