We have moved to www.dataGenX.net, Keep Learning with us.

Monday, December 23, 2013

How to Schedule Jobs and Sequencers via Datastage Scheduler

Datastage Native Job Scheduler

Datastage includes a scheduling option, it does not have its own. DataStage doesn't include a "scheduler" so leverages the underlying O/S. For UNIX that means cron and a check of the crontab entries for the scheduling user will have what you need. DataStage leverages cron for recurring schedules and at for 'one off' schedules. For Windows, it uses scheduled tasks of Windows. 

From the operating system command line, logged in as the scheduling user, a "crontab -l" will list the scheduled jobs.

Thursday, December 19, 2013

Hash Files in DataStage

Guys, Sharing a nice & detailed information on HASH files, Its old but betterto know.

What Exactly Are Hash Files?

DataStage® Server Edition utilizes several different types of hash files. The default and most versatile type is a dynamic (more precisely a type 30) hash file. A hash file can simply be described as a file that distributes data throughout a pre-sized and space allocated file. Every row has to have a key to determine where the row will reside within the file. This key is run thru an algorithm (technically a hashing algorithm, hence the name) to determine the location in the file for the row. One and only one row can exist at a single location.

Thursday, December 12, 2013

Interview Questions : DataStage - self-3

100    If 1st and 8th record is duplicate then which will be skipped? Can you configure it?
101    How do you import and export datastage jobs? What is the file extension? (See each component while importing and exporting).
102    How do you rate yourself in DataStage?
103    Explain DataStage Architecture?
104    What is repository? What are the repository items?
105    What is difference between routine and transform?
106    When you write the routines?

Wednesday, December 11, 2013

A SQL Client Tool - TeraData Studio Express

Teradata Studio Express is a graphical Java program, developed on the Eclipse Rich Client Platform (RCP),  that will allow you to view the structure of a JDBC compliant database, browse the data in tables, issue SQL commands etc.
This is a nice tool which can connect many DBs like Aster database, DB2 for LUW, DB2 for i5/OS, DB2 for z/OS, Oracle, SQL Server, Generic JDBC connection and of course TeraData DB.

DataStage Scenario - Problem6

Goal : Get the count of Vowels in Columns

Input :

Akash Aggrawal
Priya Awasthi  
Anil chahal    
Diya Singh    
Kashish Patel 
Sunil Verma    
Rashid Patel    
Rashmi Arya   
Gopal Joshi     
Neha Tomar    

Tuesday, December 10, 2013

GRE Word List Flash Card

A flash card is any of a set of cards bearing information, as words, on both sides, used in classroom drills or in private study. Flashcards are widely used as a learning drill to aid memorization. We have created online version of flash cards for students to memorize GRE vocabulary for free. You can find a word in the front and an answer overleaf.
Click on the card below to see the meaning and contextual use of the word.

Monday, December 09, 2013

Friday, December 06, 2013

List of Environment Variables in DataStage

General Job Administration


Wednesday, December 04, 2013

How to setup environment variables in DataStage

In DataStage you can set environment variables using three different methods. The right method depends on how often you need to change the value of an environment variable and how specific this value is for a particular project or job. Here is a description of these methods:

Tuesday, December 03, 2013

Error : Datastage Job Aborts with "The record is too big to fit in a block"

To fix this error you need to increase the block size to accommodate the record size:

1. Log into Designer and open the job.

2. Open the job properties--> parameters-->add environment variable and select: 

Thursday, November 28, 2013

Tail Stage in DataStage

Tail Stage is another one stage from development stage category. It can have a single input link and a single output link.
The Tail Stage selects the last N records from each partition of an input data set and copies the selected records to an output data set.

a) Job Design :

Wednesday, November 27, 2013

Head Stage in DataStage

Welcome to Basic Intro with Stage Series, We are going to look into HEAD stage ( Developmet/Dubug Categoty).  It can have a single input link and a single output link. 
The Head Stage selects the first N rows from each partition of an input data set and copies the selected rows to an output data set. You determine which rows are copied by setting properties which allow you to specify:
  • The number of rows to copy
  • The partition from which the rows are copied
  • The location of the rows to copy
  • The number of rows to skip before the copying operation begins.

Friday, November 22, 2013

ETL Job Design Standards - 2

Part 1 --> ETL Job Design Standards - 1

Parameter Management Standards
This section defines standards to manage job parameters across environments. Jobs should use parameters liberally to avoid hard coding as much as possible. Some categories of parameters include: 

  • Environmental parameters, such as directory names, file names, etc.
  • Database connection parameters
  • Notification email addresses
  • Processing options, such as degree of parallelism

Thursday, November 14, 2013

DataStage Server Hang Issues & Resolution

Server hang issue can occurred when

1) Metadata repository database detects a deadlock condition and choose failing job as the victim of the deadlock.
2) Log maintenance is ignored.
3) Temp folders are not maintained periodically.

I will try to explain above three points in detail below:

Wednesday, November 13, 2013

Interview Questions : DataStage - self-2

48    Why can’t we use sequential file as a lookup?
49    What is data warehouse?
50    What is ‘Star-Schema’?
51    What is ‘Snowflake-Schema’?
52    What is difference between Star-Schema and Snowflake-Schema?
53    What is mean by surrogate key?
54    What is ‘Conformed Dimension’?

Tuesday, November 12, 2013

ETL Job Design Standards - 1

When using an off-the-shelf ETL tool, principles for software development do not change: we want our code to be reusable, robust, flexible, and manageable. To assist in the development, a set of best practices should be created for the implementation to follow. Failure to implement these practices usually result in problems further down the track, such as a higher cost of future development, increased time spent on administration tasks, and problems with reliability.
Although these standards are listed as taking place in ETL Physical Design, it is ideal that they be done before the prototype if possible. Once they are established once, they should be able to be re-used for future increments and only need to be reviewed. 

Listed below are some standard best practice categories that should be identified on a typical project. 

Monday, November 11, 2013

DataStage Scenario - Problem4



Saturday, November 09, 2013

DB2 commands Cheat Sheet

DB2 System Commands

    DB2LEVEL -- checks version of DB2 installed.
    DB2ILIST -- lists all instances installed
    DB2CMD -- opens a command line processor
    DB2CC -- opens db2 control center
    DB2LICM -l -- gets db2 type.

Command Line Processor Commands

Friday, November 08, 2013

Conductor Node in Datastage

Below is the sample APT CONFIG FILE ,see in bold to mention conductor node.

node "node0"
fastname "server1"
pools "conductor"
resource disk "/datastage/Ascential/DataStage/Datasets/node0" {pools "conductor"}
resource scratchdisk "/datastage/Ascential/DataStage/Scratch/node0" {pools ""}

Thursday, November 07, 2013

Why Entire partition is used in LOOKUP stage ?

Entire partition has all data across the nodes So while matching(in lookup) the records all data should be present across all nodes.

Wednesday, November 06, 2013

DataStage Scenario - Problem3

Scenario :  Get the next column value in current row

Input file :

Sq, No

Tuesday, November 05, 2013

Dummy Data Generation using Row Generator in DataStage - 2

By default the Row Generator stage runs sequentially, generating data in a single partition. You can, however, configure it to run in parallel and meaningful data.

We are using the same job design as in Dummy Data Generation using Row Generator in DataStage - 1

a) Job Design :

Sunday, November 03, 2013

Wishing You & your Loved ones a Happy Diwali !!!!!

Dummy Data Generation using Row Generator in DataStage - 1

How to Generate Input Data for your dummy jobs n practice ??

In DataStage, There is a Stage called "Row Generator" under "Devlopment/Debug Stages" category. For Generating the dummy data,  we will use this stage.

So, We are going to create a job which generates dummy data.

Saturday, November 02, 2013

Interview Questions : DataStage - self-1

Sharing some collection of InterView Questions. Try these to rank your knowledge :-)

1    Types of Stages in DS? Explain with Examples
2    What are active stages and passive stages?
3    Can you filter data in hashed file? (No)
4    Difference between sequential and hashed file?
5    How do you populate time dimension?
6    Can we use target hashed file as lookup? (Yes)
7    What is Merge Stage?
8    What is Job Sequencer?
9    What are stages in sequences?
10    How do you pass parameters?
11    What parameters you used in your project?

Tuesday, October 29, 2013

CentOS - Hostname Change

Hostname Change

There are 4 steps in a hostname change, luckily all the steps are easy.


Open the /etc/sysconfig/network file with your favorite text editor. Modify the HOSTNAME= value to match your FQDN host name.

Hosts File

Change the host that is associated to your main IPaddress for your server, this is for internal networking (found at /etc/hosts):

Monday, October 28, 2013

DataStage Scenario - Problem2

Goal : To get the Unique and Duplicates values from Input Data


Input :

There is a input file which contains duplicates data, Suppose :

Friday, October 25, 2013

DataStage Scenario - Problem1

Goal :

Process a text file which contains records arranged in blocks consisting of a header record, details (items, body) and a trailer. The aim is to normalize records and load them into a relational database structure.

Details :

Thursday, October 24, 2013

Interview Questions : DataWareHouse - Part 3

 What is data warehousing?
"In computing, a data warehouse (DW) is a database used for reporting and analysis. The data stored in the warehouse is uploaded from the operational systems. The data may pass through an operational data store for additional operations before it is used in the DW for reporting.
A data warehouse maintains its functions in three layers: staging, integration, and access.
Staging is used to store raw data for use by developers. The integration layer is used to integrate data and to have a level of abstraction from users. The access layer is for getting data out for users.
The term Data Warehouse was coined by Bill Inmon in 1990, which he defined in the following way:

Tuesday, October 22, 2013

Adding background colors or background images to emails and other Notes documents - Part 2

Adding background colors or background images to Lotus Notes mail - Part 1


To add a reusable graphic from an application

  1. While creating or editing a document, click Text > Text Properties.
  2. In the properties box, click the arrow next to Text and select Document.
  3. Click the Background tab.
  4. Enter the name of the graphic in the text field in the Graphic or Resource section, or use the folder button to select a graphic.

Friday, October 18, 2013

WebSphere Application Server[WAS] fails to start in Linux

The application server fails to start after system is restarted. No messages are generated in the application server logs.
The Metadata server startup script fails to finish. You must issue the nohup command for the Metadata server startup script.

Check to ensure that WebSphere Application Server is running.
To connect to the application server using an Internet browser:
  1. Start your Internet browser.
  2. Type the application server address in the form: http//isserver:portnumber, where isserver is the name of the computer where the services tier is installed or its tcp/ip address, and portnumber is the port for connecting to the services tier (by default, 9080).
If the application server has started, the login screen is displayed; otherwise an error message is displayed.

Thursday, October 17, 2013

How to install Linux / UNIX *.tar.gz tarball files

tar.gz also known as tarball, an archive format for electronic data and software. Most Linux tarball contains a source code for software. If you are new to Linux I recommend using apt-get, rpm and yum command to install all binary packages.

Tarballs are a group of files in one file. Tarball files have the extension .tar.gz, .tgz or .tar.bz2. Most open source software use tarballs to distribute programs/source codes.

Tuesday, October 15, 2013

How many places u can call Routines?

Routines can be called at the following places:

a) In the job properties There is an option to call the Before and After job subroutines.
b) In the job sequence there is an activity called "Routine Activity".From there also the routines could be called.
c) In the derivation part of theTransformer of a parallel job "parallel routines"can be called.
d) In the derivation part of theTransformer of a server job "server routines"can be called.
f)In the server job stages also before and after job subroutines can be called.
g)We have called routines from user variable activity of a sequence as well.(Here basically transforms are called which return a value to the variable of the user variable activity after their execution.)

Friday, October 11, 2013

Difference between OLTP and OLAP

Online Transactional Processing databases are functional orientated, they are designed to provide real-time responses from concurrent users and applications. To be more specific, OLTP databases must provide real-time concurrent (multi-threaded) processing of all SQL transaction (writes/updates and reads). Another characteristic of an OLTP database, is the fact that its state (underlying data) is constantly changing. Examples of OLTP are databases that support e-commerce applications.
OLTP databases are highly Normalized relational databases. This means that there is very little or no data redundancy. This ensures data consistency (part of the ACID standard). Normalization is the process of arranging data into logical, organized groups of tables, reducing data repetition or going so far as to completely eliminating it. As a result the data is logically grouped into tables, and these tables form relationships with one another through the use of primary and foreign keys. There are different levels of normalization and OLTP data models usually meet the 3rd Normal Form also known as the Entity Attribute Relationship Model.

Monday, October 07, 2013

Create a unique counter in datastage

This entry describes various ways of creating a unique counter in DataStage jobs.
A parallel job has a surrogate key stage that creates unique IDs, however it is limited in that it does not support conditional code and it may be more efficient to add a counter to an existing transformer rather than add a new stage.

In a server job there are a set of key increment routines installed in the routine SDK samples that offer a more complex counter that remembers values between job executions.
The following section outlines a transformer only technique.

Tuesday, October 01, 2013

Some Windows Shortcuts

1. CTRL+C (Copy)
2. CTRL+X (Cut)
3. CTRL+V (Paste)
4. CTRL+Z (Undo)
5. DELETE (Delete)
6. SHIFT+DELETE (Delete the selected item permanently without placing the item in the Recycle Bin)
7. CTRL while dragging an item (Copy the selected item)
8. CTRL+SHIFT while dragging an item (Create a shortcut to the selected item)
9. F2 key (Rename the selected item)

Monday, September 30, 2013

Displaying/Change the attributes of a Queue Manager

Displaying the attributes of a Queue Manager
To display the attributes of a Queue Manager (say QMA), use the DIS QMGR MQSC command:

$ runmqsc QMA
And you will see:
: dis qmgr
            1 : dis qmgr
AMQ8408: Display Queue Manager details.
  QMNAME(QMA)                                                 ACCTCONO(DISABLED)
  ACCTINT(1800)                                    ACCTMQI(OFF)
  ACCTQ(OFF)                                        ACTIVREC(MSG)
  ALTDATE(2007-06-19)                          ALTTIME(20.34.04)

Friday, September 27, 2013

DB2 Fundamentals

DB2 for Linux, UNIX, and Windows (US)

Industry-leading performance for mixed workloads on distributed systems, offering unparalleled efficiencies for staffing and storage.

DB2 for z/OS

The database software gold standard for reliability, availability, and scalability. Optimized for SOA, CRM and data warehousing.

Thursday, September 26, 2013

Interview Questions : DataStage - Part 2

For more : Visit HERE

What is the importance of Surrogate Key in Data warehousing?
Ans : Surrogate Key is a Primary Key for a Dimension table. Most importance of using it is independent of underlying database. i.e Surrogate Key is not affected by the changes going on with a database

What does a Config File in parallel extender consist of?
Ans: Config file consists of the following.
a) Number of Processes or Nodes.
b) Actual Disk Storage Location.

Tuesday, September 24, 2013

Interview Questions : DataWareHouse - Part 2

For more : Visit HERE

What is real time data-warehousing?

Data warehousing captures business activity data. Real-time data warehousing captures business activity data as it occurs. As soon as the business activity is complete and there is data about it, the completed activity data flows into the data warehouse and becomes available instantly.

What are conformed dimensions?
Conformed dimensions mean the exact same thing with every possible fact table to which they are joined. They are common to the cubes.

What is conformed fact?
Conformed dimensions are the dimensions which can be used across multiple Data Marts in combination with multiple facts tables accordingly.

Friday, September 20, 2013

Failure to connect to DataStage services tier: invalid port

When you attempt to start one of the DataStage clients, the following message is displayed:
Failed to authenticate
the current user against the selected Domain:  Could not connect to
server [servername] on port [portnumber].

Thursday, September 19, 2013

DataStage compile error 65280; Couldn't change directory to /tmp

Error when trying to compile parallel (PX) jobs in a project.

The compilation errors include the following:

Subprocess command failed with exit status 65280.
Output from subprocess: Couldn't change directory to /tmp : No such file or directory

Tuesday, September 17, 2013

How can I run the osh command of the InfoSphere Parallel Engine?

The osh command is the main program of the InfoSphere Parallel Engine. This command is used by DataStage to perform several different tasks including parallel job execution and dataset management. Normally, there is no need to run this command directly but sometimes it is useful to use it for troubleshooting purposes.
To run this command there are 3 environment variables that must be set. These are:
  1. APT_ORCHHOME should point to Parallel Engine location
  2. APT_CONFIG_FILE should point to a configuration file
  3. LD_LIBRARY_PATH should include the path to the parallel engine libraries. Please note that the name of this environment variable may take a different name (such as LIBPATH in AIX or SLIB_PATH in HP-UX) depending on your Operating System. Note: This variable does not need to be set in Windows environments.

Saturday, September 14, 2013

How can I improve performance during an import into Datastage?

One way to improve performance of an import is to exclude the executables.
Here are three different ways to exclude executables from the import. 

  • Exclude the executables from the export.
    In DataStage Designer export using the option export Jobs without executables.

  • Create an xml file instead of a dsx file.
    Executables are not included in xml files.
    (The xml option is available with dscmdexport but only with /JOB option)

Wednesday, September 11, 2013

What can you delete to free up disk space in IBM InfoSphere Information Server

What can you delete to free up disk space in IBM InfoSphere Information Server when disks are becoming full?

What to do ????

Here are some things you can do to clean up space:
  • Clear the &PH& file in the Project directory. There is a &PH& directory in each DataStage project directory, which contains information about active stages that is used for diagnostic purposes. The &PH& directory is added to every time a job is run, and needs periodic cleaning out.

Monday, September 09, 2013

Deleting temporary lookuptable files in IBM InfoSphere DataStage

When a DataStage job with a lookup stage aborts, there may be lookuptable files left in the resource directories and they will consume space. The filenames are similar to "lookuptable.20091210.513biba"

When a job aborts it leaves the temporary files for postmortem review in the resource directories. Usually that is done in scratch, however, for lookup files, they are created in resource. Lookup filesets will not go away, just like regular datasets.

Friday, September 06, 2013

Managing and Deleting Persistent Data Sets within IBM InfoSphere Datastage

Data Sets sometimes take up too much disk space. This technote describes how to obtain information about datasets and how to delete them.


Data sets can be managed using the Data Set Management tool, invoked from the Tools > Data Set Management menu option within DataStage Designer (DataStage Manager in the 7.5 releases.) Alternatively, the 'orchadmin' command line program can be used to perform the same tasks.
The files which store the actual data persist in the locations identified as resource disks in the configuration files. These files are named according to the pattern below:


Wednesday, September 04, 2013

Buffering in DataStage

Buffering is a technique used in the Datastage jobs to ensure a constant and uninterrupted flow of data to and from stages in such a way that there is no potential dead lock or any fork join problems. It is been implemented in Datastage keeping in mind the fact that the data has to keep moving in the process with an optimized use of the memory in the server. As mentioned by IBM the ideal scenario is when the data flows through the stages without being written on the disk. As in the case of buffering in any system, the upstream operators should to wait for the downstream operators to consume their input before starting to create their records. This is the intention in Datastage too.

Tuesday, September 03, 2013

Commands to delete files in &PH& directories in IBM InfoSphere DataStage

How should we manage cleaning up the &PH& directories for my DataStage projects? Can the process be automated?

Files in the &PH& directories under DataStage project directories store runtime information when jobs are run and need to be cleared out periodically.


Steps :

To clear the &PH& directory from within DataStage:

Friday, August 30, 2013

Datastage Job Scheduler

Datastage includes a scheduling option, it does not have its own. DataStage doesn't include a "scheduler" so leverages the underlying O/S. For UNIX that means cron and a check of the crontab entries for the scheduling user will have what you need. DataStage leverages cron for recurring schedules and at for 'one off' schedules. For Windows, it uses scheduled tasks of Windows.

From the operating system command line, logged in as the scheduling user, a "crontab -l" will list the scheduled jobs.

Wednesday, August 28, 2013

14 design tips for better performance in Datastage

1) Avoid unnecessary type conversions: set the OSH_PRINT_SCHEMAS environment variable to verify that run time schemas match the job design column definitions. If you are using stage variables on a Transformer stage, ensure that their data types match the expected result types.

2) Use Transformer stages sparingly and wisely. Transformer stages can slow down your job. Do not have multiple stages where the functionality could be incorporated into a single stage, and use other stage types to perform simple transformation operations

Tuesday, August 27, 2013

Stage to OPerator in DataStage - the Real Driver

The parallel job stages are built on operators. Or Operators are individual parallel engine stages where the data actually flows. In a typical job flow, operators are end-points, and data sets are the links between the operators.

Each operator listed in the DUMP SCORE is a number of processes that are dependent on:
  • the job's established configuration file (APT_CONFIG_FILE)
  • constrained by the node pool settings
  • the operator configuration in the parallel engine code
  • Several environment variables, such as APT_DISABLE_COMBINATION, being set/unset.

Friday, August 23, 2013

DataStage BASIC functions

These functions can be used in a job control routine, which is defined as part of a jobs properties and allows other jobs to be run and controlled from the first job. Some of the functions can also be used for getting status information on the current job; these are useful in active stage expressions and before- and after-stage subroutines. 

Specify the job you want to control

Thursday, August 22, 2013

How to split source column into multiple target columns ( full name to first and Last)


fullname varchar(30)

INSERT INTO test12 ('nitin raj');
INSERT INTO test12 ('nitin agarwal');
INSERT INTO test12 ('abhishek gupta');

Wednesday, August 21, 2013

Converting .rpm Packages To Debian/Ubuntu .deb Format With Alien

This article shows how you can convert .rpm packages to .deb packages with a tool called alien so that you can easily install them on Debian and Ubuntu systems. Sometimes this is quite convenient as not all software projects release their software as Debian/Ubuntu packages.

However, you should keep in mind what the alien man page says:

Tuesday, August 20, 2013

Oracle Interview Questions - Part-3

51. What is a database instance? Explain.
A database instance (Server) is a set of memory structure and background processes that access a set of database files. The processes can be shared by all of the users. The memory structure that is used to store the most queried data from database. This helps up to improve database performance by decreasing the amount of I/O performed against data file.

52. What is Parallel Server?
Multiple instances accessing the same database (only in multi-CPU environments)

Monday, August 19, 2013

Extracting font information from PDF file

In Adobe Acrobat Professional
  1. Open your PDF.
  2. Go to TOOLS -> Advanced Editing and select the "TouchUp Text Tool".
  3. Click on the text that you wish to extract the typeface from and a bounding box should appear.

Saturday, August 17, 2013

Search the string in the file : Linux/Unix

Here, I am going to share a small shell utility which will ask source file name and the pattern which need to be searched in file and give the formatted output with the line no with line. You can select the no of lines which display before and after the string.

Friday, August 16, 2013

Big Data - The Hadoop Data Warehouse - Part 1

Big data is going to change the way you do things in the future, how you gain insight, and make decisions. These videos help you get quickly up to speed on this technology and to show you the unique things IBM is doing to turn the freely available open source big data technology into a big data platform; there’s a major difference and the platform is comprised of leveraging the open source technologies (and never forking it) and marrying that to enterprise capabilities provided by a technology leader that understands the benefits a platform can provide.

Wednesday, August 14, 2013

Schema File in Datastage

Schema files and partial Schemas:

You can also specify the meta data for a stage in a plain text file known as a schema file. This is not stored in the Repository but you could, for example, keep it in a document management or source code control system, or publish it on an intranet site.

Tuesday, August 13, 2013

Error 39202 when logging into IBM DataStage

DataStage Client programs (Designer, Manager, Director) experience a connection failure, and the message displayed to the user is:

Failed to connect to host: xxxxx, project: UV
(Internal Error (39202))

Monday, August 12, 2013

WinSCP - Save all configurations in an INI file

You can configure WinSCP to save configurations to an INI file (instead of Windows Registry):
  • Open WinSCP and go to the Preferences section.
  • Next to "Other general options" click on the "Preferences" button.

Friday, August 09, 2013

DataStage Project Name With Space

It is a common problem which can cause a rework. So we should remember that during the installation we should not give a project name with spaces in it. If we give spaces in the name then the project will not be usable.

Thursday, August 08, 2013

Interview Questions : DataWareHouse - Part-1

What is Data Warehousing?
A data warehouse is the main repository of an organization’s historical data, its corporate memory. It contains the raw material for management’s decision support system. The critical factor leading to the use of a data warehouse is that a data analyst can perform complex queries and analysis, such as data mining, on the information without slowing down the operational systems. Data warehousing collection of data designed to support management decision making. Data warehouses contain a wide variety of data that present a coherent picture of business conditions at a single point in time. It is a repository of integrated information, available for queries and analysis.

Wednesday, August 07, 2013

Logical Data Model

A logical data model describes the data in as much detail as possible, without regard to how they will be physical implemented in the database. Features of a logical data model include:
  • Includes all entities and relationships among them.
  • All attributes for each entity are specified.
  • The primary key for each entity is specified.
  • Foreign keys (keys identifying the relationship between different entities) are specified.
  • Normalization occurs at this level.

Monday, August 05, 2013

Orchadmin Command : DataStage

Orchadmin is a command line utility provided by datastage to research on data sets.

The general callable format is : $orchadmin <command> [options] [descriptor file]

1. Before using orchadmin, you should make sure that either the working directory or the $APT_ORCHHOME/etc  contains the file “config.apt” OR
The environment variable $APT_CONFIG_FILE  should be defined for your session.

Monday, July 29, 2013

Setting Up and Using PuTTY

  • Run the installer package and extract the PuTTY components. PuTTY may be downloaded from http://www.chiark.greenend.org.uk/~sgtatham/putty/
  • Start the PuTTYGen application. This application is used to create your authentication keys. Select either "SSH2 RSA" or "SSH2 DSA" for the "Type of key to generate", with 1024 bit key size or greater. Press the "Generate" button and follow the instructions on the screen.

Sunday, July 28, 2013

Working with EditPlus Text Editor-Regular Expression How To

Editplus is a lot better than the regular text editor, Notepad.
From all it’s features I like RegExp Support the most, and than comes the block select feature.
Here are the quick lines to carry out regular tasks using regular expression in Editplus. It’s kinda downloadable cheetsheet list.

Remove all empty lines:

Find: “^\n” (Ignore double-quotes in all find/replace)
Replace: “”
^ – Beginning of the line
\n – New Line

Remove Multiple Spaces convert into single space:

Find: ” +”
Replace: ” “
+ – find one or more occurance of space character.

Comment multiple line of code:

Find: “^”
Replace: “#” or “//”
You may optionally use: Edit Menu > Format > Line Comment.
Generate Comma Separated List from new line delimited list:

Find: “\n”
Replace: “, “
This helps in even joining some of lines of code instead of replacing by comma you may replace it with “”.

Manipulate columns display order / punctuation:

Find: “([0-9]+)\t([a-zA-Z]+)”
Replace: “\2\t\1″
[0-9]+ – Finds one or more digits
[a-zA-Z]+ – Finds one or more characters
() – mark the block or capture the group
\2 – 2nd mark expression
123 abc
345 cde
567 efg
abc 123
cde 345
efg 567
The Other Way:
- Press Alt+C
- Drag you mouse to select respective column and click
- Copy / Cut as required

Append / Add semicolon (any character) at the end of the line:

Find: “\n”
Replace: “;\n”

Enclose lines by quotes:

Find: “\n”
Replace: “‘\n’”

Delete all lines containing a given STRING:

Find: “^.*STRING.*$”
Replace: “”

Remove lines not containing a given STRING:

I don’t know how to do this!! :)

Convert tab separated file into insert statements:

TSV: abcd de4 iirn 34399
SQL: INSERT INTO TABLENAME VALUES (“abcd”, “de4″, “iirn”,”34399″);
Find: “(.*)\t(.*)\t(.*)\t(.*)”
Replace: “INSERT INTO TABLENAME VALUES (“\1″, “\2″, “\3″,”\4″);”

Format the telephone number:

Find: “([0-9][0-9][0-9])([0-9][0-9][0-9])([0-9].*)”
Replace: “\1-\2-\3″

Original: 1231231231
Formatted-1: 123-123-1231

Remove Brackets:

Find: “\(|\)”
Replace: “”
\( – Match (. \ is required to escape marking the expression.
| – or

Replace 1st occurrence of character:

Find: ” (.*)”
Replace: “-\1″
(.*) – matches everything and marks the block
** Make sure you ignore double-quotes(“) while writing in find / replace boxes.

EditPlus supports following regular expressions in Find, Replace and Find in Files command.

Expression – Description
  • \t – Tab character.
  • \n – New line.
  • . – Matches any character.
  • | – Either expression on its left and right side matches the target string.
  • [] – Any of the enclosed characters may match the target character.
  • [^] – None of the enclosed characters may match the target character.
  • * – Character to the left of asterisk in the expression should match 0 or more times.
  • + – Character to the left of plus sign in the expression should match 1 or more times.
  • ? – Character to the left of question mark in the expression should match 0 or 1 time.
  • ^ – Expression to the right of ^ matches only when it is at the beginning of line.
  • $ – Expression to the left of $ matches only when it is at the end of line.
  • () – Affects evaluation order of expression and also used for tagged expression.
  • \ – Escape character. If you want to use character “\” itself, you should use “\\”.

Notable Features of Editplus are :

  • Spell checking
  • Regex-based find & replace
  • Encoding conversion
  • Newline conversion
  • Syntax highlighting
  • Multiple undo/redo
  • Rectangular block selection
  • Auto indentation
  • Code folding (Text folding)
- See more at: http://kedar.nitty-witty.com/blog/working-with-editplus-text-editor-regular-expression-how-to#sthash.3CzDA4Bl.dpuf

Editplus is a lot better than the regular text editor, Notepad.
From all it’s features I like RegExp Support the most, and than comes the block select feature.
Here are the quick lines to carry out regular tasks using regular expression in Editplus. It’s kinda downloadable cheet-sheet list.

Remove all empty lines:
Find: “^\n” (Ignore double-quotes in all find/replace)
Replace: ""
^ – Beginning of the line
\n – New Line

Monday, July 22, 2013

File Extensions Used in Linux

.bz2       A file compressed using bzip2 .                             
.c         A file written in the C programming language.                           
.conf      A configuration file. In some cases, conf is used in a file name, but not as an extension.                 
.deb       A Debian package for installing software.                             
.lock      A lock file that prevents the use of another file.       

Friday, July 19, 2013

DataStage Jobs hang or have slow performance when temp directories have many files

Datastage jobs write multiple files to temporary directories which are not automatically cleaned up. When the number of files grows large, DataStage jobs will have slower performance or can appear to hang.

DataStage jobs write multiple files to temporary directories which are not automatically cleaned up. When the number of files grows large, DataStage jobs will have slower performance or can appear to hang. For sites who have been running DataStage for a year or more and have not previously performed cleanup of these directories, they could contain 100,000 or more files.
Perform periodic cleanup of old DataStage files in these directories to prevent performance problems relating to the number of files in the directories.

Thursday, July 18, 2013

Generating an SSH key pair using PuTTY

    1. Start PuTTYgen by clicking Start > Programs > PuTTY > PuTTYgen. The PuTTY Key Generator panel is displayed.
    2. Click SSH-2 RSA as the type of key to generate.
    Note: Leave the number of bits in a generated key value at 1024.
    1. Click Generate and then move the cursor around the blank area of the Key section to generate the random characters that create a unique key. When the key has been completely generated, the information about the new key is displayed in the Key section.

    Wednesday, July 17, 2013

    DWH One Liners - Part-2

    Part 1 : You can find HERE

    dimension tables
    Tables used to establish the context of the facts.

    dimensional database
    A single large table of facts described using measures and dimensions.

    In a dimensional database, the context for a fact and is used in queries to group related facts together.

    Tuesday, July 16, 2013

    DWH One Liners - Part-1

    Part 2 : You can find HERE

    Ad hoc reports
    Predefined reports that cater to the requirements of novice users.

    Refers to the analysis and interpretation of data using suitable statistical tools and techniques.

    Business intelligence
    Refers to the technologies, methods, and processes employed to understand data stored in data warehouses that will in turn help make important business decisions.
    BI is getting the right information into the right people's hand in a format that allows them to understand the data quickly.
    And more formal definition from wiki:
    "BI refers to skills, processes, technologies, applications, and practices to support decision making"

    (Thanks for this share Avik Sadhu)

    Friday, July 12, 2013

    Basics of Teradata Database

    What exactly is a Teradata Database?

    • Teradata Database is a RDMS (Relational Database Management System) which helps in driving the company’s Data warehouse.
    • It provides foundation for the company to grow, compete, evolve business by getting answers to a new generation of questions (typically “what if” queries)
    • It supports business growth from gigabytes to whooping hundreds of Terabytes
    • Teradata Database is an Open system and compliant with major support from ANSI standards
    • It works on UNIX MP-RAS, WINDOWS and LINUX operating systems, connects with other client through Network Attached channel (TCP/IP) and Channel connection for Mainframe.

    Thursday, July 11, 2013

    Convert a single row into multiple rows ( horizontally pivoting ) with Pivot stage ?

    In this example, the Pivot Enterprise stage is set up to horizontally pivot some data.
    You can generate a pivot index that will assign an index number to each row within sets of horizontally pivoted data. The following tables provide examples of data before and after a horizontal pivot operation.

    Input Data

    Wednesday, July 10, 2013

    Unable to open jobs with error "Unable to open the Job Unable to load RID"

    Unable to open jobs with error "Unable to open the Job Unable to load RID - c2e76d84.43058877.7e01af4f.cd92bbcd-2ebc-4cb3.833e.37a962373ef7"

    Also in SystemOut.log file, it has the following:
    [8/25/10 21:21:46:475 UTC] 000000b1 SystemOut O 2010-08-25 21:21:46,475 ERROR ojb.OjbPersistentEObjectPersistence - java.lang.OutOfMemoryError

    Tuesday, July 09, 2013

    When datastage file system becomes full, RT_LOGxxx files can become corrupted. Here is a manual procedure to recreate them

    When many datastage jobs are running for long time, the RT_LOGxxx files can become very big. If the log files are not regularly purged, in some cases, a file system full on the datastage file system can occur. If the corruption of datastage system files is restricted to the RT_LOGxxx files, a procedure to delete and recreate the RT_LOGxxx files can be attempted in order to recover from the corruption.

    Sunday, July 07, 2013

    How To Hide Text In Notepad

    Here is a small trick to hide text inside your windows default text editor i.e Notepad usingcommand prompt,  but this trick works only on NTFS file system.

    Steps to hide text in a Notepad
    1. Open your command prompt Start-->Run and Type cmd
    2. Type cd.. to move to C:\> drive or Type cd desktop to move to your desktop.
    3. Type the below code in your 

    Friday, July 05, 2013

    DATABASE Command in TeraData

    When users negotiate a successful logon to Teradata, they are automatically positioned in a default database as defined by the database administrator. When an SQL request is executed, by default, it looks in the current database for all referenced objects.

    There may be times when the object is not in the current database. When this happens, the user has one of two choices to resolve this situation. One solution is to qualify the name of the object along with the name of the database in which it resides. To do this, the user simply associates the database name to the object name by connecting them with a period (.) or dot as shown below:

    Thursday, July 04, 2013

    Lock/Unlock Oracle Users

    In spite of deleting an oracle user, we can LOCK it. In that way the user won’t be accessible. In future if we want we can UNLOCK it.

    create a user named JACK which is LOCKED:
    SQL> create user jack identified by jack account lock;
    User created.

    Wednesday, July 03, 2013

    Teradata Users

     In Teradata, a user is the same as a database with one exception. A user is able to logon to the system and a database cannot. Therefore, to authenticate the user, a password must be established. The password is normally established at the same time that the CREATE USER statement is executed. 
    The password can also be changed using a MODIFY USER command.

    Tuesday, July 02, 2013

    List the tables in TeraData

    For listing all the tables, views and macro , execute any sql from below as per need

    select * from dbc.tables;

    Monday, July 01, 2013

    List the tables in DB2 LUW & z/OS

    List the tables in DB2 LUW

    list tables for all
    list tables for schema <schema_name>


    select * from syscat.tables ;

    Friday, June 28, 2013

    Oracle Interview Questions - Part-2

    26. What is hash cluster?
    A row is stored in a hash cluster based on the result of applying a hash function to the row’s cluster key value. All rows with the same hash key value are stores together on disk.

    27. When can hash cluster used?
    Hash clusters are better choice when a table is often queried with equality queries. For such queries the specified cluster key value is hashed. The resulting hash key value points directly to the area on disk that stores the specified rows.

    Thursday, June 27, 2013

    list of tables in an Oracle database

    To list the tables in the schema of the current user:

    SELECT table_name FROM user_tables;
    select * from tab ;

    Wednesday, June 26, 2013

    Monday, June 24, 2013

    Thursday, June 20, 2013

    First time IIS User Setup - DataStage 8.7

    Typical users list:
    • wasadmin
    • isadmin
    • db2inst1,db2iadm1
    • db2fenc1,db2fadm1
    • dasusr1,dasadm1
    • xmeta,xmeta (db owner)
    • xmetasr (staging area)
    • iauser,iadb (db user)
    • dsadm,dstage

    Tuesday, June 18, 2013

    Selecting DataStage job log results in error: Error Selecting from log file RT_LOG

    When trying to view a job log in DataStage Director, an error similar to the following is received:

    Error selecting from log file RT_LOGnn
    Error was: Internal data error. File <path_to>/RT_LOGnn/<filename>':
    Computed blink of 0xnnnn does not match expected blink of 0xnnnn! Detected within group starting at address 0xnnnnnn!
    The error message received indicates that the log file for the job is corrupted.

    Monday, June 17, 2013

    Oracle Interview Questions - Part-1

    1. What are the components of physical database structure of Oracle database?
    Oracle database is comprised of three types of files. One or more datafiles, two or more redo log files, and one or more control files.

    2. What are the components of logical database structure of Oracle database?
    There are tablespaces and database’s schema objects.

    3. What is a tablespace?
    A database is divided into Logical Storage Unit called tablespaces. A tablespace is used to grouped related logical structures together.

    Saturday, June 15, 2013

    Delete All PuTTY Sessions Together

    When you are swapping an old computer with a new computer, you may end-up transferring all PuTTY sessions to new computer. Once you’ve transferred all PuTTY sessions, execute “putty -cleanup” from the command line as shown below.

    Friday, June 14, 2013

    Execute an SQL script file in SQLPlus - Oracle

    To execute a script file in SQLPlus, type @ and then the file name.
    SQL > @{file}

    For example, if your file was called script.sql, you'd type the following command at the SQL prompt:
    SQL > @script.sql

    The above command assumes that the file is in the current directory. ( the current directory is usually the directory that you were located in before you launched SQLPlus.)

    Thursday, June 13, 2013

    DataStage-Director: Unable to open DICT 'RT_LOG###'

    When attempting to open a job log in Director, user gets the following error message: Error selecting from log file RT_LOG### Error was: Unable to open DICT 'RT_LOG###'. This is because the D_RT_LOG file is missing from the projects directory.

    Tuesday, June 11, 2013

    MQ queue naming standards

    The naming of the WebSphere MQ queues is an important part of Q replication setup. It may be that your site already has a naming standard for MQ queues, but if it does not, then here are some thoughts on the subject.

    Queues are related to Q Capture and Q Apply programs, so it would be useful to have that fact reflected in the name of the queues.

    Monday, June 10, 2013

    Putty Sessions Time Out due to Inactivity

    The ssh client PuTTY can be configured to maintain a connection and not time out due to inactivity.


    To set up a new connection with "keep alives" to maintain your connection follow the steps below:

    1. Open the PuTTy application and navigate to the Options panel.
    2. Select Connection
    3. In the field Sending null packets to keep session active change the default value from 0 to 1800 (30 minutes)

    Sunday, June 09, 2013

    DataStage Job aborts with error: "The record is too big to fit in a block"

    To fix this error you need to increase the block size to accommodate the record size:
    1. Log into Designer and open the job.
    2. Open the job properties--> parameters-->add environment variable and select: APT_DEFAULT_TRANSPORT_BLOCK_SIZE

    Thursday, June 06, 2013

    How to save User Name in Putty Sessions

    1. Clicked on Any Saved Session and Load it 
    2. Click Connection > Data in the Category navigation tree.