Nuts & Bolts of DataStage: column

Showing posts with label column. Show all posts

Wednesday, June 24, 2015

What happen when Import Sequential File with Extra Column

Job reads sequential file and load into RDBMS table.

What happen if sequential file has extra column in the back of each record?

It depends on data type of last column defined by DS and RDBMS table.

If data type of last column = varchar, extra column will be included as last column.

DataStage Scenario - Problem20

Goal : Reverse the Column

Input is like this :

i/p
col1,col2
101,a
102,b
103,c
104,d

Column Import Stage - Read a Fixed width File

The Column Export stage is a restructure stage. It can have a single input link, a single output link and a single rejects link. Typically used to divide data arriving in a single column into multiple columns.
Here, I tried to explain How to use it for importing data...

a) Design :

As you can see, Here we have followed this design to elaborate the stage. We are reading a file, Importing Columns with Import Stage and then loading them into Sequential File.

Some more design tips for DataStage Job Development

• Common information like home directory, system date, username, password should be initialized in a global variable and then variable should be referred everywhere.

• Stage Variables allow you to hold data from a previous record when the next record, allowing you to compare between previous and current records. Stage variables also allow you return multiple errors for a record of information. By being able to evaluate all data in a record and not just error on the first exception that is found, the cleanup of data is more efficient and requires less iteration.

DataStage Scenario - Problem18

Goal : Get the y'day date as a column and change the column format

Input :

Country|Currency|ValidFrom|ExRate
CNH|USD|1-Jan-12|0.11
CNH|USD|5-Jan-12|0.8
CNH|USD|10-Jan-12|0.15
CNH|USD|15-Jan-12|0.14
CNH|USD|20-Jan-12|0.16

DataStage Scenario - Problem15

Goal : Get the Previous column value in current value

Input file :

Sq, No
1,1000
2,2200
3,3030
4,5600

DataStage Scenario - Problem14

Goal : get below outputs

Input : 
dept, emp
----------------------------
20,            R
10,            A
10,            D
20,            P
10,            B
10,            C
20,            Q
20,            S

DataStage Scenario - Problem11

Which stages are needed to achieve below output ??

Input :

col1,col2
1,1
2,rajesh
3,15000
4,2
5,suresh
6,16000
7,3
8,veeru
9,17000

DataStage Scenario - Problem7

Goal : Divide the input into 3 columns

source  

c1      
c2      
c3      
c4
c5
c6
c7
c8

DataStage Scenario - Problem6

Goal : Get the count of Vowels in Columns

Input :

Akash Aggrawal
Priya Awasthi
Anil chahal
Diya Singh
Kashish Patel
Sunil Verma
Rashid Patel
Rashmi Arya
Gopal Joshi
Neha Tomar

14 design tips for better performance in Datastage

1) Avoid unnecessary type conversions: set the OSH_PRINT_SCHEMAS environment variable to verify that run time schemas match the job design column definitions. If you are using stage variables on a Transformer stage, ensure that their data types match the expected result types.

2) Use Transformer stages sparingly and wisely. Transformer stages can slow down your job. Do not have multiple stages where the functionality could be incorporated into a single stage, and use other stage types to perform simple transformation operations

How to split source column into multiple target columns ( full name to first and Last)

Approach:

CREATE SET TABLE test

(

fullname varchar(30)

);

INSERT INTO test12 ('nitin raj');

INSERT INTO test12 ('nitin agarwal');

INSERT INTO test12 ('abhishek gupta');

Logical Data Model

A logical data model describes the data in as much detail as possible, without regard to how they will be physical implemented in the database. Features of a logical data model include:

Includes all entities and relationships among them.
All attributes for each entity are specified.
The primary key for each entity is specified.
Foreign keys (keys identifying the relationship between different entities) are specified.
Normalization occurs at this level.

Conceptual Data Model

A conceptual data model identifies the highest-level relationships between the different entities. Features of conceptual data model include:

Includes the important entities and the relationships among them.
No attribute is specified.
No primary key is specified.

The figure below is an example of a conceptual data model.

Difference Between The Continuous Funnel And Sort Funnel

# Continuous Funnel combines the records of the input data in no guaranteed order. It takes one record from each input link in turn. If data is not available on an input link, the stage skips to the next link rather than waiting.

# Sort Funnel combines the input records in the order defined by the value(s) of one or more key columns and the order of the output records is determined by these sorting keys.

Local Containers and Shared Container

A container, as its name indicates, is used to group stages and links. Containers help simplify and modularize server job designs and allow you to replacing complex areas of the diagram with a single container stage. For example, if you have a lookup that is used by multiple jobs, you can put the jobs and links that generate the lookup into a share container and use it to different jobs. In a way, you can look at it like a procedure or function in the programming term.

Containers are linked to other stages or containers in the job by input and output stages.

Two types of container:

Sort stage to remove duplicate

1)what is the advantage of using sort stage over remove duplicate stage in removing duplicates.
2) Is there any way in which we can specify which record to retain(like retaining the last record or retaining the first) when we remove duplicate using transform stage, similarly in sort stage also.

Ans :

1)The advantage of using sort stage over remove duplicate stage is that sort stage allows us to capture the duplicate records whereas remove duplicate stage does not.

2) Using a sort stage we can only retain the first record.
Normally we go for retaining last when we sort a particular field in ascending order and try to get the last rec. The same can be done using sort stage by sorting in descending order to retain the first record.

Physical Data Model

Physical data model represents how the model will be built in the database. A physical database model shows all table structures, including column name, column data type, column constraints, primary key, foreign key, and relationships between tables. Features of a physical data model include:

Pivot stage made easy

Many people have the following misconceptions about Pivot stage.
1) It converts rows into columns
2) By using a pivot stage, we can convert 10 rows into 100 columns and 100 columns into 10 rows
3) You can add more points here!!

Subscribe to: Posts ( Atom )

Nuts & Bolts of DataStage

Wednesday, June 24, 2015

What happen when Import Sequential File with Extra Column

Thursday, July 17, 2014

DataStage Scenario - Problem20

Goal : Reverse the Column

Wednesday, July 09, 2014

Column Import Stage - Read a Fixed width File

Wednesday, April 09, 2014

Some more design tips for DataStage Job Development

Wednesday, March 19, 2014

DataStage Scenario - Problem18

Goal : Get the y'day date as a column and change the column format

Thursday, January 30, 2014

DataStage Scenario - Problem15

DataStage Scenario - Problem14

Sunday, January 26, 2014

DataStage Scenario - Problem11

Monday, January 20, 2014

DataStage Scenario - Problem7

Wednesday, December 11, 2013

DataStage Scenario - Problem6

Wednesday, August 28, 2013

14 design tips for better performance in Datastage

Thursday, August 22, 2013

How to split source column into multiple target columns ( full name to first and Last)

Wednesday, August 07, 2013

Logical Data Model

Thursday, May 30, 2013

Conceptual Data Model

Monday, March 25, 2013

Difference Between The Continuous Funnel And Sort Funnel

Friday, March 22, 2013

Local Containers and Shared Container

Thursday, March 21, 2013

Sort stage to remove duplicate

Monday, February 18, 2013

Physical Data Model

Tuesday, January 29, 2013

Pivot stage made easy