We have moved to www.dataGenX.net, Keep Learning with us.

Wednesday, August 14, 2013

Schema File in Datastage

Schema files and partial Schemas:

You can also specify the meta data for a stage in a plain text file known as a schema file. This is not stored in the Repository but you could, for example, keep it in a document management or source code control system, or publish it on an intranet site.

Note: If you are using a schema file on an NLS system, the schema file needs to be in UTF-8 format. It is, however, easy to convert text files between two different maps with a WebSphere DataStage job. Such a job would read data from a text file using a Sequential File stage and specifying the appropriate character set on the NLS Map page. It would write the data to another file using a Sequential File stage, specifying the UTF-8 map on the NLS Map page.

Some parallel job stages allow you to use a partial schema. This means that you only need define column definitions for those columns that you are actually going to operate on.

Remember that you should turn runtime column propagation on if you intend to use schema files to define column meta data.

Complex data types

Parallel jobs support three complex data types: v Subrecords v Tagged subrecords v Vectors When referring to complex data in WebSphere DataStage column definitions, you can specify fully qualified column names, for example:

Subrecords :

A subrecord is a nested data structure. The column with type subrecord does not itself define any storage, but the columns it contains do. These columns can have any data type, and you can nest subrecords one within another. The LEVEL property is used to specify the structure of subrecords. The following diagram gives an example of a subrecord structure.
Parent (subrecord)
Child1 (string)
Child2 (string)      LEVEL01
Child3 (string)
Child4 (string)
        Grandchild1 (string)
        Grandchild2 (time)
        Grandchild3 (sfloat)  LEVEL02

Tagged subrecord

This is a special type of subrecord structure, it comprises a number of columns of different types and the actual column is ONE of these, as indicated by the value of a tag at run time. The columns can be of any type except subrecord or tagged. The following diagram illustrates a tagged subrecord.

Parent (tagged)
       Child1 (string)
       Child2 (int8)
       Child3 (raw)
       Tag = Child1, so column has data type of string


A vector is a one dimensional array of any type except tagged. All the elements of a vector are of the same type, and are numbered from 0. The vector can be of fixed or variable length. For fixed length vectors the length is explicitly stated, for variable length ones a property defines a link field which gives the length at run time. The following diagram illustrates a vector of fixed length and one of variable length.