Nuts & Bolts of DataStage: DataStage Naming Conventions

Friday, December 14, 2012

DataStage Naming Conventions

DataStage Naming Conventions follows the guidelines of ETL Naming Conventions.

Contents

1 Job Name Prefixes
2 Stage Names
3 Link Names

Job Name Prefixes

Job prefixes are optional but they help to quickly identify the type of job and can make job navigation and job reporting easier. Parallel jobs - par Server jobs - ser Sequence jobs - seq Batch jobs - bat Mainframe jobs

Stage Names

The stage type prefix is used on all stage names so it appears on metadata reports that do not include a diagram of the stage or a description of the stage type. The name alone can be used to indicate the stage type.

Source and target stage names indentify the name of the entity such as a table name or a sequential file name. The stage name strips out any dynamic part of the name - such as a timestamp, and file extensions.

Database stage - db_table name
Dataset - ds_datasetname
Hash file - hf_hashfilename
Sequential file stage - sf_filename

The prefix identifies the source type, the rest of the name indicates how to find that source outside of DataStage or how to refer to that source in another DataStage job.

Transformation stages

Aggregation - AG_CalculatedContent (Prices, SalesAmounts, YTDPrices)
Changed Data Capture - CD
Funnel - FO_FunnelType (Continuous, round robin)
Lookup - LU
Pivot - PI
Remove Duplicates - RD
Sort - SO_SortFields
Transformer - TR_PrimaryFunction (HandleNulls, QA, Map)

Link Names

The link name describes what data is travelling down the link. Link names turn up in process metadata via the link count statistics so it is very important to use names that make process reporting user friendly.

Only some links in a job are important to project administrators. The link naming convention has two types of link names: - Links of importance have a five letter prefix followed by a double underscore followed by link details. - Intermediate links have a link name without a double underscore.

Links of Importance: - The first primary link in a job consists of SourceType(char2)pri(primary). - Any link from a reference source consists of SourceType(char2)ref(reference). - Any link loading to a target consists of TargetType(char2)UpdateAction(char3). - Any reject link SourceType(char2)rej(reject).

Any project can add new links of importance, such as the output count of a remove duplicates or aggregation stage.

Example: dbpri__stockitem is the first link in a job. dbups__stockitem is the link loading to a target database table with an upsert option. dbref__orgcodes is a reference lookup to of orgcodes to a database table. dbrej__stockitems is a reject of upserts to the stockitem table.

You can then produce a pivot report against the link row count statistics to show the row counts for a particular job using the five letter prefix as for each type of row count.

So till then....

njoy the simplicity.......