Introduction
This document
contains the Data Stage Best practises and recommendations which could be used
to improve the quality of data stage jobs. This document would further be
enhanced to include the specific Data Stage problems and there troubleshooting.
Recommendations
• Data Stage Version
• DataStage Job
Naming Conventions
• DataStage Job
Stage and Link Naming Conventions
• DataStage Job
Descriptions
• DataStage Job
Complexity
• DataStage Job
Design
• Error and/or
Reject Handling
• Process
Exception Handling
• Standards
• Development
Guidelines
• Component
Usage
• DataStage Data
Types
• Partitinong
Data
• Collecting
Data
• Sorting
• Stage Specific
Guidelines
Data Stage Version
Use DataStage
Enterprise edition v 8.1 over Server edition for any future production
development. Enterprise edition has a more robust set of Job stages, provides
better performance through parallel processing, and more future flexibility
through its scalability.
DataStage Job Naming Conventions
For DataStage
job development, a standard naming convention should be utilized for all job
names. This could include various components including the type of job, source
of data, type of data, category, etc.
Eg:
jbCd_Aerodrome_Type_Code_RDS
jbDtl_Prty_Address_CAAMS
The job naming
conventions for the conversion jobs don’t really need to change at this point.
These jobs are likely executed only during the initial conversion and not used
again after that. If any of these are to become part of the production process,
then changing the job name to a standard format would be preferred.
DataStage Job Stage and Link Naming
Conventions
It is
recommended that a standard naming convention be adopted and adhered to for all
jobs and sequences. Although the stated items above are only minor variations
and oversights, there should be consistency and completeness in naming stages
and links.
Again, the
conversion jobs are only to be executed once and addressing any inconsistencies
does not make sense at this point. Future development should adhere to the
defined standard.
DataStage Job Descriptions
Descriptions
should be included in every DataStage job and Job stage. This is facilitated
for the Job in the Job Properties window, allowing both a Short and Long Job
Descriptions. It is also found in the Stage Properties for each stage.
Descriptions allow other developers, and those reviewing the code, to better
understand the purpose of the job, how it accomplishes it, and any special
processing. The more complex the Job or Stage is, the more detail that should
be included. Even simple self explanatory Jobs or Stages require some sort of
description.
DataStage Job Complexity
Production jobs
on the other hand shouldn’t be this complex. They should complete a specific
task with a minimal number of Stages. Typically data processing is broken up
into Extraction, Staging, Validation, Transformation, Load Ready, and Load
jobs. Each job, of each category, typically deals with one source or target
table at a time, with DataSets used to pass data between jobs. The end result
is that many more DataStage jobs are required to complete the same process,
DataStage Job Design
Continue this
design approach for any new development where there are similarities between
jobs. It is always quicker to develop a new job if a similar job can be
leveraged as a starting point. In addition, there is an opportunity to create
Shared Containers with common code that can be reused across a number of jobs.
This simplifies the development of each similar job, and only requires
changes/maintenance of one version of the common code (Shared Container). Any
new development should consider job designs that allow Shared Containers to
utilized for common coding elements.
Error Handling
Implement error
handling to manage records that cannot be processed for various reasons. This
includes records with bad data, missing data (not null attributes), orphaned
child records, missing code table entries, and other business rules required
for excluding specific data or complete UOW (units of work). A reject process
should also be considered if records are to be reprocessed at later date, such
as when code tables get updated, or when the missing Parent records are finally
processed. The staging area can be used to maintain record status so that
successful and failed records can be tracked.
Process Exception Handling
Exception
handling should be implemented in any production sequence job called by another
sequence. Since the conversion process is likely a manual process, any failure
could be dealt with manually as required.
On the other
hand, in a production environment, dependant job sequences should not be executed
if their predecessor job sequences do not complete successfully. The called
sequence should include the Exception Handler and Terminator stages to prevent
further processing when a job fails. This allows the problem to be addressed
and the sequence restarted with fewer issues.
Standards
It is important
to establish and follow consistent standards in:
Directory
structures for installation and application support directories
Naming
conventions, especially for DataStage Project categories, stage names, and
links.
All DataStage
jobs should be documented with the Short Description field, as well as
Annotation fields.
It is the
DataStage developer’s responsibility to make personal backups of their work on
their local workstation, using DataStage's DSX export capability. This can also
be used for integration with source code control systems.
njoy the simplicity.......
Atul Singh