1.
DataStage v8 Configuration (5%)
- Describe
how to properly configure DataStage V.8.0.
- This
is kind of vague but focus on how DataStage 8 gets attached to a Metadata
Server via the Metadata Console and how security rights are set up.
- Read
up on configuring DB2 and Oracle client and ODBC.
- Get
to know the dsenv file. Read the DataStage Installation Guide for
post-installation steps.
- Identify
tasks required to create and configure a project to be used for V.8.0
jobs.
- DataStage
Parallel Job Advanced Developer Guide has a section on project specific
environment variables such as reporting, c++ compile options, tracing and
optimisation. You do not have to memorise them but at least read
through each setting.
- Get
to know the DataStage Administrator Create Project and Project Settings.
- Given
a configuration file, identify its components and its overall intended
purpose.
- You
can practice with different configuration files with one, two and four
nodes and view Job Monitoring to see how the job is impacting. Turn
on job scoring to see what the job really looks like. Read up on
node pools and memorise what is in the manuals.
2.
MetaData (5%)
- Demonstrate
knowledge of Orchestrate schema.
- An
orchestrate schema is just a table definition in a file format. Try
saving a couple to file and get to know how they are put together.
- Try
reading and writing to a dataset using a schema file definition.
- Identify
the method of importing metadata.
- Read
my blog post Navigating the many paths of metadata for DataStage 8
- there are actually quite a few different ways to import metadata in
version 8.
- Try
imports through the metadata bridges, DataStage Designer plugin import
and the DataStage Designer Connector import.
- Given
a scenario, demonstrate knowledge of runtime column propagation.
- Try
an exercise where you drop and regain columns in a job using
propagation. For example generate a set of columns and push them
through three copy stages. Remove two columns from the middle copy
stage and add them back again in the third copy stage with column
propagation turned on.
3.
Persistent Storage (10%)
- Given
a scenario, explain the process of importing/exporting data to/from
framework (e.g., sequential file, external source/target).
- Get
to know the Sequential File Stage, the Column Import Stage and the
External Source Stage. All three do roughly the same thing but have
different options. They all provide a stream of data to the same
import function that turns data into orchestrate parallel data.
- Given
a scenario, describe proper use of a sequential file.
- Read
a sequential file.
- Read
up on Sequential Files stage options such as multiple readers, first row
is column names, reject rows.
- Compare
using a file name to using a file wildcard.
- Try
reading two files at once by configuring the stage with two file names.
- Given
a scenario, describe proper usage of CFF (native not plug-in).
- It
can be hard to practice with a CFF stage if you do not have CFF
data! Read up on it and see if you can find a Cobol Copy Book - try
Googling for one, if you can get your hands on one import it as a CFF
file definition and try loading the metadata into the CFF stage and
pretend you are using it to extract data. That will show you how the
stage works.
- Have
a look at all the tabs on the CFF stage for data formats, record filters
etc.
- Describe
proper usage of FileSets and DataSets.
- These
are the hash files of Enterprise Edition. Try comparing a Lookup
Fileset to a Dataset and watch what happens in the temp directory and
node scratch directory for each type of job as the job is running.
- See
what files are created behind the scenes.
- Compare
a very large lookup to a very small lookup to see how filesets and
dataset lookups are different.
- Describe
use of FTP stage for remote data.
- You
can just read up on this. Basically DataStage wraps FTP and
Sequential File read into a single stage so you read the file as it is
transferred.
- Identify
importing/exporting of XML data.
- It
is very easy to get your hands on XML data - lots of it around. Try
importing it into DataStage and read the XML Input stage and look for the
XML tutorials on my blog post 40 DataStage Learning,
Tutorial and Certification Online Resources. The
trick is that the XML Import Stage is looking to receive either a huge
chunk of XML data or an XML file name passed into it from a preceding
stage. This means you have to pass it the data from something like
a Sequential File Stage or the External Source stage.
- Have
a look at the XPATH statements created in the Description field of a
Table Definition from an XML metadata import.
- Read
up on how the KEY field of an XML Import determines how an XML file gets
flattened to a flat table.
4.
Parallel Architecture (10%)
- Given
a scenario, demonstrate proper use of data partitioning and collecting.
- The
DataStage Parallel Developers Guide has a very good description of data
partitioning that all developers should read.
- Given
a scenario, demonstrate knowledge of parallel execution.
- A
good way to learn this is to create a job that generates a sequence of
numbers in an ID field and then run that on multiple nodes, use the
Monitor tool to watch what happens to Transformers, Lookups and
Sequential file stages in this job.
5.
Databases (10%)
- Given
a scenario, demonstrate proper selection of database stages and database
specific stage properties.
- The
questions are likely to be about Oracle, DB2 and SQL Server. Most
likely the first two. You can read up on them in the Developers
Guide and sometimes there is an additional PDF. You can also add
these stages to a job even if you do not have access to the database
itself and play with the properties of the stage.
- Identify
source database options.
- Add
each type of database stage to a job and play around with the different
settings.
- Given
a scenario, demonstrate knowledge of target database options.
- Compare
inserts and updates to loads and bulk loads.
- Compare
generated SQL from the different types of upsert options (insert then
update, update then insert, replace).
- Use
www.dsxchange.com
to search for discussions on array size and transaction size for database
stages.
- Given
a scenario, describe how to design v.8.0 ETL job that will extract data
from a DBMS, combine with data from another source and load to another
DBMS target.
- Demonstrate
knowledge of working with NLS database sources and targets.
- I
did not get an NLS question in my version 7.5 exam but it may pay to read
the NLS guide at least once so you are familiar with it. Focus on
NLS with databases as this seems to be the only section that mentions
NLS.
6.
Data Transformation (10%)
- Given
a scenario, demonstrate knowledge of default type conversions, output
mappings, and associated warnings.
- Read
the DataStage Developers Guide and understand the difference between
implicit conversion - where you map a field of one data type to a
different data type without any specific conversion code.
- Read
up on explicit conversion where you use a function like StringtoTimestamp
to change the metadata during the mapping.
- Try
putting a string field into a numeric, putting an invalid date into a
date field, putting a null into a not null to see what job warnings you
get.
- Given
a scenario, demonstrate proper selections of Transformer stage vs. other
stages.
- Read
my blog post Is the DataStage parallel transformer evil? for
a discussion on what the Transformer does.
- For
scenarios when you need the Modify stage try DataStage Tutorial: How to become a Modify Stage Zen
Master.
- Given
a scenario, describe Transformer stage capabilities (including: stage
variables, link variables, DataStage macros, constraints, system
variables, link ordering, @PART NUM, functions.
- A
good way to understand partitioning functions is to try and build your
own Transformer Parallel Job Counter.
- Go
into a Transfomer and click in a derivation field and explore every part
of the Right Mouse Click menu - try adding job start time, job name,
@NULL, @DAY and various other macros to an output link and send it to the
Peek stage to read the results.
- Demonstrate
the use of Transformer stage variables (e.g., to identify key grouping
boundaries on incoming data).
- Try
a Stage Variable scenario where you compare values in the current row to
values in the previous row of data by storing values in stage
variables. Have a look at this DSXChange Vertical Pivot thread
for ideas.
- Try
a Stage Variable scenario where you do null handling of input columns in
stage variables and then use the results in derivation functions.
- Identify
process to add functionality not provided by existing DataStage stages.
(e.g., wrapper, buildops, user def functions/routines).
- Read
the DataStage Advanced Parallel Job Developers Guide for a description of
wrappers and buildops.
- Step
through the parallel function tutorial in my post 40 DataStage Learning, Tutorial and Certification
Online Resources. It's good to know how parallel
routines work.
- Given
a scenario, demonstrate proper use of SCD stage
- Read
up on the SCD stage and have a look at the SCD online tutorial in my post
40 DataStage Learning, Tutorial and Certification
Online Resources. It can be tricky understanding the SCD
stage if you have never done star schemas and I wouldn't spend a lot of
time trying to learn. Just learn the dynamics of the stage and save
your learning of star schemas for another time.
- Demonstrate
job design knowledge of using RCP (modify, filter, dynamic transformer).
- This
is a tough one - dynamic transformer? I am not even sure what that
is.
- Practice
column propagation through the modify, filter and transformer stages.
7.
Job Components (10%)
- Demonstrate
knowledge of Join, Lookup and Merge stages.
- There
is a table in the Parallel Job Developers Guide that compares the three
stages. Try to get to know the differences.
- Given
a scenario, demonstrate knowledge of SORT stage.
- Read
my blog post Sorts to the left of me, sorts to the right
to understand all the different ways to control sorting in a parallel
job.
- Given
a scenario, demonstrate understanding of Aggregator stage.
- Read
up on the stage and practice with it.
- Try
doing a sum and a count at the one time! Look for the work around
where you create a column set to the number 1.
- Describe
proper usage of change capture/change apply.
- Change
Capture splits a data stream into inserts, updates, deletes and unchanged
rows by adding a tag field by comparing a before set of data to an after
set of data. Change Apply lets you make those changes to a
dataset.
- I
don't really use Change Apply that much since I'm usually applying
changes to a target database table.
- Demonstrate
knowledge of Real-time components.
- This
is kind of vague. I am guessing they are referring to the stages in
the Real Time folder such as XML Input and WISD input. Read up on
them and add them to a job to view the properties.
8.
Job Design (10%)
- Demonstrate
knowledge of shared containers.
- Create
a big job and turn part of it into a shared container.
- Try
using that container in two different jobs and examine how job parameters
are shared.
- Try
a shared container that works on a small number of rows and uses column
propagation to pass through the other rows. Use this in multiple
jobs.
- Given
a scenario, describe how to minimize SORTS and repartitions.
- Read
my blog post Sorts to the left of me, sorts to the right
to understand all the different ways to control sorting in a parallel
job.
- Add
several stages to a job that need sorted data - Remove Duplcates, Join
and Sequential File. Compare what happens to the sort symbols on
links when you configure these stages and choose different key fields.
- Demonstrate
knowledge of creating restart points and methodologies.
- This
one moves into methodology more than tool technical details. Do a
search on www.dsxchange.com
for discussions on vertical banding and rollback.
- Remember
that in most job abort situations you cannot restart a job from where it
left off - it is hard to calculate what row number a parallel
partitioning job was up to and its usually safest to rollback all changes
and start again or overwrite the target file/dataset.
- Given
a scenario, demonstrate proper use of standards.
- Try
reading the DataStage standards chapter from IBM's own Excellent DataStage Documentation and Examples in New
660 Page IBM RedBook.
- Explain
the process necessary to run multiple copies of the source (job
multi-instance).
- Not
many people use multiple instance jobs with parallel jobs since they are
able to partition data on the fly. Read up on Multiple Instance
Jobs in the DataStage Designer Guide.
- Demonstrate
knowledge of real-time vs. batch job design.
- Read
through the Information Server Services Director just once to get the
gist of turning a DataStage job into a Web Service. The main
difference is that a real time job will tend to input and output data as
XML so it can share data with other SOA services.
9.
Monitoring and Troubleshooting (10%)
- Given
a scenario, demonstrate knowledge of parallel job score.
- See
the Advanced Parallel Developers Guide for a description on how to turn
on job scoring.
- Turn
it on and leave it on as you practice for the exam and keep looking at
job scores until you understand how it works.
- Given
a scenario, identify and define environment variables that control
DataStage v.8.0 with regard to added functionality and reporting.
- DataStage
Parallel Job Advanced Developer Guide has a section on project specific
environment variables such as reporting, c++ compile options, tracing and
optimisation.
- Given
a process list (scenario), identify conductor, section leader, and player
process.
- Practice
a job on four nodes and watch the scoring to see what happens to that job
when it runs.
- Given
a scenario, identify areas that may improve performance (e.g., buffer
size, repartitioning, config files, operator combination, etc.).
- Try
re-designing a job by adding and removing sorts, remove duplicates,
copies, transformers and database stages and keep an eye on the partitioning
and repartitioning symbols that show up on the job links to see how they
are impacted.
- Try
a job with a lookup stage and a transformer together and put a reject
stage on the transformer. See what happens when the lookup stage
doesn't find a lookup row. Switch combine operators on and off to
compare the difference.
- Demonstrate
knowledge of runtime metadata analysis and performance monitoring.
- Try
out the various DataStage performance reports and use the Monitor tool to
see how a job is progressing.
10.
Job Management and Deployment (10%)
- Demonstrate
knowledge of advanced find.
- Try
finding column names, table names and wild card searching of parts of job
names.
- Given
a scenario, demonstrate knowledge and the purpose of impact analysis.
- Try
some impact analysis search and reporting from inside the DataStage
Designer.
- Demonstrate
knowledge and purpose of job compare.
- Try
comparing jobs in different projects that have minor changes.
- Given
a scenario, articulate the change control process.
- Since
IBM do not have a change control process in DataStage 8 this is kind of a
stupid criteria. IBM decommissioned the change control tool from
version 7.5, the new change control tool isn't ready yet and an Excel
based change control tool developed internally is not being released to
the public. The only change control in version 8 is the export and
import of components manually between projects.
- Read
up on how to lock a project to be read only. Lock down a project
and see how you can still set job parameter defaults.
11.
Job Control and Runtime Management (10%)
- Demonstrate
knowledge of message handlers.
- Find
a warning message and an error message in a DataStage job. Right
mouse click on the message and add it to a job message handler and
downgrade it to a lower level. Re-run the job to see what happens.
- Read
up on the difference between job message handlers and project message
handlers.
- Identify
the use of dsjob command line utility.
- Read
the DataStage Server Job Developers Guide for a description of the dsjob
command.
- Do
a search for dsjob on www.dsxchange.com to see all the sample
code on how DSJob works.
- Use
the DataStage Director to create a new DataStage Batch job - add a job to
the batch code using the combo box provided to generate your own dsjob
code.
- Given
a scenario, demonstrate ability to use job sequencers (e.g., exception
hunting, re-startable, dependencies, passing return value from routing,
parameter passing and job status).
- Try
a scenario where you have several jobs in a row and one of them fails -
see what happens when you restart the sequence job. See what
happens when you reset the sequence job.
- Compare
an aborted sequence job to an aborted parallel job.
Ref : Vincent McBurney