We have moved to www.dataGenX.net, Keep Learning with us.

Monday, July 21, 2014

Navigating the many paths of metadata for DataStage 8

Source : Navigating the many paths of metadata for DataStage 8

Looking at the methods for importing metadata table definitions into DataStage 8 ETL jobs.
All of the metadata import methods of DataStage 7 are in DataStage 8 and all execute in the same way.  Developers familiar with previous versions will be right at home!  What is tricky to come to terms with are all the new ways to get metadata into DataStage 8 and the Metadata Server.  The Metadata Server can provide reporting on metadata from products outside of DataStage such as BI tools so in some cases you might be importing the metadata for reporting and not for ETL. 
The list that follows covers just the techniques for importing metadata to be used by DataStage jobs.

1)  Information Analyzer

This product provides the richest source of metadata because it can import the known metadata about the source and add to that column level and cross table analysis to give you even better metadata.
If the table is defined and profiled by the Information Analyzer it can be shared with DataStage:

This turns a table definition created by the Information Analyzer tool into one that can be used by DataStage.  This is the only way to get information onto the new tab in DataStage 8 - Analyzer Summary.  This gives summary information from all the Information Analyzer profiling work into the DataStage stage properties windows when that Table Definition gets used.
There will be more about this in a future post plus a few sessions on the Information Analyzer (including mine) at Information on Demand 2007 in October.  To add my session to your agenda just look me up under the Speaker list to find my two presentations.  I'm very pleased to be co-presenting with Harald Smith, the Information Analyzer product manager who can cover the IA product much better than I can.

2)  DataStage Table Import

All our old favourite import commands are still there!  This remains the most direct way to get metadata into DataStage.  Many of these import wizards use the same method to get to the data as a DataStage job stage would.  

There are import commands for Cobol complex flat files, XML files, stored procedures etc.  The Plug-in Meta Data Definition option lists the import brokers for the Enterprise database sources such as Oracle, DB2 and Teradata.  It will also list any additional special stages you have added such as Netezza, Federation Server or DataStage TX.  In this example I used the ODBC Table Definition to bring in an MS Access table.
This produces very good metadata and often brings in description fields where they are available:

The importer worked out that the Memo field needed to be defined in DataStage as a LongVarChar field and that the Yes/No flag field could be stored as a bit. 

3)  MITI Import bridges

DataStage comes with a bunch of metadata bridges supplied by a specialist vendor Meta Integration Technology Inc (MITI).

Meta Integration is the leading "Metadata Component Provider" to major database, data integration, business intelligence, repository, and modeling tool vendors. MITI has established itself as the " Switzerland of Metadata"
These bridges are great - they cover dozens of popular products and are up to date on the latest versions. (We have been waiting a long time for an ErWin 7 import broker).
The trick here is that most of the import brokers you need are already in the Table Definition menu and you don't need the MITI bridges for most of your DataStage work. The MITI bridges become useful when you are building end to end date lineage across products for Metadata Workbench reporting.
There are still times when a MITI bridge is helpful for a DataStage programmer. One scenario is where you have a data model defined in a modeling tool but not yet available in a database. The database will eventually be created by a DDL file from the modeling tool but to get your DataStage development underway you can import metadata straight from the data model.
Importing from a modeling tool is also great if you have a lot of sequential file sources where the metadata on them is stored in a data model. Good luck getting that - most projects seem to store that type of metadata in MS Office documents.
These are the modeling tools that MITI has an import bridge for:
CA ERwin Data Modeler
CA Gen (COOL:Gen)
CA COOL:BusinessTeam (GroundWorks)
CA COOL:DBA (Terrain)
CA COOL:Enterprise (ADW)
Embarcadero ER/Studio
IBM Rational Rose Data Modeler & XDE
IBM Rational Data Architect
Oracle Designer
Select SE
Silverrun RDM
Sybase PowerDesigner
Telelogic Popkin System Architect
Visible IE:Advantage

You get a menu listing the dozens of bridges available: 

Followed by whatever custom import options come with that bridge, these are the options for ErWin and each has some help text within the tool: 

Going through all the wizard steps will turn the imported metadata into one or more table definitions.
MITI also has a cobol copybook importer but I haven't been able to get it to work on files that the DataStage importers handles.  Both the complex flat file and XML importer for DataStage has quite a good preview and debugging tool that makes the job easier than via the MITI bridge.
There are two ways to run an import bridge: you can find it in DataStage on the Import menu to bring in the metadata as a DataStage Table Definition or you can run the application stand alone and bring the metadata into the Information Server.  This gets stored in the metadata repository and can be brought into DataStage as a shared table.  If you cannot find the stand alone application maybe it hasn't been installed.  Look for it under the client folder on your Information Server install disk under an import/export folder.

4)  Orchestrate Schema import

This can give the most accurate metadata for a parallel job as it uses the parallel metadata importer to bring in the metadata.  It can also be less successful than option 2) as the DataStage table importers have a way of fixing metadata glitches. 
For example when I try to import an MS Access table with a memo field the Orchestrate import gives me this error:
Datatype not supported.
While the DataStage ODBC import works - it converts the memo field:
Once I take the memo field out and run the Orchestrate import again I get a parallel schema:
record {record_format={type=fixed}, binary, delim=none}
( istext: nullable string[max=50] {prefix=2};
  isnum: nullable int32;
  isdate: nullable timestamp {text, timestamp_format="%yyyy-%mm-%dd-%hh:%nn:%ss"};
  iscurrency: nullable decimal[19,4] {packed, nocheck};
  isyesno: uint8;

The first problem is that we lost the metadata description field - replaced by some schema descriptions.  The DataStage ODBC import brought across the Description field from MS Access while Orchestrate lost it. 
This definition caused an abort to my job as it didn't convert the MS Access Yes/No field correctly:
ODBC_Connector_4: [IIS-CONN-DAAPI-000066] Schema reconciliation detected a size mismatch for field isyesno. When moving data from source field type INT16 into target field type INT8, truncation, loss of precision or data corruption can occur. (CC_DBSchemaRules::reportSizeMismatch, file CC_DBSchemaRules.cpp, line 1,583)
The DataStage ODBC import chose a Bit SQL Type and the Orchestrate import chose TinyInt and the job fell over.  So the DataStage ODBC import wins that battle.  Orchestrate might do a better job on Enterprise parallel data sources such as Oracle and DB2 and might avoid some metadata warning messages at run time however these minor glitches can be fixed.  The lack of a description field cannot be fixed so this type of import is one I avoid.

5) Write your own

If you export a table definition to a DSX or XML file you can see the format that DataStage saves it in.  You can write your own program that turns metadata into one of these Table Definition export files.  I've seen one that converts a specific MS Word table containing a data source metadata (column name, type and description) into a DataStage table definition.
This isn't a great way to get metadata since a direct import from a source database is synchronised to the database data dictionary.  It can be useful for flat file sources - especially fixed width flat files where a definition might be supplied in a specification in MS Word.  If you are doing this type of table definition creation many times then creating a program to convert it might be easier than creating metadata manually.

6) Manually enter data

In any stage in DataStage you can manually type metadata in the column grid.  In DataStage 8 you can even type in complex flat file definitions (more about that in a future post).  I've marked this as the method of last resort due to all the mistakes you can make trying to enter metadata manually.   If you have metadata in MS Word or MS Excel you can cut and paste across to DataStage to create a new table definition.
The sequential file metadata import wizard may also require some manual metadata tweaking such as the adjustment of column widths or the entry of column names (where the column names are in the first row).

For WHATSAPP group , drop a msg to 91-88-00-906098