We have moved to www.dataGenX.net, Keep Learning with us.

Monday, February 17, 2014

Datastage Coding Checklist

  1. Ensure that the null handling properties are taken care for all the nullable fields. Do not set the null field value to some value which may be present in the source.
  2. Ensure that all the character fields are trimmed before any processing. Normally extra spaces in the data may lead to some errors like lookup mismatch which are hard to detect.
  3. Always save the metadata (for source, target or lookup definitions) in the repository to ensure re usability and consistency.

  4. In case the partition type for the next immediate stage is to be changed then the ‘Propagate partition’ should be set to ‘Clear’ in the current stage.
  5. Make sure that appropriate partitioning and sorting are used in the stages, where ever possible. This enhances the performances. Make sure that you understand the partitioning being used. Otherwise leave it auto.
  6. Make sure that the pathname/format details are not hard coded and job parameters are used for the same. These details are generally set as environmental variable.
  7. Ensure that all fi le names from external source are parameterized. This will prevent the developer from the trouble of changing the job or file name if the file name is changed. File names/Datasets created in the job for intermediate purpose can be hard coded.
  8. Ensure that the environment variable $APT_DISABLE_COMBINATION is set to ‘False’.
  9. Ensure that $APT_STRING_PADCHAR is set to spaces.
  10. The parameters used across the jobs should be with same name. This helps to avoid unnecessary confusions
  11. Use 4-node configuration file for unit testing/system testing the job.
  12. If there are multiple jobs to be run for the same module. Archive the source files in the after job routine of the last job.
  13. Check whether the fi le exists in the landing directory before moving the sequential file. The ‘mv’ command will move the landing directory if the file is not found.
  14. Verify whether the appropriate after job routine is called in the job.
  15. Verify the correct link counts are used in the after job routine for ACR log fi le.
  16. Check whether the log statements are correct for that job.
  17. Ensure that the unix files created by any Datastage job is created by the same unix user who has run the job.
  18. Check the Director log if the error message does not have readability.
  19. Verify job name, stage name, link name, input file name are as per standards. Ensure that the job developed adhere to the naming standards defined for the software artifacts.
  20. Job description must be clear and readable.
  21. Make sure that the Short Job Description is filled using ‘Description Annotation’ and it contains the job name as part of the description. Don’t use Annotation for putting the job description.
  22. Check that the parameter values assigned to the jobs through sequencer.
  23. Verify if Runtime Column Propagation (RCP) is disabled or not.
  24. Ensure that reject links are output from the sequential file stage which reads the data file to log the records which are rejected.
  25. Check whether the dataset are used instead of sequential fi le for intermediate storage between the jobs. This enhances performance in a set of linked jobs.
  26. Reject records should be stored as sequential files. This helps in the analysis of rejected records outside the datastage easier.
  27. Ensure that the dataset from another job use the same metadata which is saved in the repository.
  28. Verify that the intermediate files that are used by downstream jobs have unix read access/permission to all users.
  29. For fixed width files, final delimiter should be set to “none” in the file format property.
  30. Verify that all lookup reference files have unix permission as 744. This will ensure that other users don’t overwrite or delete the reference fi le.
  31. The order of stage variables should be in correct order. Eg: - A stage variable used in calculation should be in the higher order than the calculation variable.
  32. If any processing stage requires a key ( like remove duplicate, merge, join, etc ) the Keys, sorting keys and Partitioning keys should be same and in the same order
  33. Make sure that sparse lookup are not used when large volumes of data are handled.
  34. Check look up keys, if they are correct.
  35. Do not validate on a null fi eld in a transformer. Use appropriate data types for the Stage variables. Use IsNull(), IsNotNull() or Seq() for doing such validations.
  36. In Funnel, all the input links must be hash partitioned on the sort keys.
  37. Verify if any Transformer is set to run in sequential mode. It should be in parallel mode.
  38. RCP should be enabled in the Copy stage before shared container.
  39. Verify whether column generated using column generator is created using ‘part’ and ‘part count’.
  40. In Remove Duplicate stage, ensure that the correct record (according to the requirements) is retained.
  41. Every database object referenced is accessed through the parameter schema name.
  42. Always reference database object using the schema name.
  43. Use Upper Case for column names and table names in SQL queries.
  44. Check that the parameter values are assigned to the jobs through sequencer
  45. For every Job Activity stage in sequencer, ensure that “Reset if required, then run” is selected where relevant.

No comments :

Post a Comment