Nuts & Bolts of DataStage: When to choose Server or Parallel Data stage job

Sunday, July 01, 2012

When to choose Server or Parallel Data stage job

1. The choice of server or parallel depends upon time to implement, functionality and cost.

2. When we have lots of functionality to implement for lower volume and hardware is less and ease of implementation we can go for Server jobs.

3. Parallel jobs are costly due to high scale of hardware , difficult to implement, extreme processing capabilities for absurd volumes with vast array of operators for high-performance manipulation.

4. When the data volume is less it is better to go for Server job as parallel jobs can have a longer start up time.

5. When data volume is high, it is better to choose parallel job than server job. Parallel job will be a lot faster than server job even if it runs on single node. The obvious incentive for going parallel is data volume. Parallel jobs can remove bottlenecks and run across multiple nodes in a cluster for almost unlimited scalability. At this point parallel jobs become the faster and easier option. A parallel sort stage is lot faster than server stage. A Transformer stage in parallel job with the same transformations in server job is faster. Even on one node with a compiled transformer stage, the parallel version was three times faster. On 1 node configuration that does not have a lot of parallel processing also we can still get big performance improvements from an Enterprise Edition job. The improvements will be multiplied 10 or more than that if we work on 2CPU machines and two nodes in most stages.

6. Parallel jobs take advantage of both pipeline parallelism and partitioning parallelism.

7. We can improve the performance of server job by enabling inter process row buffering. This helps stages to exchange data as soon as it is available in the link. IPC stage also helps passive stage to read data from another as soon as data is available. In other words, stages do not have to wait for the entire set of records to be read first and then transferred to the next stage. Link partitioner and link collector stages can be used to achieve a certain degree of partitioning parallelism.

8. Look up with sequential file is possible in parallel jobs and not possible in server jobs.

njoy the simplicity.......
Atul Singh

1 comment :

Anonymous19 July 2012 at 20:58
Thanks for providing the information on DataStage 10 Online training. Online training have the benefits of being convenient, flexible and on your own time.
ReplyDelete
Replies

Add comment