Wednesday, June 25, 2014
Fork n Join in DataStage
Fork/join parallelism is a style of parallel programming useful for exploiting the parallelism inherent in divide and conquer algorithms on shared memory multiprocessors. The idea is quite simple: a larger task can be divided into smaller tasks whose solutions can then be combined. As long as the smaller tasks are independent, they can be executed in parallel. One important concept to note in this framework is that ideally no worker thread is idle.
In DataStage :
In DataStage, Fork/Join algo is quite same, When we have to do some calculation on data which can't be possible in one flow, we split the data, calculate and join them back for the result. However, it is not nessory that design have JOIN or FILTER stage to do join n fork. It can be done by any stage as per requirement or you design. Actually, In datastage , Fork/Join basically represents by the graphical representation :-).
As in below design, Data has been divided into two flow for different data processing and joining them back for the result.
Like the Facebook Page & join Group