We have moved to www.dataGenX.net, Keep Learning with us.

Tuesday, October 02, 2012

DataSet, FileSet and Seq File in DataStage


Seq File:
Extract/load from/to seq file max 2GB { Its depends on OS property, Now most of the OS supports greater than 2 GB }
when used as a source at the time of compilation it will be converted into native format from ASCII
does not support null values
A seg file can only be accessed on one node.




DataSet:
 Data set is the internally data format behind Orchestrate framework, so any other data being processed as source in parallel job would be converted into data set format first(it is handled by the operator "import") and also being processed as target would be converted from data set format last(it is handled by the operator "export"). Hence, data set usually could bring highest performance.
it preserves partition.it stores data on the nodes so when you read from a dataset you dont have to repartition the data
it stores data in binary in the internal format of datastage.so it takes less time to read/write from ds to any other
It can not viewable directly, have to use data management tool.



FileSet:
It stores data in the format similar to that of sequential file so you could directly open it to see the path of data file and its schema. Only advantage of using fileset over seq file is it preserves partition scheme
you can view the data but in the order defined in partitiong scheme..



njoy the simplicity.......
Atul Singh

No comments :

Post a Comment