Seq File:
Extract/load from/to seq file max 2GB { Its
depends on OS property, Now most of the OS supports greater than 2 GB }
when used as a source at the time of compilation
it will be converted into native format from ASCII
does not support null values
A seg file can only be accessed on one node.
DataSet:
Data set is
the internally data format behind Orchestrate framework, so any other data
being processed as source in parallel job would be converted into data set
format first(it is handled by the operator "import") and also being
processed as target would be converted from data set format last(it is handled
by the operator "export"). Hence, data set usually could bring
highest performance.
it preserves partition.it stores data on the nodes
so when you read from a dataset you dont have to repartition the data
it stores data in binary in the internal format of
datastage.so it takes less time to read/write from ds to any other
It can not viewable directly, have to use data
management tool.
FileSet:
It stores data in the format similar to that of
sequential file so you could directly open it to see the path of data file and
its schema. Only advantage of using fileset over seq file is it preserves
partition scheme
you can view the data but in the order defined in
partitiong scheme..
njoy the simplicity.......
Atul Singh
No comments :
Post a Comment