Nuts & Bolts of DataStage: FileSet in DataStage

Monday, December 17, 2012

FileSet in DataStage

DataStage can generate and name exported files, write them to their destination, and list the files it has generated in a file whose extension is, by convention, .fs. The data files and the file that lists them are called a file set. while their storage places are diverse Unix files and they're human-readable.

This capability is useful because some operating systems impose a 2 GB limit on the size of a file and you need to distribute files among nodes to prevent overruns. The amount of data that can be stored in each destination data file is limited by the characteristics of the file system and the amount of free disk space available.

The number of files created by a file set depends on:

· The number of processing nodes in the default node pool

· The number of disks in the export or default disk pool connected to each processing node in the default node pool

· The size of the partitions of the data set

The File Set stage enables you to create and write to file sets, and to read data back from file set.
Unlike data sets, file sets carry formatting information that describes the format of the files to be read or written.

Filesets are similar to datasets

1.Partitioned

2.Implemented with header file and data files

Filesets are different from datasets

1.The data files of filesets are text files and hence are readable by other applications whereas the data files of datasets are stored in native internal format and are readable only DataStage

till then.....
njoy the simplicity.......