We have moved to www.dataGenX.net, Keep Learning with us.

Saturday, May 30, 2015

Data Warehouse Load Methods


A data warehouse is a warehouse that contains data. Sounds funny doesn't it? Imagine a warehouse as a storage structure that may be physical or logical. In the DW it's both...logical as in the data model representation and physical as to the hard disks and other computer equipment that support the data warehouse.

The first question should be "How does the data get there?" Here are three key areas to keep in mind....

Availability - will someone create a report with transactions completed 20 minutes ago or will it be with yesterday's data?


ETL Process - does our ETL process handle constant loads or is it a once a day process?

Hardware limitations - do we have the hardware resources to allow people to access data and create reports while we are inserting/updating data?

Types of ETL Processing:

1. Batch load - load all data once a day. At the company I work with we do a batch load process that begins at midnight and hopefully a majority is complete by 7 am. (pretty common)

2. Trickle feed - rows are loaded in the EDW as they are received in the collector (the hardest to implement)

3. Microbatch - small batches of data are processed on either a time threshold (every 2 hours) or a row threshold (20,000 records in the collector).


Like the Facebook Page & join Group
https://www.facebook.com/DataStage4you
https://www.facebook.com/groups/DataStage4you

https://twitter.com/datastage4you
https://groups.google.com/d/forum/datastage4you
For WHATSAPP group , drop a msg to 91-88-00-906098