ETL is an
abbreviation of the three words Extract, Transform and Load.
It is an ETL process to extract data, mostly from different
types of systems, transform it into a structure that’s more appropriate for
reporting and analysis and finally load it into the database and or cube(s).
As ETL stands for extraction, transformation and
loading. Etl is a process that involves the following tasks:
E – Extract from source
In this step we extract data from different internal and
external sources, structured and/or unstructured. Plain queries are sent to the
source systems, using native connections, message queuing, ODBC or
OLE-DB middleware. The data will be put in a so-called Staging Area (SA),
usually with the same structure as the source. In some cases we want only the
data that is new or has been changed, the queries will only return the changes.
Some ETL tools can do this automatically, providing a changed data capture
(CDC) mechanism.
T – Transform the data
Once the data is available in the Staging Area, it is
all on one platform and one database. So we can easily join and
union tables, filter and sort the data using specific attributes, pivot to
another structure and make business calculations. In this step of the ETL
process, we can check on data quality and cleans the data if necessary. After
having all the data prepared, we can choose to implement slowly changing
dimensions. In that case we want to keep track in our analysis and reports when
attributes changes over time, for example a customer moves from one region to
another.
L – Load into the data warehouse
Finally, data is loaded into a data
warehouse, usually into fact and dimension tables. From there the data can
be combined, aggregated and loaded into datamarts or cubes as is
deemed necessary.
The ETL process is also very often referred to as Data
Integration process and ETL tool as a Data Integration platform.
The terms closely related to and managed by ETL processes are: data migration, data management, data cleansing, data synchronization and data consolidation.
The terms closely related to and managed by ETL processes are: data migration, data management, data cleansing, data synchronization and data consolidation.
The main goal of maintaining an ETL process in an
organization is to migrate and transform data from the source OLTP systems to
feed a data warehouse and form data marts.
Two other things which is integrated with ETL..
Data profiling and data quality control
Profiling the data,
wil give direct insight in the data quality of the source
systems. It can display how many rows have missing or invalid values,
or what the distribution is of the values in a specific column. Based on this
knowledge, one can specify business rules in order to cleanse the data, or keep
really bad data out of the data warehouse. Doing data profiling before
designing your ETL process, you are better able to design a system that is
robust and has a clear structure.
Meta data management & ETL
Information about
all the data that is processed, from sources to targets by
transformations, is often put into a metadata repository; a
database containing all the metadata. The entire ETL process can
be ‘managed’ with metadata management, for example one can query how a specific
target attribute is built-up in the ETL process, called data lineage. Or, you
want to know what the impact of a change will be, for example the size of the
order identifier (id) is changed, and in which ETL steps this attribute plays a
role.
ETL Tools
At present the most popular and widely used ETL tools and
applications on the market are:
No. List of
ETL Tools ETL
Vendors
1. Oracle
Warehouse Builder (OWB) Oracle
2. Data
Services SAP
Business Objects
3. IBM
Information Server (Datastage) IBM
4. SAS Data
Integration Studio SAS
Institute
5. PowerCenter
Informatica Informatica
6. Elixir
Repertoire Elixir
7. Data
Migrator Information
Builders
8. SQL
Server Integration Services (SSIS) Microsoft
9. Talend
Open Studio for Data Integration Talend
10. DataFlow
Manager Pitney
Bowes Business Insight
11. Pervasive
Data Integrator Pervasive
Software
12. Open Text
Integration Center Open
Text
13. Oracle
Data Integrator (ODI) Oracle
14. Data
Manager/Decision Stream IBM
(Cognos)
15. Clover ETL Javlin
16. Centerprise Astera
17. DB2
Infosphere Warehouse Edition IBM
18. Pentaho
Data Integration Pentaho
19 Adeptia
Integration Suite Adeptia
20. DMExpress Syncsort
21. Expressor
Data Integration QlikTech
njoy the simplicity.......
Atul Singh