Something about DataStage, DataStage Administration, Job Designing,Developing, DataStage troubleshooting, DataStage Installation & Configuration, ETL, DataWareHousing, DB2, Teradata, Oracle and Scripting.
Friday, October 11, 2013
Difference between OLTP and OLAP
Online Transactional Processing databases are functional orientated, they are designed to provide real-time responses from concurrent users and applications. To be more specific, OLTP databases must provide real-time concurrent (multi-threaded) processing of all SQL transaction (writes/updates and reads). Another characteristic of an OLTP database, is the fact that its state (underlying data) is constantly changing. Examples of OLTP are databases that support e-commerce applications.
OLTP databases are highlyNormalizedrelational databases. This means that there is very little or no data redundancy. This ensures data consistency (part of the ACID standard). Normalization is the process of arranging data into logical, organized groups of tables, reducing data repetition or going so far as to completely eliminating it. As a result the data is logically grouped into tables, and these tables form relationships with one another through the use of primary and foreign keys. There are different levels of normalization and OLTP data models usually meet the 3rd Normal Form also known as the Entity Attribute Relationship Model.
The normalization process makes OLTP databases inherently optimized for writingand updating data, in otherwords, OLTP databases are optimized for real-time processing of concurrent SQL write and update transactions. However this does not necessarily mean that normalization hinders the reading of data. OLTP databases perform well for real-time processing of concurrent read transactions, because these read queries typically answer simple questions which are well anticipated, such as reading a few data records, and only then via previously created record keys. In other words OLTP databases are specifically designed to optimize common read queries most likely to be made by end users. Another way OLTP databases can optimise these read transactions is through read caches for the most common queries.
Howevera major drawback of normalizingOLTP databasesis that it can hinder real-time processing of concurrent AD-HOC read transactions. In otherwords, OLTP databases performance is very poor when dealing with real-time data analysis. This is because ad-hoc read queries may require real-time memory-hungry table join operations, because the requested data may be spread across multiple related tables. This is especially true for multi-dimensional queries, since the data is more likely to be separated into different tables. This means OLTP databases are not suitable for multi-dimension data analysis. Another reason why OLTP databases are not suitable for data analysis, is the fact that OLTP databases represent the data as it currently exists. That is OLTP databases do not adequately represent temporal data – the history of data as it changes over time.
OLTP databases should also comply with the ACID standard:
Atomicity states that database changes must follow an “all or nothing” rule. Atomicity guarantees that either all of the tasks of a transaction are performed or none of them are. For example, the transfer of funds from one account to another can be completed or it can fail for a multitude of reasons, but atomicity guarantees that one account won’t be debited if the other is not credited.
Consistency ensures that any transactions moves the database from one consistent state to another consistent state. In other words the data in the database is always consistent. This is achieved through database normalization – the removal of redundant data. By removing redundant data, you remove the risk of data inconsistencies.
Isolation ensures that transactions are isolated from each other. This allows for multithreaded transactions for both reads and writes.
Durability ensure that transactions are not lost due to crashes.
3NF OLTP SCHEMA
Online Analytical Processing databases are subject orientated,they are designed to provide real-time analysis of stored data. The base data in OLAP databases is much more stable than OLTP. This is because the data in OLAP databases is historical in nature and therefore it is not subject to updates unlike OLTP which is update intensive.
OLAP databases are highly De normalized. This means that the data in OLAP databases is highly redundant – this is done to improve analytic performance, because SQL Reads do not require real-time memory-hungry table join operations. This means data in OLAP databases is multi-dimensional and OLAP databases are optimized for concurrent real-time analysis of multi-dimensional data. In otherwords they are optimized for processing very complex real-time ad-hoc read queries.
Howevera major drawback of DE-normalization is that OLAP database performance is very poor (and not appropriate) for real-time processing of concurrent write/update transactions. This is due to the vast amounts of data redundancy which can easily lead to inconsistencies. For this reason, OLAP databases process write and update transactions in scheduled batch jobs. The lack of support for real-time write/update transactions is not really an issue, since OLAP databases store non-volatile historical data – that is data which does not change much over time. Sometimes OLAP databases store data which is only Written Once but Read Many (WORM).
Also OLAP databases need not support the ACID standard, this is because OLAP databases are designed for real-time data analysis rather than real-time transaction processing.
Data-warehouseis a synonym for the stored data in OLAP databases. A Data-warehouse is defined as a repository of an organization electronically stored data. Data warehouse repositories are structured to facilitate reporting and analysis of the underlying data. A data mart is a subset of an organizational data store, usually oriented to a specific purpose or major data subject. In other words data marts are often derived from subsets of data in a data warehouse.
OLAP database or Data-warehouse structures can include the following:
Multi-dimensional Tables – Facts/Metrics/Measures (What is being measured) and Dimensions (Labels that provide context to the facts/metrics/measures) exist in the same table
Star schema relational tables – central Fact Tables (metrics or measures) related to Dimension Tables (Denormalised Labels – provide context to the facts/metrics/measures) using primary keys (id’s)
Snowflake schema relational tables – same as Star Schema except the Dimensions are Normalized.
Multi-dimensional Cubes/Hypercubes – cube metadata is typically created from the star-schema or snowflake-schema tables!
The star and snowflake schema are most common for dimensional models used by data warehouses and data marts.
In summary OLAP databases are structured in such a way where speed of data retrieval is more important than the efficiency of data manipulations.
Snowflake Shema Example
Main differences between OLTP and OLAP databases
OLTP databases are optimized for writing. However, they do not suffer for reading data, because the data access methods are generally well known and data is generally located by a well known record keys.
If OLTP data models are reasonably performant for writing and reading, why then do we need OLAP databases?
Although OLTP databases can perform well for reading data, there are many assumptions for that behavior. OLTP databases typically answer simple questions which were well anticipated, such as reading a few data records, and only then via previously created record keys. OLAP databases are able to read vast amounts of data, in unanticipated manners (ad-hoc basis), with support for huge aggregate calculations.
OLTP data models represent the data as it currently exists. However, they do not adequately represent temporal data. That is, a history of data as it changes over time. Therefore is it is not possible to use OLTP to answer questions such as “What was the inventory level by product each morning at 9 AM” or “what percentage did sale revenue change and net profit rise or fall with the last retail price change”. These are examples of the types of questions that OLTP databases have trouble answering, but that OLAP databases can easily answer.
OLAP databases might not perform well, such as when supporting intensive data writes, whereas OLTP databases can.
OLAP databses rely upon a great amount of redundant data (denormalized), unlike OLTP databases, which have little or no redundant data.
Redundant data supports the sweeping ad hoc reads, but would not perform as well in support of the OLTP writes.