ETL Testing : Trends
- Record Count Verification DWH backend/Reporting queries against source and target as a initial check.
Data integrity between the various source tables and relationships. Also we should test the slowly changing dimension as given below. The most common slowly changing dimensions are Types 1, 2, and 3.
The Type 1 methodology overwrites old data with new data, and therefore does not track historical data at all. It only keeps the most recent data in the target This is most appropriate when correcting certain types of data errors, such as the spelling of a name. (Assuming you won't ever need to know how it used to be misspelled in the past.)
The Type 2 method tracks historical data by creating multiple records for a given natural key in the dimensional tables with separate surrogate keys and/or different version numbers. It keeps full history in the target. With Type 2, we have unlimited history preservation as a new record is inserted each time a update is made.
The Type 3 method tracks changes using separate columns. Whereas Type 2 had unlimited history preservation, it has limited history preservation, as it's limited to the number of columns we designate for storing historical data. It keeps the current and previous information in the target. Where the original table structure in Type 1 and Type 2 was very similar, Type 3 will add additional columns to the tables:
The Type 4 method is usually referred to as using "history tables", where one table keeps the current data; and an additional table is used to keep a record of some or all changes.
Validation for various calculations based on the different transformation rule from source to target. Transformation can be straight move, simple, complex (based on the complexity of the transformation rule) If the target field is related to timeline, default/system generated then this kind of testing should cover in this scenario.
Data Quality Validation
Check for missing data, negatives and consistency. Field-by-Field data verification can be done to check the consistency of source and target data. The scenarios covered are : Not Null, Unique ,Default values. Companies can have various DQC matrix/reports required to analyze data statistics like - In a physician ranking system business need to know total physicians, how many rejected (due to several reasons) , how many ranked, how many applied for ranking etc.
One shot/Retro validation
This kind of testing involves verification of changes applied on past data for the tables in the scope which directly cannot be covered in simple/complex transformation.
Regression testing will ensure that no other fields get impacted due to code changes.
Join/ RI validation
To verify the join and RI criteria as specified in Requirement document between the test tables.
To verify the views to validate the information that is required to display for user (security validation point of view).
To verify the insert and update in the specific test table.
This scenario involves verification of business rules which will be applied on future data for the tables in the scope which directly cannot be covered in other transformation scenarios.
This scenario involves the prospective / regression validation on production environment.
ETL Testing ChallengesETL testing is quite different from conventional testing. There are many challenges we faced while performing data warehouse testing. Here is the list of few ETL testing challenges :
- Incompatible and duplicate data.
- Loss of data during ETL process.
- Unavailability of inclusive test bed.
- Testers have no privileges to execute ETL jobs by their own.
- Volume and complexity of data is very huge.
- Fault in business process and procedures.
- Trouble acquiring and building test data.
- Missing business flow information.
-Complex transformation rules validation.
-High number of SQLs need to be executed.
Data is important for businesses to make the critical business decisions. ETL testing plays a significant role validating and ensuring that the business information is exact, consistent and reliable. Also, it minimizes hazard of data loss in production.