We have moved to www.dataGenX.net, Keep Learning with us.
Showing posts with label DataWareHouse. Show all posts
Showing posts with label DataWareHouse. Show all posts

Saturday, May 30, 2015

Data Warehouse Load Methods


A data warehouse is a warehouse that contains data. Sounds funny doesn't it? Imagine a warehouse as a storage structure that may be physical or logical. In the DW it's both...logical as in the data model representation and physical as to the hard disks and other computer equipment that support the data warehouse.

The first question should be "How does the data get there?" Here are three key areas to keep in mind....

Availability - will someone create a report with transactions completed 20 minutes ago or will it be with yesterday's data?

Monday, January 19, 2015

Some Musts To Avoid When Developing Your Data Warehouse Project


Here’s a list of things to avoid when developing a data warehouse. These are not necessarily in priority order. The priorities depend on your project.

1. Avoid technology for technology’s sake. Focus on business requirements and goals.

2. Avoid not securing a high-ranking business sponsor. If you lose your current sponsor, immediately find a replacement in the business sector of your organization.

3. Avoid trying to implement the entire data warehouse all at once. Approach the data warehouse project as a series of integrated sub-projects (data marts), and deliver each sub-project as it’s completed.

4. Avoid expending excess energy and budget on structuring the data in the warehouse. Do not over-normalize (starflake schema). Your focus should be the best query performance that you can deliver and delivering a quality, easy-to-use set of user interfaces.

Monday, January 12, 2015

Glossary of data warehousing terms



attribute
A field or column of a dimension (or other) table.

BI
Business
Intelligence
A suite of software tools used primarily by business administrative staff to navigate through the data of the data warehouse. BI tools provide functionality including managed reporting, queyring, data analysis, data visualization, etc.


data cube
cube
A multi-dimensional representation of business data in which the cells of the cube contain data measures (i.e. facts) and the edges of the cube represent the data dimensions.

Although a cube implies only 3 dimensions in geometry, a data cube may represent any number of dimensions.

Monday, June 23, 2014

Purpose of Parameters in ETL



In an ETL process, a parameter represents an attribute that is not hard coded or sourced from the transactional system. Parameters provide flexibility so that you can adapt ETL processes to fit your business requirements.
or in Simple Words, A job parameter is a way to change a property within a job without having to alter and recompile it. These are input values to job/process which can be changed time to time as per need.



Sunday, June 15, 2014

Why Data Warehouses are Hard to Deploy ?


Inherent Complexity of Data Warehouse Databases 

Data Warehouse Databases are Large
Since data warehouse databases are constructed by combining selected data from several operational databases, data warehouse databases are inherently large. They are often the largest databases within an organization. The very size of these databases can make them very difficult and expensive to query. 


Tuesday, May 27, 2014

Interview Questions : DataWareHouse - Part6



What is BUS Schema?
BUS Schema is composed of a master suite of confirmed dimension and standardized definition if facts.

What are the methodologies of Data Warehousing?
Every company has methodology of their own. However, to name a few SDLC Methodology, AIM methodology is standard used.

Tuesday, May 20, 2014

Data Warehouse Testing Checklist


Unit testing checklist
 A checklist will aid database programmers to systematically test their code before formal QA testing.

  • Check the mapping of fields that support data staging and in data marts.
  • Check for duplication of values generated using sequence generators.
  • Check the correctness of surrogate keys that uniquely identify rows of data.
  • Check for data-type constraints of the fields present in staging and core levels.
  • Check the data loading status and error messages after ETLs (extracts, transformations, loads).
  • Look for string columns that are incorrectly left or right trimmed.
  • Make sure all tables and specified fields were loaded from source to staging.
  • Verify that not-null fields were populated.
  • Verify that no data truncation occurred in each field.
  • Make sure data types and formats are as specified during database design.
  • Make sure there are no duplicate records in target tables.
  • Make sure data transformations are correctly based on business rules.
  • Verify that numeric fields are populated precisely.
  • Make sure every ETL session completed with only planned exceptions.
  • Verify all data cleansing, transformation, and error and exception handling.
  • Verify stored procedure calculations and data mappings.

Sunday, May 04, 2014

ETL Testing : Approach


Testing is undoubtedly an essential part of DW life-cycle but it received a few attention with respect to other design phases.

DW testing specification:
 – Software testing is predominantly focused on program code, while Software testing is predominantly focused on program code, while DW testing is directed at data and information DW testing is directed at data and information.  .
 – DW testing focuses on the correctness and usefulness of the information delivered to users information delivered to users
 – Differently from generic software systems, DW testing involves a huge data volume huge data volume, which significantly impacts performance and productivity.
 – DW systems are aimed at supporting any views of data, so the possible number of use scenarios is practically infinite and only few them are known from the beginning.
 – It is almost impossible to predict all the possible types of errors that will be encountered in real operational data.


Tuesday, March 11, 2014

Interview Questions : DataWareHouse - Part 5


How Many different schemas or DW Models can be used in Siebel Analytics. I know Only STAR and SNOW FLAKE and any other model that can be used?
Integrated schema design is also used to define an integrated schema design we have to define the following concepts

► Fact constellation
► Act less fact table
► Onformed dimension

A: A fact constellation is the process of joining two or more fact tables

B: A fact table with out any facts is known as fact less fact table

C:A dimension which is re useful and fixed is known as conformed dimensionA dimension, which is, shared with multiple fact tables known as conformed dimension

What is an error log table in Informatica occurs and how to maintain it in mapping?
Error Log in Informatica is a one of output file created by Informatica Server while running the session for error messages. It is created in Informatica home directory.


Sunday, February 23, 2014

DataWareHouse (ETL) Designing Steps


A) Planning and Designing Steps

i) Requirement and Realities Gathering

                       1. Business Needs
                       2. Data Profiling and other Data source realities
3. Compliance Requirement
4. Security Requirement
5. Data Integration
6. Data Latency
7. Archiving and Lineage
8. End user delivery interfaces
9. Available Development Skills
10. Available management Skills
11. Legacy licenses

Thursday, February 20, 2014

Dimension Table and Its Type in Data WareHouse


A dimension is a structure that categorizes data in order to enable users to answer business questions. It contain attributes that describe fact records in the fact table. Some of these attributes provide descriptive information; others are used to specify how fact table data should be summarized to provide useful information to the analyst. Dimension tables contain hierarchies of attributes that aid in summarization. Calculations on fact table are performed through dimensions.


Dimension table fields


The dimension tables should have at least fields listed below and contain fields used to group data during the database inquiry process.

Those are three types of fields:

Wednesday, February 12, 2014

Interview Questions : DataWareHouse - Part 4


What are the types of Synonyms?
There are two types of Synonyms Private and Public

What is a Redo Log?
The set of Redo Log files YSDATE, UID, USER or USERENV SQL functions, or the pseudo columns LEVEL or ROWNUM.

What is an Index Segment?
Each Index has an Index segment that stores all of its data.

Explain the relationship among Database, Table space and Data file?
Each databases logically divided into one or more table spaces one or more data files are explicitly created for each table space.


Saturday, November 02, 2013

Interview Questions : DataStage - self-1


Sharing some collection of InterView Questions. Try these to rank your knowledge :-)


1    Types of Stages in DS? Explain with Examples
2    What are active stages and passive stages?
3    Can you filter data in hashed file? (No)
4    Difference between sequential and hashed file?
5    How do you populate time dimension?
6    Can we use target hashed file as lookup? (Yes)
7    What is Merge Stage?
8    What is Job Sequencer?
9    What are stages in sequences?
10    How do you pass parameters?
11    What parameters you used in your project?

Thursday, October 24, 2013

Interview Questions : DataWareHouse - Part 3



 What is data warehousing?
"In computing, a data warehouse (DW) is a database used for reporting and analysis. The data stored in the warehouse is uploaded from the operational systems. The data may pass through an operational data store for additional operations before it is used in the DW for reporting.
A data warehouse maintains its functions in three layers: staging, integration, and access.
Staging is used to store raw data for use by developers. The integration layer is used to integrate data and to have a level of abstraction from users. The access layer is for getting data out for users.
The term Data Warehouse was coined by Bill Inmon in 1990, which he defined in the following way:

Friday, October 11, 2013

Difference between OLTP and OLAP


OLTP:
Online Transactional Processing databases are functional orientated, they are designed to provide real-time responses from concurrent users and applications. To be more specific, OLTP databases must provide real-time concurrent (multi-threaded) processing of all SQL transaction (writes/updates and reads). Another characteristic of an OLTP database, is the fact that its state (underlying data) is constantly changing. Examples of OLTP are databases that support e-commerce applications.
OLTP databases are highly Normalized relational databases. This means that there is very little or no data redundancy. This ensures data consistency (part of the ACID standard). Normalization is the process of arranging data into logical, organized groups of tables, reducing data repetition or going so far as to completely eliminating it. As a result the data is logically grouped into tables, and these tables form relationships with one another through the use of primary and foreign keys. There are different levels of normalization and OLTP data models usually meet the 3rd Normal Form also known as the Entity Attribute Relationship Model.

Thursday, September 26, 2013

Interview Questions : DataStage - Part 2


For more : Visit HERE

What is the importance of Surrogate Key in Data warehousing?
Ans : Surrogate Key is a Primary Key for a Dimension table. Most importance of using it is independent of underlying database. i.e Surrogate Key is not affected by the changes going on with a database


What does a Config File in parallel extender consist of?
Ans: Config file consists of the following.
a) Number of Processes or Nodes.
b) Actual Disk Storage Location.

Tuesday, September 24, 2013

Interview Questions : DataWareHouse - Part 2


For more : Visit HERE




What is real time data-warehousing?

Data warehousing captures business activity data. Real-time data warehousing captures business activity data as it occurs. As soon as the business activity is complete and there is data about it, the completed activity data flows into the data warehouse and becomes available instantly.


What are conformed dimensions?
Conformed dimensions mean the exact same thing with every possible fact table to which they are joined. They are common to the cubes.


What is conformed fact?
Conformed dimensions are the dimensions which can be used across multiple Data Marts in combination with multiple facts tables accordingly.

Friday, August 16, 2013

Big Data - The Hadoop Data Warehouse - Part 1



Big data is going to change the way you do things in the future, how you gain insight, and make decisions. These videos help you get quickly up to speed on this technology and to show you the unique things IBM is doing to turn the freely available open source big data technology into a big data platform; there’s a major difference and the platform is comprised of leveraging the open source technologies (and never forking it) and marrying that to enterprise capabilities provided by a technology leader that understands the benefits a platform can provide.

Thursday, August 08, 2013

Interview Questions : DataWareHouse - Part-1




What is Data Warehousing?
A data warehouse is the main repository of an organization’s historical data, its corporate memory. It contains the raw material for management’s decision support system. The critical factor leading to the use of a data warehouse is that a data analyst can perform complex queries and analysis, such as data mining, on the information without slowing down the operational systems. Data warehousing collection of data designed to support management decision making. Data warehouses contain a wide variety of data that present a coherent picture of business conditions at a single point in time. It is a repository of integrated information, available for queries and analysis.