Nuts & Bolts of DataStage: Interview Questions : DataWareHouse

What is data warehousing?
"In computing, a data warehouse (DW) is a database used for reporting and analysis. The data stored in the warehouse is uploaded from the operational systems. The data may pass through an operational data store for additional operations before it is used in the DW for reporting.
A data warehouse maintains its functions in three layers: staging, integration, and access.
Staging is used to store raw data for use by developers. The integration layer is used to integrate data and to have a level of abstraction from users. The access layer is for getting data out for users.
The term Data Warehouse was coined by Bill Inmon in 1990, which he defined in the following way:
"A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process".
He defined the terms in the sentence as follows:
Subject Oriented: Data that gives information about a particular subject instead of about a company's ongoing operations.
Integrated: Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole.
Time-variant: All data in the data warehouse is identified with a particular time period.
Non-volatile: Data is stable in a data warehouse. More data is added but data is never removed.

What is ETL process in data warehousing?
"Extract, Transform and Load (ETL) is a process in database usage and especially in data warehousing that involves:
Extracting data from outside sources
Transforming it to fit operational needs (which can include quality levels)
Loading it into the end target (database or data warehouse)
Extract - The first part of an ETL process involves extracting the data from the source systems. In many cases this is the most challenging aspect of ETL, as extracting data correctly will set the stage for how subsequent processes will go.
Transform - The transform stage applies a series of rules or functions to the extracted data from the source to derive the data for loading into the end target.
Load - The load phase loads the data into the end target, usually the data warehouse (DW). Depending on the requirements of the organization, this process varies widely "

What are the steps to build the data warehouse?
Gathering business requirements>>Identifying Sources>>Identifying Facts>>Defining Dimensions>>Define Attributes>>Redefine Dimensions / Attributes>>Organize Attribute Hierarchy>>Define Relationship>>Assign Unique Identifiers

What is a source qualifier?
When you add a relational or a flat file source definition to a mapping, you need to connect it to a Source Qualifier transformation. The Source Qualifier represents the rows that the Informatica Server reads when it executes a session.

What do you mean by static and local variable?
Static variable is not created on function stack but is created in the initialized data segment and hence the variable can be shared across the multiple call of the same function. Usage of static variables within a function is not thread safe.On the other hand, local variable or auto variable is created on function stack and valid only in the context of the function call and is not shared across function calls.

What are the different types of data warehousing?
Types of data warehousing are:
1. Enterprise Data warehousing
2. Operational data store (ODS)

3. Data Mart

What is the data type of the surrogate key?
There is no data type for a Surrogate Key.
Requirement of a surrogate Key: UNIQUE
Recommended data type of a Surrogate key is NUMERIC.

What are Data Marts?
A data mart is a collection of tables focused on specific business group/department. It may have multi-dimensional or normalized. Data marts are usually built from a bigger data warehouse or from operational data.

What are the differences between star and snowflake schema?
Star schema: A single fact table with N number of Dimension
Snowflake schema: Any dimensions with extended dimensions are known as snowflake schema.

What is a cube in data warehousing concept?
Cubes are logical representation of multidimensional data. The edge of the cube contains dimension members and the body of the cube contains data values.

What are the difference between Snow flake and Star Schema?
Star schema contains the dimension tables mapped around one or more fact tables. It is a denormalized model and no need to use complicated joins. Also queries results fast.
Snowflake schema: It is the normalized form of Star schema. It contains in-depth joins, because the tables are split in to many pieces. We can easily do modification directly in the tables. We have to use complicated joins, since we have more tables.There will be some delay in processing the query.

What is Dimensional Modelling?
Dimensional Modelling is a design concept used by many data warehouse designers to build their data warehouse. In this design model all the data is stored in two types of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measurements i.e., the dimensions on which the facts are calculated.

What are the steps involved in Database Startup?
Start an instance, Mount the Database and Open the Database.

What are the steps involved in Database Shutdown?
Close the Database; Dismount the Database and Shutdown the Instance.

What is Partial Backup?
A Partial Backup is any operating system backup short of a full backup, taken while the database is open or shut down.
What is a full backup?
A full backup is an operating system backup of all data files, on- line redo log files and control file that constitute ORACLE database and the parameter.

Can a View based on another View?
Yes

Can a Table space hold objects from different Schemes?
Yes

Can objects of the same Schema reside in different table spaces?
Yes

Do you View contain Data?
Views do not contain or store data.

What are the Referential actions supported by FOREIGN KEY integrity constraint?
Update And Delete Restrict - A referential integrity rule that disallows the update or deletion of referenced data. DELETE Cascade - When a referenced row is deleted all associated dependent rows are deleted.

For more : --> DWH Interview Questions.

Nuts & Bolts of DataStage

Thursday, October 24, 2013

Interview Questions : DataWareHouse - Part 3