We have moved to www.dataGenX.net, Keep Learning with us.

Tuesday, March 11, 2014

Interview Questions : DataWareHouse - Part 5


How Many different schemas or DW Models can be used in Siebel Analytics. I know Only STAR and SNOW FLAKE and any other model that can be used?
Integrated schema design is also used to define an integrated schema design we have to define the following concepts

► Fact constellation
► Act less fact table
► Onformed dimension

A: A fact constellation is the process of joining two or more fact tables

B: A fact table with out any facts is known as fact less fact table

C:A dimension which is re useful and fixed is known as conformed dimensionA dimension, which is, shared with multiple fact tables known as conformed dimension

What is an error log table in Informatica occurs and how to maintain it in mapping?
Error Log in Informatica is a one of output file created by Informatica Server while running the session for error messages. It is created in Informatica home directory.



What is loop in Data warehousing?
In DWH loops may exist between the tables. If loops exist, then query generation will take more time, because more than one path is available. It creates ambiguity also. Loops can be avoided by creating aliases of the table or by context.

Example: 4 Tables - Customer, Product, Time, Cost forming a close loop. Create alias for the cost to avoid loop.

How many clustered indexes can u create for a table in DWH? In case of truncate and delete command what happens to table, which has unique id.
You can have only one clustered index per table. If you use delete command, you can rollback... it fills your redo log files.

If you do not want records, you may use truncate command, which will be faster and does not fill your redo log file.

What is hybrid slowly changing dimension?
Hybrid SCDs are combination of both SCD 1 and SCD 2.It may happen that in a table, some columns are important and we need to track changes for them i.e. capture the historical data for them whereas in some columns even if the data changes, we don't care.For such tables we implement Hybrid SCDs, where in some columns are Type 1 and some are Type 2.You can add that it is not an intelligent key but similar to a sequence number and tied to a timestamp typically!

Can a dimension table contain numeric values?
Yes. However, those data type will be char (only the values can numeric/char).Yes, dimensions even contain numerical because these are descriptive elements of our business.

How do you create Surrogate Key using Ab Initio?
There are many ways to create Surrogate key but it depends on your business logic. Here you can try these ways.1. Use next in sequence () function in your transform

2. Use Assign key values component (if your GDE is higher than 1.10)

3. Write a stored proc to this and call this store proc wherever you need.Yes, dimension table contains numeric but not contain measures and facts

What is the difference between star and snowflake schemas?
Star schema:
A single fact table with N number of DimensionSnowflake schema: Any dimensions with extended dimensions are known as snowflake schema.

What is a CUBE in data warehousing concept?
Cubes are logical representation of multidimensional data. The edge of the cube contains dimension members and the body of the cube contains data values.

What is the difference between Snowflake and Star Schema? What are situations where Snowflake Schema is better than Star Schema when the opposite is true?
Star schema contains the dimension tables mapped around one or more fact tables.It is a renormalized model and no need to use complicated joins. Also Queries results fast.Snowflake schema is the normalized form of Star schema. It contains in-depth joins, because the tables are spited in to many pieces. We can easily do modification directly in the tables.We have to use complicated joins, since we have more tables. There will be some delay in processing the Query.

What is ER Diagram?
The Entity-Relationship (ER) model was originally proposed by Peter in 1976 [Chen76] as a way to unify the network and relational database views. Simply stated the ER model is a conceptual data model that views the real world as entities and relationships. A basic component of the model is the Entity-Relationship diagram, which is used to visually represent data objects. Since Chen wrote his paper the model has been extended and today it is commonly used for database design for the database designer, the utility of the ER model is: it maps well to the relational model. The constructs used in the ER model can easily be transformed into relational tables. It is simple and easy to understand with a minimum of training. Therefore, the database designer to communicate the design to the end user can use the model. In addition, the model can be used as a design plan by the database developer to implement a data model in specific database management software.

What is degenerate dimension table?
Degenerate Dimensions: If a table contains the values, which r neither dimension nor measures is called degenerate dimensions. For example invoice id, employee no.A degenerate dimension is data that is dimensional in nature but stored in a fact table.

What is VLDB?
The perception of what constitutes a VLDB continues to grow. A one-terabyte database would normally be considered VLDB.Degenerate dimension: it does not have any link with dimensions and it will not have any attribute.

What is Dimensional Modeling?
Dimensional Modeling is a design concept used by many data warehouse designers to build their data warehouse. In this design model all the data is stored in two types of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measurements i.e., the dimensions on which the facts are calculated.Dimension modeling is a method for designing data warehouse. Three types of modeling are there

1. Conceptual modeling

2. Logical modeling

3. Physical modeling

What are the various ETL tools in the Market?
Various ETL tools used in market are Informatica Data Stage Oracle Warehouse Builder Ab Initio Data Junction

What are the possible data marts in Retail sales?
Product information and sales information

What is meant by metadata in context of a Data warehouse and how it is important?
Metadata is the data about data; Business Analyst or data modeler usually capture information about data - the source (where and how the data is originated), nature of data (char, varchar, nullable, existence, valid values etc) and behavior of data (how it is modified / derived and the life cycle) in data dictionary a.k.a metadata.

Metadata is also presented at the Datamart level, subsets, fact and dimensions, ODS etc. For a DW user, metadata provides vital information for analysis / DSS.

What is a linked cube?
Linked cube in which a sub-set of the data can be analyzed into detail. The linking ensures that the data in the cubes remain consistent.

What is surrogate key? Where we use it? Explain with examples.
Surrogate key is a substitution for the natural primary key.It is just a unique identifier or number for each row that can be used for the primary key to the table. The only requirement for a surrogate primary key is that it is unique for each row in the table.

Data warehouses typically use a surrogate, (also known as artificial or identity key), key for the dimension tables primary keys. They can use Info sequence generator, or Oracle sequence, or SQL Server Identity values for the surrogate key.

It is useful because the natural primary key (i.e. Customer Number in Customer table) can change and this makes updates more difficult.

Some tables have columns such as AIRPORT_NAME OR CITY_NAME which are stated as the primary keys (according to the business users) but ,not only can these change, indexing on a numerical value is probably better and you could consider creating a surrogate key called, say, AIRPORT_ID. This would be internal to the system and as far as the client is concerned, you may display only the AIRPORT_NAME.

What are the data types present in BO? What happens if we implement view in the designer n report?
Three different data types: Dimensions, Measure, and DetailView is nothing but an alias and it can be used to resolve the loops in the universe.

What are data validation strategies for data mart validation after loading process?
Data validation is to make sure that the loaded data is accurate and meets the business requirements. Strategies are different methods followed to meet the validation requirements.