Nuts & Bolts of DataStage: Interview Questions : DataWareHouse

What is BUS Schema?
BUS Schema is composed of a master suite of confirmed dimension and standardized definition if facts.

What are the methodologies of Data Warehousing?
Every company has methodology of their own. However, to name a few SDLC Methodology, AIM methodology is standard used.

What is conformed fact?
Conformed dimensions are the dimensions, which can be used across multiple Data Marts in combination with multiple facts tables accordingly

What is Difference between E-R Modeling and Dimensional Modeling?
Basic difference is E-R modeling will have logical and physical model. Dimensional model will have only physical model. E-R modeling is used for normalizing the OLTP database design.Dimensional modeling is used for de-normalizing the ROLAP/MOLAP design.

Why fact table is in normal form?
The fact table consists of the Index keys of the dimension/look up tables and the measures. So whenever we have the keys in a table. That it implies that the table is in the normal form.

What is the difference between view and materialized view?
View - store the SQL statement in the database and let you use it as a table. Every time you access the view, the SQL statement executes. Materialized view - stores the results of the SQL in table form in the database. SQL statement only executes once and after that every time you run the query, the stored result set is used. Pros include quick query results.

What is the advantages data mining over traditional approaches?
Data Mining is used for the estimation of future. For example, if we take a company/business organization, by using the concept of Data Mining, we can predict the future of business in terms of Revenue (or) Employees (or) Customers (or) Orders etc.Traditional approaches use simple algorithms for estimating the future. However, it does not give accurate results when compared to Data Mining.

What is a Hash Cluster?
A row is stored in a hash cluster based on the result of applying a hash function to the row's cluster key value. All rows with the same hash key value are stores together on disk.

What is a Private Synonyms?
A Private Synonyms can be accessed only by the owner.

What is Database Link?
A database link is a named object that describes a "path" from one database to another.

What is a Table space?
A database is divided into Logical Storage Unit called table spaces. A table space is used to grouped related logical structures together.

What is Rollback Segment?
A Database contains one or more Rollback Segments to temporarily store "undo" information.

What are the Characteristics of Data Files?
A data file can be associated with only one database. Once created a data file can't change size. One or more data files form a logical unit of database storage called a table space.

How do you define Data Block size?
A data block size is specified for each ORACLE database when the database is created. A database users and allocated free database space in ORACLE data blocks. Block size is specified in INIT.ORA file and can't be changed latter.

What does a Control file Contain?
A Control file records the physical structure of the database. It contains the following information. Database Names and locations of a database's files and redo log files. And Time stamp of database creation.

What is the effect of setting the value "CHOOSE" for OPTIMIZER_GOAL, parameter of the ALTER SESSION Command?
The Optimizer chooses Cost based approach and optimizes with the goal of best throughput if statistics for at least one of the tables accessed by the SQL statement exist in the data dictionary. Otherwise the OPTIMIZER chooses RULE based approach.

What is the function of Optimizer?
The goal of the optimizer is to choose the most efficient way to execute a SQL statement.

What is Execution Plan?
The combination of the steps the optimizer chooses to execute a statement is called an execution plan.

What are the different approaches used by Optimizer in choosing an execution plan?
Rule-based and Cost-based

What is the difference between OLAP and OLTP?
OLAP stands for online analytical processing. In this, we have access to live data. This process contains historical information to analyze. Data needs to be integrated. We can create reports that are multi-dimensional, supported by time-based analysis and ideal for applications with unpredictable, ad hoc query requirements.OLTP stands for online transaction processing. OLTP databases are fully normalized and are designed for consistently store operational data, one transaction at a time. It performs day-to -day operations and not support historical data.

What is the difference between aggregate table and materialized view?
Aggregate tables are pre-computed totals in the form of hierarchical multidimensional structure., whereas materialized view ,is an database object which caches the query result in a concrete table and updates it from the original database table from time to time .Aggregate tables are used to speed up the query computing whereas materialized view speed up the data retrieval .

"A dimension table is wide but the fact table is deep," Explain the statement in your own words.
Dimension table has got all the detail in formations of their respective table ,for egg, customer dimension table will contain all the related info about customers whereas fact table contains the main data, which contains the surrogate keys of every dimension; along with other measures.

What is a data profile?
Data profiling is a way to find out what is the profile of the information contained in the source. E.g. In a table a column may be defined as alphanumeric. However, majority of the data may be numeric. Profiling tools will provide the statistical information about how many records have pure no. populated as against no. of records with alphanumeric data.Before data migration exercise, these tools provide vital clues about whether the exercise is going to be a success or a failure. This can help is changing the target schema or applying cleanse at the source level so that most of the records can get in the destination database.In DW these tools are used at the design stage for the same purpose. Some tool vendors who sell this as a product call this as data discovery phase.

Explain the advantages of RAID 1, 1/0, and 5. what type of RAID setup would you put your TX logs.
The basic advantage of RAID is to speed up the data reading from permanent storage device (hard disk).

What is the difference between mapping parameter & mapping variable in data warehousing?
Mapping Parameter defines the constant value and it cannot change the value throughout the session.Mapping Variables defines the value and it can be change throughout the session

What is the difference between metadata and data dictionary?
Meta data is nothing but information about data. It contains the information (i.e. data) about the graphs, its related files, abinitio commands, server information etc i.e. all kinds of information about project related information etc.

What is Virtual Data Warehousing?
A virtual or point-to-point data warehousing strategy means that end-users are allowed to get at operational databases directly using whatever tools are enabled to the "data access network"

Which technology should be used for interactive data querying across multiple dimensions for a decision making for a DW?
MOLAP

What is the difference between dependent data warehouse and independent data warehouse?
Dependent departments are those, which depend on a data ware to for their data.Independent department are those, which get their data directly from the operational data sources in the organization.

What is the main difference between star and snowflake star schema? Which one is better and why?
If u have one to may relation ship in the data then only we choose snowflake schema, as per the performance-wise every-one go for the Star schema. Moreover, if the ETL is concerned with reporting means choose for snowflake because this schema provides more browsing capability than the former schema.

What is critical column?
Let us take one ex: Suppose 'XYZ' is customer in Bangalore, he was residing in the city from the last 5 years, in the period of 5 years he has made purchases worth of 3 lacs. Now, he moved to 'HYD'. When you update the 'XYZ' city to 'HYD' in your Warehouse, all the purchases by him will show in city 'HYD' only. This makes warehouse inconsistent. Here CITY is the Critical Column. Solution is use Surrogate Key.

What is junk dimension? What is the difference between junk dimension and degenerated dimension?
Junk dimension: Grouping of Random flags and text attributes in a dimension and moving them to a separate sub dimension. Degenerate Dimension: Keeping the control information on Fact table ex: Consider a Dimension table with fields like order number and order line number and have 1:1 relationship with Fact table, In this case this dimension is removed and the order information will be directly stored in a Fact table in order eliminate unnecessary joins while retrieving order information.

What is the main difference between Inmon and Kimball philosophies of data warehousing?
Both differed in the concept of building the data warehouse.According to Kimball, Kimball views data warehousing as a constituency of data marts. Data marts are focused on delivering business objectives for departments in the organization. And the data warehouse is a conformed dimension of the data marts. Hence, a unified view of the enterprise can be obtained from the dimension modeling on a local departmental level.Inmon beliefs in creating a data warehouse on a subject-by-subject area basis. Hence, the development of the data warehouse can start with data from the online store. Other subject areas can be added to the data warehouse as their needs arise. Point-of-sale (POS) data can be added later if management decides it is necessary.

What is Data warehousing Hierarchy?
Hierarchies are logical structures that use ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation. For example, in a time dimension, a hierarchy might aggregate data from the month level to the quarter level to the year level. A hierarchy can also be used to define a navigational drill path and to establish a family structure.Within a hierarchy, each level is logically connected to the levels above and below it. Data values at lower levels aggregate into the data values at higher levels. A dimension can be composed of more than one hierarchy. For example, in the product dimension, there might be two hierarchies--one for product categories and one for product suppliers.Dimension hierarchies also group levels from general to granular. Query tools use hierarchies to enable you to drill down into your data to view different levels of granularity. This is one of the key benefits of a data warehouse.When designing hierarchies, you must consider the relationships in business structures. Hierarchies impose a family structure on dimension values. For a particular level value, a value at the next higher level is its parent, and values at the next lower level are its children. These familial relationships enable analysts to access data quickly.

Like the Facebook Page & join Group
https://www.facebook.com/DataStage4you
https://www.facebook.com/groups/DataStage4you

Nuts & Bolts of DataStage

Tuesday, May 27, 2014

Interview Questions : DataWareHouse - Part6