September 24, 2013 Slowly Changing Dimension/SCD in datastage To understand what is Slowly Changing Dimension, we first understand these: Most frequently, we insert into Fact tables . Very infrequently we update the facts that were loaded incorrectly. It is even less likely to delete rows from the fact table; the only time this might happen is if you wanted to archive the old data that is no longer relevant for the current portion of the data warehouse. Dimension tables, on the other hand are likely to see frequent updates. A classic example of this is a consumer dimension. People can change names, phone numbers, addresses, etc regularly. Dimensions that are changing over time are referred to as slowly changing dimensions (SCD). Slowly Changing Dimension Type 1: override the existing value Customer_key Customer_name Customer_city Customer_state 123 Ms. Brown Houston Texas 123 Ms. Brown Dallas Texas Slowly Changing Dimension – SCD Type 1 Type 2: create a new record with effective dates Customer_key Customer_name Customer_city Customer_state Effective_date Obsolete_date 123 Ms.Brown Houston Texas 1/1/2000 1/1/2005 234 Ms. Brown Dallas Texas 1/1/2005 NULL Slowly Changing Dimension – SCD Type 3 Type 3: create a new and old columns, no new rows Customer_key Customer_name Customer_city Customer_state Change_date New_city 123 Ms. Brown Houston Texas 1/1/2005 Dallas Slowly Changing Dimension – SCD Type 3
September 24, 2013 Materialized view A materialized view is like a query with a result that is materialized and stored in a table. When a user query is found compatible with the query associated with a materialized view, the user query can be rewritten in terms of the materialized view. This technique improves the execution of the user query, because most of the query result has been pre-computed. The query transformer looks for any materialized views that are compatible with the user query and selects one or more materialized views to rewrite the user query. The use of materialized views to rewrite a query is cost-based. That is, the query is not rewritten if the plan generated without the materialized views has a lower cost than the plan generated with the materialized views.
September 17, 2013 Different Dimensional modelings Techniques Available There are 4 types of dimensional modelling available. Entity Relationship modeling. Dimensional modelling. Hierarchal modelling Network modelling —————–
September 17, 2013 Star Schema vs Snowflake Schema Star schema is preferred Why: Why because star schema is simple queries and logic, but snowflake schema is good for complex queries and logic. If the table space is not enough to maintain a STAR schema then we should go for snowflake instead of STAR Schema. I.e. The table should be spited in to multiple tables. Eg: If you want to maintain time data in one table like year, month, day in one table in star schema, you need to spit this data into 3 tables like year, month, and day in snowflake schema.
September 17, 2013 Difference between Star-Schema and Snowflake Schema Star-Schema: A Centralized fact table surrounded by n number of dimensions is known as fact table. Star-Schema contains one fact table and one or more dimension tables. Its Normalized model. Easy to use and understand, because of fewer joins contains. More time for execution, because of more joins. Snowflake Schema: 1) Any dimension with extended dimensions are known as snowflake schema. 2) Its De-Normalized model. 3) End users will get confused, because of complicated joins. 4) Execution is less, because of more joins.
September 15, 2013 Difference between SCD Type2 and SCD Type3 Type2 SCD: Type2 SCD maintains full historical information by adding a new row to the existing row. In DWH we use to implement the type2 SCD because it maintains the full historical information. One disadvantage with type2 SCD is the size of the database is going to increase hence it shows the performance degrader. Type3 SCD: It maintains the partial history by adding a new column to the existing row.
September 15, 2013 Types of Dimension Tables Conformed Dimension: If a dimension table is connected to more than one fact table, the granularity that is defined in the dimension table is common across the fact tables. Junk Dimension: The Dimension table, which contains only flags. Monster Dimension: If rapidly changes in Dimension are known as Monster Dimension. De-generative Dimension: It is line item-oriented fact table design.
September 15, 2013 Surrogate Key It is a 4-byte integer which replaces the transaction / business / OLTP key in the dimension table. We can store up to 2 billion record. It is used for integrating the data may help better for primary key. Index maintenance, joins, table size, key updates, disconnected inserts and partitioning.
September 15, 2013 Snowflake schema It is a normalized multi dimensional model. It contains centralized fact table surrounded by normalized dimensional tables (Dimension table further divided in to one or more dimensional tables in a hierarchy)
September 15, 2013 Star Schema Star Schema is a de-normalized multi-dimensional model. It contains centralized fact tables surrounded by dimensions table. Dimension Table: It contains a primary key and description about the fact table. Fact Table: It contains foreign keys to the dimension tables, measures and aggregates.