Slowly Changing Dimension/SCD in datastage

To understand what is Slowly Changing Dimension, we first understand these:
Most frequently, we insert into Fact tables . Very infrequently we update the facts that were loaded incorrectly. It is even less likely to delete rows from the fact table; the only time this might happen is if you wanted to archive the old data that is no longer relevant for the current portion of the data warehouse.
Dimension tables, on the other hand are likely to see frequent updates. A classic example of this is a consumer dimension. People can change names, phone numbers, addresses, etc regularly.

Dimensions that are changing over time are referred to as slowly changing dimensions (SCD).

Slowly Changing Dimension

Slowly Changing Dimension

Type 1: override the existing value

Customer_key Customer_name Customer_city Customer_state
123 Ms. Brown Houston Texas
123 Ms. Brown Dallas Texas
Slowly Changing Dimension - SCD Type 1

Slowly Changing Dimension – SCD Type 1

Type 2: create a new record with effective dates

Customer_key Customer_name Customer_city Customer_state Effective_date Obsolete_date
123 Ms.Brown Houston Texas 1/1/2000 1/1/2005
234 Ms. Brown Dallas Texas 1/1/2005 NULL
Slowly Changing Dimension - SCD Type 3

Slowly Changing Dimension – SCD Type 3

Type 3: create a new and old columns, no new rows

Customer_key Customer_name Customer_city Customer_state Change_date New_city
123 Ms. Brown Houston Texas 1/1/2005 Dallas
Slowly Changing Dimension - SCD Type 3

Slowly Changing Dimension – SCD Type 3

Materialized view

A materialized view is like a query with a result that is materialized and stored in a table. When a user query is found compatible with the query associated with a materialized view, the user query can be rewritten in terms of the materialized view. This technique improves the execution of the user query, because most of the query result has been pre-computed. The query transformer looks for any materialized views that are compatible with the user query and selects one or more materialized views to rewrite the user query. The use of materialized views to rewrite a query is cost-based. That is, the query is not rewritten if the plan generated without the materialized views has a lower cost than the plan generated with the materialized views.


Star Schema vs Snowflake Schema

Star schema is preferred

Why:
Why because star schema is simple queries and logic, but snowflake schema is good for complex queries and logic.
If the table space is not enough to maintain a STAR schema then we should go for snowflake instead of STAR Schema.
I.e. The table should be spited in to multiple tables.

Eg: If you want to maintain time data in one table like year, month, day in one table in star schema, you need to spit this data into 3 tables like year, month, and day in snowflake schema.

Difference between Star-Schema and Snowflake Schema

Star-Schema:
  1. A Centralized fact table surrounded by n number of dimensions is known as fact table.
Star-Schema contains one fact table and one or more dimension tables.
  1. Its Normalized model.
  2. Easy to use and understand, because of fewer joins contains.
  3. More time for execution, because of more joins.
Snowflake Schema:
1) Any dimension with extended dimensions are known as snowflake schema.
2) Its De-Normalized model.
3) End users will get confused, because of complicated joins.

4) Execution is less, because of more joins. 

Difference between SCD Type2 and SCD Type3

Type2 SCD:
Type2 SCD maintains full historical information by adding a new row to the existing row.
In DWH we use to implement the type2 SCD because it maintains the full historical information.
One disadvantage with type2 SCD is the size of the database is going to increase hence it shows the performance degrader.
Type3 SCD:

It maintains the partial history by adding a new column to the existing row.

Types of Dimension Tables

Conformed Dimension: If a dimension table is connected to more than one fact table, the granularity that is defined in the dimension table is common across the fact tables.

Junk Dimension: The Dimension table, which contains only flags.

Monster Dimension: If rapidly changes in Dimension are known as Monster Dimension.


De-generative Dimension: It is line item-oriented fact table design.

Surrogate Key

It is a 4-byte integer which replaces the transaction / business / OLTP key in the dimension table. We can store up to 2 billion record.

It is used for integrating the data may help better for primary key.


Index maintenance, joins, table size, key updates, disconnected inserts and partitioning.

Snowflake schema

It is a normalized multi dimensional model. It contains centralized fact table surrounded by normalized dimensional tables (Dimension table further divided in to one or more dimensional tables in a hierarchy)

Star Schema

Star Schema is a de-normalized multi-dimensional model. It contains centralized fact tables surrounded by dimensions table.

Dimension Table: It contains a primary key and description about the fact table.


Fact Table: It contains foreign keys to the dimension tables, measures and aggregates.