Datastage | Datastage

September 24, 2013

Slowly Changing Dimension/SCD in datastage

To understand what is Slowly Changing Dimension, we first understand these:

Most frequently, we insert into Fact tables . Very infrequently we update the facts that were loaded incorrectly. It is even less likely to delete rows from the fact table; the only time this might happen is if you wanted to archive the old data that is no longer relevant for the current portion of the data warehouse.

Dimension tables, on the other hand are likely to see frequent updates. A classic example of this is a consumer dimension. People can change names, phone numbers, addresses, etc regularly.

Dimensions that are changing over time are referred to as slowly changing dimensions (SCD).

Slowly Changing Dimension

Type 1: override the existing value

Customer_key	Customer_name	Customer_city	Customer_state
123	Ms. Brown	Houston	Texas
123	Ms. Brown	Dallas	Texas

Slowly Changing Dimension – SCD Type 1

Type 2: create a new record with effective dates

Customer_key	Customer_name	Customer_city	Customer_state	Effective_date	Obsolete_date
123	Ms.Brown	Houston	Texas	1/1/2000	1/1/2005
234	Ms. Brown	Dallas	Texas	1/1/2005	NULL

Slowly Changing Dimension – SCD Type 3

Type 3: create a new and old columns, no new rows

Customer_key	Customer_name	Customer_city	Customer_state	Change_date	New_city
123	Ms. Brown	Houston	Texas	1/1/2005	Dallas

Slowly Changing Dimension – SCD Type 3

September 24, 2013

Materialized view

A materialized view is like a query with a result that is materialized and stored in a table. When a user query is found compatible with the query associated with a materialized view, the user query can be rewritten in terms of the materialized view. This technique improves the execution of the user query, because most of the query result has been pre-computed. The query transformer looks for any materialized views that are compatible with the user query and selects one or more materialized views to rewrite the user query. The use of materialized views to rewrite a query is cost-based. That is, the query is not rewritten if the plan generated without the materialized views has a lower cost than the plan generated with the materialized views.

September 17, 2013

Different Dimensional modelings Techniques Available

There are 4 types of dimensional modelling available.

Entity Relationship modeling.
Dimensional modelling.
Hierarchal modelling
Network modelling

—————–

September 17, 2013

Star Schema vs Snowflake Schema

Star schema is preferred

Why:

Why because star schema is simple queries and logic, but snowflake schema is good for complex queries and logic.

If the table space is not enough to maintain a STAR schema then we should go for snowflake instead of STAR Schema.

I.e. The table should be spited in to multiple tables.

Eg: If you want to maintain time data in one table like year, month, day in one table in star schema, you need to spit this data into 3 tables like year, month, and day in snowflake schema.

September 17, 2013

Difference between Star-Schema and Snowflake Schema

Star-Schema:

A Centralized fact table surrounded by n number of dimensions is known as fact table.

Star-Schema contains one fact table and one or more dimension tables.

Its Normalized model.
Easy to use and understand, because of fewer joins contains.
More time for execution, because of more joins.

Snowflake Schema:

1) Any dimension with extended dimensions are known as snowflake schema.

2) Its De-Normalized model.

3) End users will get confused, because of complicated joins.

4) Execution is less, because of more joins.

September 15, 2013

Difference between SCD Type2 and SCD Type3

Type2 SCD:

Type2 SCD maintains full historical information by adding a new row to the existing row.

In DWH we use to implement the type2 SCD because it maintains the full historical information.

One disadvantage with type2 SCD is the size of the database is going to increase hence it shows the performance degrader.

Type3 SCD:

It maintains the partial history by adding a new column to the existing row.

September 15, 2013

Types of Dimension Tables

Conformed Dimension: If a dimension table is connected to more than one fact table, the granularity that is defined in the dimension table is common across the fact tables.

Junk Dimension: The Dimension table, which contains only flags.

Monster Dimension: If rapidly changes in Dimension are known as Monster Dimension.

De-generative Dimension: It is line item-oriented fact table design.

September 15, 2013

Surrogate Key

It is a 4-byte integer which replaces the transaction / business / OLTP key in the dimension table. We can store up to 2 billion record.

It is used for integrating the data may help better for primary key.

Index maintenance, joins, table size, key updates, disconnected inserts and partitioning.

September 15, 2013

Snowflake schema

It is a normalized multi dimensional model. It contains centralized fact table surrounded by normalized dimensional tables (Dimension table further divided in to one or more dimensional tables in a hierarchy)

September 15, 2013

Star Schema

Star Schema is a de-normalized multi-dimensional model. It contains centralized fact tables surrounded by dimensions table.

Dimension Table: It contains a primary key and description about the fact table.

Fact Table: It contains foreign keys to the dimension tables, measures and aggregates.