Slowly Changing Dimension/SCD in datastage

To understand what is Slowly Changing Dimension, we first understand these:
Most frequently, we insert into Fact tables . Very infrequently we update the facts that were loaded incorrectly. It is even less likely to delete rows from the fact table; the only time this might happen is if you wanted to archive the old data that is no longer relevant for the current portion of the data warehouse.
Dimension tables, on the other hand are likely to see frequent updates. A classic example of this is a consumer dimension. People can change names, phone numbers, addresses, etc regularly.

Dimensions that are changing over time are referred to as slowly changing dimensions (SCD).

Slowly Changing Dimension

Slowly Changing Dimension

Type 1: override the existing value

Customer_key Customer_name Customer_city Customer_state
123 Ms. Brown Houston Texas
123 Ms. Brown Dallas Texas
Slowly Changing Dimension - SCD Type 1

Slowly Changing Dimension – SCD Type 1

Type 2: create a new record with effective dates

Customer_key Customer_name Customer_city Customer_state Effective_date Obsolete_date
123 Ms.Brown Houston Texas 1/1/2000 1/1/2005
234 Ms. Brown Dallas Texas 1/1/2005 NULL
Slowly Changing Dimension - SCD Type 3

Slowly Changing Dimension – SCD Type 3

Type 3: create a new and old columns, no new rows

Customer_key Customer_name Customer_city Customer_state Change_date New_city
123 Ms. Brown Houston Texas 1/1/2005 Dallas
Slowly Changing Dimension - SCD Type 3

Slowly Changing Dimension – SCD Type 3

What is DataWarehouse?

A data warehouse is a central integrated database containing data from all the operational sources and archive systems in an organization. It contains a copy of transaction data specifically structured for query analysis. This database can be accessed by all users, ensuring that each group in an organization is accessing valuable, stable data
Data Warehouse
A Data Warehouse Is A Structured Repository of Historic Data.
It Is Developed in an Evolutionary Process by Integrating Data from Non-integrated Legacy Systems.
It Is Usually:
ü    Subject Oriented
ü    Integrated
ü    Time Variant
ü    Non-volatile

 Next: Under stand Why do we need a Data warehouse

OLAP cubes

OLAP – What is OLAP?

OLAP stands for On-Line Analytical Processing. OLAP is a technology that is used to organize large business databases and support business intelligence. OLAP databases are divided into one or more cubes, and each cube is organized and designed by a cube administrator to fit the way that you retrieve and analyze data so that it is easier to create.

OLAP

OLAP

 

OLAP database & cubes

Online Analytical Processing (OLAP) databases facilitate business-intelligence queries. OLAP is a database technology that has been optimized for querying and reporting, instead of processing transactions. The source data for OLAP is Online Transactional Processing (OLTP) databases that are commonly stored in data warehouses. OLAP data is derived from this historical data, and aggregated into structures that permit sophisticated analysis. OLAP data is also organized hierarchically and stored in cubes instead of tables. It is a sophisticated technology that uses multidimensional structures to provide rapid access to data for analysis.

OLAP cubes

OLAP cubes

 

 

OLAP – volumes, complexity

Online Analytical Process (OLAP) is a characterized by relatively low volume of transactions. Actually the queries are often very complex. In the OLAP System response time more. In OLAP Database there is Aggregated, historical Inf. Data , stored in multi-dimensional schemas.

The first attempt to provide a definition to OLAP was by Dr. Codd, who proposed 12 rules for OLAP.

Not to be confused with OLTP (Online Transaction processing)


For people on the business side, the key feature out of the above list is “Multidimensional.” In other words, the ability to analyze metrics in different dimensions such as time, geography, gender, product, etc. For example, sales for the company are up. What region is most responsible for this increase? Which store in this region is most responsible for the increase? What particular product category or categories contributed the most to the increase? Answering these types of questions in order means that you are performing an OLAP analysis.

Depending on the underlying technology used, OLAP can be broadly divided into two different camps: MOLAP and ROLAP.