Datastage | Datastage

September 7, 2013

Project transition from one vendor to another – Execution Phase Activities

Organization Chart
Prepare project organization chart
Support Groups
Interact with operations/helpdesk/testing/application teams
Interact with business groups
Existing process and environment
Study the existing hardware and software environments (Production/Development/Testing)
Study the maintenance/development/testing process (flow) and methodology
Study escalation process, SLA/QA processes
Create OPD and Induction Manual
Support Tools
Identify existing support tools
Identify transition support tools
Introduce transition team to support tools
Update Induction Manual
Configuration Management
Understand existing configuration management procedures and tools
Update Configuration management procedure in OPD and modify Induction Manual
Incorporate review changes
Try check-in and check-out procedures on the tool
Testing Procedures
Identify Testing team from Client
Contact testing team for procedures
Study SIT set up process
Study UAT set up process
Study data population process
Study other Testing activities
Document Testing Procedures in OPD
Incorporate review changes
Access procedure check
Check if individual team members can access modules, tools, applications
Document analysis and Organization (for each module)
Identify the availability of Business Requirement documents
Identify Technical Specifications available
Prepare System Description Document
Prepare Overview docs on missing sub module specifications
Collect data from interviews about documents present
Prepare Documents reference list
Incorporate review changes
Application System Study
Detailed presentation by Client
Question and answer session
Study the business flow and process
Study the application for functionality and design
Collect Inventory and Identify the boundary of the system
Transactions study and analysis – screen navigation
Study batch jobs (daily, weekly, monthly, annual, adhoc etc)
Study the reports and forms
Batch process (copy books, programs, JCLs, procedures)
Study job flow dependencies
Study interface systems
Obtain recovery instructions
Update OPD, SDD and Induction Manual
Create business/screen/job/data flow diagrams
Check errors from Casual Analysis Report (if available)
Database analysis (for each module)
Collect Data Model
Understand current Data Model and organization
Identify external system interfaces
Update SDD
Update Inventory (for each module)
Prepare/update the System Inventory Document
Backlog Management
Identify unresolved issues in the system
Coding Standards
Study documents on existing coding standards
Prepare/enhance coding standards document for future development and enhancements
Incorporate review changes
Release Coding Standards Document
Analysis of Maintenance History
Identify number of problems and enhancements performed over a period of time
Study the manner in which the problems were closed
Review error prone and critical software items
Review test plans and test specifications
Prepare high level document on understanding and outcome of analysis
Prepare a detailed activity list on Maintenance and Support and Development that are carried out by Client and update OPD
Analysis of Help-Desk Methodology
Overview of the system
Overview of Helpdesk Methodology
Understanding the Miscellaneous Helpdesk activities (Daily Batch Checks, Updates to Call DB, etc)
Identify number of issues resolved
Study how problems were received/resolved/closed
Review error prone and critical software items
Identify unresolved issues in the system
Prepare a detailed activity list on helpdesk activities that are carried out by Client
Operations Procedures Study
Study Operations activities and functionalities
Study Operation tools and procedures
Study the JCL and Procedure libraries
Study housekeeping jobs and environment related jobs
Study coding standards and naming conventions
Prepare Operations Procedures document
Incorporate review changes
Cross Over criteria
Create Cross Over Document
Incorporate review changes
Release Cross Over Document
Defect Logging and Monitoring
Briefing on existing procedures
Prepare Defect Logging and Monitoring procedures for Acceptance Phase in OPD
Incorporate review changes in the Defect Logging and Monitoring procedures
Acceptance Procedures
Identify contacts for deciding on Acceptance Procedures
Discuss current procedures and sign-off
Document Acceptance Procedures in OPD
Incorporate review changes
Existing Vendor interaction
Identify contacts from existing vendors
Gather system details and product information
Identify key vendor resources
Document Vendor Liaison Procedures in OPD
System Description documentation (SDD)
Finalize the System Description Documentation
Incorporate review changes
Release the System Description Documentation
Finalize Manuals
Finalize Induction Manual
Incorporate review changes
Finalize OPD
Incorporate review changes
Release Induction Manual and OPD
Study existing development projects
Study the existing projects
Prepare a development take-on plan for assuming responsibility of development
Off-Shore set-up procedures
Identify procedures and infrastructure required for Offshore
Offshore set-up requirements review by Client
Start setting up New Vendor offshore based on requirements
Transition Plan (Progress Monitoring)
Update Transition Plan based on the project progress and changes in scope of the project
Weekly Reviews and Play back Session

September 7, 2013

Project transition from one vendor to another – Startup Activities

Kick-off Meeting
Discuss Approach, Procedures and detailed plan
Update procedures in detailed plan, if required
Induction of Team

September 7, 2013

Project transition from Vendor – Activities for Acceptance

Refine SLAs and Estimation guidelines (if required)
Cross-Over criteria have been met
Transfer of responsibility to new vendor

September 7, 2013

Datastage Coding Checklist

Category	Checkpoint
General
General	Ensure that the null handling properties are taken care for all the nullable fields. Do not set the null field value to some value which may be present in the source.
General	Ensure that all the character fields are trimmed before any processing. Normally extra spaces in the data may lead to some errors like lookup mismatch which are hard to detect.
General	Always save the metadata (for source, target or lookup defi nitions) in the repository to ensure reusability and consistency.
General	In case the partition type for the next immediate stage is to be changed then the
General	‘Propagate partition’ should be set to ‘Clear’ in the current stage.
General	Make sure that appropriate partitioning and sorting are used in the stages, where ever possible. This enhances the performances. Make sure that you understand the partitioning being used. Otherwise leave it auto.
General	Make sure that the pathname/format details are not hard coded and job parameters are used for the same. These details are generally set as environmental variable.
General	Ensure that all fi le names from external source are parameterized. This will prevent the developer from the trouble of changing the job or file name if the file name is changed. File names/Datasets created in the job for intermediate purpose can be hard coded.
General	Ensure that the environment variable $APT_DISABLE_COMBINATION is set to ‘False’.
General	Ensure that $APT_STRING_PADCHAR is set to spaces.
General	The parameters used across the jobs should be with same name. This helps to avoid unnecessary confusions
General	Use 4-node confi guration fi le for unit testing/system testing the job.
General	If there are multiple jobs to be run for the same module. Archive the source fi les in the after job routine of the last job.
General	Check whether the fi le exists in the landing directory before moving the sequential file.The ‘mv’ command will move the landing directory if the fi le is not found.
General	Verify whether the appropriate after job routine is called in the job.
General	Verify the correct link counts are used in the after job routine for ACR log fi le.
General	Check whether the log statements are correct for that job.
General	Ensure that the unix fi les created by any Datastage job is created by the same unix user who has run the job.
General	Check the Director log if the error message does not have readability.
General	Verify job name, stage name, link name, input fi le name are as per standards. Ensure that the job developed adhere to the naming standards defi ned for the software artifacts.
General	Job description must be clear and readable.
General	Make sure that the Short Job Description is filled using ‘Description Annotation’ and it contains the job name as part of the description. Don’t use Annotation for putting the job description.
General	Check that the parameter values assigned to the jobs through sequencer.
General	Verify if Runtime Column Propagation (RCP) is disabled or not.
File Stages Checklist
File Stages Checklist	Ensure that reject links are output from the sequential file stage which reads the data file to log the records which are rejected.
File Stages Checklist	Check whether the dataset are used instead of sequential fi le for intermediate storage between the jobs. This enhances performance in a set of linked jobs.
File Stages Checklist	Reject records should be stored as sequential files. This helps in the analysis of rejected records outside the datastage easier.
File Stages Checklist	Ensure that the dataset from another job use the same metadata which is saved in the repository.
File Stages Checklist	Verify that the intermediate files that are used by downstream jobs have unix read access/permission to all users.
File Stages Checklist	For fixed width files, final delimiter should be set to “none” in the file format property.
File Stages Checklist	Verify that all lookup reference files have unix permission as 744. This will ensure that other users don’t overwrite or delete the reference fi le.
Processing Stages Checklist
Processing Stages Checklist	The order of stage variables should be in correct order. Eg: – A stage variable used in calculation should be in the higher order than the calculation variable.
Processing Stages Checklist	If any processing stage requires a key ( like remove duplicate, merge, join, etc ) the Keys, sorting keys and Partitioning keys should be same and in the same order
Processing Stages Checklist	Make sure that sparse lookup are not used when large volumes of data are handled.
Processing Stages Checklist	Check look up keys, if they are correct.
Processing Stages Checklist	Do not validate on a null fi eld in a transformer. Use appropriate data types for the Stage variables. Use IsNull(), IsNotNull() or Seq() for doing such validations.
Processing Stages Checklist	In Funnel, all the input links must be hash partitioned on the sort keys.
Processing Stages Checklist	Verify if any Transformer is set to run in sequential mode. It should be in parallel mode.
Processing Stages Checklist	RCP should be enabled in the Copy stage before shared container.
Processing Stages Checklist	Verify whether column generated using column generator is created using ‘part’ and ‘part count’.
Processing Stages Checklist	In Remove Duplicate stage, ensure that the correct record (according to the requirements) is retained.
Database Stages Checklist
Database Stages Checklist	Every database object referenced is accessed through the parameter schema name.
Database Stages Checklist	Always reference database object using the schema name.
Database Stages Checklist	Use Upper Case for column names and table names in SQL queries.
Job Sequencer Checklist
Job Sequencer Checklist	Check that the parameter values are assigned to the jobs through sequencer
Job Sequencer Checklist	For every Job Activity stage in sequencer, ensure that “Reset if required, then run” is selected where relevant.

September 5, 2013

Implementing SCD II in Datastage 7.X

SCD II can be achieved in Datasatge with the help of two fields’ Effective date (EFCT_DT) and Expiry date (EXPR_DT). Through these two fields we can keep track of the active records and expired records.

For identifying the active records we need to assign a standard high data EXPR_DT (e.g. 2999-12-31).We can assume to use 2999-12-31 as EXPR_DT for active records in our example. So for identifying the active records you need to query the table for records with EXPR_DT=’2999-12-31’.

For identifying the expired records we need to query the table for records with EXPR_DT<>’2999-12-31’.

The following block diagram shows the layout to implement SCD II in Datastage:

September 5, 2013

Steps to be followed for implementing SCD II

Read the incoming records through any input stage like sequential file/dataset/table.

• Do the required processing for the incoming data.

• After the above processing step, pass the data into the change capture stage.

• The change capture should be having two input links- one is the before dataset and the other is the after dataset. For our job, the before dataset should be the active records present in the table. The active records are all those records which are having EXPR_DT=’2999-12-31’. The after dataset will be the incoming data passed into change capture after all the necessary processing.

• The change capture stage compared the before dataset and after dataset and produces 4 change_codes for each of the records. The 4 change codes are as follows:

“0” – Copy code (The code indicates the after record is a copy of the before record)

“1”-Insert code (The code indicates a new record has been inserted in the after set that did not exist in the before set.)

“2”-Delete code(The code indicates that a record in the before set has been deleted from the after set)

“3”-Edit code(the code indicates the after record is an edited version of the before record)

The copy records are not passed in the change captured stage as since we need only edited, insert records fro SCD II implementation.

• Use a filter stage to separate the records that needs to be expired and inserted.

• Filter the records with change_code = “1 or 3” into the insert records link. Filter the records with change_code=” 3” into update/expiry link.

• The records with change_code=3 are edited records. So the original records corresponding to these edited records are to be made in-active (expired). We can make the records inactive by changing the EXPR_DT<> ‘2999-12-31’.So to make the record inactive change the EXPR_DT with a valid date. For e.g. you can use make the EXPR_DT as the date one less than the date on which you are loading the data into the table. We will assume that we are loading the data on 2008-08-15.So the EXPR_DT for inactive records would become ‘2008-08-14’. The date 2008-08-15 can be made as the EFCT_DT for records to be inserted.

• To get the original records which needs to be expired, “look-up” the target table for all the records with change_code=3 which are filtered out separately. Get the original record along with the EFCT_DT of the original record. Then update the records EXPR_DT to ‘2008-08-14’ in the table. Now the original records are made inactive (expired).

• The new updated record (change_code=3) needs to be in table along with the new insert records(change_code=1).This data is filtered out from the “filter” stage and inserted into the table with EFCT_DT=”Data of loading” i.e. “2008-08-15” and EXPR_DT=”2999-12-31”

September 5, 2013

Type 3 Slowly Changing Dimension

In Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular attribute of interest, one indicating the original value, and one indicating the current value. There will also be a column that indicates when the current value becomes active.

In our example, recall we originally have the following table:

Customer Key	Name	State
1001	Christina	Illinois

To accommodate Type 3 Slowly Changing Dimension, we will now have the following columns:

· Customer Key

· Name

· Original State

· Current State

· Effective Date

After Christina moved from Illinois to California, the original information gets updated, and we have the following table (assuming the effective date of change is January 15, 2003):

Customer Key	Name	Original State	Current State	Effective date
1001	Christina	Illinois	California	15-Jan-2003

Advantages:

– This does not increase the size of the table, since new information is updated.

– This allows us to keep some part of history.

Disadvantages:

– Type 3 will not be able to keep all history where an attribute is changed more than once. For example, if Christina later moves to Texas on December 15, 2003, the California information will be lost.

Usage:

Type 3 is rarely used in actual practice.

When to use Type 3:

Type III slowly changing dimension should only be used when it is necessary for the data warehouse to track historical changes, and when such changes will only occur for a finite number of time.

September 5, 2013

Type 2 Slowly Changing Dimension

In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The new record gets its own primary key.

In our example, recall we originally have the following table:

Customer Key	Name	State
1001	Christina	Illinois

After Christina moved from Illinois to California, we add the new information as a new row into the table:

Customer Key	Name	State
1001	Christina	Illinois
1005	Christina	California

Advantages:

– This allows us to accurately keep all historical information.

Disadvantages:

– This will cause the size of the table to grow fast. In cases where the number of rows for the table is very high to start with, storage and performance can become a concern.

– This necessarily complicates the ETL process.

Usage:

About 50% of the time.

When to use Type 2:

Type 2 slowly changing dimension should be used when it is necessary for the data warehouse to track historical changes.

September 5, 2013

Type 1 Slowly Changing Dimension

In our example, recall we originally have the following table:

Customer Key	Name	State
1001	Christina	Illinois

After Christina moved from Illinois to California, the new information replaces the new record, and we have the following table:

Customer Key	Name	State
1001	Christina	California

Advantages:
– This is the easiest way to handle the Slowly Changing Dimension problem, since there is no need to keep track of the old information.
Disadvantages:
– All history is lost. By applying this methodology, it is not possible to trace back in history. For example, in this case, the company would not be able to know that Christina lived in Illinois before.
Usage:
About 50% of the time.
When to use Type 1:
Type 1 slowly changing dimension should be used when it is not necessary for the data warehouse to keep track of historical changes.

September 5, 2013

SCD – Slowly Changing Dimension

The “Slowly Changing Dimension” problem is a common one particular to data warehousing. In a nutshell, this applies to cases where the attribute for a record varies over time. We give an example below:
Christina is a customer with ABC Inc. She first lived in Chicago, Illinois. So, the original entry in the customer lookup table has the following record:

Customer Key	Name	State
1001	Christina	Illinois

At a later date, she moved to Los Angeles, California on January 2003. How should ABC Inc. now modify its customer table to reflect this change? This is the “Slowly Changing Dimension” problem.
There are in general three ways to solve this type of problem, and they are categorized as follows:
Type 1: The new record replaces the original record. No trace of the old record exists.
Type 2: A new record is added into the customer dimension table. Therefore, the customer is treated essentially as two people.
Type 3: The original record is modified to reflect the change.
We next take a look at each of the scenarios and how the data model and the data looks like for each of them. Finally, we compare and contrast among the three alternatives.