September 7, 2013 Project transition from one vendor to another – Execution Phase Activities Organization Chart Prepare project organization chart Support Groups Interact with operations/helpdesk/testing/application teams Interact with business groups Existing process and environment Study the existing hardware and software environments (Production/Development/Testing) Study the maintenance/development/testing process (flow) and methodology Study escalation process, SLA/QA processes Create OPD and Induction Manual Support Tools Identify existing support tools Identify transition support tools Introduce transition team to support tools Update Induction Manual Configuration Management Understand existing configuration management procedures and tools Update Configuration management procedure in OPD and modify Induction Manual Incorporate review changes Try check-in and check-out procedures on the tool Testing Procedures Identify Testing team from Client Contact testing team for procedures Study SIT set up process Study UAT set up process Study data population process Study other Testing activities Document Testing Procedures in OPD Incorporate review changes Access procedure check Check if individual team members can access modules, tools, applications Document analysis and Organization (for each module) Identify the availability of Business Requirement documents Identify Technical Specifications available Prepare System Description Document Prepare Overview docs on missing sub module specifications Collect data from interviews about documents present Prepare Documents reference list Incorporate review changes Application System Study Detailed presentation by Client Question and answer session Study the business flow and process Study the application for functionality and design Collect Inventory and Identify the boundary of the system Transactions study and analysis – screen navigation Study batch jobs (daily, weekly, monthly, annual, adhoc etc) Study the reports and forms Batch process (copy books, programs, JCLs, procedures) Study job flow dependencies Study interface systems Obtain recovery instructions Update OPD, SDD and Induction Manual Create business/screen/job/data flow diagrams Check errors from Casual Analysis Report (if available) Database analysis (for each module) Collect Data Model Understand current Data Model and organization Identify external system interfaces Update SDD Update Inventory (for each module) Prepare/update the System Inventory Document Backlog Management Identify unresolved issues in the system Coding Standards Study documents on existing coding standards Prepare/enhance coding standards document for future development and enhancements Incorporate review changes Release Coding Standards Document Analysis of Maintenance History Identify number of problems and enhancements performed over a period of time Study the manner in which the problems were closed Review error prone and critical software items Review test plans and test specifications Prepare high level document on understanding and outcome of analysis Prepare a detailed activity list on Maintenance and Support and Development that are carried out by Client and update OPD Analysis of Help-Desk Methodology Overview of the system Overview of Helpdesk Methodology Understanding the Miscellaneous Helpdesk activities (Daily Batch Checks, Updates to Call DB, etc) Identify number of issues resolved Study how problems were received/resolved/closed Review error prone and critical software items Identify unresolved issues in the system Prepare a detailed activity list on helpdesk activities that are carried out by Client Operations Procedures Study Study Operations activities and functionalities Study Operation tools and procedures Study the JCL and Procedure libraries Study housekeeping jobs and environment related jobs Study coding standards and naming conventions Prepare Operations Procedures document Incorporate review changes Cross Over criteria Create Cross Over Document Incorporate review changes Release Cross Over Document Defect Logging and Monitoring Briefing on existing procedures Prepare Defect Logging and Monitoring procedures for Acceptance Phase in OPD Incorporate review changes in the Defect Logging and Monitoring procedures Acceptance Procedures Identify contacts for deciding on Acceptance Procedures Discuss current procedures and sign-off Document Acceptance Procedures in OPD Incorporate review changes Existing Vendor interaction Identify contacts from existing vendors Gather system details and product information Identify key vendor resources Document Vendor Liaison Procedures in OPD System Description documentation (SDD) Finalize the System Description Documentation Incorporate review changes Release the System Description Documentation Finalize Manuals Finalize Induction Manual Incorporate review changes Finalize OPD Incorporate review changes Release Induction Manual and OPD Study existing development projects Study the existing projects Prepare a development take-on plan for assuming responsibility of development Off-Shore set-up procedures Identify procedures and infrastructure required for Offshore Offshore set-up requirements review by Client Start setting up New Vendor offshore based on requirements Transition Plan (Progress Monitoring) Update Transition Plan based on the project progress and changes in scope of the project Weekly Reviews and Play back Session
September 7, 2013 Project transition from one vendor to another – Startup Activities Kick-off Meeting Discuss Approach, Procedures and detailed plan Update procedures in detailed plan, if required Induction of Team
September 7, 2013 Project transition from Vendor – Activities for Acceptance Refine SLAs and Estimation guidelines (if required)Cross-Over criteria have been metTransfer of responsibility to new vendor
September 7, 2013 Datastage Coding Checklist Category Checkpoint General General Ensure that the null handling properties are taken care for all the nullable fields. Do not set the null field value to some value which may be present in the source. General Ensure that all the character fields are trimmed before any processing. Normally extra spaces in the data may lead to some errors like lookup mismatch which are hard to detect. General Always save the metadata (for source, target or lookup defi nitions) in the repository to ensure reusability and consistency. General In case the partition type for the next immediate stage is to be changed then the General ‘Propagate partition’ should be set to ‘Clear’ in the current stage. General Make sure that appropriate partitioning and sorting are used in the stages, where ever possible. This enhances the performances. Make sure that you understand the partitioning being used. Otherwise leave it auto. General Make sure that the pathname/format details are not hard coded and job parameters are used for the same. These details are generally set as environmental variable. General Ensure that all fi le names from external source are parameterized. This will prevent the developer from the trouble of changing the job or file name if the file name is changed. File names/Datasets created in the job for intermediate purpose can be hard coded. General Ensure that the environment variable $APT_DISABLE_COMBINATION is set to ‘False’. General Ensure that $APT_STRING_PADCHAR is set to spaces. General The parameters used across the jobs should be with same name. This helps to avoid unnecessary confusions General Use 4-node confi guration fi le for unit testing/system testing the job. General If there are multiple jobs to be run for the same module. Archive the source fi les in the after job routine of the last job. General Check whether the fi le exists in the landing directory before moving the sequential file.The ‘mv’ command will move the landing directory if the fi le is not found. General Verify whether the appropriate after job routine is called in the job. General Verify the correct link counts are used in the after job routine for ACR log fi le. General Check whether the log statements are correct for that job. General Ensure that the unix fi les created by any Datastage job is created by the same unix user who has run the job. General Check the Director log if the error message does not have readability. General Verify job name, stage name, link name, input fi le name are as per standards. Ensure that the job developed adhere to the naming standards defi ned for the software artifacts. General Job description must be clear and readable. General Make sure that the Short Job Description is filled using ‘Description Annotation’ and it contains the job name as part of the description. Don’t use Annotation for putting the job description. General Check that the parameter values assigned to the jobs through sequencer. General Verify if Runtime Column Propagation (RCP) is disabled or not. File Stages Checklist File Stages Checklist Ensure that reject links are output from the sequential file stage which reads the data file to log the records which are rejected. File Stages Checklist Check whether the dataset are used instead of sequential fi le for intermediate storage between the jobs. This enhances performance in a set of linked jobs. File Stages Checklist Reject records should be stored as sequential files. This helps in the analysis of rejected records outside the datastage easier. File Stages Checklist Ensure that the dataset from another job use the same metadata which is saved in the repository. File Stages Checklist Verify that the intermediate files that are used by downstream jobs have unix read access/permission to all users. File Stages Checklist For fixed width files, final delimiter should be set to “none” in the file format property. File Stages Checklist Verify that all lookup reference files have unix permission as 744. This will ensure that other users don’t overwrite or delete the reference fi le. Processing Stages Checklist Processing Stages Checklist The order of stage variables should be in correct order. Eg: – A stage variable used in calculation should be in the higher order than the calculation variable. Processing Stages Checklist If any processing stage requires a key ( like remove duplicate, merge, join, etc ) the Keys, sorting keys and Partitioning keys should be same and in the same order Processing Stages Checklist Make sure that sparse lookup are not used when large volumes of data are handled. Processing Stages Checklist Check look up keys, if they are correct. Processing Stages Checklist Do not validate on a null fi eld in a transformer. Use appropriate data types for the Stage variables. Use IsNull(), IsNotNull() or Seq() for doing such validations. Processing Stages Checklist In Funnel, all the input links must be hash partitioned on the sort keys. Processing Stages Checklist Verify if any Transformer is set to run in sequential mode. It should be in parallel mode. Processing Stages Checklist RCP should be enabled in the Copy stage before shared container. Processing Stages Checklist Verify whether column generated using column generator is created using ‘part’ and ‘part count’. Processing Stages Checklist In Remove Duplicate stage, ensure that the correct record (according to the requirements) is retained. Database Stages Checklist Database Stages Checklist Every database object referenced is accessed through the parameter schema name. Database Stages Checklist Always reference database object using the schema name. Database Stages Checklist Use Upper Case for column names and table names in SQL queries. Job Sequencer Checklist Job Sequencer Checklist Check that the parameter values are assigned to the jobs through sequencer Job Sequencer Checklist For every Job Activity stage in sequencer, ensure that “Reset if required, then run” is selected where relevant.
September 5, 2013 Implementing SCD II in Datastage 7.X SCD II can be achieved in Datasatge with the help of two fields’ Effective date (EFCT_DT) and Expiry date (EXPR_DT). Through these two fields we can keep track of the active records and expired records. For identifying the active records we need to assign a standard high data EXPR_DT (e.g. 2999-12-31).We can assume to use 2999-12-31 as EXPR_DT for active records in our example. So for identifying the active records you need to query the table for records with EXPR_DT=’2999-12-31’. For identifying the expired records we need to query the table for records with EXPR_DT<>’2999-12-31’. The following block diagram shows the layout to implement SCD II in Datastage:
September 5, 2013 Steps to be followed for implementing SCD II Read the incoming records through any input stage like sequential file/dataset/table. • Do the required processing for the incoming data. • After the above processing step, pass the data into the change capture stage. • The change capture should be having two input links- one is the before dataset and the other is the after dataset. For our job, the before dataset should be the active records present in the table. The active records are all those records which are having EXPR_DT=’2999-12-31’. The after dataset will be the incoming data passed into change capture after all the necessary processing. • The change capture stage compared the before dataset and after dataset and produces 4 change_codes for each of the records. The 4 change codes are as follows: “0” – Copy code (The code indicates the after record is a copy of the before record) “1”-Insert code (The code indicates a new record has been inserted in the after set that did not exist in the before set.) “2”-Delete code(The code indicates that a record in the before set has been deleted from the after set) “3”-Edit code(the code indicates the after record is an edited version of the before record) The copy records are not passed in the change captured stage as since we need only edited, insert records fro SCD II implementation. • Use a filter stage to separate the records that needs to be expired and inserted. • Filter the records with change_code = “1 or 3” into the insert records link. Filter the records with change_code=” 3” into update/expiry link. • The records with change_code=3 are edited records. So the original records corresponding to these edited records are to be made in-active (expired). We can make the records inactive by changing the EXPR_DT<> ‘2999-12-31’.So to make the record inactive change the EXPR_DT with a valid date. For e.g. you can use make the EXPR_DT as the date one less than the date on which you are loading the data into the table. We will assume that we are loading the data on 2008-08-15.So the EXPR_DT for inactive records would become ‘2008-08-14’. The date 2008-08-15 can be made as the EFCT_DT for records to be inserted. • To get the original records which needs to be expired, “look-up” the target table for all the records with change_code=3 which are filtered out separately. Get the original record along with the EFCT_DT of the original record. Then update the records EXPR_DT to ‘2008-08-14’ in the table. Now the original records are made inactive (expired). • The new updated record (change_code=3) needs to be in table along with the new insert records(change_code=1).This data is filtered out from the “filter” stage and inserted into the table with EFCT_DT=”Data of loading” i.e. “2008-08-15” and EXPR_DT=”2999-12-31”
September 5, 2013 Type 3 Slowly Changing Dimension In Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular attribute of interest, one indicating the original value, and one indicating the current value. There will also be a column that indicates when the current value becomes active. In our example, recall we originally have the following table: Customer Key Name State 1001 Christina Illinois To accommodate Type 3 Slowly Changing Dimension, we will now have the following columns: · Customer Key · Name · Original State · Current State · Effective Date After Christina moved from Illinois to California, the original information gets updated, and we have the following table (assuming the effective date of change is January 15, 2003): Customer Key Name Original State Current State Effective date 1001 Christina Illinois California 15-Jan-2003 Advantages: – This does not increase the size of the table, since new information is updated. – This allows us to keep some part of history. Disadvantages: – Type 3 will not be able to keep all history where an attribute is changed more than once. For example, if Christina later moves to Texas on December 15, 2003, the California information will be lost. Usage: Type 3 is rarely used in actual practice. When to use Type 3: Type III slowly changing dimension should only be used when it is necessary for the data warehouse to track historical changes, and when such changes will only occur for a finite number of time.
September 5, 2013 Type 2 Slowly Changing Dimension In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The new record gets its own primary key. In our example, recall we originally have the following table: Customer Key Name State 1001 Christina Illinois After Christina moved from Illinois to California, we add the new information as a new row into the table: Customer Key Name State 1001 Christina Illinois 1005 Christina California Advantages: – This allows us to accurately keep all historical information. Disadvantages: – This will cause the size of the table to grow fast. In cases where the number of rows for the table is very high to start with, storage and performance can become a concern. – This necessarily complicates the ETL process. Usage: About 50% of the time. When to use Type 2: Type 2 slowly changing dimension should be used when it is necessary for the data warehouse to track historical changes.
September 5, 2013 Type 1 Slowly Changing Dimension In our example, recall we originally have the following table: Customer Key Name State 1001 Christina Illinois After Christina moved from Illinois to California, the new information replaces the new record, and we have the following table: Customer Key Name State 1001 Christina California Advantages: – This is the easiest way to handle the Slowly Changing Dimension problem, since there is no need to keep track of the old information. Disadvantages: – All history is lost. By applying this methodology, it is not possible to trace back in history. For example, in this case, the company would not be able to know that Christina lived in Illinois before. Usage: About 50% of the time. When to use Type 1: Type 1 slowly changing dimension should be used when it is not necessary for the data warehouse to keep track of historical changes.
September 5, 2013 SCD – Slowly Changing Dimension The “Slowly Changing Dimension” problem is a common one particular to data warehousing. In a nutshell, this applies to cases where the attribute for a record varies over time. We give an example below: Christina is a customer with ABC Inc. She first lived in Chicago, Illinois. So, the original entry in the customer lookup table has the following record: Customer Key Name State 1001 Christina Illinois At a later date, she moved to Los Angeles, California on January 2003. How should ABC Inc. now modify its customer table to reflect this change? This is the “Slowly Changing Dimension” problem. There are in general three ways to solve this type of problem, and they are categorized as follows: Type 1: The new record replaces the original record. No trace of the old record exists. Type 2: A new record is added into the customer dimension table. Therefore, the customer is treated essentially as two people. Type 3: The original record is modified to reflect the change. We next take a look at each of the scenarios and how the data model and the data looks like for each of them. Finally, we compare and contrast among the three alternatives.