Datastage 7.5 certification

Official IBM website (http://www-03.ibm.com/certify/tests/obj415.shtml) below are areas to prepare in order to pass the test:

Test 415 – IBM WebSphere IIS DataStage Enterprise Edition v7.5 

Section 1 – Installation and Configuration (5%) 

Describe how to properly install and configure DataStage EE 
Describe users and groups 
Describe the environment (e.g., dsenv, ODBC) 
Setup database connectivity 
Describe OS configuration/kernel 
Describe USS configuration 
Identify required components for server 
C++ compiler 
Identify the tasks required to create and configure a project to be used for EE jobs. 
Project location 
Assign DataStage EE roles 
Environment defaults 
Given a configuration file, identify its components and its overall intended purpose. 
Project location 
Assign DataStage EE roles 
Environment defaults 
List the steps necessary to start/stop DataStage EE properly. 
netstat -a|grep ds 

Section 2 – Metadata (5%) 

Demonstrate knowledge of Orchestrate schema. 
Distinguish internal data type (Orchestrate schema) vs external data type 
Describe how to set extended properties for table definition 
Import metadata using plug-ins vs orchdbutil 
Explain nullable mapping rules (e.g., source vs target) 
NLS data types 
Identify the method of importing metadata. 
Flat sources (e.g., sequential file, Orchestrate schema, ODBC, etc.) 
COBOL CopyBook 
XML 
Given a scenario, demonstrate knowledge of runtime column propagation. 
Usage 
Impact on stage mapping and target databases 

Section 3 – Persistent Storage (10%) 

Given a scenario, explain the process of importing/exporting data to/from framework (e.g., sequential file, external source/target). 
Explain use of various file stages (e.g., file, CFF, fileset, dataset) and where appropriate to use 
If USS, define the native file format (e.g., EBCDIC, VSDM) 
Given a scenario, describe proper use of a sequential file. 
Read in parallel (e.g., reader per node, multiple files) 
Handle various formats (e.g., fix vs variable, delimeted vs nondelimited, etc.) 
Describe how to import and export nullable data 
Explain how to identify and capture rejected records (e.g., log counts, using reject link, options for rejection)
Given a scenario, describe proper usage of CFF (native not plug-in). 
Explain how to import data from a file that has multiple varying record types (e.g., COBOL CopyBook, EBCDIC to ASCII) 
Describe proper usage of FileSets and DataSets. 
Explain differences and similarities of FileSet and DataSet (e.g., header data file segments, internal DS vs external format (FS)) 
Determine which tools can be used to manage FileSets and DataSets (GUI and CLI) 
Describe use of FTP stage for remote data (e.g., how to parallel, plug-in vs enterprise). 
Restructure stages (e.g., column import/export) 
Identify importing/exporting of XML data. 
XML stage options and usage 
XPATH and XLS 

Section 4 – Parallel Architecture (10%) 

Given a scenario, demonstrate proper use of data partitioning and collecting. 
Partitioning goals (e.g., data distribution, meeting data grouping requirements) 
Explain the differences and usage of round robin, entire, same, and hash partitioning 
Explain the differences between auto, ordered, and sort collector 
Identify partitioning type, parallel/sequential by analyzing a DataStage EE screen shot 
Explain the differences between partitioning keys and sorting (stage) keys through a scenario where these do not match 
Demonstrate proper use of partitioning methods based on given business requirements and DataStage EE technical requirements 
Explain the differences between a funnel state and a collector 
Describe input of partitioning and re-partitioning in an MPP/cluster environment 
Given a scenario, demonstrate knowledge of parallel execution. 
Given a job design and configuration file, provide estimates of the number of processes generated at runtime 
Explain the purpose and use of resource and node pools 
Given a source DataSet, describe the degree of parallelism using auto and same partitioning 

Section 5 – Databases (15%) 

Given a scenario, demonstrate proper selection of database stages and database specific stage properties. 
DB2 
Based on DB2 connectivity requirements, select appropriate stage (e.g., DB2 API, DB2 Enterprise) 
List environment variables needed to successfully run a DB2 job 
Teradata 
Based on functional requirements, select appropriate Teradata stage (e.g., TD Enterprise, TD MultiLoad (target only), TD API) 
Requested sessions and total sessions 
DRS 
ODBC 
Informix 
Sybase 
Oracle 
Remote Server 
Partition table 
Given a scenario using Oracle, load target and proper use of index mode and $API_ORA_LOAD_OPTIONS 
Identify source database options. 
Identify which stages/options read in parallel 
Explain the degree of parallelism for Oracle Enterprise and DB2 Enterprise 
Identify the use of “sparse” vs “normal” lookup 
When to use (e.g., 1:100) 
How to construct sparse lookup and SQL statements 
Given a scenario, demonstrate knowledge of target database options. 
Articulate benefits and limitations of using parallel load method (e.g., constraints, speed, exclusive locks, etc.) 
Explain the differences between upsert and load methods on target database stages 
Describe how to control restart of DB target (separate the “L” from ETL) 
Use OPEN, CLOSE, create temp, Load, SELECT FROM… INSERT INFO, DROP 
Identify the impact of RCP when target has fewer columns than job flow 
Separate update and insert records before target upsert 
Given a scenario, describe how to design EE ETL jobs that will extract data from DB2 (or any DBMS), combine with data from another source and load to another DBMS target. 
Demonstrate knowledge of working with NLS database sources and targets. 

Section 6 – Data Transformation (15%) 

Given a scenario, demonstrate knowledge of default type conversions, output mappings, and associated warnings. 
Demonstrate appropriate use of modify for NULL handling, string trim, non-default conversions 
Given a scenario, demonstrate proper selections of Transformer stage vs other stages. 
Copy stage/output mapping 
No BASIC Transformer 
Filter vs Transformer constraint (e.g., filter vs interpreted) 
Given a scenario, describe Transformer stage capabilities (including stage variables, link variables, DataStage macros, constraints, system variables, link ordering, @PART NUM, functions). 
Explain NULL handling within parallel transformer (e.g., reject rule, NULL functions) 
Demonstrate the use of Transformer stage variables (e.g., to identify key grouping boundaries on incoming data 
Use a Transformer only to generate a sequence of numbers in parallel 
Identify processes to add functionality not provided by existing DataStage stages (e.g., wrapper, buildops, user def functions/routines). 
Given a scenario, demonstrate no closed loop scenario when updating a reference table. 

Section 7 – Combining and Sorting Data (10%) 

Demonstrate knowledge of Join, Lookup, and Merge stages. 
Explain the differences between Lookup, Join, and Merge stages and demonstrate when to use each
Explain link (e.g., input/output/reject) requirements 
Explain sorting requirements 
Explain full outer join usage 
Demonstrate understanding of link ordering with JOIN and MERGE stages 
Lookup – continue (also outer JOIN) with non-nullable input columns 
Explain what happens with duplicate input column names on JOIN and MERGE 
Given a scenario, demonstrate knowledge of SORT stage. 
Describe the differences between parallel SORT, SORT, and sequential SORT 
Describe the differences between unique option in osrt, RemDup stage 
Demonstrate understanding of SORT key column properties (e.g., don’t sort, previously sorted)
Demonstrate understanding of framework inserted sorts 
Examine score 
Set $APT_SORT_INSERTION_CHECK_ONLY 
Distinguish between SQL ORDER BY vs parallel sort 
Explain NLS collation sequences 
Given a scenario, demonstrate understanding of Aggregator stage. 
Memory impact of sort vs hash 
Demonstrate use of Aggregator properties 
Count vs calculation 
Explicitly create output columns and map 
Take output of Aggregator and join with detail records 
Describe the proper usage of change capture/change apply. 
Separate delta rows (e.g., inserts vs updates) 

Section 8 – Automation and Production Deployment (10%) 

Given a scenario, articulate the change control process. 
“Dev to QA to prod” 
Explain the differences between compile and force compile 
Use of protected projects 
Import/Export 
Including compiled Transformer 
Managing custom components (e.g., buildops, wrappers) 
USS deployment topics (e.g., remote deployment, how to deploy) 
Identify the use of the dsjob command line utility. 
Explain how a DataStage job may be run using a third party scheduler (e.g., autosys, control U, CRON, etc.) 
Explain how performance run statistics from a job run can be captured 
Capture log records to a text file (XML format) 
Explain how to use parameters to simplify the creation and maintenance of job designs. 
Explain the processes necessary to run multiple copies of the source (job multi-instance) 
Given a scenario, demonstrate the ability to use job sequencers (e.g., exception hunting, re-startable, dependencies, passing return value from routing, parameter passing and job status) 
Create different designs using Job Sequencer 

Section 9 – Monitoring and Troubleshooting (10%) 

Given a scenario, demonstrate knowledge of parallel job score. 
Show how it is captured and interpret its results 
Identify and interpret its contents 
Number of processes 
Number of nodes 
Number of DataSets 
Combined operators 
Parallel/Sequential execution 
Inserted buffer and sort operators 
Given a scenario, identify and define environment variables that control EE with regard to added functionality and reporting. 
Articulate ability to identify runtime metadata vs design metadata 
APT_PM_SHOW_PIDS, etc. 
Given a process list, identify conductor, section leader, and player processes. 
Given a scenario, identify areas that may improve performance (e.g., buffer size, repartitioning, config files, operator combination, etc.). 

Section 10 – Job Design (10%) 

Demonstrate knowledge of shared containers. 
Reuse using RCP 
Using usage analysis and multi-job compile to recompile jobs that use a modified shared container 
Given a scenario, describe how to minimize SORTS and repartitions.
Demonstrate knowledge of creating restart points across jobs using intermediate DataSets.
Given a scenario, demonstrate proper use of standards. 
Naming conventions for stages and links 
Differences between annotation and description annotation 

Leave a Reply

Your email address will not be published. Required fields are marked *