Datastage | Datastage

May 13, 2014

DataStage Naming Conventions

DataStage Naming Conventions follows the guidelines of ETL Naming Conventions.

Contents

1 Job Name Prefixes
2 Stage Names
3 Link Names

Job Name Prefixes

Job prefixes are optional but they help to quickly identify the type of job and can make job navigation and job reporting easier. Parallel jobs – par Server jobs – ser Sequence jobs – seq Batch jobs – bat Mainframe jobs

Stage Names

The stage type prefix is used on all stage names so it appears on metadata reports that do not include a diagram of the stage or a description of the stage type. The name alone can be used to indicate the stage type.

Source and target stage names indentify the name of the entity such as a table name or a sequential file name. The stage name strips out any dynamic part of the name – such as a timestamp, and file extensions.

Database stage – db_table name
Dataset – ds_datasetname
Hash file – hf_hashfilename
Sequential file stage – sf_filename

The prefix identifies the source type, the rest of the name indicates how to find that source outside of DataStage or how to refer to that source in another DataStage job.

Transformation stages

Aggregation – AG_CalculatedContent (Prices, SalesAmounts, YTDPrices)
Changed Data Capture – CD
Funnel – FO_FunnelType (Continuous, round robin)
Lookup – LU
Pivot – PI
Remove Duplicates – RD
Sort – SO_SortFields
Transformer – TR_PrimaryFunction (HandleNulls, QA, Map)

Link Names

The link name describes what data is travelling down the link. Link names turn up in process metadata via the link count statistics so it is very important to use names that make process reporting user friendly.

Only some links in a job are important to project administrators. The link naming convention has two types of link names: – Links of importance have a five letter prefix followed by a double underscore followed by link details. – Intermediate links have a link name without a double underscore.

Links of Importance: – The first primary link in a job consists of SourceType(char2)pri(primary). – Any link from a reference source consists of SourceType(char2)ref(reference). – Any link loading to a target consists of TargetType(char2)UpdateAction(char3). – Any reject link SourceType(char2)rej(reject).

Any project can add new links of importance, such as the output count of a remove duplicates or aggregation stage.

Example: dbpri__stockitem is the first link in a job. dbups__stockitem is the link loading to a target database table with an upsert option. dbref__orgcodes is a reference lookup to of orgcodes to a database table. dbrej__stockitems is a reject of upserts to the stockitem table.

You can then produce a pivot report against the link row count statistics to show the row counts for a particular job using the five letter prefix as for each type of row count.

May 12, 2014

DataStage Environment variables list

1 Buffering	2 Building Custom stages
1.1 APT_BUFFER_FREE_RUN	2.1 DS_OPERATOR_BUILDOP_DIR
1.2 APT_BUFFER_MAXIMUM_MEMORY	2.2 OSH_BUILDOP_CODE
1.3 APT_BUFFER_MAXIMUM_TIMEOUT	2.3 OSH_BUILDOP_HEADER
1.4 APT_BUFFER_DISK_WRITE_INCREMENT	2.4 OSH_BUILDOP_OBJECT
1.5 APT_BUFFERING_POLICY	2.5 OSH_BUILDOP_XLC_BIN
1.6 APT_SHARED_MEMORY_BUFFERS	2.6 OSH_CBUILDOP_XLC_BIN

3 Compiler	4 DB2 Support
3.1 APT_COMPILER	4.1 APT_DB2INSTANCE_HOME
3.2 APT_COMPILEOPT	4.2 APT_DB2READ_LOCK_TABLE
3.3 APT_LINKER	4.3 APT_DBNAME
3.4 APT_LINKOPT	4.4 APT_RDBMS_COMMIT_ROWS
	4.5 DB2DBDFT
5 Debugging
5.1 APT_DEBUG_OPERATOR	6 Decimal Support
5.2 APT_DEBUG_MODULE_NAMES	6.1 APT_DECIMAL_INTERM_PRECISION
5.3 APT_DEBUG_PARTITION	6.2 APT_DECIMAL_INTERM_SCALE
5.4 APT_DEBUG_SIGNALS	6.3 APT_DECIMAL_INTERM_ROUND_MODE
5.5 APT_DEBUG_STEP
5.6 APT_DEBUG_SUBPROC	7 Disk IO
5.7 APT_EXECUTION_MODE	7.1 APT_BUFFER_DISK_WRITE_INCREMENT
5.8 APT_PM_DBX	7.2 APT_CONSISTENT_BUFFERIO_SIZE
5.9 APT_PM_GDB	7.3 APT_EXPORT_FLUSH_COUNT
5.10 APT_PM_SHOW_PIDS	7.4 APT_IO_MAP/APT_IO_NOMAP and APT_BUFFERIO_MAP/APT_BUFFERIO_NOMAP
5.11 APT_PM_XLDB	7.5 APT_PHYSICAL_DATASET_BLOCK_SIZE
5.12 APT_PM_XTERM
5.13 APT_SHOW_LIBLOAD	9 Job Monitoring
	9.1 APT_MONITOR_SIZE
8 General Job Admin	9.2 APT_MONITOR_TIME
8.1 APT_CHECKPOINT_DIR	9.3 APT_NO_JOBMON
8.2 APT_CLOBBER_OUTPUT	9.4 APT_PERFORMANCE_DATA
8.3 APT_CONFIG_FILE
8.4 APT_DISABLE_COMBINATION	11 Lookup
8.5 APT_EXECUTION_MODE	11.1 APT_LUTCREATE_NO_MMAP
8.6 APT_ORCHHOME
8.7 APT_STARTUP_SCRIPT	12 Network
8.8 APT_NO_STARTUP_SCRIPT	12.1 APT_IO_MAXIMUM_OUTSTANDING
8.9 APT_STARTUP_STATUS	12.2 APT_IOMGR_CONNECT_ATTEMPTS
8.10 APT_THIN_SCORE	12.3 APT_PM_CONDUCTOR_HOSTNAME
	12.4 APT_PM_NO_TCPIP
10 Misc	12.5 APT_PM_NODE_TIMEOUT
10.1 APT_COPY_TRANSFORM_OPERATOR	12.6 APT_PM_SHOWRSH
10.2 “APT_EBCDIC_VERSION” on page 65	12.7 APT_PM_USE_RSH_LOCALLY
10.3 APT_DATE_CENTURY_BREAK_YEAR	12.8 APT_RECVBUFSIZE
10.4 APT_IMPEXP_ALLOW_ZERO_LENGTH_FIXED_NULL	12.9 APT_RECVBUFSIZE
10.5 APT_IMPORT_REJECT_STRING_FIELD_OVERRUNS
10.6 APT_INSERT_COPY_BEFORE_MODIFY	13 NLS
10.7 APT_OLD_BOUNDED_LENGTH	13.1 APT_COLLATION_SEQUENCE
10.8 APT_OPERATOR_REGISTRY_PATH	13.2 APT_COLLATION_STRENGTH
10.9 APT_PM_NO_SHARED_MEMORY	13.3 APT_ENGLISH_MESSAGES
10.10 APT_PM_NO_NAMED_PIPES	13.4 APT_IMPEXP_CHARSET
10.11 APT_PM_SOFT_KILL_WAIT	13.5 APT_INPUT_CHARSET
10.12 APT_PM_STARTUP_CONCURRENCY	13.6 APT_OS_CHARSET
10.13 APT_RECORD_COUNTS	13.7 APT_OUTPUT_CHARSET
10.14 APT_SAVE_SCORE	13.8 APT_STRING_CHARSET
10.15 APT_SHOW_COMPONENT_CALLS
10.16 APT_STACK_TRACE	14 Oracle Support
10.17 APT_WRITE_DS_VERSION	14.1 APT_ORACLE_LOAD_DELIMITED
10.18 OSH_PRELOAD_LIBS	14.2 APT_ORACLE_LOAD_OPTIONS
	14.3 APT_ORACLE_NO_OPS
15 Partitioning	14.4 APT_ORACLE_PRESERVE_BLANKS
15.1 APT_NO_PART_INSERTION	14.5 APT_ORA_IGNORE_CONFIG_FILE_PARALLELISM
15.2 APT_PARTITION_COUNT	14.6 APT_ORA_WRITE_FILES
15.3 APT_PARTITION_NUMBER	14.7 APT_ORAUPSERT_COMMIT_ROW_INTERVAL APT_ORAUPSERT_COMMIT_TIME_INTERVAL

16 Reading/Writing files	17 Reporting
16.1 APT_DELIMITED_READ_SIZE	17.1 APT_DUMP_SCORE
16.2 APT_FILE_IMPORT_BUFFER_SIZE	17.2 APT_ERROR_CONFIGURATION
16.3 APT_FILE_EXPORT_BUFFER_SIZE	17.3 APT_MSG_FILELINE
16.4 APT_IMPORT_PATTERN_USES_FILESET	17.4 APT_PM_PLAYER_MEMORY
16.5 APT_MAX_DELIMITED_READ_SIZE	17.5 APT_PM_PLAYER_TIMING
16.6 APT_PREVIOUS_FINAL_DELIMITER_COMPATIBLE	17.6 APT_RECORD_COUNTS
16.7 APT_STRING_PADCHAR	17.7 OSH_DUMP
	17.8 OSH_ECHO
18 Sorting	17.9 OSH_EXPLAIN
18.1 APT_NO_SORT_INSERTION	17.10 OSH_PRINT_SCHEMAS
18.2 APT_SORT_INSERTION_CHECK_ONLY
	21 SAS Support
19 Teradata Support	21.1 APT_HASH_TO_SASHASH
19.1 APT_TERA_64K_BUFFERS	21.2 APT_NO_SASOUT_INSERT
19.2 APT_TERA_NO_ERR_CLEANUP	21.3 APT_NO_SAS_TRANSFORMS
19.3 APT_TERA_NO_PERM_CHECKS	21.4 APT_SAS_ACCEPT_ERROR
19.4 APT_TERA_NO_SQL_CONVERSION	21.5 APT_SAS_CHARSET
19.5 APT_TERA_SYNC_DATABASE	21.6 APT_SAS_CHARSET_ABORT
19.6 APT_TERA_SYNC_USER	21.7 APT_SAS_COMMAND
	21.8 APT_SASINT_COMMAND
20 Transport Blocks	21.9 APT_SAS_DEBUG
20.1 APT_AUTO_TRANSPORT_BLOCK_SIZE	21.10 APT_SAS_DEBUG_IO
20.2 APT_LATENCY_COEFFICIENT	21.11 APT_SAS_DEBUG_LEVEL
20.3 APT_DEFAULT_TRANSPORT_BLOCK_SIZE	21.12 APT_SAS_DEBUG_VERBOSE
20.4 APT_MAX_TRANSPORT_BLOCK_SIZE/ APT_MIN_TRANSPORT_BLOCK_SIZE	21.13 APT_SAS_NO_PSDS_USTRING
	21.14 APT_SAS_S_ARGUMENT
	21.15 APT_SAS_SCHEMASOURCE_DUMP
	21.16 APT_SAS_SHOW_INFO
	21.17 APT_SAS_TRUNCATION

May 12, 2014

InfoSphere DataStage Jobstatus returned Codes from dsjob

Equ DSJS.RUNNING To	0	This is the only status that means the job is actually running
Equ DSJS.RUNOK To	1	Job finished a normal run with no warnings
Equ DSJS.RUNWARN To	2	Job finished a normal run with warnings
Equ DSJS.RUNFAILED To	3	Job finished a normal run with a fatal error
Equ DSJS.QUEUED To	4	Job queued waiting for resource allocation
Equ DSJS.VALOK To	11	Job finished a validation run with no warnings
Equ DSJS.VALWARN To	12	Job finished a validation run with warnings
Equ DSJS.VALFAILED To	13	Job failed a validation run
Equ DSJS.RESET To	21	Job finished a reset run
Equ DSJS.CRASHED To	96	Job has crashed
Equ DSJS.STOPPED To	97	Job was stopped by operator intervention (can’t tell run type)
Equ DSJS.NOTRUNNABLE To	98	Job has not been compiled
Equ DSJS.NOTRUNNING To	99	Any other status

May 10, 2014

Alphabetical list of stages in DataStage

The stages in this document are included and available with IBM InfoSphere DataStage unless otherwise noted in the following key.

Palette category	Description
Data Quality¹	Available only with IBM InfoSphere QualityStage
Data Quality²	Available as a separately licensed add-on module to InfoSphere QualityStage
Application	Available as an add-on Pack to IBM InfoSphere DataStage and QualityStage

The following key includes additional notes about stages in the Designer client.

Stage name	Description
Stage name³	Available in the Stage Types folder in the repository tree, but is not displayed in the palette by default.

Alphabetic A – F: Stages in parallel jobs or server jobs

Stage name	Job type	Palette category	Delivery and notes
Address Verification	Parallel	Data Quality²
Aggregator	Parallel, server	Processing
BASIC Transformer3	Parallel	Processing	Superseded by the Transformer stage
CASS	Parallel	Data Quality²
CDC Transaction	Parallel	Real Time	Delivered as the CDC Transaction patch
Change Apply	Parallel	Processing
Change Capture	Parallel	Processing
Checksum	Parallel	Processing
Classic Federation	Parallel	Database
Column Export	Parallel	Restructure
Column Generator	Parallel	Development/Debug
Column Import	Parallel	Restructure
Combine Records	Parallel	Restructure
Command Stage	Server	Processing
Compare	Parallel	Processing
Complex Flat File	Parallel, server	File
Compress	Parallel	Processing
Copy	Parallel	Processing
Data Set	Parallel	File
DB2® Connector	Parallel, server	Database
DB2 Enterprise	Parallel	Database	Superseded by the DB2 Connector stage
DB2 UDB API3	Parallel, server	Database	Superseded by the DB2 Connector stage
DB2 UDB Load3	Parallel, server	Database	Superseded by the DB2 Connector stage
DB2Z 3	Parallel	Database	Superseded by the DB2 Connector stage
Decode	Parallel	Processing
Difference	Parallel	Processing
Direct Access for PeopleSoft Enterprise	Parallel, server	Application
Distributed Transaction	Parallel	Database
DPID	Parallel	Data Quality²
DRS Connector	Parallel, server	Database
Dynamic RDBMS	Parallel, server	Database	Superseded by the DRS Connector stage (Dynamic Relational Stage Connector stage)
Encode	Parallel	Processing
Essbase Connector	Parallel	Application
Expand	Parallel	Processing
External Filter	Parallel	Processing
External Source	Parallel	File
External Target	Parallel	File
File Set	Parallel	File
Filter	Parallel	Processing
Folder	Server	File
FTP Enterprise	Parallel	Processing
FTP Plug-in	Parallel, server	Processing	Superseded by the FTP Enterprise stage on the parallel canvas
Funnel	Parallel	Processing

Alphabetic G – P: Stages in parallel jobs or server jobs

Stage name	Job type	Palette category	Delivery and notes
Generic	Parallel	Processing
Hashed File	Server	File
Head	Parallel	Development/Debug
Hierarchy for PeopleSoft Enterprise	Parallel, server	Application
Informix® CLI	Parallel, server	Database
Informix Enterprise	Parallel	Database
Informix Load	Parallel, server	Database
Informix XPS Load3	Server	Database
InterProcess	Server	Processing
Investigate	Parallel	Data Quality¹
ISD Input	Parallel, server	Real Time
ISD Output	Parallel, server	Real Time
iWay Enterprise	Parallel	Database
Java Client	Parallel, server	Real Time
Java Transformer	Parallel, server	Real Time
JD Edwards EnterpriseOne	Parallel, server	Application
Join	Parallel	Processing
Link Collector	Server	Processing
Link Partitioner	Server	Processing
Lookup	Parallel	Processing
Lookup File Set	Parallel	File
Make Subrecord	Parallel	Restructure
Make Vector	Parallel	Restructure
MapStage (for TX)	Parallel	Application	Delivered as IBM WebSphere Transformation Extender for DataStage
Match Frequency	Parallel	Data Quality¹
Merge	Parallel, server	Processing
MNS (Multinational Standardization)	Parallel	Data Quality¹
Modify	Parallel	Processing
MS OLEDB	Server	Database
MS SQL Server Load	Server	Database
Netezza Enterprise	Parallel	Database	Documentation is available only with the installed product
ODBC	Server	Database	Superseded by the ODBC Connector stage
ODBC Connector	Parallel, server	Database
ODBC Enterprise3	Parallel	Database, Application	Superseded by the ODBC Connector stage
Oracle Applications Direct Access	Parallel, server	Application
Oracle Applications Hierarchy Access	Parallel, server	Application
Oracle Connector	Parallel, server	Database
Oracle Enterprise3	Parallel	Database	Superseded by the Oracle Connector stage
Oracle OCI (Oracle Call Interface)3	Server	Database	Superseded by the Oracle Connector stage
Oracle OCI Load3	Parallel, server	Database, Application	Superseded by the Oracle Connector stage
Peek	Parallel	Development/Debug
Pivot	Parallel, server	Processing	Superseded by the Pivot Enterprise stage on the parallel canvas
Pivot Enterprise	Parallel	Processing
Promote Subrecord	Parallel	Restructure

Alphabetic Q – Z : Stages in parallel jobs or server jobs

Stage name	Job type	Palette category	Delivery and notes
QualityStage Legacy	Parallel	Data Quality¹
RedBrick Load	Parallel, server	Database
Reference Match	Parallel	Data Quality¹
Remove Duplicates	Parallel	Processing
Row Generator	Parallel	Development/Debug
Row Merger	Server	Processing
Row Splitter	Server	Processing
Salesforce	Parallel	Application
Sample	Parallel	Development/Debug
(SAP BW) BW Load	Parallel, server	Application
(SAP BW) BW Open Hub	Parallel, server	Application
(SAP R/3) ABAP Extract	Parallel, server	Application
(SAP R/3) BAPI	Parallel, server	Application
(SAP R/3) IDoc Extract	Parallel, server	Application
(SAP R/3) IDoc Load	Parallel, server	Application
SAS	Parallel	Application
SAS Parallel Data Set	Parallel	Application
Sequential File	Parallel, server	File
SERP (Software Evaluation and Recognition Program)	Parallel	Data Quality²
Siebel BC Pack (Business Component)	Parallel, server	Application
Siebel DS Pack (Direct Access)	Parallel, server	Application
Siebel EIM Pack (Enterprise Integration Manager)	Parallel, server	Application
Slowly Changing Dimension	Parallel	Processing
Sort	Parallel, server	Processing
Split Subrecord	Parallel	Restructure
Split Vector	Parallel	Restructure
SQA (Standardization Quality Assessment)	Parallel	Data Quality¹
SQLServer Enterprise	Parallel	Database
Standardize	Parallel	Data Quality¹
Stored Procedure	Parallel, server	Database
Surrogate Key Generator	Parallel	Processing
Survive	Parallel	Data Quality¹
Switch	Parallel	Processing
Sybase BCP Load	Server	Database
Sybase Enterprise	Parallel	Database
Sybase IQ 12 Load	Parallel, server	Database
Sybase OC	Parallel, server	Database, Application
Tail	Parallel	Development/Debug
Teradata API3	Parallel, server	Database	Superseded by the Teradata Connector stage
Teradata Connector	Parallel, server	Database
Teradata Enterprise 3	Parallel	Database	Superseded by the Teradata Connector stage
Teradata Load3	Parallel, server	Database	Superseded by the Teradata Connector stage
Teradata Multiload3	Parallel, server	Database	Superseded by the Teradata Connector stage
Transformer	Parallel, server	Processing
Unduplicate Match	Parallel	Data Quality¹
UniData®	Server	Database
UniData 6	Server	Database	Superseded by the UniData stage
UniVerse	Server	Database
Wave Generator	Parallel	Processing
Web Services Client	Parallel, server	Real Time
Web Services Transformer	Parallel, server	Real Time
WebSphere® MQ3	Parallel, server	Real Time	Superseded by the WebSphere MQ Connector stage
WebSphere MQ Connector	Parallel, server	Real Time
Write Range Map	Parallel	Development/Debug
XML	Parallel, server	Real Time	Delivered as the XML Transformer patch
XML Input	Parallel, server	Real Time	Superseded by the XML stage
XML Output	Parallel, server	Real Time	Superseded by the XML stage
XML Transformer	Parallel, server	Real Time	Superseded by the XML stage
zOS File	Parallel	File

Stages that are available in sequence jobs

Stage name

Palette category

EndLoop Activity

Sequence

Exception Handler

Sequence

Execute Command

Sequence

Job Activity

Sequence

Nested Condition

Sequence

Notification Activity

Sequence

Routine Activity

Sequence

Sequencer

Sequence

StartLoop Activity

Sequence

Terminator Activity

Sequence

UserVariables Activity

Sequence

Wait For File Activity

Sequence

May 10, 2014

Datastage Webservices pack for SOAP over HTTP

V 8.0 and up already has a Wbeservices pack for Datastage. if in 7.5, this needs to be installed (acquired) separately

Stages:
WSClient Stage
WSTransformer Stage

Datastage can act as a SOAP client. DataStage Job can reach out and invoke a Web Service.

Get the WSDL from the webservices team for the API to be invoked

(WISD, or RTI, allows you to publish a DataStage Job or other Information Server asset “as” a Web Service)
for more information:
http://etlimpact.blogspot.in/2012/08/web-service-integration-through.html

http://www.dstagebox.com/2011/07/expose-datastage-job-as-a-web-service/

http://infospheredatastage.blogspot.in/2011/09/infosphere-datastage-and-web-service.html

May 10, 2014

Difference between Hash and Modulus partitioning

Hash and Modulus techniques are Key based partition techniques.

Hash and Modulus techniques are used for different purpose.

If Key column data type is textual then we use has partition technique for the job.

If Key column data type is numeric, we use modulus partition technique.

If one key column numeric and another text then also we use has partition technique.

if both the key columns are numeric data type then we use modulus partition technique.

May 10, 2014

2) Before-after job subroutines

3) Job Control routines

May 9, 2014

Cleanup after an aborted load or update

DROP TABLE TableName_LOG;

DROP TABLE TableName_ET;

DROP TABLE TableName_UV;

DROP TABLE TableName_WT;

RELEASE MLOAD TableName;

DataStage Naming Conventions

DataStage Environment variables list

InfoSphere DataStage Jobstatus returned Codes from dsjob

Alphabetical list of stages in DataStage

Alphabetic A – F: Stages in parallel jobs or server jobs

Alphabetic G – P: Stages in parallel jobs or server jobs

Alphabetic Q – Z : Stages in parallel jobs or server jobs

Stages that are available in sequence jobs

Datastage Webservices pack for SOAP over HTTP

Difference between Hash and Modulus partitioning

Job Dependencies

Protecting projects in datastage

Routines

Cleanup after an aborted load or update