DataStage Naming Conventions

DataStage Naming Conventions follows the guidelines of ETL Naming Conventions.
Contents
  • 1 Job Name Prefixes
  • 2 Stage Names
  • 3 Link Names
Job Name Prefixes
Job prefixes are optional but they help to quickly identify the type of job and can make job navigation and job reporting easier. Parallel jobs – par Server jobs – ser Sequence jobs – seq Batch jobs – bat Mainframe jobs
Stage Names
The stage type prefix is used on all stage names so it appears on metadata reports that do not include a diagram of the stage or a description of the stage type. The name alone can be used to indicate the stage type.
Source and target stage names indentify the name of the entity such as a table name or a sequential file name. The stage name strips out any dynamic part of the name – such as a timestamp, and file extensions. 
  • Database stage – db_table name
  • Dataset – ds_datasetname
  • Hash file – hf_hashfilename
  • Sequential file stage – sf_filename
The prefix identifies the source type, the rest of the name indicates how to find that source outside of DataStage or how to refer to that source in another DataStage job.
Transformation stages 
  • Aggregation – AG_CalculatedContent (Prices, SalesAmounts, YTDPrices)
  • Changed Data Capture – CD
  • Funnel – FO_FunnelType (Continuous, round robin)
  • Lookup – LU
  • Pivot – PI
  • Remove Duplicates – RD
  • Sort – SO_SortFields
  • Transformer – TR_PrimaryFunction (HandleNulls, QA, Map)
Link Names
The link name describes what data is travelling down the link. Link names turn up in process metadata via the link count statistics so it is very important to use names that make process reporting user friendly.
Only some links in a job are important to project administrators. The link naming convention has two types of link names: – Links of importance have a five letter prefix followed by a double underscore followed by link details. – Intermediate links have a link name without a double underscore.
Links of Importance: – The first primary link in a job consists of SourceType(char2)pri(primary). – Any link from a reference source consists of SourceType(char2)ref(reference). – Any link loading to a target consists of TargetType(char2)UpdateAction(char3). – Any reject link SourceType(char2)rej(reject).
Any project can add new links of importance, such as the output count of a remove duplicates or aggregation stage.
Example: dbpri__stockitem is the first link in a job. dbups__stockitem is the link loading to a target database table with an upsert option. dbref__orgcodes is a reference lookup to of orgcodes to a database table. dbrej__stockitems is a reject of upserts to the stockitem table.
You can then produce a pivot report against the link row count statistics to show the row counts for a particular job using the five letter prefix as for each type of row count.

DataStage Environment variables list

1 Buffering 2 Building Custom stages
1.1 APT_BUFFER_FREE_RUN 2.1 DS_OPERATOR_BUILDOP_DIR
1.2 APT_BUFFER_MAXIMUM_MEMORY 2.2 OSH_BUILDOP_CODE
1.3 APT_BUFFER_MAXIMUM_TIMEOUT 2.3 OSH_BUILDOP_HEADER
1.4 APT_BUFFER_DISK_WRITE_INCREMENT 2.4 OSH_BUILDOP_OBJECT
1.5 APT_BUFFERING_POLICY 2.5 OSH_BUILDOP_XLC_BIN
1.6 APT_SHARED_MEMORY_BUFFERS 2.6 OSH_CBUILDOP_XLC_BIN
3 Compiler 4 DB2 Support
3.1 APT_COMPILER 4.1 APT_DB2INSTANCE_HOME
3.2 APT_COMPILEOPT 4.2 APT_DB2READ_LOCK_TABLE
3.3 APT_LINKER 4.3 APT_DBNAME
3.4 APT_LINKOPT 4.4 APT_RDBMS_COMMIT_ROWS
4.5 DB2DBDFT
5 Debugging
5.1 APT_DEBUG_OPERATOR 6 Decimal Support
5.2 APT_DEBUG_MODULE_NAMES 6.1 APT_DECIMAL_INTERM_PRECISION
5.3 APT_DEBUG_PARTITION 6.2 APT_DECIMAL_INTERM_SCALE
5.4 APT_DEBUG_SIGNALS 6.3 APT_DECIMAL_INTERM_ROUND_MODE
5.5 APT_DEBUG_STEP
5.6 APT_DEBUG_SUBPROC 7 Disk IO
5.7 APT_EXECUTION_MODE 7.1 APT_BUFFER_DISK_WRITE_INCREMENT
5.8 APT_PM_DBX 7.2 APT_CONSISTENT_BUFFERIO_SIZE
5.9 APT_PM_GDB 7.3 APT_EXPORT_FLUSH_COUNT
5.10 APT_PM_SHOW_PIDS 7.4 APT_IO_MAP/APT_IO_NOMAP and APT_BUFFERIO_MAP/APT_BUFFERIO_NOMAP
5.11 APT_PM_XLDB 7.5 APT_PHYSICAL_DATASET_BLOCK_SIZE
5.12 APT_PM_XTERM
5.13 APT_SHOW_LIBLOAD 9 Job Monitoring
9.1 APT_MONITOR_SIZE
8 General Job Admin 9.2 APT_MONITOR_TIME
8.1 APT_CHECKPOINT_DIR 9.3 APT_NO_JOBMON
8.2 APT_CLOBBER_OUTPUT 9.4 APT_PERFORMANCE_DATA
8.3 APT_CONFIG_FILE
8.4 APT_DISABLE_COMBINATION 11 Lookup
8.5 APT_EXECUTION_MODE 11.1 APT_LUTCREATE_NO_MMAP
8.6 APT_ORCHHOME
8.7 APT_STARTUP_SCRIPT 12 Network
8.8 APT_NO_STARTUP_SCRIPT 12.1 APT_IO_MAXIMUM_OUTSTANDING
8.9 APT_STARTUP_STATUS 12.2 APT_IOMGR_CONNECT_ATTEMPTS
8.10 APT_THIN_SCORE 12.3 APT_PM_CONDUCTOR_HOSTNAME
12.4 APT_PM_NO_TCPIP
10 Misc 12.5 APT_PM_NODE_TIMEOUT
10.1 APT_COPY_TRANSFORM_OPERATOR 12.6 APT_PM_SHOWRSH
10.2 “APT_EBCDIC_VERSION” on page 65 12.7 APT_PM_USE_RSH_LOCALLY
10.3 APT_DATE_CENTURY_BREAK_YEAR 12.8 APT_RECVBUFSIZE
10.4 APT_IMPEXP_ALLOW_ZERO_LENGTH_FIXED_NULL 12.9 APT_RECVBUFSIZE
10.5 APT_IMPORT_REJECT_STRING_FIELD_OVERRUNS
10.6 APT_INSERT_COPY_BEFORE_MODIFY 13 NLS
10.7 APT_OLD_BOUNDED_LENGTH 13.1 APT_COLLATION_SEQUENCE
10.8 APT_OPERATOR_REGISTRY_PATH 13.2 APT_COLLATION_STRENGTH
10.9 APT_PM_NO_SHARED_MEMORY 13.3 APT_ENGLISH_MESSAGES
10.10 APT_PM_NO_NAMED_PIPES 13.4 APT_IMPEXP_CHARSET
10.11 APT_PM_SOFT_KILL_WAIT 13.5 APT_INPUT_CHARSET
10.12 APT_PM_STARTUP_CONCURRENCY 13.6 APT_OS_CHARSET
10.13 APT_RECORD_COUNTS 13.7 APT_OUTPUT_CHARSET
10.14 APT_SAVE_SCORE 13.8 APT_STRING_CHARSET
10.15 APT_SHOW_COMPONENT_CALLS
10.16 APT_STACK_TRACE 14 Oracle Support
10.17 APT_WRITE_DS_VERSION 14.1 APT_ORACLE_LOAD_DELIMITED
10.18 OSH_PRELOAD_LIBS 14.2 APT_ORACLE_LOAD_OPTIONS
14.3 APT_ORACLE_NO_OPS
15 Partitioning 14.4 APT_ORACLE_PRESERVE_BLANKS
15.1 APT_NO_PART_INSERTION 14.5 APT_ORA_IGNORE_CONFIG_FILE_PARALLELISM
15.2 APT_PARTITION_COUNT 14.6 APT_ORA_WRITE_FILES
15.3 APT_PARTITION_NUMBER 14.7 APT_ORAUPSERT_COMMIT_ROW_INTERVAL APT_ORAUPSERT_COMMIT_TIME_INTERVAL
16 Reading/Writing files 17 Reporting
16.1 APT_DELIMITED_READ_SIZE 17.1 APT_DUMP_SCORE
16.2 APT_FILE_IMPORT_BUFFER_SIZE 17.2 APT_ERROR_CONFIGURATION
16.3 APT_FILE_EXPORT_BUFFER_SIZE 17.3 APT_MSG_FILELINE
16.4 APT_IMPORT_PATTERN_USES_FILESET 17.4 APT_PM_PLAYER_MEMORY
16.5 APT_MAX_DELIMITED_READ_SIZE 17.5 APT_PM_PLAYER_TIMING
16.6 APT_PREVIOUS_FINAL_DELIMITER_COMPATIBLE 17.6 APT_RECORD_COUNTS
16.7 APT_STRING_PADCHAR 17.7 OSH_DUMP
17.8 OSH_ECHO
18 Sorting 17.9 OSH_EXPLAIN
18.1 APT_NO_SORT_INSERTION 17.10 OSH_PRINT_SCHEMAS
18.2 APT_SORT_INSERTION_CHECK_ONLY
21 SAS Support
19 Teradata Support 21.1 APT_HASH_TO_SASHASH
19.1 APT_TERA_64K_BUFFERS 21.2 APT_NO_SASOUT_INSERT
19.2 APT_TERA_NO_ERR_CLEANUP 21.3 APT_NO_SAS_TRANSFORMS
19.3 APT_TERA_NO_PERM_CHECKS 21.4 APT_SAS_ACCEPT_ERROR
19.4 APT_TERA_NO_SQL_CONVERSION 21.5 APT_SAS_CHARSET
19.5 APT_TERA_SYNC_DATABASE 21.6 APT_SAS_CHARSET_ABORT
19.6 APT_TERA_SYNC_USER 21.7 APT_SAS_COMMAND
21.8 APT_SASINT_COMMAND
20 Transport Blocks 21.9 APT_SAS_DEBUG
20.1 APT_AUTO_TRANSPORT_BLOCK_SIZE 21.10 APT_SAS_DEBUG_IO
20.2 APT_LATENCY_COEFFICIENT 21.11 APT_SAS_DEBUG_LEVEL
20.3 APT_DEFAULT_TRANSPORT_BLOCK_SIZE 21.12 APT_SAS_DEBUG_VERBOSE
20.4 APT_MAX_TRANSPORT_BLOCK_SIZE/ APT_MIN_TRANSPORT_BLOCK_SIZE 21.13 APT_SAS_NO_PSDS_USTRING
21.14 APT_SAS_S_ARGUMENT
21.15 APT_SAS_SCHEMASOURCE_DUMP
21.16 APT_SAS_SHOW_INFO
21.17 APT_SAS_TRUNCATION 

InfoSphere DataStage Jobstatus returned Codes from dsjob

Equ DSJS.RUNNING To 0  This is the only status that means the job is actually running
Equ DSJS.RUNOK To 1  Job finished a normal run with no warnings
Equ DSJS.RUNWARN To 2  Job finished a normal run with warnings
Equ DSJS.RUNFAILED To 3  Job finished a normal run with a fatal error
Equ DSJS.QUEUED To 4  Job queued waiting for resource allocation
Equ DSJS.VALOK To 11  Job finished a validation run with no warnings
Equ DSJS.VALWARN To 12  Job finished a validation run with warnings
Equ DSJS.VALFAILED To 13  Job failed a validation run
Equ DSJS.RESET To 21  Job finished a reset run
Equ DSJS.CRASHED To 96  Job has crashed
Equ DSJS.STOPPED To 97  Job was stopped by operator intervention (can’t tell run type)
Equ DSJS.NOTRUNNABLE To 98  Job has not been compiled
Equ DSJS.NOTRUNNING To 99  Any other status

Alphabetical list of stages in DataStage

The stages in this document are included and available with IBM InfoSphere DataStage unless otherwise noted in the following key.
Palette category Description
Data Quality1 Available only with IBM InfoSphere QualityStage
Data Quality2 Available as a separately licensed add-on module to InfoSphere QualityStage
Application Available as an add-on Pack to IBM InfoSphere DataStage and QualityStage
The following key includes additional notes about stages in the Designer client.

Stage name Description
Stage name3 Available in the Stage Types folder in the repository tree, but is not displayed in the palette by default.

Alphabetic A – F: Stages in parallel jobs or server jobs


Stage name Job type Palette category Delivery and notes
Address Verification Parallel Data Quality2
Aggregator Parallel, server Processing
BASIC Transformer3 Parallel Processing Superseded by the Transformer stage
CASS Parallel Data Quality2
CDC Transaction Parallel Real Time Delivered as the CDC Transaction patch
Change Apply Parallel Processing
Change Capture Parallel Processing
Checksum Parallel Processing
Classic Federation Parallel Database
Column Export Parallel Restructure
Column Generator Parallel Development/Debug
Column Import Parallel Restructure
Combine Records Parallel Restructure
Command Stage  Server Processing
Compare Parallel Processing
Complex Flat File Parallel, server File
Compress Parallel Processing
Copy Parallel Processing
Data Set  Parallel File
DB2® Connector Parallel, server Database
DB2 Enterprise Parallel Database Superseded by the DB2 Connector stage
DB2 UDB API3 Parallel, server Database Superseded by the DB2 Connector stage
DB2 UDB Load3 Parallel, server Database Superseded by the DB2 Connector stage
DB2Z 3 Parallel Database Superseded by the DB2 Connector stage
Decode Parallel Processing
Difference Parallel Processing
Direct Access for PeopleSoft Enterprise Parallel, server Application
Distributed Transaction Parallel Database
DPID Parallel Data Quality2
DRS Connector Parallel, server Database
Dynamic RDBMS Parallel, server Database Superseded by the DRS Connector stage (Dynamic Relational Stage Connector stage)
Encode Parallel Processing
Essbase Connector Parallel Application
Expand Parallel Processing
External Filter Parallel Processing
External Source Parallel File
External Target Parallel File
File Set Parallel File
Filter Parallel Processing
Folder  Server File
FTP Enterprise Parallel Processing
FTP Plug-in Parallel, server Processing Superseded by the FTP Enterprise stage on the parallel canvas
Funnel  Parallel Processing

Alphabetic G – P: Stages in parallel jobs or server jobs


Stage name Job type Palette category Delivery and notes
Generic Parallel Processing
Hashed File Server File
Head Parallel Development/Debug
Hierarchy for PeopleSoft Enterprise Parallel, server Application
Informix® CLI Parallel, server Database
Informix Enterprise Parallel Database
Informix Load Parallel, server Database
Informix XPS Load3 Server Database
InterProcess Server Processing
Investigate Parallel Data Quality1
ISD Input  Parallel, server Real Time
ISD Output Parallel, server Real Time
iWay Enterprise Parallel Database
Java Client Parallel, server Real Time
Java Transformer Parallel, server Real Time
JD Edwards EnterpriseOne Parallel, server Application
Join Parallel Processing
Link Collector Server Processing
Link Partitioner Server Processing
Lookup Parallel Processing
Lookup File Set Parallel File
Make Subrecord  Parallel Restructure
Make Vector  Parallel Restructure
MapStage (for TX) Parallel Application Delivered as IBM WebSphere Transformation Extender for DataStage
Match Frequency Parallel Data Quality1
Merge  Parallel, server Processing
MNS (Multinational Standardization) Parallel Data Quality1
Modify Parallel Processing
MS OLEDB Server Database
MS SQL Server Load Server Database
Netezza Enterprise Parallel Database Documentation is available only with the installed product
ODBC Server Database Superseded by the ODBC Connector stage
ODBC Connector Parallel, server Database
ODBC Enterprise3 Parallel Database, Application Superseded by the ODBC Connector stage
Oracle Applications Direct Access Parallel, server Application
Oracle Applications Hierarchy Access Parallel, server Application
Oracle Connector Parallel, server Database
Oracle Enterprise3 Parallel Database Superseded by the Oracle Connector stage
Oracle OCI (Oracle Call Interface)3 Server Database Superseded by the Oracle Connector stage
Oracle OCI Load3 Parallel, server Database, Application Superseded by the Oracle Connector stage
Peek Parallel Development/Debug
Pivot Parallel, server Processing Superseded by the Pivot Enterprise stage on the parallel canvas
Pivot Enterprise Parallel Processing
Promote Subrecord Parallel Restructure

Alphabetic Q – Z : Stages in parallel jobs or server jobs

Stage name Job type Palette category Delivery and notes
QualityStage Legacy Parallel Data Quality1
RedBrick Load Parallel, server Database
Reference Match Parallel Data Quality1
Remove Duplicates Parallel Processing
Row Generator Parallel Development/Debug
Row Merger Server Processing
Row Splitter Server Processing
Salesforce Parallel Application
Sample Parallel Development/Debug
(SAP BW) BW Load Parallel, server Application
(SAP BW) BW Open Hub Parallel, server Application
(SAP R/3) ABAP Extract Parallel, server Application
(SAP R/3) BAPI Parallel, server Application
(SAP R/3) IDoc Extract Parallel, server Application
(SAP R/3) IDoc Load Parallel, server Application
SAS Parallel Application
SAS Parallel Data Set Parallel Application
Sequential File Parallel, server File
SERP (Software Evaluation and Recognition Program) Parallel Data Quality2
Siebel BC Pack (Business Component) Parallel, server Application
Siebel DS Pack (Direct Access) Parallel, server Application
Siebel EIM Pack (Enterprise Integration Manager) Parallel, server Application
Slowly Changing Dimension Parallel Processing
Sort Parallel, server Processing
Split Subrecord Parallel Restructure
Split Vector Parallel Restructure
SQA (Standardization Quality Assessment) Parallel Data Quality1
SQLServer Enterprise Parallel Database
Standardize Parallel Data Quality1
Stored Procedure Parallel, server Database
Surrogate Key Generator Parallel Processing
Survive Parallel Data Quality1
Switch Parallel Processing
Sybase BCP Load Server Database
Sybase Enterprise Parallel Database
Sybase IQ 12 Load Parallel, server Database
Sybase OC Parallel, server Database, Application
Tail Parallel Development/Debug
Teradata API3 Parallel, server Database Superseded by the Teradata Connector stage
Teradata Connector  Parallel, server Database
Teradata Enterprise 3 Parallel Database Superseded by the Teradata Connector stage
Teradata Load3 Parallel, server Database Superseded by the Teradata Connector stage
Teradata Multiload3 Parallel, server Database Superseded by the Teradata Connector stage
Transformer Parallel, server Processing
Unduplicate Match Parallel Data Quality1
UniData® Server Database
UniData 6 Server Database Superseded by the UniData stage
UniVerse  Server Database
Wave Generator  Parallel Processing
Web Services Client Parallel, server Real Time
Web Services Transformer Parallel, server Real Time
WebSphere® MQ3 Parallel, server Real Time Superseded by the WebSphere MQ Connector stage
WebSphere MQ Connector Parallel, server Real Time
Write Range Map Parallel Development/Debug
XML Parallel, server Real Time Delivered as the XML Transformer patch
XML Input Parallel, server Real Time Superseded by the XML stage
XML Output Parallel, server Real Time Superseded by the XML stage
XML Transformer Parallel, server Real Time Superseded by the XML stage
zOS File Parallel File


Stages that are available in sequence jobs

 

Stage name Palette category
EndLoop Activity Sequence
Exception Handler Sequence
Execute Command Sequence
Job Activity Sequence
Nested Condition Sequence
Notification Activity Sequence
Routine Activity Sequence
Sequencer Sequence
StartLoop Activity Sequence
Terminator Activity Sequence
UserVariables Activity Sequence
Wait For File Activity Sequence

Datastage Webservices pack for SOAP over HTTP

V 8.0 and up already has a Wbeservices pack for Datastage. if in 7.5, this needs to be installed (acquired) separately

Stages:
WSClient Stage
WSTransformer Stage

Datastage can act as a SOAP client. DataStage Job can reach out and invoke a Web Service. 

Get the WSDL from the webservices team for the API to be invoked

(WISD, or RTI, allows you to publish a DataStage Job or other Information Server asset “as” a Web Service)
for more information:
http://etlimpact.blogspot.in/2012/08/web-service-integration-through.html

http://www.dstagebox.com/2011/07/expose-datastage-job-as-a-web-service/

http://infospheredatastage.blogspot.in/2011/09/infosphere-datastage-and-web-service.html




Difference between Hash and Modulus partitioning

Hash and Modulus techniques are Key based partition techniques. 

Hash and Modulus techniques are used for different purpose. 


If Key column data type is textual then we use has partition technique for the job. 

If Key column data type is numeric, we use modulus partition technique. 

If one key column numeric and another text then also we use has partition technique. 

if both the key columns are numeric data type then we use modulus partition technique.

Routines

Routines are stored in the
Routines branch of the DataStage Repository, where you can create, view or
edit. The following are different types of routines:

1) Transform functions

2) Before-after job subroutines

3) Job Control routines