Ananth TM | Datastage

August 17, 2016

Oracle column message type NCLOB

Officially an NCLOB is not supported in lower versions. Is there a way to make this work in a bare bones SRC -> TRANSFORMER -> TGT kind of job or something more complex? Please let me know in comments. In V11.5, define NCLOB as LONGNVARCHAR.

https://www.ibm.com/support/knowledgecenter/SSZJPZ_11.5.0/com.ibm.swg.im.iis.conn.oracon.usage.doc/topics/data_map_to_ds_oracc.html

What is NCLOB?

NCLOB (National Character Large Object) is an Oracle data type that can hold up to 4 GB of character data. It’s similar to a CLOB, but characters are stored in a NLS or multibyte national character set.

Other details that may be useful

1. If you are using JAVA applications to modify the database columns, then you have to open streaming connections to write data for these columns, where the streaming connections can be ASCII or Binary connections.
2. From SQL plus it not possible to write as max value that can be written is 4000 chars.
3. From PL/SQL you have to use the dbms_lob.writeappend packages for inserting data.

July 8, 2016

SPARSE lookup

July 7, 2016

Datastage Director – Job Running and viewing issues

Operator role by default has very few options, check how it is setup in your environment.

You could recheck – extract the job log via command line

sanity test your environment.
- RowGen->Peek … then compile and run
- RowGen -> Transformer -> Peek … compile and run

$DSHOME/bin/dsjob -lprojects

$DSHOME/bin/dsjob -domain host:port -server host -user u124443 -password some43@$ -lprojects

July 7, 2016

‘mapping file error’ -Lookup Stage

Join versus Lookup

The Lookup Stage: It memory maps files. This means that you MUST have enough system memory available to store the entire contents of the file AND you must have enough disk space on the resource disk (defined in the APT_CONFIG_FILE) to shadow the file in memory.

‘Each physical process can address only up to 4 GB of memory because it is a 32-bit application. The Windows version of the InfoSphere DataStage Parallel Engine is available only with 32-bit pointers. ‘

It is not that you dont have enough memory on the system, but that to load the whole map into memory hits the limit. The issue was worked around by using a join stage instead of lookup.

Another option is to Change the job to re-partition both the reference input to the lookup and the primary input to lookup to match on the keys. Because the lookup is running 4 way parallel and because we have explicitly partitioned the data, the lookup will disable memory sharing and the per process memory requirement is reduced on the reference input because of the data distribution. This will enable the job to complete.

Fatal Error: APT_Communicator::shmemInitBuffers: createMapFile (/tmp) failed: Not enough space on node.

This message can also be caused by system-wide limit on the number of mmap’ed shared memory segments. Often this issue occurs when the value for APT_DEFAULT_TRANSPORT_BLOCK_SIZE is set too high. Please check the environmental variable APT_DEFAULT_TRANSPORT_BLOCK_SIZE. The default for this variable is 131072 or 128 KB. The maximum value should be is 1048576 or 1MB. Please see page 79 on the Parallel Job Advanced Developer Guide for additional information regarding this environmental variable.

Resolving the problem
To resolve this issue, ensure that there is adequate space in the assigned TMPDIR (temporary directory) or verify the settings on the environmental variable APT_DEFAULT_TRANSPORT_BLOCK_SIZE.

July 1, 2016

PX solution have a distinctive different solution than Server solutions.

July 1, 2016

Why do we use autosys or other job scheduler

Autosys gives u various options, like JOB_ON_ICE, JOB_ON_HOLD
Scheduling is pretty simple, if u hav a job that u want to schedule every one hr. Then through Datastage you have to schedule it 24 times, which would create 24 processes(Distinct PID). whereas in autosys you dont have to take so much pain.
if u want to run a job on first monday of every month, u just have to set a Calender in autosys, in datastage couldn’t think of.
if u want to run a job on first business day(a business day for a client may vary) of every month, u just have to set a Calender in autosys, in datastage couldn’t think of.
In short I would say, it would give various scheduling options, with less effort. Reusability and maintenance is also a factor.

how will u connect datastage job with autosys

Irrespective of the scheduler you use (AutoSys, SeeBeyond, ControlM, at, cron, to name a few) use the command line interface dsjob to specify what you want DataStage to do.

you will have to write a Wrapper shell scripts, in which you need use Datastage CLI(Command level interface). After creation of the wrapper shell script, you need to just execute that shell script through autosys.

Transformer Stage Looping

Looping from V8.5 and up

Aggregation operations make use of a cache that stores input rows.

Two functions, SaveInputRecord() and GetSavedInputRecord(), are used to add input rows to the cache and retrieve them.

SaveInputRecord() is called when a stage variable is evaluated, and returns the count of rows in the cache (starting at 1 when the first row is added).
GetSavedInputRecord() is called when a loop variable is evaluated.

In the Transformer Stage settings, these are the various inputs

Stage variable

Define the stage variables:

can use functions such as: SaveInputRecord()
or use functions with link columns such as: LastRowInGroup(inlink.Col1)
can use IF THEN ELSE, such as: IF IsBreak THEN 0 ELSE SummingPrice + inlink.Price
Loop condition
Enter the expression as the loop condition:

@ITERATION <= NumRows

The loop continues to iterate for the count specified in the NumRows variable.
Loop variables

Define the loop variable, for instance:

SavedRowIndex: GetSavedInputRecord()

Output link metadata and derivations

Define the output link columns and their derivations, example:

Col1 – inlink.Col1
Price – inlink.Price
Percentage – (inlink.Price * 100)/TotalPrice

RUNTIME ERRORS

The number of calls to SaveInputRecord() and GetSavedInputRecord() must match for each loop. You can call SaveInputRecord() multiple times to add to the cache, but once you call GetSavedInputRecord(), then you must call it enough times to empty the input cache before you can call SaveInputRecord() again. The examples described can generate runtime errors in the following circumstances by not observing this rule:

If your Transformer stage calls GetSavedInputRecord before SaveInputRecord, then a fatal error similar to the following example is reported in the job log:
```
APT_CombinedOperatorController,0: Fatal Error: get_record() called on 
record 1 but only 0 records saved by save_record()
```
If your Transformer stage calls GetSavedInputRecord more times than SaveInputRecord is called, then a fatal error similar to the following example is reported in the job log:
```
APT_CombinedOperatorController,0: Fatal Error: get_record() called on 
record 3 but only 2 records saved by save_record()
```
If your Transformer stage calls SaveInputRecord but does not call GetSavedInputRecord, then a fatal error similar to the following example is reported in the job log:
```
APT_CombinedOperatorController,0: Fatal Error: save_record() called on 
record 3, but only 0 records retrieved by get_record()
```
If your Transformer stage does not call GetSavedInputRecord as many times as SaveInputRecord, then a fatal error similar to the following example is reported in the job log:
```
APT_CombinedOperatorController,0: Fatal Error: save_record() called on 
record 3, but only 2 records retrieved by get_record()
```

July 1, 2016

Datastage Job scheduler

1. Be sure that scheduling has been defined with a valid user in your ADMINISTRATOR for the Project you are working.
2. In Director – highlight your job – click on the ‘Add schedule’ tool button – enter when you want it to run (every -which day of week or whatever schedule you want and apply the time you want it to run).
3. If you want to add or remove ‘limits’ you should set these at this time as well (max # of warnings and/or rows?)
4. Then ‘schedule’.

July 1, 2016

ETL CLOB data type

What is CLOB

A Character Large OBject (or CLOB) is a collection of character data in a database management system, usually stored in a separate location that is referenced in the table itself.

A CLOB (character large object) value can be up to 2,147,483,647 characters long. A CLOB is used to store unicode character-based data, such as large documents in any character set.

The length is given in number characters for both CLOB, unless one of the suffixes K, M, or G is given, relating to the multiples of 1024, 1024*1024, 1024*1024*1024 respectively.

Length is specified in characters (unicode) for CLOB.

From 8.5 version, The Oracle connector supports reading and writing Oracle LOB data types BFILE,BLOB, CLOB, NCLOB, LONG RAW, RAW, and XMLType.

Example

One of the column in a particular table contains HTML or XML data stored as a CLOB, The length could be large say ‘1048576’.

Use LongVarchar(n) datatype within DataStage.

In ETL job, replace CLOB with LongVarChar of same size (1048576) as CLOB is not defined in data Stage. Where “n” depends on what is in the CLOB field.

For example, DBA/Analyst confirms it will not be more than 1500 chars in the field. Use Varchar(2000) – not LongVarChar.

images stored as CLOBs

BLOBs are generally used to store images. Sometimes though, CLOBs have been used to store images. Sometimes only the Path is stored while the images are physically stored on the disk.

In cases where we can encode the images and store the text and the encoding metadata, we use BLOB/CLOB to store the images.

July 1, 2016

What is unicode and what is the use of it

While doing the View Data shows you “un-American” characters. View data wont give the correct result ,please check the final out put (file or table). Don’t trust what View Data shows you in regard to “un-American” characters. Use a viewer that can display (Eg. the Euro symbol, accented characters, etc.) properly.
Use a viewer that can display the Euro symbol, accented characters, etc. properly.
Use “UNICODE” to handle LATIN source data/Chinese characters .

Apply stringtoUstring(columnvalue,’UTF-8′) and make column datatype as nvarchar

NLS-settings:
– Server Maps=UTF-8
– Parallel Maps=UTF-8

code page IBM 1388

Error and resolution

APT_CombinedOperatorController(1),0: Invalid character(s) ([xE4]) found converting string (code point(s): [xEF][xBC][x8A][xE3][x80][x80][xE3][x80][x80][xE3][ x80][x80][xE3][x80][x80][xE3][x80][x80][xE3][x80][ x80][xE3][x80][x80][xE3][x80][x80][xE3][x80][x80][ xE3][x80][ x80][xE3][x80][x80][xE3][x80][x80][xE3][x80][x80][ xE3][x80][ x80][xE3][x80][x80][xE3][x80][x80] [xE3][x80][x80][xE3][x80][x80][xE3][x80][x80][xE3][ x80][ x80][xE3][x80][x80][xE3][x80][x80][xE3][x80][x80][ xE3][x80][ x80][xE3][x80][x80][xE3]…) from codepage UTF-8 to Unicode, substituting.

Probable Resolution:

the job (or project) was setup to default to UTF-8 character set, but the stage used to read source uses ISO-8859-1 characters.

Useful links

http://unicode-table.com/en/#0000