Datastage | Datastage

January 12, 2013

Datastage export objects

January 10, 2013

Restart Commands for IIS Datstage 8.5 or 8.7 UNIX/Linux

We will stop in this order:

Stop DataStage
Stop Node Agents
Stop WebSphere

Log into your Unix/Linux machine as root (you can run first part as dsadm).
Make sure that no one is using DataStage.
Check if there are any processes left by executing the following commands:
ps -ef | grep phantom – This would show processes from jobs
ps -ef | grep osh – This would show processes for parallel jobs
ps -ef | grep dsapi – This would show processes for client connection
ps -ef | grep dscs – This would show processes for client connection
These commands shouldn’t return any results. If they return results, it means that there are still some processes. You need to wait for the users to stop their processes or kill the processes (using kill -9 PID) if you confirm that these are not longer valid connections .
Check if DataStage RPC Daemon (dsrpc) is running by executing:
netstat -a | grep dsrpc
If dsrpc is running then you should see one line with the status LISTEN. If you get more lines it means that there are still some connections and you need to wait. If you see that dsrpc or other connections are in status CLOSE_WAIT then you also need to wait until these disappear (you can also reboot the machine)
Stop DataStage (you have to run this from $DSHOME)
. ./dsenv
bin/uv -admin -stop

Check to see if there are any shared memory segments left:
ipcs -m | grep ade
Note: If you are not running with the default itag of “ade” use that instead of “ade”. For example if you have two DataStage instances one would be “ade” and the second would have another tag, say “801” or “efg”.
If you get a result, take note of the ID and remove it (you will need root access for this part)
ipcrm -m ID
Then check dsrpc again using
netstat -a | grep dsrpc
You shouldn’t get any results. If you get any and/or the dsrpc is in CLOSE_WAIT then you need to wait until this it disappears (you can also reboot the machine)
Stop the ASB Agent (you need to be root here)
Go to <InstallPath>/IBM/InformationServer/ASBNode/bin
Run ./NodeAgents.sh stop
Type ‘yes’ if you receive the following message: rm: remove write-protected regular file Agent.pid’?
Check the agents have stopped:
ps -ef | grep java
To see full command in Solaris:
/usr/ucb/ps -auxww | grep java
If still see the Logging Agent or ASB Agent then you need to kill these. e.g. kill -9 PID
Stop the WebSphere Server (you need to be root here)
Go to <InstallPath>/IBM/WebSphere/AppServer/bin
Run ./stopServer.sh server1
* if that does not work, try ./bin/stopServer.sh server1 -username wasadmin -password wasadmin
Now check that the WebSphere java processes is gone:
ps -ef | grep java
To see full command in Solaris:
/usr/ucb/ps -auxww | grep java
remove it if it exists with: kill PID (no -9 in the kill command)

Now, you can restart, in this order:

Start WebSphere Server
Start ASB Agent
Start DataStage

Start WebSphere Server as root
Go to <InstallPath>/IBM/WebSphere/AppServer/bin
Run ./startServer.sh server1
Wait until you get back a line saying:
Server server1 open for e-business;
Start ASB Agent as root
Go to <InstallPath>/IBM/InformationServer/ASBNode/bin
Run ./NodeAgents.sh start
Start DataStage (as dsadm)
Go to $DSHOME
Run bin/uv -admin -start

Reference: http://www-01.ibm.com/support/docview.wss?uid=swg21431700

January 10, 2013

Pass the Fatal Error Message between jobs as parameters

*A simple server routine to get the fatal errors from failed job and it can passed as parameter to next job as one string and further it can be stored into a file or database.

$INCLUDE DSINCLUDE JOBCONTROL.H
JobName = FIELD(Arg1,”|”,1)
evtMessage1 = “”
hJob1 = DSAttachJob(JobName, DSJ.ERRNONE)
jobStart = DSGetJobInfo (hJob1, DSJ.JOBSTARTTIMESTAMP)
latestID = DSGetNewestLogId(hJob1,DSJ.LOGANY)
For eventID = latestID To 1 Step -1
eventDetail = DSGetLogEntry(hJob1,eventID)
evtDate = FIELD(eventDetail,””,1)
evtDate = CHANGE(evtDate,”Date: “,””)
evtMessage = FIELD(eventDetail,””,4)
evtType = Field(eventDetail,””,3)
If evtDate < jobStart Then Exit

If evtType = “3” Then
evtMessage1 = evtMessage:”|”:evtMessage1
END
Next eventID
Ans = Lower(evtMessage1)

January 5, 2013

Datastage 9.1 – the latest version

Excel Compatibility

Excel is feeling the love at IBM IOD 2012 thanks to DataStage 9.1. The new version can read MS Excel naively from any platform – you can read Excel even when your DataStage server is install on Unix or Linux – without the pain of Unix to Windows bridges. Excel spreadsheets have never been easier to access as a data source. Previously DataStage installed on Windows read Excel files easily thanks to all the Windows libraries but was damn hard to get working from Unix or Linux as the operating system could not understand MS Office files.

DataStage 9.1 comes with an unstructured text stage that makes Excel easier to read even if it is not a perfected formatted table. DataStage can go in and find column headings whether they are on row 1 or row 10. It can parse the columns and turn them into relational data and even add on extra text strings such as a single comment field. DataStage 9.1 can overcome a lot of the problems you get from Excel as a source when people muck around with the spreadsheet format.

Support for Big Data

Balanced Optimization for DB2 z/OS and for Hadoop

Faster Database Writes

DataStage 9.1 is faster when writing to DB2 and Oracle thanks to improved big buffering of data. Oracle Bulk Load and DB2 Bulk Load are faster. Oracle Upsert is faster. The Connector stages are better.

Faster Netezza Table Copies

IBM introduces a new interface called Data Click – it promises to load tables from a source Oracle or DB2 table into Netezza with just two clicks. Two clicks! The new console will automatically generate the DataStage job required to move the tables selected into Netezza. Great for fast PoC or agile BI development. It can repeat the data movement later or automatically generate InfoSphere CDC table subscriptions to keep those tables up to date in real time.

IlOG comes back to DataStage

ILOG is now called Operational Decision Management. This must be licensed for the stage to work properly.

Many a year ago there was a JRules plugin for DataStage so you could run the iLog JRule engine in stream with a DataStage job. IBM has brought this back to DataStage 9.1 Parallel Jobs. What this means for DataStage is one of the best rules engines on the market is available to ETL processing. What it means for iLog is massive scalability, running iLog on a grid or cluster of DataStage parallel servers with automatic partitioning. It also means iLog can read and write data in ANY format – DataStage can read any database, flat file, mainframe file, InfoSphere CDC subscription, SOA message or hundreds of other data formats and write to or read from iLog JRules.

Second Generation Java Stage

In the new and improved Java Stage for DataStage you will be able to view and browse java classes and access a huge range of data sources or targets. Better access to web applications and web data and easier integration of Java code.

Write to Multiple Files

DataStage 9.1 makes it easier to take a data stream and write it to a bunch of output files, such as splitting data up by customer.

New Expressions

A couple old Server Edition functions finally make their way onto the Parallel canvas – such as the popular EReplace function for replacing values in text strings.

WorkLoad Management

The DataStage Operations Console, the web application introduced in DataStage 8.7, has a new tab for workload management. Included in the bundles of goodies is the ability to cap the number of jobs running at once, the amount of CPU being used and the amount of RAM being used by parallel jobs. This should avoid problems you get when you pass 100% RAM utilisation and stagger the start up of hundreds of simultaneous job requests. There is also the ability to create queues and apply workload priorities to those queues.

Datstage on Windows

For DataStage on Windows far more use is made of native Windows processes, and very little of DataStage now uses MKS Toolkit processes (MKS Toolkit still installs, for all those who have created invocations of “UNIX” commands/scripts from DataStage).

What DataStage 9.1 Does Not Have

It is with some disappointment that I report that the following oft requested features will not be in DataStage 9.1:

DataStage will not have a plugin that chops fruit and vegetables. It will not be possible to chop up an entire soup in three seconds using parallel chop-o-matics. I’m afraid we will need to continue to chop using the old fashioned methods.

DataStage 9.1 National Language Support lets you process data in different character sets such as Mandarin and Japanese. Unfortanately still no support for Klingon and it is still not possible to process customer sentiments on Twitter that are written in Klingon, which as we all know has a much richer dialect for insults.

Source: http://it.toolbox.com/blogs/infosphere/ibm-launches-datastage-91-at-ibm-iod2012-53350#4993356

January 5, 2013

dsjob return code 269 Datstage v8.5

dsjob command in V8.5 has error. if you want to revert back old functionality of dsjob –

Apply APAR JR43437 and set environment variable(APAR JR43437)

DSE_SLAVE_CLOSE_SOCKET_ON_EXEC=1 in $DSHOME/dsenv.

Workaround – Set the DataStage Administrator Client inactivity timeout value to a number of seconds that is greater than your longest-running job

Source: http://www-01.ibm.com/support/docview.wss?uid=swg21621459

January 5, 2013

Rename many/all jobs

Create an Excel spreadsheet with new and old names.
Export the whole project as a dsx file.
Write a Perl program, which can do a simple rename of the strings looking up the Excel file.
Then import the new dsx file probably into a new project for testing.
Recompile all jobs.

Be cautious that the name of the jobs has also been changed in your job control jobs or Sequencer jobs. So you have to make the necessary changes to these Sequencers.

January 5, 2013

Exception activity in datastage

The stages followed by exception activity will be executed whenever an unknown error occurs while running the job sequencer

January 5, 2013

Schedule Datastage ETL jobs – Basics

Read about Datastage Scheduler

What is the utility used to schedule the jobs on a UNIX server?

dsjob utility (UNIX commands) are used to schedule datastage jobs. dsjob commands allow us to:

Start a job (-run)
Stop a job -stop
List projects, jobs, stages, links, and parameters
Set an alias for a job
Retrieve information

Access log files (Important )- IBM reference

dsjob -log [ -info | -warn ] [ -useid ] project job|job_id

Generate a report

<h2>There are many tools that help scheduling</h2>

Ascential Director
AUTOSYS”: Thru autosys u can automate the job by invoking the shell script written to schedule the datastage jobs
- Autosys is the most popular scheduler
SeeBeyond,
ControlM,
at,
cron

For more information read – Popular schedulers in Datastage

January 5, 2013

SQL LOADER or OCI STAGE

When the source data is anormous or for bulk data we can use OCI and SQL loader depending upon the source

January 5, 2013

What are Stage Variables, Derivations and Constants?

Stage Variable – An intermediate processing variable that retains value during read and doesnt pass the value into target column.
Derivation – Expression that specifies value to be passed on to the target column.
Constant – Conditions that are either true or false that specifies flow of data with a link