Restart Commands for IIS Datstage 8.5 or 8.7 UNIX/Linux

We will stop in this order:
  • Stop DataStage
  • Stop Node Agents
  • Stop WebSphere
  1. Log into your Unix/Linux machine as root (you can run first part as dsadm).
  2. Make sure that no one is using DataStage.
  3. Check if there are any processes left by executing the following commands:
    ps -ef | grep phantom – This would show processes from jobs
    ps -ef | grep osh – This would show processes for parallel jobs
    ps -ef | grep dsapi – This would show processes for client connection
    ps -ef | grep dscs – This would show processes for client connection
    These commands shouldn’t return any results. If they return results, it means that there are still some processes. You need to wait for the users to stop their processes or kill the processes (using kill -9 PID) if you confirm that these are not longer valid connections .
  4. Check if DataStage RPC Daemon (dsrpc) is running by executing:
    netstat -a | grep dsrpc
    If dsrpc is running then you should see one line with the status LISTEN. If you get more lines it means that there are still some connections and you need to wait. If you see that dsrpc or other connections are in status CLOSE_WAIT then you also need to wait until these disappear (you can also reboot the machine)
  5. Stop DataStage (you have to run this from $DSHOME)
    . ./dsenv
    bin/uv -admin -stop

  1. Check to see if there are any shared memory segments left:
    ipcs -m | grep ade
    Note: If you are not running with the default itag of “ade” use that instead of “ade”. For example if you have two DataStage instances one would be “ade” and the second would have another tag, say “801” or “efg”.
    If you get a result, take note of the ID and remove it (you will need root access for this part)
    ipcrm -m ID
  2. Then check dsrpc again using
    netstat -a | grep dsrpc
    You shouldn’t get any results. If you get any and/or the dsrpc is in CLOSE_WAIT then you need to wait until this it disappears (you can also reboot the machine)
  3. Stop the ASB Agent (you need to be root here)
    Go to <InstallPath>/IBM/InformationServer/ASBNode/bin
    Run ./NodeAgents.sh stop
    Type ‘yes’ if you receive the following message: rm: remove write-protected regular file Agent.pid’?
  4. Check the agents have stopped:
    ps -ef | grep java
    To see full command in Solaris:
    /usr/ucb/ps -auxww | grep java
    If still see the Logging Agent or ASB Agent then you need to kill these. e.g. kill -9 PID
  5. Stop the WebSphere Server (you need to be root here)
    Go to <InstallPath>/IBM/WebSphere/AppServer/bin
    Run ./stopServer.sh server1
    * if that does not work, try ./bin/stopServer.sh server1 -username wasadmin -password wasadmin
    Now check that the WebSphere java processes is gone:
    ps -ef | grep java
    To see full command in Solaris:
    /usr/ucb/ps -auxww | grep java
    remove it if it exists with: kill PID (no -9 in the kill command)

Now, you can restart, in this order:

  • Start WebSphere Server
  • Start ASB Agent
  • Start DataStage
  1. Start WebSphere Server as root
    Go to <InstallPath>/IBM/WebSphere/AppServer/bin
    Run ./startServer.sh server1
    Wait until you get back a line saying:
    Server server1 open for e-business;
  2. Start ASB Agent as root
    Go to <InstallPath>/IBM/InformationServer/ASBNode/bin
    Run ./NodeAgents.sh start
  3. Start DataStage (as dsadm)
    Go to $DSHOME
    Run bin/uv -admin -start
Reference: http://www-01.ibm.com/support/docview.wss?uid=swg21431700

Pass the Fatal Error Message between jobs as parameters

*A simple server routine to get the fatal errors from failed job and it can passed as parameter to next job as one string and further it can be stored into a file or database.

$INCLUDE DSINCLUDE JOBCONTROL.H 
JobName = FIELD(Arg1,”|”,1) 
evtMessage1 = “”
hJob1 = DSAttachJob(JobName, DSJ.ERRNONE) 
jobStart = DSGetJobInfo (hJob1, DSJ.JOBSTARTTIMESTAMP) 
latestID = DSGetNewestLogId(hJob1,DSJ.LOGANY) 
For eventID = latestID To 1 Step -1 
         eventDetail = DSGetLogEntry(hJob1,eventID) 
         evtDate = FIELD(eventDetail,””,1) 
         evtDate = CHANGE(evtDate,”Date: “,””) 
         evtMessage = FIELD(eventDetail,””,4) 
         evtType = Field(eventDetail,””,3) 
         If evtDate < jobStart Then Exit 
         
          
         If evtType = “3” Then 
            evtMessage1 = evtMessage:”|”:evtMessage1
         END
Next eventID 
Ans = Lower(evtMessage1)

Datastage 9.1 – the latest version

Excel Compatibility
Excel is feeling the love at IBM IOD 2012 thanks to DataStage 9.1.  The new version can read MS Excel naively from any platform – you can read Excel even when your DataStage server is install on Unix or Linux – without the pain of Unix to Windows bridges. Excel spreadsheets have never been easier to access as a data source.  Previously DataStage installed on Windows read Excel files easily thanks to all the Windows libraries but was damn hard to get working from Unix or Linux as the operating system could not understand MS Office files.
DataStage 9.1 comes with an unstructured text stage that makes Excel easier to read even if it is not a perfected formatted table.  DataStage can go in and find column headings whether they are on row 1 or row 10.  It can parse the columns and turn them into relational data and even add on extra text strings such as a single comment field.  DataStage 9.1 can overcome a lot of the problems you get from Excel as a source when people muck around with the spreadsheet format.
Support for Big Data
Balanced Optimization for DB2 z/OS and for Hadoop
Faster Database Writes
DataStage 9.1 is faster when writing to DB2 and Oracle thanks to improved big buffering of data. Oracle Bulk Load and DB2 Bulk Load are faster.  Oracle Upsert is faster.  The Connector stages are better.
Faster Netezza Table Copies
IBM introduces a new interface called Data Click – it promises to load tables from a source Oracle or DB2 table into Netezza with just two clicks.  Two clicks!  The new console will automatically generate the DataStage job required to move the tables selected into Netezza.  Great for fast PoC or agile BI development.  It can repeat the data movement later or automatically generate InfoSphere CDC table subscriptions to keep those tables up to date in real time.
IlOG comes back to DataStage

ILOG is now called Operational Decision Management. This must be licensed for the stage to work properly. 

Many a year ago there was a JRules plugin for DataStage so you could run the iLog JRule engine in stream with a DataStage job.  IBM has brought this back to DataStage 9.1 Parallel Jobs.  What this means for DataStage is one of the best rules engines on the market is available to ETL processing.  What it means for iLog is massive scalability, running iLog on a grid or cluster of DataStage parallel servers with automatic partitioning.  It also means iLog can read and write data in ANY format – DataStage can read any database, flat file, mainframe file, InfoSphere CDC subscription, SOA message or hundreds of other data formats and write to or read from iLog JRules.
Second Generation Java Stage
In the new and improved Java Stage for DataStage you will be able to view and browse java classes and access a huge range of data sources or targets.  Better access to web applications and web data and easier integration of Java code.
Write to Multiple Files
DataStage 9.1 makes it easier to take a data stream and write it to a bunch of output files, such as splitting data up by customer.
New Expressions
A couple old Server Edition functions finally make their way onto the Parallel canvas – such as the popular EReplace function for replacing values in text strings.
WorkLoad Management
The DataStage Operations Console, the web application introduced in DataStage 8.7, has a new tab for workload management.  Included in the bundles of goodies is the ability to cap the number of jobs running at once, the amount of CPU being used and the amount of RAM being used by parallel jobs.  This should avoid problems you get when you pass 100% RAM utilisation and stagger the start up of hundreds of simultaneous job requests.  There is also the ability to create queues and apply workload priorities to those queues.
Datstage on Windows
For DataStage on Windows far more use is made of native Windows processes, and very little of DataStage now uses MKS Toolkit processes (MKS Toolkit still installs, for all those who have created invocations of “UNIX” commands/scripts from DataStage).
What DataStage 9.1 Does Not Have
It is with some disappointment that I report that the following oft requested features will not be in DataStage 9.1:
DataStage will not have a plugin that chops fruit and vegetables.  It will not be possible to chop up an entire soup in three seconds using parallel chop-o-matics.  I’m afraid we will need to continue to chop using the old fashioned methods.
DataStage 9.1 National Language Support lets you process data in different character sets such as Mandarin and Japanese.  Unfortanately still no support for Klingon and it is still not possible to process customer sentiments on Twitter that are written in Klingon, which as we all know has a much richer dialect for insults.
Source: http://it.toolbox.com/blogs/infosphere/ibm-launches-datastage-91-at-ibm-iod2012-53350#4993356 

dsjob return code 269 Datstage v8.5

dsjob command in V8.5 has error. if you want to revert back old functionality of dsjob –  

Apply APAR JR43437 and set environment variable(APAR JR43437)

DSE_SLAVE_CLOSE_SOCKET_ON_EXEC=1 in $DSHOME/dsenv.

Workaround – Set the DataStage Administrator Client inactivity timeout value to a number of seconds that is greater than your longest-running job

Source: http://www-01.ibm.com/support/docview.wss?uid=swg21621459

Rename many/all jobs

  • Create an Excel spreadsheet with new and old names. 
  • Export the whole project as a dsx file.
  • Write a Perl program, which can do a simple rename of the strings looking up the Excel file. 
  • Then import the new dsx file probably into a new project for testing. 
  • Recompile all jobs. 

Be cautious that the name of the jobs has also been changed in your job control jobs or Sequencer jobs. So you have to make the necessary changes to these Sequencers.

Schedule Datastage ETL jobs – Basics

Read about Datastage Scheduler

What is the utility used to schedule the jobs on a UNIX server?

dsjob utility (UNIX commands) are used to schedule datastage jobs. dsjob commands allow us to:

  • Start a job (-run)
  • Stop a job -stop
  • List projects, jobs, stages, links, and parameters
  • Set an alias for a job
  • Retrieve information
  • Access log files (Important )- IBM reference
    dsjob -log [ -info | -warn ] [ -useid ] project job|job_id
  • Generate a report


<h2>There are many tools that help scheduling</h2>

  • Ascential Director
  • AUTOSYS”: Thru autosys u can automate the job by invoking the shell script written to schedule the datastage jobs
    • Autosys is the most popular scheduler
  • SeeBeyond,
  • ControlM,
  • at,
  • cron

For more information read – Popular schedulers in Datastage

What are Stage Variables, Derivations and Constants?

Stage Variable – An intermediate processing variable that retains value during read and doesnt pass the value into target column.
Derivation – Expression that specifies value to be passed on to the target column.
Constant – Conditions that are either true or false that specifies flow of data with a link