January 10, 2013 Restart Commands for IIS Datstage 8.5 or 8.7 UNIX/Linux We will stop in this order: Stop DataStage Stop Node Agents Stop WebSphere Log into your Unix/Linux machine as root (you can run first part as dsadm). Make sure that no one is using DataStage. Check if there are any processes left by executing the following commands:ps -ef | grep phantom – This would show processes from jobsps -ef | grep osh – This would show processes for parallel jobsps -ef | grep dsapi – This would show processes for client connectionps -ef | grep dscs – This would show processes for client connectionThese commands shouldn’t return any results. If they return results, it means that there are still some processes. You need to wait for the users to stop their processes or kill the processes (using kill -9 PID) if you confirm that these are not longer valid connections . Check if DataStage RPC Daemon (dsrpc) is running by executing:netstat -a | grep dsrpcIf dsrpc is running then you should see one line with the status LISTEN. If you get more lines it means that there are still some connections and you need to wait. If you see that dsrpc or other connections are in status CLOSE_WAIT then you also need to wait until these disappear (you can also reboot the machine) Stop DataStage (you have to run this from $DSHOME). ./dsenvbin/uv -admin -stop Check to see if there are any shared memory segments left:ipcs -m | grep adeNote: If you are not running with the default itag of “ade” use that instead of “ade”. For example if you have two DataStage instances one would be “ade” and the second would have another tag, say “801” or “efg”.If you get a result, take note of the ID and remove it (you will need root access for this part)ipcrm -m ID Then check dsrpc again usingnetstat -a | grep dsrpcYou shouldn’t get any results. If you get any and/or the dsrpc is in CLOSE_WAIT then you need to wait until this it disappears (you can also reboot the machine) Stop the ASB Agent (you need to be root here)Go to <InstallPath>/IBM/InformationServer/ASBNode/binRun ./NodeAgents.sh stopType ‘yes’ if you receive the following message: rm: remove write-protected regular file Agent.pid’? Check the agents have stopped:ps -ef | grep javaTo see full command in Solaris:/usr/ucb/ps -auxww | grep javaIf still see the Logging Agent or ASB Agent then you need to kill these. e.g. kill -9 PID Stop the WebSphere Server (you need to be root here)Go to <InstallPath>/IBM/WebSphere/AppServer/binRun ./stopServer.sh server1* if that does not work, try ./bin/stopServer.sh server1 -username wasadmin -password wasadminNow check that the WebSphere java processes is gone:ps -ef | grep javaTo see full command in Solaris:/usr/ucb/ps -auxww | grep javaremove it if it exists with: kill PID (no -9 in the kill command) Now, you can restart, in this order: Start WebSphere Server Start ASB Agent Start DataStage Start WebSphere Server as rootGo to <InstallPath>/IBM/WebSphere/AppServer/binRun ./startServer.sh server1Wait until you get back a line saying:Server server1 open for e-business; Start ASB Agent as rootGo to <InstallPath>/IBM/InformationServer/ASBNode/binRun ./NodeAgents.sh start Start DataStage (as dsadm)Go to $DSHOMERun bin/uv -admin -start Reference: http://www-01.ibm.com/support/docview.wss?uid=swg21431700
January 10, 2013 Pass the Fatal Error Message between jobs as parameters *A simple server routine to get the fatal errors from failed job and it can passed as parameter to next job as one string and further it can be stored into a file or database. $INCLUDE DSINCLUDE JOBCONTROL.H JobName = FIELD(Arg1,”|”,1) evtMessage1 = “”hJob1 = DSAttachJob(JobName, DSJ.ERRNONE) jobStart = DSGetJobInfo (hJob1, DSJ.JOBSTARTTIMESTAMP) latestID = DSGetNewestLogId(hJob1,DSJ.LOGANY) For eventID = latestID To 1 Step -1 eventDetail = DSGetLogEntry(hJob1,eventID) evtDate = FIELD(eventDetail,””,1) evtDate = CHANGE(evtDate,”Date: “,””) evtMessage = FIELD(eventDetail,””,4) evtType = Field(eventDetail,””,3) If evtDate < jobStart Then Exit If evtType = “3” Then evtMessage1 = evtMessage:”|”:evtMessage1 ENDNext eventID Ans = Lower(evtMessage1)
January 5, 2013 Datastage 9.1 – the latest version Excel Compatibility Excel is feeling the love at IBM IOD 2012 thanks to DataStage 9.1. The new version can read MS Excel naively from any platform – you can read Excel even when your DataStage server is install on Unix or Linux – without the pain of Unix to Windows bridges. Excel spreadsheets have never been easier to access as a data source. Previously DataStage installed on Windows read Excel files easily thanks to all the Windows libraries but was damn hard to get working from Unix or Linux as the operating system could not understand MS Office files. DataStage 9.1 comes with an unstructured text stage that makes Excel easier to read even if it is not a perfected formatted table. DataStage can go in and find column headings whether they are on row 1 or row 10. It can parse the columns and turn them into relational data and even add on extra text strings such as a single comment field. DataStage 9.1 can overcome a lot of the problems you get from Excel as a source when people muck around with the spreadsheet format. Support for Big Data Balanced Optimization for DB2 z/OS and for Hadoop Faster Database Writes DataStage 9.1 is faster when writing to DB2 and Oracle thanks to improved big buffering of data. Oracle Bulk Load and DB2 Bulk Load are faster. Oracle Upsert is faster. The Connector stages are better. Faster Netezza Table Copies IBM introduces a new interface called Data Click – it promises to load tables from a source Oracle or DB2 table into Netezza with just two clicks. Two clicks! The new console will automatically generate the DataStage job required to move the tables selected into Netezza. Great for fast PoC or agile BI development. It can repeat the data movement later or automatically generate InfoSphere CDC table subscriptions to keep those tables up to date in real time. IlOG comes back to DataStage ILOG is now called Operational Decision Management. This must be licensed for the stage to work properly. Many a year ago there was a JRules plugin for DataStage so you could run the iLog JRule engine in stream with a DataStage job. IBM has brought this back to DataStage 9.1 Parallel Jobs. What this means for DataStage is one of the best rules engines on the market is available to ETL processing. What it means for iLog is massive scalability, running iLog on a grid or cluster of DataStage parallel servers with automatic partitioning. It also means iLog can read and write data in ANY format – DataStage can read any database, flat file, mainframe file, InfoSphere CDC subscription, SOA message or hundreds of other data formats and write to or read from iLog JRules. Second Generation Java Stage In the new and improved Java Stage for DataStage you will be able to view and browse java classes and access a huge range of data sources or targets. Better access to web applications and web data and easier integration of Java code. Write to Multiple Files DataStage 9.1 makes it easier to take a data stream and write it to a bunch of output files, such as splitting data up by customer. New Expressions A couple old Server Edition functions finally make their way onto the Parallel canvas – such as the popular EReplace function for replacing values in text strings. WorkLoad Management The DataStage Operations Console, the web application introduced in DataStage 8.7, has a new tab for workload management. Included in the bundles of goodies is the ability to cap the number of jobs running at once, the amount of CPU being used and the amount of RAM being used by parallel jobs. This should avoid problems you get when you pass 100% RAM utilisation and stagger the start up of hundreds of simultaneous job requests. There is also the ability to create queues and apply workload priorities to those queues. Datstage on Windows For DataStage on Windows far more use is made of native Windows processes, and very little of DataStage now uses MKS Toolkit processes (MKS Toolkit still installs, for all those who have created invocations of “UNIX” commands/scripts from DataStage). What DataStage 9.1 Does Not Have It is with some disappointment that I report that the following oft requested features will not be in DataStage 9.1: DataStage will not have a plugin that chops fruit and vegetables. It will not be possible to chop up an entire soup in three seconds using parallel chop-o-matics. I’m afraid we will need to continue to chop using the old fashioned methods. DataStage 9.1 National Language Support lets you process data in different character sets such as Mandarin and Japanese. Unfortanately still no support for Klingon and it is still not possible to process customer sentiments on Twitter that are written in Klingon, which as we all know has a much richer dialect for insults. Source: http://it.toolbox.com/blogs/infosphere/ibm-launches-datastage-91-at-ibm-iod2012-53350#4993356
January 5, 2013 dsjob return code 269 Datstage v8.5 dsjob command in V8.5 has error. if you want to revert back old functionality of dsjob – Apply APAR JR43437 and set environment variable(APAR JR43437) DSE_SLAVE_CLOSE_SOCKET_ON_EXEC=1 in $DSHOME/dsenv. Workaround – Set the DataStage Administrator Client inactivity timeout value to a number of seconds that is greater than your longest-running job Source: http://www-01.ibm.com/support/docview.wss?uid=swg21621459
January 5, 2013 Rename many/all jobs Create an Excel spreadsheet with new and old names. Export the whole project as a dsx file. Write a Perl program, which can do a simple rename of the strings looking up the Excel file. Then import the new dsx file probably into a new project for testing. Recompile all jobs. Be cautious that the name of the jobs has also been changed in your job control jobs or Sequencer jobs. So you have to make the necessary changes to these Sequencers.
January 5, 2013 Exception activity in datastage The stages followed by exception activity will be executed whenever an unknown error occurs while running the job sequencer
January 5, 2013 Schedule Datastage ETL jobs – Basics Read about Datastage Scheduler What is the utility used to schedule the jobs on a UNIX server? dsjob utility (UNIX commands) are used to schedule datastage jobs. dsjob commands allow us to: Start a job (-run) Stop a job -stop List projects, jobs, stages, links, and parameters Set an alias for a job Retrieve information Access log files (Important )- IBM reference dsjob -log [ -info | -warn ] [ -useid ] project job|job_id Generate a report <h2>There are many tools that help scheduling</h2> Ascential Director AUTOSYS”: Thru autosys u can automate the job by invoking the shell script written to schedule the datastage jobs Autosys is the most popular scheduler SeeBeyond, ControlM, at, cron For more information read – Popular schedulers in Datastage
January 5, 2013 SQL LOADER or OCI STAGE When the source data is anormous or for bulk data we can use OCI and SQL loader depending upon the source
January 5, 2013 What are Stage Variables, Derivations and Constants? Stage Variable – An intermediate processing variable that retains value during read and doesnt pass the value into target column.Derivation – Expression that specifies value to be passed on to the target column.Constant – Conditions that are either true or false that specifies flow of data with a link