Data Integration or ETL products

Top picks out in the market –
1. Informatica
2. IBM DataStage
3. Ab Initio
4. Oracle
5. SAP Business Objects Data Integrator
6. Microsoft SSIS

Top Challengers
1. SAS/DataFlux
2. iWay Software
3. Talend
4. Syncsort
5. Pervasive Software
6. Pitney Bowes Insight

Competitive price vendors –
1. Informatica Cloud edition
2. expressor-software (pricing based on channels i.e. multiple processing)

Open Source vendors –
1. Talend
2. Apatar
3. Pentaho Kettle
4. SnapLogic

Cloud/SaaS Based vendors –
1. Informatica
2. SnapLogic
Both started by entrepreneur Gaurav Dhillon.

Top Pipeline Partitioning vendors –
1. IBM DataStage (process based)
2. Informatica (multi thread based)
3. expressor-software (hybrid based)

Top Message oriented/Real-time processing vendors –
1. IBM DataStage
2. Informatica

Best Integration vendors – 
1. Informatica (both Cloud edition based and adaptor based support)
2. IBM DataStage (Adaptor/Pack based support)

Top ELT architecture based vendors –
1. Talend (excellent ELT based objects to drag and drop in the designer)
2. IBM DataStage (provides options to create tables before loading)

Cloud computing in Datastage and Qualitystage

Is InfoSphere Information Server cloud enabled?
Yes, InfoSphere Information Server’s DataStage and QualityStage offerings are enabled for deployment as a cloud-deployable data integration solution. Due to the versatility of the InfoSphere DataStage and QualityStage platform support, connectivity (including and standards used, these offerings can be deployed in a similar manner to how they are deployed on-premise within an enterprise today.
·         What are the benefits of running InfoSphere Information Server in the cloud?
IBM InfoSphere Information Server in the cloud opens up multiple new solutions to traditional and cloud-enabled information challenges. Coupled with it’s the ability to leverage a pay-as-you-go pricing model with the rapid deployment paradigm and massive scalability InfoSphere Information Server, the following off-premise solutions can be quickly delivered:
o    Enable systems integrators to provide data consolidation services to support complex application rationalization and migration projects lasting 3 to 12 months.
o    Flexible development capacity for existing clients using InfoSphere Information Server.
o    On-going data preparation for SaaS applications and business intelligence solutions.
This reduces the upfront time and expense involved with setting up hardware infrastructure and software licenses for projects lasting 3-12 months. With its collaborative, model-driven design environment coupled with massive scalability of a parallel processing architecture, it is ideally suited to rapid deployment and ensuring maximum throughput of trusted data per hour.
·         What kind of cloud environments can InfoSphere be deployed in, or planned?
InfoSphere Information Server can be deployed in both private and public cloud scenarios. This announcement highlights the availability of the first InfoSphere offering on a public cloud provider, specifically, Amazon Elastic Compute Cloud (Amazon EC2), a hosting service provided by Amazon Web Services. Clients building their own private clouds or using other cloud providers can leverage their existing InfoSphere Information Server licenses provided they adhere to the license terms and prepared their own machine images.
·         What is Amazon Web Services (AWS)?
AWS delivers a set of integrated services that form a computing platform “in the cloud”. Learn more about Amazon Web Services and the IBM offerings on AWS.
·         What deployment models are available for InfoSphere Information Server on Amazon EC2?
You can deploy InfoSphere Information Server’s DataStage and QualityStage on Amazon EC2 one of two ways:
o    Create your own InfoSphere Information Server-based Amazon Machine Images (AMI)s by using licenses that you already own.
o    Use the pre-built InfoSphere Information Server AMIs containing production-ready InfoSphere DataStage and InfoSphere QualityStage generated by IBM. There are hourly usage charges for the IBM generated AMIs including InfoSphere DataStage and QualityStage software licensing costs.
·         What does InfoSphere Information Server uniquely deliver on Amazon EC2?
On Amazon EC2, the pre-built InfoSphere Information Server AMI delivers an integrated ETL and Data Quality development environment that enables developers to cleanse, transform and move data with same tool using the same metadata. InfoSphere Information Server has a dynamic parallel execution engine that provides a design, deploy anywhere capability that dynamically and seamlessly scales up-or-down based upon hardware configuration thereby simplifying deployment and administration. Lastly, InfoSphere QualityStage is a probabilistic matching engine that ensures higher quality trusted data when linking any data domain across multiple, complex data sources.
·         How does InfoSphere Information Server on Amazon save time, or money, or worry?
This offering is primarily aimed at systems integrator partners, who have an enterprise application consolidation and migration service lines (e.g., SAP) and have skilled resources trained on InfoSphere Information Server. These partners are actively helping clients to use InfoSphere Information Server to consolidate, cleanse, and load the right trusted data into their target enterprise application of choice. The new “pay as you go” pricing offered by InfoSphere Information Server on EC2 is ideal for short-term projects lasting 3-12 months. Ordinarily a client would need to purchase a perpetual data integration license and make a hardware investment in order to support a short-term application consolidation or migration project.
Secondarily, this offering can help enterprise IT departments who have an existing investment in InfoSphere Information Server skills and have the need for additional test and development capacity. In addition, it helps them to achieve demonstrable ROI before committing to the capital investment in additional hardware and data integration software licenses to support short-term enterprise projects, or cloud integration scenarios.
·         Can I create my own InfoSphere Information Server-based AMIs for use on Amazon EC2?
Yes, in addition to being able to use the InfoSphere DataStage and QualityStage AMI generated by IBM, you can create your own InfoSphere Information Server-based AMIs for EC2. If you create your own InfoSphere Information Server-based AMIs, you are responsible for ensuring you own licenses for the software (such as InfoSphere Information Server and the operating system) running on the AMI.
·         Does IBM provide any InfoSphere AMIs for use on Amazon EC2?
Yes, IBM has partnered with AWS to make available a pre-bundled InfoSphere Information Server AMI containing InfoSphere DataStage and InfoSphere QualityStage for production use. This AMI consists of InfoSphere DataStage and QualityStage version 8.1 pre-installed on Novell SUSE Linux Enterprise Server version 10 (SLES 10 SP2) for the server and Windows 2003 Server for the InfoSphere Developer client licenses.
·         How are the production-ready InfoSphere Information Server AMI priced?
The InfoSphere Information Server AMI available from AWS has an hourly pay-as-you-use pricing that depends on the instance size you run the InfoSphere AMIs on. You can also incur charges for additional AWS services you use with InfoSphere. See the Amazon page for more information.
·         How will customers get information into and out of the cloud environment?
Together IBM InfoSphere Information Server and Amazon EC2 provide a broad range of solutions for getting information into and out of the cloud.
o    Load or save information via files or file sets. Network file transfers can be used for transferring data to Amazon from their enterprise for small and medium-sized data sets (gigabytes). For larger data sets, the Amazon import/export services should be used.
o    Access databases such as Oracle, MySQL, DB2 and leverage flat files and XML documents.
o    Leverage Web services or other service protocols to access or push information in and out of the cloud environment.
·         What support is available for the InfoSphere Information Server production-ready AMI?
At this point you can seek community assistance on the AWS forum. Documentation provided with the AMI image provides useful information on how to maximize the use of the InfoSphere software in the EC2 environment. IBM offers a full curriculum of InfoSphere educational courses with multiple delivery options.
·         Where can I access the production-ready InfoSphere Information Server AMI and related information?
Links to IBM Documents on Datastage Cloud computing on Amazon
Getting Started

Cloud computing in Datastage and Qualitystage – Amazon web services

Please note: Amazon has done away with Datastage on the cloud. To install Datastage on the cloud, the software and the license needs to be provided to Amazon and installation will be completed. (Thanks to the comments from ganther). So, all the details may not still be valid.

IBM InfoSphere Information Server provides an integrated ETL and Data Quality development environment that delivers a trusted, scalable and “Pay as you go” data integration solution that helps organizations derive more value from the complex, heterogeneous


This 64 bit kernel version 2.6.16 based Linux OS AMI contains a pre-installed InfoSphere Information Server version 8.1 product. The installed components are InfoSphere DataStage and InfoSphere QualityStage. It also contains an instance of the DB2 9.5 database that will host the metadata repository. The services tier is hosted by IBM WebSphere Application Server version
There is a separate Windows based image with InfoSphere Information Server clients such as the web based Administration Console, the DataStage and QualityStage Designer, DataStage and QualityStage Administrator, and the DataStage and QualityStage Director.

Prerequisites and Resources:

Instance Type
The InfoSphere Information Server AMI on 64 bit kernel version 2.6.16 based Linux OS, can be used with large, Extra Large or High-CPU Extra Large instance type based on project requirement.
Software Included
  • 2.6.16 kernel level based Linux OS
  • InfoSphere DataStage 8.1 and InfoSphere QualityStage 8.1 (64 bit)
Resources and Documentation
  • Link to the Information Server Client AMI : (AWS to add link to the Client Catalog Entry)
  • Frequently Asked Questions for Information Server AMIs.
  • Get Started with Information Server AMIs.
  • Cloud Computing and Information Server
  • Information Server on DeveloperWorks
  • For a full listing of IBM products featured on AWS, please see the IBM developerWorks Cloud Computing Resource Center
  • Please visit for extensive information about the IBM Information Server suite.
  • If you are interested in purchasing a new IBM InfoSphere Information Server license, please visit the IBM Software Online Catalog.
  • InfoSphere DataStage product number 5724-Q36
  • InfoSphere QualityStage product number 5724-Q36
  • DB2 9.5 product number
  • The IBM License Information (LI) is presented in the AMI for your acceptance in English only. If you would prefer to view the LI in another language, please search for the program license agreement here using the program numbers above. You’ll then be asked to select your language.

Amazon Web Services


Amazon Web Services

Cost – Pay as you go, no long term commitment, no upfront costs
Elasticity – Deploy instantly and scale up or down as and when needed
Security managed by Amazon

AMI – Amazon Machine Image