Cloud computing in Datastage and Qualitystage

Is InfoSphere Information Server cloud enabled?
Yes, InfoSphere Information Server’s DataStage and QualityStage offerings are enabled for deployment as a cloud-deployable data integration solution. Due to the versatility of the InfoSphere DataStage and QualityStage platform support, connectivity (including and standards used, these offerings can be deployed in a similar manner to how they are deployed on-premise within an enterprise today.
·         What are the benefits of running InfoSphere Information Server in the cloud?
IBM InfoSphere Information Server in the cloud opens up multiple new solutions to traditional and cloud-enabled information challenges. Coupled with it’s the ability to leverage a pay-as-you-go pricing model with the rapid deployment paradigm and massive scalability InfoSphere Information Server, the following off-premise solutions can be quickly delivered:
o    Enable systems integrators to provide data consolidation services to support complex application rationalization and migration projects lasting 3 to 12 months.
o    Flexible development capacity for existing clients using InfoSphere Information Server.
o    On-going data preparation for SaaS applications and business intelligence solutions.
This reduces the upfront time and expense involved with setting up hardware infrastructure and software licenses for projects lasting 3-12 months. With its collaborative, model-driven design environment coupled with massive scalability of a parallel processing architecture, it is ideally suited to rapid deployment and ensuring maximum throughput of trusted data per hour.
·         What kind of cloud environments can InfoSphere be deployed in, or planned?
InfoSphere Information Server can be deployed in both private and public cloud scenarios. This announcement highlights the availability of the first InfoSphere offering on a public cloud provider, specifically, Amazon Elastic Compute Cloud (Amazon EC2), a hosting service provided by Amazon Web Services. Clients building their own private clouds or using other cloud providers can leverage their existing InfoSphere Information Server licenses provided they adhere to the license terms and prepared their own machine images.
·         What is Amazon Web Services (AWS)?
AWS delivers a set of integrated services that form a computing platform “in the cloud”. Learn more about Amazon Web Services and the IBM offerings on AWS.
·         What deployment models are available for InfoSphere Information Server on Amazon EC2?
You can deploy InfoSphere Information Server’s DataStage and QualityStage on Amazon EC2 one of two ways:
o    Create your own InfoSphere Information Server-based Amazon Machine Images (AMI)s by using licenses that you already own.
o    Use the pre-built InfoSphere Information Server AMIs containing production-ready InfoSphere DataStage and InfoSphere QualityStage generated by IBM. There are hourly usage charges for the IBM generated AMIs including InfoSphere DataStage and QualityStage software licensing costs.
·         What does InfoSphere Information Server uniquely deliver on Amazon EC2?
On Amazon EC2, the pre-built InfoSphere Information Server AMI delivers an integrated ETL and Data Quality development environment that enables developers to cleanse, transform and move data with same tool using the same metadata. InfoSphere Information Server has a dynamic parallel execution engine that provides a design, deploy anywhere capability that dynamically and seamlessly scales up-or-down based upon hardware configuration thereby simplifying deployment and administration. Lastly, InfoSphere QualityStage is a probabilistic matching engine that ensures higher quality trusted data when linking any data domain across multiple, complex data sources.
·         How does InfoSphere Information Server on Amazon save time, or money, or worry?
This offering is primarily aimed at systems integrator partners, who have an enterprise application consolidation and migration service lines (e.g., SAP) and have skilled resources trained on InfoSphere Information Server. These partners are actively helping clients to use InfoSphere Information Server to consolidate, cleanse, and load the right trusted data into their target enterprise application of choice. The new “pay as you go” pricing offered by InfoSphere Information Server on EC2 is ideal for short-term projects lasting 3-12 months. Ordinarily a client would need to purchase a perpetual data integration license and make a hardware investment in order to support a short-term application consolidation or migration project.
Secondarily, this offering can help enterprise IT departments who have an existing investment in InfoSphere Information Server skills and have the need for additional test and development capacity. In addition, it helps them to achieve demonstrable ROI before committing to the capital investment in additional hardware and data integration software licenses to support short-term enterprise projects, or cloud integration scenarios.
·         Can I create my own InfoSphere Information Server-based AMIs for use on Amazon EC2?
Yes, in addition to being able to use the InfoSphere DataStage and QualityStage AMI generated by IBM, you can create your own InfoSphere Information Server-based AMIs for EC2. If you create your own InfoSphere Information Server-based AMIs, you are responsible for ensuring you own licenses for the software (such as InfoSphere Information Server and the operating system) running on the AMI.
·         Does IBM provide any InfoSphere AMIs for use on Amazon EC2?
Yes, IBM has partnered with AWS to make available a pre-bundled InfoSphere Information Server AMI containing InfoSphere DataStage and InfoSphere QualityStage for production use. This AMI consists of InfoSphere DataStage and QualityStage version 8.1 pre-installed on Novell SUSE Linux Enterprise Server version 10 (SLES 10 SP2) for the server and Windows 2003 Server for the InfoSphere Developer client licenses.
·         How are the production-ready InfoSphere Information Server AMI priced?
The InfoSphere Information Server AMI available from AWS has an hourly pay-as-you-use pricing that depends on the instance size you run the InfoSphere AMIs on. You can also incur charges for additional AWS services you use with InfoSphere. See the Amazon page for more information.
·         How will customers get information into and out of the cloud environment?
Together IBM InfoSphere Information Server and Amazon EC2 provide a broad range of solutions for getting information into and out of the cloud.
o    Load or save information via files or file sets. Network file transfers can be used for transferring data to Amazon from their enterprise for small and medium-sized data sets (gigabytes). For larger data sets, the Amazon import/export services should be used.
o    Access databases such as Oracle, MySQL, DB2 and leverage flat files and XML documents.
o    Leverage Web services or other service protocols to access or push information in and out of the cloud environment.
·         What support is available for the InfoSphere Information Server production-ready AMI?
At this point you can seek community assistance on the AWS forum. Documentation provided with the AMI image provides useful information on how to maximize the use of the InfoSphere software in the EC2 environment. IBM offers a full curriculum of InfoSphere educational courses with multiple delivery options.
·         Where can I access the production-ready InfoSphere Information Server AMI and related information?
Links to IBM Documents on Datastage Cloud computing on Amazon
Getting Started

6 thoughts on “Cloud computing in Datastage and Qualitystage

  1. miller says

    DataStage: What is Transformer Stage?

    Period properties Advanced tab:

    In the Advanced tab, the following options available to place.

    Execution Mode:The period can execute in sequential manner or parallel manner. In concurrent mode, the information is processed by the nodes that were available as defined in the Configuration file, and by any node restraints specified on the Advanced tab. In sequential way the information is processed by the conductor node.

    Combinability mode:That is Auto by default option, which enables the operators to ensure they run in precisely the same process if it is sensible for such stage that underlie parallel stages to be combined by WebSphere DataStage Online Training.

    Preserve partitioning: You may also select Place or Clear. The period will request that the partitioning is preserved by the next stage as is, should you select Set.
    Choose this choice to constrain concurrent execution to pools or the node pool or resource pool or pools set in the grid. The grid lets you produce choices from drop down lists.
    Choose this choice to constrain execution that is concurrent to the nodes in a node map that is defined. It is possible to define a node map into the text box by typing node numbers or by selecting nodes from there and clicking the browse button to open the Available Nodes dialog box. You’re efficiently defining a new node pool for this stage (to any node pools defined in the Configuration file).

    Properties Surrogate Key Tab:
    Select Source sort discipline as Flat File or DBSequence
    Transformer period: Input Signal page
    Partitioning tab:
    The Partitioning tab allows you to specify details about how when input signal to the Transformer stage, the incoming data partitioned or accumulated. It also lets you specify the data ought to be sorted on input.

    By default the Transformer phase will try to preserve partitioning of incoming data, or use its partitioning method according to what the previous period in the task dictates.

    The Partitioning tab also allows you to specify that data arriving on the input link should be sorted. The sort is definitely carried out within information partitions. If incoming data is being partitioned by the period the kind occurs after the partitioning. If the stage is gathering data, the sort occurs before the collection. The access to sorting is contingent on the partitioning procedure selected.
    Perform Sort. Choose this to specify that data coming in on the link needs to be sorted. Choose the column or columns to sort on from the Available list.

    Preserves Sort Order:
    Choose this if you realize that the rows being input signal to the Transformer period have already been sorted and you also would like to maintain the sort order.

  2. miller says

    could you explain APT_RECORD_COUNTS setting factors.
    Additionally OSH_PRINT_SCHEMAS
    What is the APT_DUMP_SCORE , APT_CombinedOperatorController

    I dont understand UNIX commands:
    “cat –tev” or “od –xc”

  3. taruni nandu says

    Thanks for the update…
    The post is good to read and useful for the people to meet their success. We IT Hub Online Training supplying Datastage Online Training

  4. jessycandy says

    Thanks for the best topic. Very useful information.
    We IT hub Online Training are good in giving the datastage Training

Leave a Reply

Your email address will not be published.