Datastage | Datastage

May 9, 2014

Teradata Connector advantages

· Parallel MultiLoad capability including MPP configurations

· Parallel immediate lookups and writes

· Array support for better performance of immediate writes

· Reject link support for lookups and writes on DS Enterprise Edition

· Reject link support for bulk loads

· Cursor lookups (lookups that return more than one row)

· Restart capability for parallel bulk loads

· MultiLoad delete task support

· Support for BLOB and CLOB data types

· Reject link support for missing UPDATE or DELETE rows

· Error message and row count feedback for immediate lookups/writes

· Parallel synchronization table

May 4, 2014

Teradata Connector stage

The Teradata Connector includes the following features of the ODBC Connector:

• Source, target and lookup context

• Reject links

• Passing LOBs by reference

• Arrays

• SQL Builder

• Pre/post run statements

• Metadata import

It supports Teradata server versions V2R6.1 and V2R6.2 and Teradata client

TTU versions V8.1 and V8.2. The Connector uses CLIv2 API for immediate

operations (SELECT, INSERT, UPDATE, DELETE) and Parallel Transporter

Direct API (formerly TEL-API) for bulk load and bulk extract operations.

Parallel bulk load is supported through LOAD, UPDATE, and STREAM operators

in Parallel Transporter. This corresponds to the functionality provided by the

FastLoad, MultiLoad, and TPump Teradata utilities, respectively. When the

UPDATE operator is used it supports the option for deleting rows of data

(MultiLoad delete task).

Parallel bulk export is supported through the EXPORT operator in Parallel

Transporter. This corresponds to the functionality provided by the FastExport

Teradata utility. The Connector persists the bulk-load progress state and provides

sophisticated support for restarting the failed bulk-load operations. The

Connector uses a designated database table for synchronization of distributed

Connector instances in the parallel bulk-load.

May 2, 2014

Teradata Parallel Transporter

TPT is an object-oriented client suite that executes multiple instances of data extraction, transformations, and loading functions in a scalable, high-speed, parallel-processing environment.

Main features:

All-in-one! Single infrastructure for all loading/unloading needs using single scripting language.

Greatly reduces the amount of file I/O and significantly improve performance.

Push up and Push down features.

Provides unlimited symbolic substitution for the script language and application programming interface (API).

Combines functionality of TD FastLoad, MultiLoad, FastExport, TPump and API.

Respective modules or operators for the protocols of fastload, multiload, tpump and fastexport are named as Load, Update, Stream and Export.

Apart from the four operators, on API & ODBC front, there are operators like Selector, Inserter and more.

April 30, 2014

Multiload (as source) uses Fast export utility. Multiload (as target) is very efficient when you are doing maintenance activities on multiple large tables. At a time, Multiload stage can perform Inserts/Updates on upto 5 different tables in one pass. Work tables and error tables are created each time you perform an operation using Multiload also. It is automatically dropped once the job has run successfully. However if the job aborts, the work tables have to be manually dropped before the job is run again.

Teradata MultiLoad features:

• Target table can be populated regardless of it/they is/are empty or not.
• Allows inserts, updates, deletes, and upserts
• Loads into multiple target tables
• Allows non-unique secondary indexes automatically rebuilds them after loading

April 29, 2014

Why Data warehouse?

Why do we need a data warehouse?

Heterogeneities are everywhere

Enterprises tend to rely on different databases, files, external sources and even the internet and social media for all the information. This means:

Different interfaces
Different data representations
Duplicate and inconsistent information

Data Management in Large Enterprises

Vertical fragmentation of informational systems (vertical stove pipes)
Result of application (user)-driven development of operational systems

The goal of building a warehouse is to have a system that:

Collects and combines information
Provides integrated view, uniform user interface
Supports sharing

April 29, 2014

Introduction to Data Warehousing

April 28, 2014

Advantages and Disadvantages of Teradata API

Advantages:

· Better performance and speed (rows / sec) compared to generic ODBC stage.

· Support for TD client

· Simplified configuration on UNIX platforms.

· Adv. support for target table DDL (i.e. For create and drop)

· Native metadata import support

· Reject rows handling

Disadvantages:

· Does not support Non-ANSI SQL statements in stage generated SQL statements.

· Does not support byte data types.

· Does not generate TD version specific SQLs as stage generated SQL statements.

· Does not support ‘like’ in the user defined sql when used as lookup.

April 26, 2014

Teradata API

TD API has the option to run in parallel mode also. One can write/insert records using TD API in parallel mode and it gives much better performance than using the same in sequential mode. When used in parallel mode, you must set the partitioning property to HASH and select all the columns in the Primary Index of the target table for partitioning. You might get occasional blocks due to Teradata hash synonyms only, but this should be pretty minimal on most low volume loads. If you plan to implement restart strategy in a job using TD API for loading data, since there is no cleanup, it is advised to write as UPSERT. The API will generate upsert code automatically, but it will be 2 SQL statements, not atomic upsert. If you want ATOMIC upsert you will have to take the auto generated code and then modify it slightly to the ATOMIC upsert syntax.

April 24, 2014

Teradata Enterprise

TD EE (Source/Lookup) invokes FastExport which produces 1 answer set. All the parallel processing is performed on the TD Server, not in DataStage. FastExport cannot handle selects that use unique key constraint that would only return one row. You have to be careful when using unique key constraints. This is set from the TD side and not by Datastage. If you use this 2 or 3 times for source tables/lookups in a job and you run a couple of jobs at once, then you have invoked too many fastexports. For explicit exports or big data pulls, the TD EE can work fine, or you can use the MLOAD stage (which does Fastexport if it is used as source stage)

TD EE (Target) invokes FastLoad for bulk loading into TD Tables. Fastload does not support Secondary Indexes in Teradata. TD EE Stage will create a work table when an append to a Teradata table is selected for the job. This is because a FastLoad must load to an empty table which is impossible with an append operation. To get around this, DataStage FastLoads to a generated empty work table and then does insert into (select * from work table) on the database side. Append option will create an empty TD table with ALL fields but NO defaults. It generates this empty table from the Teradata metadata, NOT your DataStage job metadata. Also, unknown columns are replaced with null.

Operator	Equivalent Utility	Advantages	Disadvantages
Export	FastExport	Fastest export method.	Uses utility slot, No single- AMP SELECTs.
Load	FastLoad	Fastest load method.	Uses utility slot, INSERT only, Locks table, No views, No secondary indexes.
Update	MultiLoad	INSERT, UPDATE, DELETE, Views, Non-unique secondary indexes.	Uses utility slot, Locks table, No unique secondary indexes, Table inaccessible on abort.
Stream	TPump	INSERT, UPDATE, DELETE, Views, Secondary indexes, No utility slot, No table lock.	Slower than UPDATE operator.

Parallel Transport operators