Teradata MultiLoad stage

Multiload (as source) uses Fast export utility. Multiload (as target) is very efficient when you are doing maintenance activities on multiple large tables. At a time, Multiload stage can perform Inserts/Updates on upto 5 different tables in one pass. Work tables and error tables are created each time you perform an operation using Multiload also. It is automatically dropped once the job has run successfully. However if the job aborts, the work tables have to be manually dropped before the job is run again.

Teradata MultiLoad features:

• Target table can be populated regardless of it/they is/are empty or not.
• Allows inserts, updates, deletes, and upserts
• Loads into multiple target tables
• Allows non-unique secondary indexes automatically rebuilds them after loading

Why Data warehouse?

Why do we need a data warehouse?

Heterogeneities are everywhere

Enterprises tend to rely on different databases, files, external sources and even the internet and social media for all the information. This means:

Different interfaces
Different data representations
Duplicate and inconsistent information

 
Data Management in Large Enterprises
Vertical fragmentation of informational systems (vertical stove pipes)
Result of application (user)-driven development of operational systems

The goal of building a warehouse is to have a system that:
 
Collects and combines information
Provides integrated view, uniform user interface
Supports sharing



Advantages and Disadvantages of Teradata API

Advantages:

·       Better performance and speed (rows / sec) compared to generic ODBC stage.

·       Support for TD client

·       Simplified configuration on UNIX platforms.

·       Adv. support for target table DDL (i.e. For create and drop)

·       Native metadata import support

·       Reject rows handling

Disadvantages:

·       Does not support Non-ANSI SQL statements in stage generated SQL statements.

·       Does not support byte data types.

·       Does not generate TD version specific SQLs as stage generated SQL statements.

·       Does not support ‘like’ in the user defined sql when used as lookup.

Teradata API

TD API has the option to run in parallel mode also. One can write/insert records using TD API in parallel mode and it gives much better performance than using the same in sequential mode. When used in parallel mode, you must set the partitioning property to HASH and select all the columns in the Primary Index of the target table for partitioning. You might get occasional blocks due to Teradata hash synonyms only, but this should be pretty minimal on most low volume loads. If you plan to implement restart strategy in a job using TD API for loading data, since there is no cleanup, it is advised to write as UPSERT. The API will generate upsert code automatically, but it will be 2 SQL statements, not atomic upsert. If you want ATOMIC upsert you will have to take the auto generated code and then modify it slightly to the ATOMIC upsert syntax.

Teradata Enterprise

TD EE (Source/Lookup) invokes FastExport which produces 1 answer set. All the parallel processing is performed on the TD Server, not in DataStage. FastExport cannot handle selects that use unique key constraint that would only return one row. You have to be careful when using unique key constraints. This is set from the TD side and not by Datastage. If you use this 2 or 3 times for source tables/lookups in a job and you run a couple of jobs at once, then you have invoked too many fastexports. For explicit exports or big data pulls, the TD EE can work fine, or you can use the MLOAD stage (which does Fastexport if it is used as source stage)

 

TD EE (Target) invokes FastLoad for bulk loading into TD Tables. Fastload does not support Secondary Indexes in Teradata. TD EE Stage will create a work table when an append to a Teradata table is selected for the job. This is because a FastLoad must load to an empty table which is impossible with an append operation. To get around this, DataStage FastLoads to a generated empty work table and then does insert into (select * from work table) on the database side. Append option will create an empty TD table with ALL fields but NO defaults. It generates this empty table from the Teradata metadata, NOT your DataStage job metadata. Also, unknown columns are replaced with null.

Target Teradata stages

Teradata stage
Stage type
Usage guidelines
Parallel
read
Teradata
utility limit
Teradata
Enterprise
Native
Parallel
1)Writing a large number of rows in parallel
2)Supports OPEN and CLOSE commands
3)Limited to INSERT (new table) or
APPEND (existing table)
4)Subject to the limits of Teradata FastLoad
(but also supports APPEND)
5) Locks the target table in exclusive mode
Yes
applies
Teradata
MultiLoad
(MultiLoad utility)
Plug-In
1)Insert, Update, Delete, Upsert of moderate
data volumes
2)Locks the target tables in exclusive mode
No
applies
Teradata
MultiLoad
(MultiLoad utility)
Plug-In
1)Insert, Update, Delete, Upsert of moderate
data volumes
2)Locks the target tables in exclusive mode
No
applies
Teradata
MultiLoad
(TPump utility)
Plug-In
1) Insert, Update, Delete, Upsert of small
volumes of data in a large database
2)Does not lock the target tables
3)Must not be run in parallel, because each
node and use counts toward system-wide
Teradata utility limit
No
applies
Teradata API
Plug-In
1) Insert, Update, Delete, Upsert of small
volumes of data
2)Allows concurrent writes (does not lock
target)
3)Slower than TPump for equivalent
operations
Yes
none

Source Teradata stages

Teradata stage
Stage type
Usage guidelines
Parallel
read
Teradata
utility limit
Teradata
Enterprise
Native
Parallel
1)Reading a large number of rows in
parallel
2)Supports OPEN and CLOSE
commands
3)Subject to the limits of Teradata
FastExport
Yes
applies
Teradata API
Plug-In
Reading a small number of rows
sequentially
No
none

Comparison with existing Teradata stages

The following list details comparisons with Teradata stages:

      Limited support for stored procedures and macros, but the Stored Procedure plug-in is still better suited for it.

      No more utilities, named pipes, control scripts, or report files.

      Interface with the Parallel Transporter is through a direct call-level API.

      Error messages are reported in the DataStage Director log.

      MultiLoad plug-in jobs that use advanced custom script features cannot be migrated to use the Teradata Connector.
Number of players is determined by the PX Engine config file.