May 9, 2014 Parallel Transport operators Operator Equivalent Utility Advantages Disadvantages Export FastExport Fastest export method. Uses utility slot, No single- AMP SELECTs. Load FastLoad Fastest load method. Uses utility slot, INSERT only, Locks table, No views, No secondary indexes. Update MultiLoad INSERT, UPDATE, DELETE, Views, Non-unique secondary indexes. Uses utility slot, Locks table, No unique secondary indexes, Table inaccessible on abort. Stream TPump INSERT, UPDATE, DELETE, Views, Secondary indexes, No utility slot, No table lock. Slower than UPDATE operator.
May 7, 2014 Teradata Connector advantages · Parallel MultiLoad capability including MPP configurations · Parallel immediate lookups and writes · Array support for better performance of immediate writes · Reject link support for lookups and writes on DS Enterprise Edition · Reject link support for bulk loads · Cursor lookups (lookups that return more than one row) · Restart capability for parallel bulk loads · MultiLoad delete task support · Support for BLOB and CLOB data types · Reject link support for missing UPDATE or DELETE rows · Error message and row count feedback for immediate lookups/writes · Parallel synchronization table
May 4, 2014 Teradata Connector stage The Teradata Connector includes the following features of the ODBC Connector: • Source, target and lookup context • Reject links • Passing LOBs by reference • Arrays • SQL Builder • Pre/post run statements • Metadata import It supports Teradata server versions V2R6.1 and V2R6.2 and Teradata client TTU versions V8.1 and V8.2. The Connector uses CLIv2 API for immediate operations (SELECT, INSERT, UPDATE, DELETE) and Parallel Transporter Direct API (formerly TEL-API) for bulk load and bulk extract operations. Parallel bulk load is supported through LOAD, UPDATE, and STREAM operators in Parallel Transporter. This corresponds to the functionality provided by the FastLoad, MultiLoad, and TPump Teradata utilities, respectively. When the UPDATE operator is used it supports the option for deleting rows of data (MultiLoad delete task). Parallel bulk export is supported through the EXPORT operator in Parallel Transporter. This corresponds to the functionality provided by the FastExport Teradata utility. The Connector persists the bulk-load progress state and provides sophisticated support for restarting the failed bulk-load operations. The Connector uses a designated database table for synchronization of distributed Connector instances in the parallel bulk-load.
May 2, 2014 Teradata Parallel Transporter TPT is an object-oriented client suite that executes multiple instances of data extraction, transformations, and loading functions in a scalable, high-speed, parallel-processing environment. Main features: All-in-one! Single infrastructure for all loading/unloading needs using single scripting language. Greatly reduces the amount of file I/O and significantly improve performance. Push up and Push down features. Provides unlimited symbolic substitution for the script language and application programming interface (API). Combines functionality of TD FastLoad, MultiLoad, FastExport, TPump and API. Respective modules or operators for the protocols of fastload, multiload, tpump and fastexport are named as Load, Update, Stream and Export. Apart from the four operators, on API & ODBC front, there are operators like Selector, Inserter and more.
April 30, 2014 Teradata MultiLoad stage Multiload (as source) uses Fast export utility. Multiload (as target) is very efficient when you are doing maintenance activities on multiple large tables. At a time, Multiload stage can perform Inserts/Updates on upto 5 different tables in one pass. Work tables and error tables are created each time you perform an operation using Multiload also. It is automatically dropped once the job has run successfully. However if the job aborts, the work tables have to be manually dropped before the job is run again. Teradata MultiLoad features: • Target table can be populated regardless of it/they is/are empty or not. • Allows inserts, updates, deletes, and upserts • Loads into multiple target tables • Allows non-unique secondary indexes automatically rebuilds them after loading
April 29, 2014 Why Data warehouse? Why do we need a data warehouse? Heterogeneities are everywhereEnterprises tend to rely on different databases, files, external sources and even the internet and social media for all the information. This means: Different interfacesDifferent data representationsDuplicate and inconsistent information Data Management in Large Enterprises Vertical fragmentation of informational systems (vertical stove pipes)Result of application (user)-driven development of operational systems The goal of building a warehouse is to have a system that: Collects and combines informationProvides integrated view, uniform user interfaceSupports sharing
April 29, 2014 Introduction to Data Warehousing What is a Data warehouse? Definition of Data warehousing
April 28, 2014 Advantages and Disadvantages of Teradata API Advantages: · Better performance and speed (rows / sec) compared to generic ODBC stage. · Support for TD client · Simplified configuration on UNIX platforms. · Adv. support for target table DDL (i.e. For create and drop) · Native metadata import support · Reject rows handling Disadvantages: · Does not support Non-ANSI SQL statements in stage generated SQL statements. · Does not support byte data types. · Does not generate TD version specific SQLs as stage generated SQL statements. · Does not support ‘like’ in the user defined sql when used as lookup.
April 26, 2014 Teradata API TD API has the option to run in parallel mode also. One can write/insert records using TD API in parallel mode and it gives much better performance than using the same in sequential mode. When used in parallel mode, you must set the partitioning property to HASH and select all the columns in the Primary Index of the target table for partitioning. You might get occasional blocks due to Teradata hash synonyms only, but this should be pretty minimal on most low volume loads. If you plan to implement restart strategy in a job using TD API for loading data, since there is no cleanup, it is advised to write as UPSERT. The API will generate upsert code automatically, but it will be 2 SQL statements, not atomic upsert. If you want ATOMIC upsert you will have to take the auto generated code and then modify it slightly to the ATOMIC upsert syntax.
April 24, 2014 Teradata Enterprise TD EE (Source/Lookup) invokes FastExport which produces 1 answer set. All the parallel processing is performed on the TD Server, not in DataStage. FastExport cannot handle selects that use unique key constraint that would only return one row. You have to be careful when using unique key constraints. This is set from the TD side and not by Datastage. If you use this 2 or 3 times for source tables/lookups in a job and you run a couple of jobs at once, then you have invoked too many fastexports. For explicit exports or big data pulls, the TD EE can work fine, or you can use the MLOAD stage (which does Fastexport if it is used as source stage) TD EE (Target) invokes FastLoad for bulk loading into TD Tables. Fastload does not support Secondary Indexes in Teradata. TD EE Stage will create a work table when an append to a Teradata table is selected for the job. This is because a FastLoad must load to an empty table which is impossible with an append operation. To get around this, DataStage FastLoads to a generated empty work table and then does insert into (select * from work table) on the database side. Append option will create an empty TD table with ALL fields but NO defaults. It generates this empty table from the Teradata metadata, NOT your DataStage job metadata. Also, unknown columns are replaced with null.