September 13, 2013 Types of Hashed Files Hashed File is classified broadly into 2 types. a) Static– Sub divided into 17 types based on Primary Key Pattern. b) Dynamic– sub divided into 2 types I) Generic II) Specific. Dynamic files do not perform as well as, designed static file, but do perform better than a badly designed one. By Default Hashed file is “Dynamic – Type Random 30 D”
September 13, 2013 SQLProcedures Lock occurs DB2 for z/OS 9.x DB2 stored procedure with IBM DataStage Check if DS_NO_PROCESS_METADATA was set to 0 (causes catalog call) Set the project environment variable DS_NO_PROCESS_METADATA to 1 to stop the lock from occurring.
September 13, 2013 last day of current month in transformer use the function DaysInMonth. example: Date = inputlink.dateVar Year = YearFromDate(Date) Month = MonthFromDate(Date) Day = DaysInMonth(Date) EndOfMonth = DateFromComponents(Year,Month,Day)
September 12, 2013 Datastage training sessions Sessions 1-3: Data Stage Course Content : •Introduction about Data Stage •Difference between Server Jobs and Parallel Jobs •Difference between Pipeline Parallelisms •Partition techniques (Round Robin, Random, Hash, Entire, Same, Modules,Range,DB2,Auto) •Configuration File •Difference between SMP/MPD architecture •Data Stage Components (Server components/Client Components) •Package Installer Sessions 1-3: Data Stage Course Content Sessions 4&5: Data Stage Administrator : •Creating project, Editing project and Deleting project •Permissions to user •Apt Config file •Environment variable creation, permission Sessions 4&5: Data Stage Administrator Sessions 6&7: Data stage director : •Introduction to Data stage Director •Job status View •View logs •Scheduling •Batches Creation Sessions 6&7: Data stage director Sessions 8-15: Designer : •Introduction about Designer •Repository • Palatte •Types of Links •File Stages •Sequential File •Data set File Sessions 8-15: Designer •Lookup file set •Difference between Sequential file/Dataset/File set •Database stages •Dynamic RDBMS •Oracle Enterprise •ODBC Enterprise •Stored Procedure Sessions 16-23: Processing stages : •Change Capture (Caption) •Compare stage •Difference Stage •Aggregate Stage •Transformer Stage •Surrogate Generator Stage •Join Generator Stage •Merge Generator Stage Sessions 16-23: Processing stages •Lookup Generator Stage •Difference between join/Lookup/Merge •Difference between join/Lookup •Remove Duplicates •Switch •Pivot •Modify •Funnel Sessions 24-27: Debugging stage : Head Tail Pea Row Generator Column Generator Sample Job Parameters Sessions 24-27: Debugging stage Sessions 28-29: Manager : Introduction About Data stage Manager Importing the Job Exporting the Job Importing Table Definition Importing Flat File Definition Routines Sessions 28-29: Manager Session 30: Containers : Difference between Local Container and Shared Container Local Container Shared Container Session 30: Containers
September 12, 2013 The Data warehouse Bus Structure: The Bottom-Up Approach Ralph Kimball designed the data warehouse with the data marts connected to it with a bus Structure. The bus structure contained all the common elements that are used by data marts such as conformed dimensions, measures etc defined for the enterprise as a whole. He felt that by using these conformed elements, users can query all data marts together. This architecture makes the data warehouse more of a virtual reality than a physical reality. All data marts could be located in one server or could be located on different servers across the enterprise while the data warehouse would be a virtual entity being nothing more than a sum total of all the data marts. In this context even the cubes constructed by using OLAP tools could be considered as data marts. In both cases the shared dimensions can be used for the conformed dimensions. This model strikes a good balance between centralized and localized flexibility. Data marts can be delivered more quickly and shared data structures along the bus eliminate the repeated effort expended when building multiple data marts in a non-architected structure. The conformed dimensions along the bus fit very well with the shared dimension and virtual cube capabilities of Microsoft’s OLAP services. The bottom-up approach reverses the positions of the Data warehouse and the Data marts. Data marts are directly loaded with the data from the operational systems through the staging area. The ODS may or may not exist depending on the business requirements. However, this approach increases the complexity of process coordination. The standard procedure where data marts are refreshed from the ODS and not from the operational databases ensures data consolidation and hence it is generally recommended approach The data flow in the bottom up approach starts with extraction of data from operational databases into the staging area where it is processed and consolidated and then loaded into the ODS. The data in the ODS is appended to or replaced by the fresh data being loaded. After the ODS is refreshed the current data is once again extracted into the staging area and processed to fit into the Data mart structure. The data from the Data Mart, then is extracted to the staging area aggregated, summarized and so on and loaded into the Data Warehouse and made available to the end user for analysis.
September 12, 2013 Loading data into fact & dimensional tables Fact table – Table with Collection of Foreign Keys corresponding to the Primary Keys in Dimensional table. Consists of fields with numeric values. Dimension table – Table with Unique Primary Key. Load – Data should be first loaded into dimensional table. Based on the primary key values in dimensional table, the data should be loaded into Fact table.
September 12, 2013 Surrogate Key in Data warehousing Surrogate Key is a Primary Key for a Dimension table. Most importance of using it is it is independent of underlying database. i.e. Surrogate Key is not affected by the changes going on with a database.
September 12, 2013 Types of Dimensional Modeling Dimensional modeling is again sub divided into 2 types. a) Star Schema – Simple & Much Faster. Denormalized form. b) Snowflake Schema – Complex with more Granularity. More normalized form.
September 12, 2013 Server Components of Datastage Datastage consists of client and server components. There are three server components which are installed on a server: ü Repository ü DataStage Server ü DataStage Package Installer
September 12, 2013 DATASTAGE JOB TYPES There are three basic types of DataStage job: Server jobs: These are compiled and run on the DataStage server. A server job will connect to databases on other machines as necessary, extract data, process it, then write the data to the target data warehouse. Parallel jobs: These are available only if you have Enterprise Edition installed. Parallel jobs are compiled and run on a DataStage UNIX server, and can be run in parallel on SMP, MPP, and cluster systems. Mainframe jobs: These are available only if you have Enterprise MVS Edition installed. A mainframe job is compiled and run on the mainframe. Data extracted by such jobs is then loaded into the data warehouse.