[go: up one dir, main page]

0% found this document useful (0 votes)
15 views16 pages

SAP Tutorial 1

The document provides instructions for executing the MtrlDim job and utilizing the XML_Pipeline transform to efficiently extract data from an XML file for populating the Material Dimension table. It outlines the steps for setting up a job, configuring transforms, and validating the data flow, as well as a summary of the tutorial's objectives. Additionally, it introduces the next task of populating the Sales Fact table from multiple relational tables.

Uploaded by

Cloosy Duneem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views16 pages

SAP Tutorial 1

The document provides instructions for executing the MtrlDim job and utilizing the XML_Pipeline transform to efficiently extract data from an XML file for populating the Material Dimension table. It outlines the steps for setting up a job, configuring transforms, and validating the data flow, as well as a summary of the tutorial's objectives. Additionally, it introduces the next task of populating the Sales Fact table from multiple relational tables.

Uploaded by

Cloosy Duneem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

7.

6 Executing the MtrlDim job

After you save the MtrlDim data flow, execute the MtrlDim job.

1. In the project area, right-click JOB_MtrlDim and click Execute.


2. If prompted to save your work, click OK.
3. In the Execution Properties dialog box, click OK.
4. After the job completes, ensure there are no error or warning messages.
5. To view the captured sample data, in the project area select the data flow to open it in the workspace. Click
the magnifying glass on the target MTRL.DIM table to view its six rows.

Or, use a query tool in your RDBMS to check the contents of the MTRL.DIM table.

The next section describes an alternate way to capture XML data.

Task overview: Populate the Material Dimension from an XML File [page 68]

Previous task: Validating that the MtrlDim data flow has been constructed properly [page 75]

Next task: Leveraging the XML_Pipeline [page 76]

7.7 Leveraging the XML_Pipeline

The main purpose of the XML_Pipeline transform is to extract parts of the XML file.

When you extract data from an XML file to load into a target data warehouse, you usually obtain only parts of
the XML file. The Query transform does partial extraction (as the previous exercise shows), and it does much
more because it has many of the clauses of a SQL SELECT statement.

Because the XML_Pipeline transform focuses on partial extraction, it utilizes memory more efficiently and
performs better than the Query transform for this purpose.

● The XML_Pipeline transform uses less memory because it processes each instance of a repeatable
schema within the XML file, rather than building the whole XML structure first.
● The XML_Pipeline transform continually releases and reuses memory to steadily flow XML data through
the transform.

You can use the XML_Pipeline transform as an alternate way to build the Material dimension table. The data
flow components for this alternate way will consist of the following objects:

● The source XML file


● An XML_Pipeline transform to obtain a repeatable portion of the nested source schema
● A query to map the output of the XML_Pipeline transform to the flat target schema
● The target table into which the material dimension data loads

Setting up a job and data flow that uses the XML_Pipeline transform [page 77]
In this exercise, you will achieve the same outcome as in the previous exercise, but you use the XML
Pipeline transform for more efficient configuration and processing.

Tutorial
76 PUBLIC Populate the Material Dimension from an XML File
Configuring the XML_Pipeline and Query_Pipeline transforms [page 78]
Open the transform and the query to map input columns to output columns.

Task overview: Populate the Material Dimension from an XML File [page 68]

Previous task: Executing the MtrlDim job [page 76]

Next: Summary and what to do next [page 79]

7.7.1 Setting up a job and data flow that uses the


XML_Pipeline transform

In this exercise, you will achieve the same outcome as in the previous exercise, but you use the XML Pipeline
transform for more efficient configuration and processing.

1. Add a new job and name it JOB_Mtrl_Pipe.


2. Add a new work flow job and name it WF_Mtrl_Pipe.
3. Add a data flow to the work flow definition and name it DF_Mtrl_Pipe.
4. Click the name of the data flow to open the data flow definition.
5. In the object library on the Formats tab, expand Nested Schemas.
6. Drag the Mtrl_List file into the DF_Mtrl_Pipe definition workspace, drop it on the left side, and click Make
Source.
7. Click the Mtrl_List name in the workspace to configure it.
8. On the Source tab, ensure XML is selected.
9. Click the down arrow in File and click Select file.
10. Go to <LINK_DIR>\Tutorial Files\ and slect mtrl_list.xml. Click Open to import the mtrl.xml file.
11. Select Enable Validation to enable comparison of the incoming data to the stored DTD format.
12. Click the back arrow to return to the data flow.
13. Select DF_Mtrl_Pipe to open the data flow workspace.
14. In the Local Object Library, open the Transforms tab and expand the Data Integrator transforms.
15. Drag the XML_Pipeline transform into the DF_Mtrl_Pipe definition workspace, and drop it to the right of
Mtrl_List source.
16. In the Transforms tab of the Local Object Library, expand the Platform transforms.
17. Drag the Query transform into the workspace, drop it to the right of XML_Pipeline , and name the query
Query_Pipeline.
18. Open the Datastores tab in the Local Object Library and expand the table node under Target_DS.
19. Drag and drop the MTRL_DIM table to the workspace and click Make Target.
20.Connect the icons to indicate the flow of data from the source XML file through the XML_Pipeline and
Query_Pipeline transforms to the target table.

Tutorial
Populate the Material Dimension from an XML File PUBLIC 77
21. Save all files.

 Note

Remember to periodically close the tabs in the workspace area.

7.7.2 Configuring the XML_Pipeline and Query_Pipeline


transforms

Open the transform and the query to map input columns to output columns.

Set up the job as instructed in Setting up a job and data flow that uses the XML_Pipeline transform [page 77].

 Note

Unlike the qryunnest Query, the XML_Pipeline transform allows you to map a nested column directly to a
flat target.

1. Click XML_Pipeline in the Project Aea to open the transform editor.

The Schema In pane shows the nested structure of the source file.
2. Multiselect the following columns from the Schema In pane and drag them to the XML_Pipeline
transform Schema Out pane.
○ MTRL_ID
○ MTRL_TYPE
○ IND_SECTOR
○ MRTL_GROUP
○ SHORT_TEXT
3. Click the back arrow from the icon menu.
4. Click Query_Pipeline to open the query editor.
5. Map each Schema In column to the corresponding columns in the Schema Out pane.
When you drag each column from the Schema In pane to the Schema Out pane, the Type in the Schema
Out pane remains the same even though the input fields have the type varchar(1024).
Optional. For an experiment, remap one of the fields. After you drop the field into the Schema Out pane, a
popup menu appears. Choose Remap Column. The Remap Column option preserves the name and data
type in Schema Out.
6. In the Project Area, click the MTRL_DIM target table to open the target editor.

Tutorial
78 PUBLIC Populate the Material Dimension from an XML File
7. Open the Options tab in the lower pane and select Delete data from table before loading.

This option deletes existing data in the table before loading new data. If you do not select this option, the
software appends data to the existing table.
8. In the project area, click DF_MTRL_Pipe to return to the data flow.
9. Select the Validate icon from the menu.

The Warnings tab opens. The warnings indicate that each column will be converted to the data type in the
Schema Out pane.

There should not be any errors. If there are errors, you may have missed a step. Fix the errors and try to
validate again.
10. In the Project Area, right-click JOB_Mtrl_Pipe and click Execute.
11. If prompted to save your work, click OK.
12. Accept the default settings in Execution Properties and click OK.
13. After the job completes, ensure that there are no error or warning messages.
14. To view the captured sample data, in the project area select the data flow to open it in the workspace. Click
the magnifying glass on the target MTRL.DIM table to view the six rows of data.
Alternately, use a query tool in your RDBMS to check the contents of the MTRL.DIM table.

7.8 Summary and what to do next

In this section you learned two ways to process an XML file: With a Query transform and with the XML Pipeline
transform.

We walked you through using a Query transform to flatten a nested schema. And we worked with a data type
definition (DTD) file for a source XML file.

If you are unclear about how Data Services processes XML files, and about nested data, see the Designer Guide
for more details.

At this point in the tutorial you have populated the following four tables in the sample data warehouse:

● Sales organization dimension from a flat file


● Time dimension using the Date Generation transform
● Customer dimension from a relational table
● Material dimension from a nested XML file

In the next section you will populate the sales fact table from more than one source.

Parent topic: Populate the Material Dimension from an XML File [page 68]

Previous task: Leveraging the XML_Pipeline [page 76]

Tutorial
Populate the Material Dimension from an XML File PUBLIC 79
Related Information

Designer Guide: Nested Data


Reference Guide: Transforms

Tutorial
80 PUBLIC Populate the Material Dimension from an XML File
8 Populate the Sales Fact Table from
Multiple Relational Tables

In this exercise you learn about using joins and functions to populate the Sales Fact table from the Sales star
schema with data from multiple relational tables.

The exercise joins data from two source tables and loads it into a target table.

1. Adding the SalesFact job, work flow, and data flow [page 82]
Use the basic skills that you have learned in earlier exercises to set up a new job named JOB_SalesFact.
2. Creating the SalesFact data flow [page 82]
Add objects to DF_SalesFact and connect the objects to set the flow of data.
3. Defining the details of the Query transform [page 83]
Set up a table join, a filter, and a Lookup expression in the query transform, and then map columns
from the Schema In columns to the Schema Out columns.
4. Using a lookup_ext function for order status [page 85]
Create a Lookup expression to select a column from the ODS_DELIVERY table to include in the
SALES_FACT output table based on two conditions.
5. Validating the SalesFact data flow [page 89]
Use the skills you obtained from previous exercises to validate the data flow.
6. Executing the SalesFact job [page 89]
After you have performed the validation step and fixed any errors, the SalesFact job should execute
without errors.
7. Viewing Impact and Lineage Analysis for the SALES_FACT target table [page 91]
Use the metadata reporting tool to browse reports about metadata associated with the SalesFact job.
The metadata reporting tool is a Web-based application.
8. Summary and what to do next [page 93]

Tutorial
Populate the Sales Fact Table from Multiple Relational Tables PUBLIC 81
In this section you joined two source tables using a filter, and you used a Lookup expression to add a
column from a related table that was not one of the source tables.

Related Information

Reference Guide: Transforms, Platform transforms, Query transform, Joins in the Query transform
Designer Guide: Nested data, Operations on nested data

8.1 Adding the SalesFact job, work flow, and data flow

Use the basic skills that you have learned in earlier exercises to set up a new job named JOB_SalesFact.

1. Add a new job to the Class_Exercises project and name it JOB_SalesFact.


2. Add a workflow to JOB_SalesFact and name it WF_SalesFact.
3. Add a data flow to the workflow definition WF_SalesFact and name it DF_SalesFact.

Task overview: Populate the Sales Fact Table from Multiple Relational Tables [page 81]

Next task: Creating the SalesFact data flow [page 82]

Related Information

Adding a new job [page 39]


Adding a workflow [page 39]
Adding a data flow [page 40]

8.2 Creating the SalesFact data flow

Add objects to DF_SalesFact and connect the objects to set the flow of data.

Optional. The data flow has two sources. To make the workspace look more organized, change the appearance
of the data flow by following these steps:

1. Select Tools Options .


2. Expand the Designer node and click Graphics.
3. Click the dropdown arrow in the Workspace Flow Type option and select Data Flow.
4. Select Horizontal/Vertical for the Line Type option. Click OK.

Tutorial
82 PUBLIC Populate the Sales Fact Table from Multiple Relational Tables
Follow these steps to set up the data flow:

1. Click the DF_SalesFact data flow in the Project Area to open the data flow workspace.
2. Open the Datastores tab in the Local Object Library and expand the Tables node under ODS_DS.
3. Move the ODS_SALESITEM table to the left side of the workspace using drag and drop. Click Make Source.
4. Move the ODS_SALESORDER table to the left of the ODS_SALESITEM table in the workspace using drag and
drop. Click Make Source.

5. Add a query to the data flow from the tool palette .


6. Open the Datastores tab in the Local Object Library and expand the Tables node under Target_DS.
7. Move the SALES_FACT table to the workspace using drag and drop. Select Make Target.
8. In the data flow workspace, connect the icons to indicate the flow of data as shown in the following
diagram.

9. Save your work.

Task overview: Populate the Sales Fact Table from Multiple Relational Tables [page 81]

Previous task: Adding the SalesFact job, work flow, and data flow [page 82]

Next task: Defining the details of the Query transform [page 83]

8.3 Defining the details of the Query transform

Set up a table join, a filter, and a Lookup expression in the query transform, and then map columns from the
Schema In columns to the Schema Out columns.

1. Expand DF_SalesFact in the Project Area and click the query to open the editor.
2. Open the FROM tab in the options pane.

Tutorial
Populate the Sales Fact Table from Multiple Relational Tables PUBLIC 83
3. Click the dropdown arrow under the Left column heading in the Join pairs area and select
ODS_SALESORDER.

The ODS_SALESORDER table is now the left portion of the join.


4. Select the dropdown arrow under the Right column heading and select ODS_SALESITEM.

The ODS_SALESITEM is now the right portion of the join. Leave the Join Type set to Inner join.

The software defines the relationship between the SalesItem and SalesOrder tables by using the key
column Sales_Order_Number. The inner join type generates a join expression based on primary and foreign
keys and column names. The SALES_ORDER_NUMBER column is the primary key in ODS_SLAESORDER
table and the foreign key in the ODS_SALESITEM table. The relationship states that the fields in each table
should match before the record is joined.

The resulting relationship appears in the From clause text box:

SALESITEM.SALES_ORDER_NUMBER = SALESORDER.SALES_ORDER_NUMBER

5. Click the elipses icon next to the Right table name ODS_SALESITEM.

The Smart Editor opens.


6. Place your cursor at the end of the first line and click Enter .
7. Type the following two lines, using the casing as shown:

AND ODS_SALESORDER.ORDER_DATE >= to_date('2007.01.01','yyyy.mm.dd')


AND ODS_SALESORDER.ORDER_DATE <= to_date('2007.12.31','yyyy.mm.dd')

These lines filter the sales orders by date. All orders that are from January 1, 2007 up to and including
December 31, 2007 are moved into the target.

 Tip

As you type the function names, the Smart Editor prompts you with options. Either ignore the prompts
and keep typing or select an option that is highlighted and press Enter . You can alternately double-
click the prompt to accept it.

8. Click OK.

The join conditions that you added in the Smart Editor appear in the Join Condition column and in the
FROM Clause area.
9. In the Schema In and Schema Out panes, map the following source columns to output columns using drag
and drop.

Source table Source column Target column Column description

SALESITEM SALES_ORDER_NUMBER SLS_DOC_NO Sales order number

SALES_LINE_ITEM_ID SLS_DOC_LINE_NO Sales line item num­


ber

MTRL_ID MATERIAL_NO Material ID

PRICE NET_VALUE Order item price

SALESORDER CUST_ID CUST_ID Customer ID

Tutorial
84 PUBLIC Populate the Sales Fact Table from Multiple Relational Tables
Source table Source column Target column Column description

ORDER_DATE SLS_DOC_DATE Order date

10. Keep the Query Editor open for the next task.

Task overview: Populate the Sales Fact Table from Multiple Relational Tables [page 81]

Previous task: Creating the SalesFact data flow [page 82]

Next task: Using a lookup_ext function for order status [page 85]

8.4 Using a lookup_ext function for order status

Create a Lookup expression to select a column from the ODS_DELIVERY table to include in the SALES_FACT
output table based on two conditions.

Continue configuring the Query transform by setting up a lookup_ext function.

1. Select the ORD_STATUS column in the Schema Out pane.

You haven't mapped the ORD_STATUS column.


2. In the Mapping tab in the lower pane, click Functions....

The Select Function editor opens.


3. In the Function categories column at left, click Lookup Functions.
4. In the Function name column at right, select lookup_ext.
5. Click Next.

The dialog changes with options to define the LOOKUP_EXT() function.


6. The following table contains instructions for completing the LOOKUP_EXT() function.

 Note

We use the following two methods to add expressions in the Select Parameters dialog box:
○ Drag column names into the target columns under Condition, Output, and Order by sections.
○ Click the ellipses button to open the Smart Editor.

Tutorial
Populate the Sales Fact Table from Multiple Relational Tables PUBLIC 85
Lookup_ext option settings

Option Procedure

Lookup table 1. Select the Lookup table dropdown arrow. The Input
Parameter dialog box opens.
 Note 2. Select Datastore from the Look in dropdown arrow.
The lookup table is where the LOOKUP_EXT() func­ 3. Select ODS_DS and click OK.
tion obtains the value to put into the ORD_STATUS 4. Select the ODS_DELIVERY table and click OK.
column.

Leave Cache spec set to PRE_LOAD_CACHE.

Available parameters 1. Expand the Lookup table node at left and then expand
the ODS_DELIVERY node to expose the columns in
You choose parameters to build conditions from the tables
the table.
that you define here.
2. Expand the Input Schema node and then expand the
ODS_SALESITEM node to expose the columns in the
table.

Tutorial
86 PUBLIC Populate the Sales Fact Table from Multiple Relational Tables
Option Procedure

Conditions Condition 1:

 Note ODS_DELIVERY.DEL_SALES_ORDER_NUMBER
= ODS_SALESITEM.SALES_ORDER_NUMBER
Conditions identify the rules the software follows to
determine what value to output for the ORD_STATUS 1. Under ODS_DELIVERY, move the
column. DEL_SALES_ORDER_NUMBER column to the
Conditions area under the Column in lookup table col­
umn using drag and drop.
 Note
2. Verify that the operator column, OP.(&), automati­
Set up two conditions for this expression because cally sets to =.
there is a one to many relationship between the 3. Click the ellipses under the Expressions column to
SALES_ORDER_NUMBER column and the open the Smart Editor.
SALES_LINE_ITEM_ID column. For example, the
4. Expand the ODS_SALESITEM node and move the
SALES_ORDER_NUMBER column value PT22221000
SALES_ORDER_NUMBER column to the right side us­
has two SALES_LINE_ITEM_ID values: IT100 and
ing drag and drop.
IT102.
5. Click OK.

Condition 2:

ODS_DELIVERY.DEL_ORDER_ITEM_NUMBER =
ODS_SALESITEM.SALES_LINE_ITEM_ID

The steps are similar to the steps for Condition 1:


1. Move the DEL_ORDER_ITEM_NUMBER column to the
Conditions area under the Column in lookup table col­
umn using drag and drop.
2. Verify that the operator column, OP.(&), automati­
cally sets to =.
3. Click the ellipses under the Expressions column and
expand ODS_SALESITEM.
4. Move SALES_LINE_ITEM_ID to the right using drag
and drop.
5. Click OK.

Output 1. Move the DEL_ORDER_STATUS column from


ODS_Delivery to the Output area under the Column in
 Note lookup table column using drag and drop.

Output parameters specify the column in the Lookup 2. Leave all other options as they are.
table that contains the value to put in the
ORD_STATUS column in the query.

The following image shows the completed Select Parameters dialog box.

Tutorial
Populate the Sales Fact Table from Multiple Relational Tables PUBLIC 87
The final lookup function displays in the Mapping tab and looks as follows:

lookup_ext([ODS_DS.DBO.ODS_DELIVERY,'PRE_LOAD_CACHE','MAX'],
[DEL_ORDER_STATUS],[NULL],
[DEL_SALES_ORDER_NUMBER,'=',ODS_SALESITEM.SALES_ORDER_NUMBER,DEL_ORDER_ITEM_NU
MBER,'=',ODS_SALESITEM.SALES_LINE_ITEM_ID]) SET
("run_as_separate_process"='no', "output_cols_info"='<?xml version="1.0"
encoding="UTF-8"?><output_cols_info><col index="1" expression="no"/>
</output_cols_info>' )

7. Click Finish.

8. Click the Back icon in the upper toolbar.


9. Save your work.

To look at the expression for ORD_STATUS again, select the ORD_STATUS column from the Schema Out
pane in the Query editor and open the Mapping tab in the options pane.

Task overview: Populate the Sales Fact Table from Multiple Relational Tables [page 81]

Tutorial
88 PUBLIC Populate the Sales Fact Table from Multiple Relational Tables
Previous task: Defining the details of the Query transform [page 83]

Next task: Validating the SalesFact data flow [page 89]

8.5 Validating the SalesFact data flow

Use the skills you obtained from previous exercises to validate the data flow.

1. Select DF_SalesFact in the Project Area.

2. Click Validation Validate Current View .


3. If you followed all of the steps in the exercise correctly, there should be no errors or warnings.

Possible errors could result from an incorrect join condition clause or other syntax error.

Task overview: Populate the Sales Fact Table from Multiple Relational Tables [page 81]

Previous task: Using a lookup_ext function for order status [page 85]

Next task: Executing the SalesFact job [page 89]

8.6 Executing the SalesFact job

After you have performed the validation step and fixed any errors, the SalesFact job should execute without
errors.

1. Right-click JOB_SalesFact in the Project Area and select Execute.

No error notifications should appear in the status window. You might see a warning notification indicating
that a conversion from a date to datetime value occurred.
2. Accept the settings in the Execution Properties dialog box and click OK.
3. Click DF_SalesFact in the Project Area to open it in the workspace.
4. Click the magnifying-glass icon on the target table SALES_FACT to view 17 rows of data. Compare

 Example

The following diagram shows how all of the tables are related, and breaks up the steps that you
completed in the Query editor to help you understand the relationships of the three tables and why you
set up conditions for the Lookup expression.

Tutorial
Populate the Sales Fact Table from Multiple Relational Tables PUBLIC 89
Task overview: Populate the Sales Fact Table from Multiple Relational Tables [page 81]

Tutorial
90 PUBLIC Populate the Sales Fact Table from Multiple Relational Tables
Previous task: Validating the SalesFact data flow [page 89]

Next task: Viewing Impact and Lineage Analysis for the SALES_FACT target table [page 91]

8.7 Viewing Impact and Lineage Analysis for the


SALES_FACT target table

Use the metadata reporting tool to browse reports about metadata associated with the SalesFact job. The
metadata reporting tool is a Web-based application.

View information about the Sales_Fact target table to find out when the table was last updated and used. Also
see the related source tables and column mappings.

1. In Designer, select Tools Data Services Management Console


2. Log in using the same credentials as you used to log in to Designer.

The Management Console main page opens.


3. Click the Impact and Lineage Analysis icon.

A browser opens showing the listed repository information.


4. Click Settings in the upper right corner.

Use the Settings options to make sure that you are viewing the applicable repository and to refresh source
and column data.
5. Check the name in Repository to make sure that it contains the current repository.
6. Open the Refresh Usage Data tab to make sure that it lists the current job server.
7. Click Calculate Column Mapping.

The software calculates the current column mapping and notifies you when it is sucessfully complete.
8. Click Close.
9. In the file tree at left, expand Datastores and then Target_DS to view the list of tables.
10. Expand Data Flow Column Mapping Calculation in the right pane to view the calculation status of each data
flow.
11. Double-click the SALES_FACT table under Target_DS in the file tree.

The Overview tab for SALES_FACT table opens at right. The Overview tab displays general information
about the table such as the table datastore name and the table type.
12. Click the Lineage tab.

The following Lineage tab displays the sources for the SALES_FACT target table. When you move the
pointer over a source table icon, the name of the datastore, data flow, and owner appear.

Tutorial
Populate the Sales Fact Table from Multiple Relational Tables PUBLIC 91

You might also like