Informatica Hands-On Challenge: Super_Store Analysis
• Introduction: You are provided with a sample dataset from a retail store,
Super_Store. This dataset contains information about orders, customers, products,
and sales. Your task involves cleaning the data, analyzing sales, customer orders,
customer geography, and order processing time using Informatica PowerCenter.
Data Preparation:
• Oracle SQL Setup:
• Log in to Oracle SQL Developer in Admin connection using the credentials:
• Username: system
• Password: Admin
Create a table named Super_Store with the provided structure
Row_ID INT
Order_Date DATE
Ship_Date DATE
Ship_Mode VARCHAR(50)
Customer_ID VARCHAR(50)
Customer_Name VARCHAR(50)
Segment VARCHAR(50)
Country VARCHAR(50)
City VARCHAR(50)
State VARCHAR(50)
Postal_Code VARCHAR(50)
Region VARCHAR(50)
Product_ID VARCHAR(250)
Category VARCHAR(250)
Sub_Category VARCHAR(250)
Product_Name VARCHAR(250)
Sales INT
• NOTE : while loading data into table update order_date, ship_date Date Format to
DD/MM/YYYY
Load superstore_data.csv into the Super_Store table.
You are given data set is in the "~\Desktop\Project\miniproject-informatica-
super_store \"
Informatica Repository Setup:
Connect to the Informatica repository manager using the following credentials:
• Username: Administrator
• Password: Administrator
Create a folder named Super_Store in the repository manager.
How to Import Source Table in Source Analyzer
Following are the steps to import source table in Informatica Source Analyzer:
Step 1) Go to “Sources” option
In source analyzer
1. Click on tab “Sources” from the main menu
2. Select import from database option, after this ODBC Connection box will open.
Step 2) Create ODBC connection
• We will now create ODBC connection
1. Click on the button next to ODBC data Source(...).
2. On the next page, Select user DSN tab and click Add button.
3. Select oracle wire protocol
4. On the next page, select the general tab and enter the database details.
Then click connect.
• Data Source name : oracle
• Host : localhost
• port : 1521
• sid : xe
Create Connections for Workflow Manager
To Create a Relational Connection
Step 1: In Workflow Manager
• Click on the Connection menu
• Select Relational Option
Step 2: In the pop up window
• Select Oracle in type
• Click on the new button
Step 3: In the new window of connection object definition
• Enter Connection Name (system)
• Enter username - system
• Enter password – Admin
• Enter connection string - xe
• Leave other settings as default and Select OK button
Note : For more credentials, like for designer, knidly check in the Readme File.
Note : Please Follow the naming conventions in the problem statement
Data Cleaning:
• Mapping Name: Map_Cleaned_Data
• Workflow Name: Workflow_Cleaned_Data
• Session Name: Session_Cleaned_Data
• Target Table: Super_Store_Cleaned_Data
Operations:
• Remove duplicates from the dataset to ensure data integrity.
• Filter records where Country is 'United States' to focus on domestic orders.
• Extract numeric part from Customer_ID to standardize customer identification.(EX:
CH-1234, extract 1234)
• Concatenate Customer_ID and Customer_Name with '-' to create a unique
identifier for each customer.(Ex: 1234-Charlies, Extracted_ID-Customer_name) and
store it in Customer_Id_Name Column
• Drop the customer_id, Customer_name Columns
• After cleaning Load data into the Super_Store_Cleaned_Data target table (For
columns check sample output)
• After completing of mapping, in the workflow manager.
• We are going to connect to repository “Repo_etlclass”, so double click on the
folder to connect.
• Enter user name and password then select “Connect Button”.
Username : Administrator
Password : Administrator
• create session Session_Cleaned_Data
• Step 1 – Double click on the session object in wokflow manager. It will open
a task window to modify the task properties.
• Step 2 – In the edit task window
1. Select mapping tab
2. Select connection property
3. Assign the connection to source and target, the connection which we
created in early steps.(In targets property – load data as normal and
click on insert check box)
4. Select OK Button
Creating workflow
• Step 1 – In workflow designer
1. Select workflows menu
2. Select create option
• Step 2 – In create workflow window
1. Enter workflow name - Workflow_Cleaned_Data
2. Select OK Button ( leave other options as default)
->When you create a workflow, it does not consist of any tasks. So, to execute any task in
a workflow you have to add task in it.
• Step 1 – To add session task that we have created in Task developer to the
workflow desinger
1. In the navigator tree, expand the tasks folder
2. Drag and drop the command task to workflow designer
• After link task to the workflow and start the workflow
->Kindly check the workflow monitor if it success or not.
Sample Output : Reamining columns and additional with 'CUSTOMER_ID_NAME'
COLUMN.
CUSTOMER_ID_NAME
21925-Zushuss Donatelli
16585-Ken Black
21520-Tracy Blumstein
NOTE : Super_Store_Cleaned_Data table data is used for the below every tasks.
Analysis Tasks:
Task 1: Sales Summary
• Mapping Name: Map_Sales_Summary
• Workflow Name: Workflow_Sales_Summary
• Session Name: Session_Sales_Summary
• Target Table: Sales_Summary
Problem Statement: Summarize total sales and average sales for each customer. Identify
customers with significant contribution to overall sales.
Operations:
• Summarize the sales data by calculating the sum of sales and store it in Total_Sales
and average sales store it in a Avg_Sales column for each customer using their
'Customer_ID_Name'
• Order the summarized data in descending order based on the total sales
('Total_Sales').
• Filter customers with total sales greater than 3000 and average sales greater than
300 to focus on significant contributors.
• Drop the unnecessary columns, kindly check the sample output.
• Load data into the Sales_Summary target table (For columns check sample
output)
• After completing of mapping, in the workflow manager.
• We are going to connect to repository “Repo_etlclass”, so double click on the
folder to connect.
• Enter user name and password then select “Connect Button”.
Username : Administrator
Password : Administrator
• create session Session_Sales_Summary
• Step 1 – Double click on the session object in wokflow manager. It will open
a task window to modify the task properties.
• Step 2 – In the edit task window
1. Select mapping tab
2. Select connection property
3. Assign the connection to source and target, the connection which we
created in early steps.(In targets property – load data as normal and
click on insert check box)
4. Select OK Button
Creating workflow
• Step 1 – In workflow designer
1. Select workflows menu
2. Select create option
• Step 2 – In create workflow window
1. Enter workflow name - Workflow_Sales_Summary
2. Select OK Button ( leave other options as default)
->When you create a workflow, it does not consist of any tasks. So, to execute any task in
a workflow you have to add task in it.
• Step 1 – To add session task that we have created in Task developer to the
workflow desinger
1. In the navigator tree, expand the tasks folder
2. Drag and drop the command task to workflow designer
• After link task to the workflow and start the workflow
->Kindly check the workflow monitor if it success or not.
Sample Output:
CUSTOMER_ID_ TOTAL_SALES AVG_SALES
NAME
11140-Becky 10540 1506
Martin
14635-Grant 8167 4084
Thornton
20290-Sean 5580 2790
Braxton
Task 2: Customer Order Analysis
• Mapping Name: Map_Order_Analysis
• Workflow Name: Workflow_Order_Analysis
• Session Name: Session_Order_Analysis
• Target Table: Order_Analysis
Problem Statement: Analyze customer orders to determine the most frequent buyers
and their order patterns.
Operations:
• Filter records for customers in category 'Office Supplies' and City in 'San Francisco'
to analyze local customer behavior.
• Create new column orders_count, Calculate the count of orders for each customer
to determine their order frequency.
• Sort the results by order count in descending order to identify the most frequent
buyers and get only top 10 records.
• Drop the unnecessary columns, kindly check the sample output.
• Load data into the Order_Analysis target table (For columns check sample output)
• After completing of mapping, in the workflow manager.
• We are going to connect to repository “Repo_etlclass”, so double click on the
folder to connect.
• Enter user name and password then select “Connect Button”.
Username : Administrator
Password : Administrator
• create session Session_Order_Analysis
• Step 1 – Double click on the session object in wokflow manager. It will open
a task window to modify the task properties.
• Step 2 – In the edit task window
1. Select mapping tab
2. Select connection property
3. Assign the connection to source and target, the connection which we
created in early steps.(In targets property – load data as normal and
click on insert check box)
4. Select OK Button
Creating workflow
• Step 1 – In workflow designer
1. Select workflows menu
2. Select create option
• Step 2 – In create workflow window
1. Enter workflow name - Workflow_Order_Analysis
2. Select OK Button ( leave other options as default)
->When you create a workflow, it does not consist of any tasks. So, to execute any task in
a workflow you have to add task in it.
• Step 1 – To add session task that we have created in Task developer to the
workflow desinger
1. In the navigator tree, expand the tasks folder
2. Drag and drop the command task to workflow designer
• After link task to the workflow and start the workflow
->Kindly check the workflow monitor if it success or not.
Sample output:
CUSTOMER_ID_NA ORDERS_COUN
ME T
15520-Jeremy 4
Pistek
16510-Keith Herrera 4
Task 3: Customer Geography Analysis6
• Mapping Name: Map_Geography_Analysis
• Workflow Name: Workflow_Geography_Analysis
• Session Name: Session_Geography_Analysis
• Target Table: Geography_Analysis
Problem Statement: Analyze customer distribution across different regions to identify
potential market segments.
Operations:
• Filter records for customers in state 'California' to focus on specific geographical
areas.
• Group customers by region (North, South, East, West) based on their location data.
• Calculate the count of customers in each region to understand the geographical
distribution.
• Drop the unnecessary columns, kindly check the sample output.
• Load data into the Geography_Analysis target table (For columns check sample
output)
• After completing of mapping, in the workflow manager.
• We are going to connect to repository “Repo_etlclass”, so double click on the
folder to connect.
• Enter user name and password then select “Connect Button”.
Username : Administrator
Password : Administrator
• create session Session_Geography_Analysis
• Step 1 – Double click on the session object in wokflow manager. It will open
a task window to modify the task properties.
• Step 2 – In the edit task window
1. Select mapping tab
2. Select connection property
3. Assign the connection to source and target, the connection which we
created in early steps.(In targets property – load data as normal and
click on insert check box)
4. Select OK Button
Creating workflow
• Step 1 – In workflow designer
1. Select workflows menu
2. Select create option
• Step 2 – In create workflow window
1. Enter workflow name - Workflow_Geography_Analysis
2. Select OK Button ( leave other options as default)
->When you create a workflow, it does not consist of any tasks. So, to execute any task in
a workflow you have to add task in it.
• Step 1 – To add session task that we have created in Task developer to the
workflow desinger
1. In the navigator tree, expand the tasks folder
2. Drag and drop the command task to workflow designer
• After link task to the workflow and start the workflow
->Kindly check the workflow monitor if it success or not.
Sample output:
CUSTOMER_ID_ REGION STATE REGION_ORDERS_CO
NAME UNT
14095-Eudokia West California 182
Martin
Task 4: Order Processing Time Analysis
• Mapping Name: Map_Order_Processing
• Workflow Name: Workflow_Order_Processing
• Session Name: Session_Order_Processing
• Target Table: Order_Processing
Problem Statement: Evaluate order processing efficiency by analyzing the time taken
between order placement and shipment,
Operations:
• Calculate the processing days for each order by subtracting the order date from
the ship date and store it in new column Processing_days.
• Categorize processing days (e.g., Less than 1 day then Immediate delivery, 1 to 3
days then Moderate Delivery, 3 or more days then Long term delivery ).
• Count the number of orders falling with in each categorise processing days for
each to analyze processing days distributions.
• Drop the unnecessary columns, kindly check the sample output.
• Load data into the Order_Processing target table (For columns check sample
output)
• After completing of mapping, in the workflow manager.
• We are going to connect to repository “Repo_etlclass”, so double click on the
folder to connect.
• Enter user name and password then select “Connect Button”.
Username : Administrator
Password : Administrator
• create session Session_Order_Processing
• Step 1 – Double click on the session object in wokflow manager. It will open
a task window to modify the task properties.
• Step 2 – In the edit task window
1. Select mapping tab
2. Select connection property
3. Assign the connection to source and target, the connection which we
created in early steps.(In targets property – load data as normal and
click on insert check box)
4. Select OK Button
Creating workflow
• Step 1 – In workflow designer
1. Select workflows menu
2. Select create option
• Step 2 – In create workflow window
1. Enter workflow name - Workflow_Order_Processing
2. Select OK Button ( leave other options as default)
->When you create a workflow, it does not consist of any tasks. So, to execute any task in
a workflow you have to add task in it.
• Step 1 – To add session task that we have created in Task developer to the
workflow desinger
1. In the navigator tree, expand the tasks folder
2. Drag and drop the command task to workflow designer
• After link task to the workflow and start the workflow
->Kindly check the workflow monitor if it success or not.
Sample Output:
CATEGORISE_PROCESSING_ ORDERS_COUN
DAYS T
Immediate delivery 32
Long Term delivery 641
After completing the challenge, Double click on the SAMPLE_TEST.EXE file to get the
sample score
Click on the submit button to validate your solution