[go: up one dir, main page]

0% found this document useful (0 votes)
29 views35 pages

Chapter 2 - Preparing Data for Analysis

Uploaded by

sylviawinee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views35 pages

Chapter 2 - Preparing Data for Analysis

Uploaded by

sylviawinee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

1

CHAPTER 2

PREPARING DATA FOR


ANALYSIS

201123 – ACCOUNTING ANALYTICS


2 OBJECTIVES

• Understand how data are organized in an accounting information system.


• Understand how data are stored in a relational database.
• Explain and apply extraction, transformation, and loading (ETL) techniques.
• Describe the ethical considerations of data collection and data use

201123 – ACCOUNTING ANALYTICS


3 CONTENTS

1. • How are data used and stored in the accounting cycle?


2. • How are data stored in relational databases?
3. • Data dictionaries
4. • What does it mean to extract, transform, and load
5. • Ethical considerations of data collection and use

201123 – ACCOUNTING ANALYTICS


4 LOOK BACK

Chapter 1 defined Data Analytics and explained that the value of Data Analytics is in the
insights it provides. We described:
 the Data Analytics Process using the IMPACT cycle model; and
 explained how this process is used to address both business and accounting questions.
We specifically emphasized the importance of identifying appropriate questions that Data
Analytics might be able to address.

201123 – ACCOUNTING ANALYTICS


1. HOW ARE DATA USED AND STORED IN THE
ACCOUNTING CYCLE?

Data can be found throughout various systems.

In most cases, you need to know which tables and


attributes contain the relevant data.

Unified Modeling Language (UML) is one way to


understand databases.
While Exhibit 2-2 Procure-to-Pay Database Schema (Simplified)

201123 – ACCOUNTING ANALYTICS


INTERNAL AND EXTERNAL DATA SOURCES
Data may come from a number of different sources, either internal or external to the organization.
Internal data sources include:
• accounting information system: records, processes, reports, and communicates the results of
business transactions.
• supply chain management system: active vendors, the orders made to date or demand
schedules for what component of the final product is needed when.
• customer relationship management system: detail about the customer.
• human resource management system: is an information system for managing all interactions
with current and potential employees.

201123 – ACCOUNTING ANALYTICS


INTERNAL AND EXTERNAL DATA SOURCES (CON’T)
Data may come from a number of different sources, either internal or external to the
organization. Internal data sources include:
• accounting information system
• supply chain management system
• customer relationship management system
• human resource management system.
Enterprise Resource Planning (ERP) (also known as Enterprise Systems) is a category of
business management software that integrates applications from throughout the business (such as
manufacturing, accounting, finance, human resources, etc.) into one system.

201123 – ACCOUNTING ANALYTICS


INTERNAL AND EXTERNAL
DATA SOURCES (CON’T)

• External data sources include economic,


financial, governmental, and other sources.
Each of these may be useful in addressing
accounting and business questions.

201123 – ACCOUNTING ANALYTICS


1. HOW ARE DATA USED AND STORED IN THE
ACCOUNTING CYCLE?

 Accounting Data and Accounting Information Systems


 There are a variety of applications that support relational databases (these are
referred to as Relational Database Management Systems or RDBMS). For
example: Microsoft Access, SQLite, and Microsoft SQL Server.

 There are many other examples of relational database management systems:


Teradata, MySql, Oracle RDBMS, IBM DB2,Amazon RDS, and PostGreSQL.

201123 – ACCOUNTING ANALYTICS


II. HOW ARE DATA STORED IN RELATIONAL DATABASES?

RELATIONAL DATABASES ENSURE THAT DATA

• Are complete or include all data.


• Aren’t redundant, so they don’t take up too much space.
• Follow business rules and internal controls.
• Aid communication and integration of business processes.

201123 – ACCOUNTING ANALYTICS


2. HOW ARE DATA STORED IN
RELATIONAL DATABASES?

• Primary keys are unique identifiers.


• Foreign keys are attributes that point to a
primary key in another table.
• Composite keys are a combination of two or
more attributes to create a unique identifier.
• Descriptive attributes include everything else.

Exhibit 2-4 Purchase Order Table

201123 – ACCOUNTING ANALYTICS


2. HOW ARE DATA STORED IN RELATIONAL DATABASES?

Examples of two tables, attributes, and data. Notice the PK-FK relationship.

Exhibit 2-3 Line Items Table:


Purchase Order Detail Table
Exhibit 2-4 Purchase Order Table

201123 – ACCOUNTING ANALYTICS


2. HOW ARE DATA STORED IN RELATIONAL DATABASES?

201123 – ACCOUNTING ANALYTICS


DATA DICTIONARIES DEFINE
WHAT DATA ARE ACCEPTABLE.

• FOR EACH ATTRIBUTE, WE LEARN:


WHAT TYPE OF KEY IT IS.
WHAT DATA ARE REQUIRED.
WHAT DATA CAN BE STORED IN IT.
HOW MUCH DATA IS STORED.

Exhibit 2-6 Supplier Data Dictionary

201123 – ACCOUNTING ANALYTICS


LENDING
CLUB DATA
DICTIONARY
FOR REJECTED
LOAN DATA

201123 – ACCOUNTING ANALYTICS


ABBREVIATED
DATA
DICTIONARY
FOR VENDOR
DATA EXTRACT

201123 – ACCOUNTING ANALYTICS


3. WHAT DOES IT MEAN TO EXTRACT, TRANSFORM, AND
LOAD
The ETL process begins with identifying which data you need and is complete when the clean
data are loaded in the appropriate format into the tool to be used for analysis.The Requesting
data is an iterative practice involving 5 steps:
• Extract
Step 1: Determine the purpose and scope of the data request.
Step 2: Obtain the data.
• Transform
Step 3: Validate the data for completeness and integrity.
Step 4: Clean the data.
• Load
Step 5: Load the data for data analysis.

201123 – ACCOUNTING ANALYTICS


EXTRACT
Step 1: Determine the purpose and scope of the data request.
Ask a few questions before beginning the process:
• What is the purpose of the data request?

• What do you need the data to solve?


• What business problem will it address?
• What risk exists in data integrity (for example, reliability, usefulness)?
• What is the mitigation plan?
• What other information will impact the nature, timing, and extent of the data analysis?

201123 – ACCOUNTING ANALYTICS


EXTRACT
Step 2: Obtain the Data – Methods
There are a couple options:
• Obtain data through a data request to the IT department.

• Obtain data yourself.

201123 – ACCOUNTING ANALYTICS


EXTRACT
Step 2: Obtain the Data – Methods
There are a couple options:
• Obtain data through a data request to the IT department.

201123 – ACCOUNTING ANALYTICS


EXHIBIT 2-7 Example Standard Data Request Form

EXAMPLE STANDARD DATA REQUEST FORM – HEADER

201123 – ACCOUNTING ANALYTICS


EXAMPLE STANDARD DATA REQUEST FORM – RESPONSE

EXHIBIT 2-7 Example Standard Data Request Form

201123 – ACCOUNTING ANALYTICS


OBTAIN THE DATA YOURSELF

• If you have direct access to a data warehouse, you can use SQL and other tools to pull the data
yourself.
• Identify the tables that contain the information you need. You can do this by looking through the data
dictionary or the relationship model.
• Identify which attributes, specifically, hold the information you need in each table.
• Identify how those tables are related to each other

201123 – ACCOUNTING ANALYTICS


•SELECT liệt kê các cột mà bạn muốn truy xuất:
CustomerName (từ bảng Customers), OrderDate (từ
bảng Orders), ProductName, Quantity, và UnitPrice (từ
bảng OrderDetails).
•FROM Customers C xác định bảng Customers với bí
danh C.
•INNER JOIN Orders O ON C.CustomerID =
O.CustomerID kết nối bảng Customers và bảng Orders
dựa trên cột CustomerID.
•INNER JOIN OrderDetails OD ON O.OrderID =
OD.OrderID kết nối bảng Orders và bảng OrderDetails
dựa trên cột OrderID.
•WHERE O.OrderDate BETWEEN '2023-01-01' AND
'2023-12-31' lọc các đơn hàng trong khoảng thời gian từ
ngày 1 tháng 1 năm 2023 đến ngày 31 tháng 12 năm 2023.
•ORDER BY O.OrderDate sắp xếp kết quả theo ngày đặt
hàng.

201123 – ACCOUNTING ANALYTICS


TRANSFORM
Step 3: Validating the data for completeness and integrity
• Chances are the data you request isn’t complete. Before you begin, do a little work to
make sure your data are valid:
Compare the number of records.
Compare descriptive statistics for numeric fields.

Validate Date/Time fields.


Compare string limits for text fields.

201123 – ACCOUNTING ANALYTICS


TRANSFORM
Step 4: Clean the data.
• Once you have valid data, there is still some work that needs to be done to make sure
it is consistent and ready for analysis:
Remove headings or subtotals.
Clean leading zeroes and nonprintable characters.

Format negative numbers.


Correct inconsistencies across data, in general.

201123 – ACCOUNTING ANALYTICS


KNOWLEDGE CHECK

201123 – ACCOUNTING ANALYTICS


Q: IN COLUMN 3,
WHICH OF THE FOLLOWING PROBLEMS DO YOU FIND?

 a. data consistency error


 b. data imputation error
 c. data contradiction error
 d. violated attribute dependencies

201123 – ACCOUNTING ANALYTICS


Q: IN COLUMN 2, ROW 7,
WHICH OF THE FOLLOWING PROBLEMS DO YOU FIND?

 a. data threshold violation


 b. data entry error
 c. violated attribute dependencies
 d. dichotomous variable problem

201123 – ACCOUNTING ANALYTICS


A NOTE ABOUT DATA QUALITY.

 Dates (e.g., 7/6/2023 or 6/7/2023 or 2023-07-06)


 Numbers (e.g., 1 or I, 7 or seven)
 International characters and encoding (e.g., * or “ or TAB)
 Languages and measures (e.g., Arkansas or AR, $ or €)
 Human error (e.g., 23 or 32) o The Microsoft Track includes Excel, Power Query,
Power BI, and Power Automate

201123 – ACCOUNTING ANALYTICS


FORMAT CELLS
WINDOW IN
EXCEL

201123 – ACCOUNTING ANALYTICS


LOAD

Step 5: Load the data for data analysis


• Finally, you can now import your data into the tool of your choice and expect the
functions to work properly.

201123 – ACCOUNTING ANALYTICS


POTENTIAL ETHICAL ISSUES SURROUND HOW DATA
ARE COLLECTED AND HOW THEY ARE SHARED.
1. How does the company use data, and to what extent are they integrated into firm strategy
2. Does the company send a privacy notice to individuals when their personal data are collected?
3. Does the company assess the risks linked to the specific type of data the company uses?
4. Does the company have safeguards in place to mitigate the risks of data misuse?
5. Does the company have the appropriate tools to manage the risks of data misuse?
6. Does our company conduct appropriate due diligence when sharing with or acquiring data
from third parties?

201123 – ACCOUNTING ANALYTICS


SUMMARY CHAPTER II
• The first step in the IMPACT cycle is to identify the questions that you intend to answer Once you have the data, they will need to be validated for
through your data analysis project. Once a data analysis problem or question has been completeness and integrity—that is, you will need to ensure that all
identified, the next step in the IMPACT cycle is mastering the data, which can be broken of the data you need were extracted, and that all data are correct.
down to mean obtaining the data needed and preparing it for analysis. Sometimes when data are extracted, some formatting or
sometimes even entire records will get lost, resulting in
• In order to obtain the right data, it is important to have a firm grasp of what data are
inaccuracies. Correcting the errors and cleaning the data is an
available to you and how that information is stored.
integral step in mastering the data.
• Data are often stored in a relational database, which helps to ensure that an
• Finally, after the data have been cleaned, there may be one last
organization’s data are complete and to avoid redundancy. Relational databases are made
step of mastering the data, which is to load them into the tool that
up of tables with uniquely identified records (this is done through primary keys)and are
will be used for analysis. Often, the cleaning and correcting of data
related through the usage of foreign keys.
occur in Excel and the analysis will also be done in Excel. In this
• To obtain the data, you will either have access to extract the data yourself or you will case, there is no need to load the data elsewhere. However, if you
need to request the data from a database administrator or the information systems team. intend to do more rigorous statistical analysis than Excel provides,
If the latter is the case, you will complete a data request form, indicating exactly which or if you intend to do more robust data visualization than can be
data you need and why. done in Excel, it may be necessary to load the data into another
tool following the transformation process.

201123 – ACCOUNTING ANALYTICS


Thank you

201123 – ACCOUNTING ANALYTICS

You might also like