[go: up one dir, main page]

0% found this document useful (0 votes)
24 views11 pages

DAA - Chapter 02

The document discusses how data is organized and used in accounting information systems and relational databases. It covers how data flows through the accounting cycle from various internal and external sources, and is stored in relational database tables with attributes, primary keys, foreign keys and descriptive fields. The document also explains data dictionaries that define acceptable data elements, and the extract-transform-load (ETL) process for obtaining data from various sources and preparing it for use in databases and analytics.

Uploaded by

chauchou2711
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views11 pages

DAA - Chapter 02

The document discusses how data is organized and used in accounting information systems and relational databases. It covers how data flows through the accounting cycle from various internal and external sources, and is stored in relational database tables with attributes, primary keys, foreign keys and descriptive fields. The document also explains data dictionaries that define acceptable data elements, and the extract-transform-load (ETL) process for obtaining data from various sources and preparing it for use in databases and analytics.

Uploaded by

chauchou2711
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

6/03/2024

CHAPTER 02
Mastering Data

1
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

Objectives

• Understand how data are organized in an accounting information


system.
• Understand how data are stored in a relational database.
• Explain and apply extraction, transformation, and loading (ETL)
techniques.
• Describe the ethical considerations of data collection and data use

2
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

Contents

• How are data used and stored in the accounting cycle?


• How are data stored in relational databases?
• Data dictionaries
• What does it mean to Extract, Transform, and Load
• Ethical considerations of data collection and use

3
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

1
6/03/2024

How are data used and stored in the accounting cycle?

Data can be found throughout


various systems.

In most cases, you need to know


which tables and attributes
Exhibit 2-2 Procure-to-Pay Database Schema (Simplified)
contain the relevant data.

4
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

How are data used and stored in the accounting cycle?

Internal and External Data Sources


Data may come from a number of different sources, either internal or external to the
organization. Internal data sources include:
▪ Accounting information system
▪ Supply chain management system
▪ Customer relationship management system
▪ Human resource management system.
Enterprise Resource Pl anning (ERP) (also known as Enterprise Systems) is a category of
business management software that integrates app lications from throughout the
business (such as manufacturing, accounting, finance, human resources, etc.) into one
system.
5
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

How are data used and stored in the accounting cycle?

Accounting data and Accounting information systems

6
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

2
6/03/2024

How are data used and stored in the accounting cycle?

Accounting data and Accounting information systems

• There are a variety of applications that support relational


databases (these are referred to as Relational Database
Management Systems or RDBMS). For example: Microsoft
Access, SQLite, and Microsoft SQL Server.
• There are many other examples of relational database
management systems: Teradata, MySql, Oracle RDBMS, IBM
DB2, Amazon RDS, and PostGreSQL.

7
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

How are data stored in relational databases?

Relational databases ensure that data:


• Are complete or include all data.
• Aren’t redundant, so they don’t take up too much space.
• Follow business rules and internal controls.
• Aid communication and integration of business processes.

8
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

How are data stored in relational databases?

9
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

3
6/03/2024

How are data stored in relational databases?

• Primary keys are unique Purchase Order Table

identifiers. PO_ Created Approved Supplier Employee


Cash
Date Disbursement

• Foreign keys are attributes that


Number By By ID ID ID

point to a primar y key in another 1787 11/1/2020 1001 1010 1 52 2001

table. 1788 11/1/2020 1005 1010 2 52 2003

• Composite keys are a 1789 11/8/2020 1002 1010 1 52 2004

combination of two or more


1790 11/15/2020 1005 1010 1 52 2004

attributes to create a unique Exhibit 2-4 Purchase Order Table


identifier.
• Descriptive attributes include
ever ything else.
10
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

10

How are data stored in relational databases?

• Examples of two tables, attributes, and data. Notice the PK-FK


relationship.

Exhibit 2-3 Line Items Table:


Purchase Order Detail Table

Exhibit 2-4 Purchase Order Table

11
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

11

How are data stored in relational databases?

12
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

12

4
6/03/2024

Data dictionaries define what data are acceptable.


Primary
or Defau
Require Attribut Data Field
• For each attribute, we learn:
lt
Foreign d e Name Description Type Size Notes
Value
Key?

▪ What type of key it is. Unique


Supplier Identifier for
PK Y each Supplier Number n/a 10
▪ What data are required.
ID

▪ What data can be stored in it. N


Supplier First and Last
Name Name
Short
Text n/a 30

▪ How much data is stored. Type Code for


Different
Supplier Supplier 1:
FK N Type Number Null 10 Vendor
Categories
2: Misc

Exhibit 2-6 Supplier Data Dictionary

13
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

13

Data dictionaries define what data are acceptable.

Abbreviated Data Dictionary for Vendor Data Extract

Romney et al, 2021 14


Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

14

What does it mean to Extract, Transform, and Load

The ETL process begins with identifying which data you need and is
complete when the clean data are loaded in the appropriate format
into the tool to be used for analysis. The Requesting data is an iterative
practice involving 5 steps

15
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

15

5
6/03/2024

What does it mean to Extract, Transform, and Load


Step 1: Determine the purpose and scope of the data request.
Step 2: Obtain the data.

Step 5: Load the data for data analysis.

Step 3: Validate the data for completeness and integrity.


Step 4: Clean the data. 16
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

16

Extract

Step 1: Determine the purpose and scope of the data request.


Ask a few questions before beginning the process:
• What is the purpose of the data request?
• What do you need the data to solve?
• What business problem will it address?
• What risk exists in data integrity?
• What is the mitigation plan?
• What other information will impact the nature, timing, and extent
of the data analysis?
17
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

17

Extract

Step 2: Obtain the Data – Methods


There are a couple options:
• Obtain data through a data request to the IT department.
• Obtain data yourself.

18
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

18

6
6/03/2024

Example Standard Data Request Form – Header


Section 1: Request Details
Frequency (circle One-Off Annually Termly
Requestor Name: one) Other:___________
Requestor Contact
Number:
Spreadsheet
Requestor Email Format you wish the
Word Document
Address: data to be delivered
Text File
in(circle one):
Please provide a description of the Other: ____________
information needed (indicate which tables
and which fields you require): Request Date:
Required Date:
What will the information be used for?
Intended Audience:
Customer
(if not requestor):

EXHIBIT 2-7 Example Standard Data Request Form


19
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

19

Example Standard Data Request Form – Response

Section 2: To be Completed by Information


Section 3: Completion Details
Systems Department

Request Date
Number Received Date Date
Completed Provided
Assigned
Received by
to
Initial review comments (discussion with client— Revisions
revisions required? agreement to proceed? etc.) Required

Feedback from client (if applicable)

Work in progress comments (additional notes and


comments during production of data)

EXHIBIT 2-7 Example Standard Data Request Form


20
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

20

Extract

Step 2: Obtain the Data – Methods


Obtain the data yourself

• If you have direct access to a data warehouse, you can use SQL and
other tools to pull the data yourself.
• Identify the tables that contain the information you need. You can do
this by looking through the data dictionary or the relationship model.
• Identify which attributes, specifically, hold the information you need in
each table.
• Identify how those tables are related to each other. 21
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

21

7
6/03/2024

Transform

Step 3: Validating the data for completeness and integrity


• Chances are the data you request isn’t complete. Before you
begin, do a little work to make sure your data are valid:
▪ Compare the number of records.
▪ Compare descriptive statistics for numeric fields.
▪ Validate Date/Time fields.
▪ Compare string limits for text fields.

22
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

22

Transform

Step 4: Clean the data.


• Once you have valid data, there is still some work that needs to
be done to make sure it is consistent and ready for analysis:
▪ Remove headings or subtotals.
▪ Clean leading zeroes and nonprintable characters.
▪ Format negative numbers.
▪ Correct inconsistencies across data, in general.

23
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

23

Knowledge check

24
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

24

8
6/03/2024

In column 3, which of the following problems do you find?

a. data consistency error


b. data imputation error
c. data contradiction error
d. violated attribute dependencies

25
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

25

In column 5, which of the following problems do you find?

a. data pivoting error


b. violated attribute dependencies
c. data consistency error
d. cryptic values

26
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

26

In row 8 and row 9, which of the following problems do


you find?

a. data contradiction error


b. data concatenation error
c. data aggregation error
d. duplicate values

27
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

27

9
6/03/2024

In column 2, row 7, which of the following problems do you find?

a. data threshold violation


b. data entry error
c. violated attribute dependencies
d. dichotomous variable problem

28
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

28

A note about data quality.


• Dates (e.g., 7/6/2023 or 6/7/2023 or 2023-07-06)
• Numbers (e.g., 1 or I, 7 or seven)
• International characters and encoding (e.g., * or “ or TAB)
• Languages and measures (e.g., Arkansas or AR, $ or €)
• Human error (e.g., 23 or 32)

29
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

29

Format Cells Window in Excel

30
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

30

10
6/03/2024

Load

Step 5: Load the data for data analysis


• Finally, you can now import your data into the tool of your
choice and expect the functions to work properly.

31
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

31

Potential ethical issues surround how data


are collected and how they are shared.
1. How does the company use data, and to what extent are they integrated
into firm strategy
2. Does the company send a privacy notice to individuals when their
personal data are collected?
3. Does the company assess the risks linked to the specific type of data the
company uses?
4. Does the company have safeguards in place to mitigate the risks of data
misuse?
5. Does the company have the appropriate tools to manage the risks of
data misuse?
6. Does our company conduct appropriate due diligence when sharing with
or acquiring data from third parties?
32
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

32

Chapter 2 Summary
• The first step in the IMPACT cycle is to identify the • Once you have the data, they will need to be validated
questions that you intend to answer through your data for completeness and integrity—that is, you will need to
analysis project. Once a data analysis problem or question ensure that all of the data you need were extracted, and
has been identified, the next step in the IMPACT cycle is that all data are correct. Sometimes when data are
mastering the data, which can be broken down to mean extracted, some formatting or sometimes even entire
obtaining the data needed and preparing it for analysis. records will get lost, resulting in inaccuracies. Correcting
the errors and cleaning the data is an integral step in
• In order to obtain the right data, it is important to have a mastering the data.
firm grasp of what data are available to you and how that
information is stored. • Finally, after the data have been cleaned, there may be
• Data are often stored in a relational database, which one last step of mastering the data, which is to load
helps to ensure that an organization’s data are them into the tool that will be used for analysis. Often,
complete and to avoid redundancy. Relational the cleaning and correcting of data occur in Excel and
databases are made up of tables with uniquely the analysis will also be done in Excel. In this case, there
identified records (this is done through primary keys) is no need to load the data elsewhere. However, if you
and are related through the usage of foreign keys. intend to do more rigorous statistical analysis than Excel
provides, or if you intend to do more robust data
• To obtain the data, you will either have access to extract the visualization than can be done in Excel, it may be
data yourself or you will need to request the data from a necessary to load the data into another tool following
database administrator or the information systems team. If the transformation process.
the latter is the case, you will complete a data request form,
indicating exactly which data you need and why.

33
Prepared by Nguyen Huu Binh-SOA-COB-UEH- hu ub in h_a is@ ueh .e du.v n

33

11

You might also like