CHAPTER 8
CAATTs for Data Extraction and Analysis
(Computer Aided Audit Tools and Techniques)
USES OF DATA EXTRACTION (SUBSTANTIVE TESTS)
- Determining the correct value of inventory
- Determining the accuracy of prepayments and accruals
- Confirming accounts receivable with customers
- Searching for unrecorded liabilities
DATA STRUCTURES 2 FUNDAMENTAL COMPONENTS
- Organization
o Physical layout/location of the records in a file
o Types:
Sequential (e.g., alphabetical, chronological, numbered)
Random
- Access method
o How you locate the record
o Types:
Direct
Sequential
FILE PROCESSING OPERATIONS (Individual records)
1. Retrieve a record by key
2. Insert a record
3. Update a record
4. Read a file
5. Find next record
6. Scan a file
7. Delete a record
DATA STRUCTURES (2)
- Flat File Structures
o Sequential
All records in contiguous storage spaces in specified sequence (key field)
Sequential files are simple & easy to process
Application reads from beginning in sequence
If only small portion of file being processed, inefficient method
Does not permit accessing a record directly
Efficient: 4, 5 – sometimes 3
Inefficient: 1, 2, 6, 7 – usually 3
- Indexed structure
o In addition to data file, separate index file
o Contains physical address in data file of each indexed record
o Indexed random file
Records are created without regard to physical proximity to other related records
Physical organization of index file itself may be sequential or random
Random indexes are easier to maintain, sequential more difficult
Advantage over sequential: rapid searches
Other advantages: processing individual records, efficient usage of disk storage
Efficient: 1, 2, 3, 7
Inefficient: 4
o Virtual Storage Access Method (VSAM) or Indexed Sequential Access Method (ISAM)
Large files, routine batch processing
Moderate degree of individual record processing
Used for files across cylinders
Uses number of indexes, with summarized content
Access time for single record is slower than Indexed Sequential or Indexed Random
Disadvantage: does not perform record insertions efficiently – requires physical relocation of all
records beyond that point – SOS
Has 3 physical components: indexes, prime data storage area, overflow area [Figure 8-4]
Might have to search index, prime data area, and overflow area – slowing down access time
Integrating overflow records into prime data area, then reconstructing indexes reorganizes ISAM
files
Very Efficient: 4, 5, 6
Moderately Efficient: 1, 3
Inefficient: 2, 7
- Hashing Structure
o Employs algorithm to convert primary key into physical record storage address [Figure 8-5]
No separate index necessary
Advantage: access speed
Disadvantages:
Inefficient use of storage
Different keys may create same address
Efficient: 1, 2, 3, 6
Inefficient: 4, 5, 7
- Pointer Structure
o Stores the address (pointer) of related record in a field with each data record [Figure 8-6]
Records stored randomly
Pointers provide connections b/w records
Pointers may also provide links of records b/w files [Figure 8-7]
Types of pointers [Figure 8-8]:
Physical address – actual disk storage location
o Advantage: Access speed
o Disadvantage: if related record moves, pointer must be changed & w/o logical
reference, a pointer could be lost causing referenced record to be lost
Relative address – relative position in the file (135th)
o Must be manipulated to convert to physical address
Logical address – primary key of related record
o Key value is converted by hashing to physical address
Efficient: 1, 2, 3, 6
Inefficient: 4, 5, 7
- Hierarchical and network structures
o Uses explicit linkages b/w records to establish relationship
o Figure 8-9 is M:N example (many to many)
- Relational structure
o Uses implicit linkages b/w records to establish relationship: foreign keys / primary keys
o User views
Data a particular user needs to achieve his/her assigned tasks
A single view, or view without user input, leads to problems in meeting the diverse needs of the
enterprise
Trend today: capture data in sufficient detail and diversity to sustain multiple user views
User views MUST be consolidated into a single “logical view” or schema
Data in the logical view MUST be normalized
o Creating views
Designing output reports, documents, and input screens needed by users or groups
Physical documents help designer understand relationships among the data
3 user views: Table 8-2, Figure 8-12, Table 8-3
Then apply normalization principles to the conceptual user views to design the database tables
o Importance of data normalization
Critical to success of DBMS
Effective design in grouping data
Several levels: 1NF, 2NF, 3NF, etc.
Un-normalized data suffers from:
Insertion anomalies
Deletion anomalies
Update anomalies
One or more of these anomalies will exist in tables < 3NF
o Normalization process
Un-normalized data [Table 8-4]
Eliminates the 3 anomalies if:
All non-key attributes are dependent on the primary key
There are no partial dependencies (on part of the primary key)
There are no transitive dependencies; non-key attributes are not dependent on other
non-key attributes
“Split” tables are linked via embedded “foreign keys”
Normalized database tables examples: Figures 8-13, 8-14
o Creating physical tables
Created on paper so far
Then create physical files and populate data
Physical views can be produced from DBMS
o Query function
Allows users to create customized lists from database
Users stipulate, using English-like commands, which tables, records, fields, filtering criteria
needed to produce the desired list
Result is virtual table derived from actual database tables
SQL
SELECT, FROM, WHERE [Figure 8-16]
De facto standard query language
o Auditors and data normalization
Database normalization is a technical matter that is usually the responsibility of systems
professionals.
The subject has implications for internal control that make it the concern of auditors also.
Most auditors will never be responsible for normalizing an organization’s databases; they should
have an understanding of the process and be able to determine whether a table is properly
normalized.
In order to extract data from tables to perform audit procedures, the auditor first needs to know
how the data are structured.
EMBEDDED AUDIT MODULE
- Identify important transactions live while they are being processed and extract them [Figure 8-18]
o Examples
Errors
Fraud
Compliance
SAS 78, SAS 94, SAS 99 / S-OX
- Disadvantages:
o Operational efficiency – can decrease performance, especially if testing is extensive
o Verifying EAM integrity – such as environments with a high level of program maintenance
o Status: increasing need, demand, and usage of COA/EAM/CA
GENERALIZED AUDIT SOFTWARE
- Brief history
- Most widely used CAATT [Figure 8-19]
- Usages include:
o Footing and balancing entire files or selected data items (e.g., extending inventory)
o Selecting and reporting detail data
o Selecting stratified statistical samples from data files
o Formatting results into audit reports (auto work papers!)
o Printing confirmations
o Screening / filtering data
o Comparing multiple files for differences
o Recalculating values in data
- Popular because:
o GAS software is easy to use and requires little computer background
o Many products are platform independent, works on mainframes and PCs
o Auditors can perform tests independently of IT staff
o GAS can be used to audit the data currently being stored in most file structures and formats
- Simple structures [Figure 8-19]
- Complex structures [Figures 8-20, 8-21]
- Auditing issues:
o Auditor must sometime rely on IT personnel to produce files/data
o Risk that data integrity is compromised by extraction procedures
o Auditors skilled in programming better prepared to avoid these pitfalls
AUDIT COMMAND LANGUAGE (ACL)
- ACL is a proprietary version of GAS
- Leader in the industry
- Designed as an auditor-friendly meta-language (i.e., contains commonly used auditor tests)
- Access to data generally easy with ODBC interface
- See ACL tutorial #1
- Input file definition
- Customizing a view [Figure 8-23]
- Filtering data [Figures 8-24 thru 8-27]
- Stratifying data [Figure 8-28]
- Statistical analysis