[go: up one dir, main page]

0% found this document useful (0 votes)
1K views30 pages

Chapter 7 Understanding Data

Chapter 7 focuses on understanding data, emphasizing its importance in decision-making across various fields. It covers topics such as types of data, data collection, storage, processing, and statistical techniques for analyzing data. The chapter also includes real-life examples and exercises to reinforce learning.

Uploaded by

jsamuels02001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views30 pages

Chapter 7 Understanding Data

Chapter 7 focuses on understanding data, emphasizing its importance in decision-making across various fields. It covers topics such as types of data, data collection, storage, processing, and statistical techniques for analyzing data. The chapter also includes real-life examples and exercises to reinforce learning.

Uploaded by

jsamuels02001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

M

O
H
A
M
M
ED
M
AT
H
EE
N
L
R
STUDENT NOTES || CHAPTER 7 UNDERSTANDING DATA April 23, 2025

Contents

LICENSE 3

CHAPTER 7: UNDERSTANDING DATA 4


CHAPTER NOTES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
7.1 Introduction to Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
7.1.1 Importance of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

R
7.1.2 Types of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
7.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

L
7.3 Data Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

N
7.4 Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Figure 7.2: Real-Life Data Processing Examples . . . . . . . . . . . . . . . . . . . . 8

EE
7.5 Statistical Techniques for Data Processing . . . . . . . . . . . . . . . . . . . . . . . 8
Glossary of Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

MULTIPLE CHOICE QUESTIONS (MCQs) H 11


AT
FILL IN THE BLANKS 17
M

2 MARKS QUESTIONS 18
ED

3 MARKS QUESTIONS 20

5 MARKS QUESTIONS 22
M

CHAPTER END EXERCISES 26


M
A
H
O
M

L R MOHAMMED MATHEEN M.C.A., M.A., B.ED., UGC NET.,


LECTURER, PRIMUS PU COLLEGE, BANGALORE - 560 035
2
STUDENT NOTES || CHAPTER 7 UNDERSTANDING DATA April 23, 2025

LICENSE

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 Interna-
tional License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ or
send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.

“Karnataka Second PUC Computer Science Study Material / Student Notes” by L R Mohammed Matheen
is licensed under CC BY-NC-ND 4.0.

R
L
N
Figure 1: Licence

EE
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 Interna-
tional License.

H
Portions of this work may include material under separate copyright. These materials are not covered by
AT
this Creative Commons license and are used by permission or under applicable copyright exceptions.

This book is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 Interna-


M

tional License.
ED
M
M
A
H
O
M

L R MOHAMMED MATHEEN M.C.A., M.A., B.ED., UGC NET.,


LECTURER, PRIMUS PU COLLEGE, BANGALORE - 560 035
3
STUDENT NOTES || CHAPTER 7 UNDERSTANDING DATA April 23, 2025

CHAPTER 7: UNDERSTANDING DATA

CHAPTER NOTES

7.1 Introduction to Data

Data is essential for human decision-making in nearly every field of life. Whether we are selecting a
college, making business plans, monitoring health, or predicting weather, we rely on data.

R
Data are raw facts and figures which, when processed, yield information that helps in making decisions.

L
For instance: - Colleges maintain placement data of students to attract new students. - Governments
conduct census to gather demographic data. - Banks maintain account-related data for every customer. -

N
Sports teams analyze opponent data to form strategies.

EE
Definition: Data is a collection of characters, numbers, or symbols that represent values of some
variables.

H
AT
Examples of Common Data Types:

• Personal information: Name, Age, Gender, Contact


M

• Transactional data: Online or offline purchases


• Multimedia: Images, videos, graphics, audio
• Documents and web content
ED

• Social media posts


• Sensor signals (like IoT data)
M

• Satellite data (weather, earth observation, etc.)


M

7.1.1 Importance of Data


A

Humans and computers both rely heavily on data. Computers can process large volumes of data quickly
H

and reveal patterns or traits not easily visible.


O
M

Real-life Examples:

• ATM transactions: The system deducts the withdrawn amount from your bank account.
• Weather monitoring: Meteorological departments analyze satellite data for warnings.
• Businesses: Analyze market trends, customer feedback, and dynamic pricing.
• Transportation: Cab apps adjust pricing based on demand and supply.
• Restaurants: Offer discounts during “happy hours” by analyzing customer visits.

L R MOHAMMED MATHEEN M.C.A., M.A., B.ED., UGC NET.,


LECTURER, PRIMUS PU COLLEGE, BANGALORE - 560 035
4
STUDENT NOTES || CHAPTER 7 UNDERSTANDING DATA April 23, 2025

Other Scenarios:

• EVMs: Record and count votes digitally.


• Scientific research: Involves collecting experimental data.
• Pharmaceuticals: Analyze medicine effectiveness through trials.
• Libraries: Store book and membership data.
• Search Engines: Analyze web content to show relevant results.
• Weather Systems: Use satellite data to generate alerts.

R
7.1.2 Types of Data

L
(A) Structured Data: Organized in a tabular format (rows and columns), structured data can be easily

N
processed using spreadsheets or databases. Each row is a record, and each column is an attribute.

EE
Table 7.1: Structured Inventory Data

ModelNo ProductName Unit Price


H Discount(%) Items_in_Inventory
AT
ABC1 Water Bottle 126 8 13
ABC2 Melamine Plates 320 5 45
M

ABC3 Dinner Set 4200 10 8


ED

GH67 Jug 80 0 10
M

With structured data, you can calculate: - Total items in inventory - Total inventory value

Table 7.2: More Structured Examples - Books: Title, Author, Price, Year - Fee Payments: Student
M

Name, Roll No., Amount - ATM Withdrawals: Account No., Amount, ATM ID, Date & Time
A
H

(B) Unstructured Data: Lacks a fixed format or structure. Examples: - News articles - Emails -
Social media posts - Documents with images, audio, video
O

Example: A newspaper page can have a mix of text, ads, images with no fixed layout.
M

Metadata: Data about data


E.g., For a photo → size, type (JPEG/PNG), resolution. For an email → subject, sender, attachments.

7.2 Data Collection

Before data processing, you must collect or identify relevant data.

L R MOHAMMED MATHEEN M.C.A., M.A., B.ED., UGC NET.,


LECTURER, PRIMUS PU COLLEGE, BANGALORE - 560 035
5
STUDENT NOTES || CHAPTER 7 UNDERSTANDING DATA April 23, 2025

3 Scenarios:

1. Manual Records (e.g., diaries): Must be converted to digital format.


2. Digital Files: Like CSV files, can be directly processed.
3. No existing data: Need to create software (e.g., using Python) to store and retrieve new data.

Our daily digital activities constantly generate data (e.g., browsing, transactions, healthcare records).

R
Example Applications:

• Hospitals: Use patient data for service improvements.

L
• Shopping malls: Analyze co-purchased items to optimize product placement.

N
• Social Media: Public sentiment analysis before elections.
• Global Institutions: Use economic data for forecasting.

EE
7.3 Data Storage

H
AT
After collection, data needs to be stored safely for current and future use. The volume of data is increasing
rapidly, making storage challenging.
M

Common Storage Devices:


ED

• Hard Disk Drive (HDD)


• Solid State Drive (SSD)
• CD/DVD
M

• Tape Drives
• Pen Drives
M

• Memory Cards
A

Data such as documents, images, audio, video, etc., are stored as files. For complex requirements, we
H

use a Database Management System (DBMS) instead of simple file storage.


O

7.4 Data Processing


M

Just storing data is not enough. Data needs to be processed to extract information. Raw data alone cannot
be used for analysis or decision-making.

Here is the Data Processing Cycle diagram from the textbook, properly redrawn and explained for
clarity and accuracy:

L R MOHAMMED MATHEEN M.C.A., M.A., B.ED., UGC NET.,


LECTURER, PRIMUS PU COLLEGE, BANGALORE - 560 035
6
STUDENT NOTES || CHAPTER 7 UNDERSTANDING DATA April 23, 2025

Figure 7.1: Data Processing Cycle

+------------------------+
| Data Collection |
+-----------+------------+
|
v
+------------------------+

R
| Data Preparation |
| (cleaning, formatting) |

L
+-----------+------------+

N
|
v

EE
+------------------------+
| Data Entry |
| (input into system) |
H
AT
+-----------+------------+
|
v
M

+------------------------+
| Data Storage |
ED

| (files/databases) |
+-----------+------------+
|
M

v
M

+------------------------+
| Data Retrieval |
A

| (fetching for use) |


H

+-----------+------------+
|
O

v
M

+------------------------+
| Data Processing |
| (classify, update, |
| analyze, compute) |
+-----------+------------+
|
v

L R MOHAMMED MATHEEN M.C.A., M.A., B.ED., UGC NET.,


LECTURER, PRIMUS PU COLLEGE, BANGALORE - 560 035
7
STUDENT NOTES || CHAPTER 7 UNDERSTANDING DATA April 23, 2025

+------------------------+
| Data Output |
| (Reports, results, |
| visualizations) |
+------------------------+

Explanation of Each Step:

1. Data Collection

R
Gathering raw data from various sources like forms, sensors, logs, or digital inputs.

L
2. Data Preparation
Cleaning data (removing errors, duplicates), converting formats, and making it consistent.

N
3. Data Entry

EE
Inputting the cleaned data into digital systems (manually or via automated processes).
4. Data Storage

H
Saving data in files, spreadsheets, or databases for persistent use.
AT
5. Data Retrieval
Accessing stored data when needed for processing.
M

6. Data Processing
Analyzing, classifying, computing, and transforming data to generate meaningful results.
ED

7. Data Output
Presenting the results in the form of charts, tables, summaries, or reports.
M

Figure 7.2: Real-Life Data Processing Examples


M
A

Problem Statement Inputs Processing Output


H

Exam registration Student details, payment Validate data, assign roll no. Admit card
info
O

ATM withdrawal Account info, PIN Verify & deduct amount Currency,
M

receipt
Train ticket booking Journey info, payment Check availability, assign berth E-ticket

7.5 Statistical Techniques for Data Processing

7.5.1 Measures of Central Tendency

L R MOHAMMED MATHEEN M.C.A., M.A., B.ED., UGC NET.,


LECTURER, PRIMUS PU COLLEGE, BANGALORE - 560 035
8
STUDENT NOTES || CHAPTER 7 UNDERSTANDING DATA April 23, 2025

(A) Mean (Average)


𝑥1 + 𝑥2 + ... + 𝑥𝑛
[Mean = ]
𝑛
- Gives the average value. - Not reliable if data has outliers (extreme values).

Example: [90, 102, 110, 115, 85, 90, 100, 110, 110]
→ Mean = 101.33

R
(B) Median

• Middle value in sorted data.

L
• If odd count → middle item

N
• If even count → average of two middle items

EE
Sorted Example: [85, 90, 90, 100, 102, 110, 110, 110, 115]
→ Median = 102

H
AT
(C) Mode

• Most frequent value.


M

• Can be used for both numeric and non-numeric data.


• May have one, multiple, or no mode.
ED

Example: Mode = 110 (occurs 3 times)


M

7.5.2 Measures of Variability


M

(A) Range
A

[Range = Max − Min]


H

- Shows spread of data. - Sensitive to outliers.


O

Example: Heights → Max = 115, Min = 85 → Range = 30 cm


M

(B) Standard Deviation (𝜎) Measures how values differ from the mean.

∑(𝑥𝑖 − 𝑥)̄ 2
[𝜎 = √ ]
𝑛
Table 7.3: SD Calculation for Heights

L R MOHAMMED MATHEEN M.C.A., M.A., B.ED., UGC NET.,


LECTURER, PRIMUS PU COLLEGE, BANGALORE - 560 035
9
STUDENT NOTES || CHAPTER 7 UNDERSTANDING DATA April 23, 2025

• Mean = 101.33

• 𝑆𝐷 ≈ √(938/9) = 𝑎𝑝𝑝𝑟𝑜𝑥.10.21

Lower SD → tightly clustered data


Higher SD → widely spread data

Important Statistical Technique Selection

R
L
Task Best Technique

N
Disparity in salaries Standard Deviation

EE
Average class performance Mean
Compare height in 2 cities Mean or SD
Popular car color
H Mode
AT
Find dominant data value Mode
Compare incomes Mean + SD
M

Glossary of Key Terms


ED

Term Explanation
M

Data Raw, unorganised facts and figures


M

Information Processed data that is meaningful


A

Structured Data Tabular data with rows and columns


H

Unstructured Data Data without a predefined format


O

Metadata Data about other data


M

Data Collection Gathering data from various sources


Data Storage Saving data on physical/digital devices
Data Processing Converting raw data into useful output
Mean Average of numerical data
Median Middle value in sorted data

L R MOHAMMED MATHEEN M.C.A., M.A., B.ED., UGC NET.,


LECTURER, PRIMUS PU COLLEGE, BANGALORE - 560 035
10
STUDENT NOTES || CHAPTER 7 UNDERSTANDING DATA April 23, 2025

Term Explanation

Mode Most frequently occurring value


Range Difference between max and min values
Standard Deviation (σ) Measures the spread of values around the mean
Outlier Data point that is far from other values

R
DBMS Software to manage large datasets efficiently

L
MULTIPLE CHOICE QUESTIONS (MCQs)

N
EE
1. What is the primary purpose of collecting data?
A. To decorate webpages
B. To store data indefinitely
H
AT
C. To make decisions based on analysis
D. To reduce computer memory usage
Answer: C. To make decisions based on analysis
M

2. Which of the following is an example of structured data?


A. A newspaper article
ED

B. An image on a website
C. A table showing inventory items in a shop
M

D. A video file
Answer: C. A table showing inventory items in a shop
M

3. What does metadata represent?


A

A. Multimedia content
B. Data about data
H

C. Only numerical data


O

D. None of the above


Answer: B. Data about data
M

4. Which of the following is an outlier?


A. A value that is identical to the mean
B. A value that occurs most frequently
C. An extremely high or low value compared to others
D. A value with zero frequency
Answer: C. An extremely high or low value compared to others

L R MOHAMMED MATHEEN M.C.A., M.A., B.ED., UGC NET.,


LECTURER, PRIMUS PU COLLEGE, BANGALORE - 560 035
11
STUDENT NOTES || CHAPTER 7 UNDERSTANDING DATA April 23, 2025

5. What statistical technique is most affected by outliers?


A. Median
B. Mode
C. Mean
D. Standard Deviation
Answer: C. Mean

6. In structured data, what is an attribute?

R
A. The format of an image
B. A column representing a characteristic

L
C. A row of values
D. A description of metadata

N
Answer: B. A column representing a characteristic

EE
7. Which of the following is not a measure of central tendency?
A. Mean
B. Median
H
AT
C. Mode
D. Range
Answer: D. Range
M

8. What is the correct order of the data processing cycle?


A. Input → Output → Processing
ED

B. Collection → Preparation → Entry → Storage → Processing → Output


C. Output → Processing → Storage
M

D. Entry → Output → Collection


Answer: B. Collection → Preparation → Entry → Storage → Processing → Output
M

9. Which of the following is used to summarize data for easy understanding?


A

A. Audio processing
B. Metadata generation
H

C. Statistical techniques
O

D. Data hiding
Answer: C. Statistical techniques
M

10. What is the standard deviation used for?


A. To find the middle value
B. To count data entries
C. To measure the spread of data
D. To identify structured data
Answer: C. To measure the spread of data

L R MOHAMMED MATHEEN M.C.A., M.A., B.ED., UGC NET.,


LECTURER, PRIMUS PU COLLEGE, BANGALORE - 560 035
12
STUDENT NOTES || CHAPTER 7 UNDERSTANDING DATA April 23, 2025

11. What is the mode of the dataset [34, 34, 27, 28, 27, 34, 34]?
A. 27
B. 34
C. 28
D. 33
Answer: B. 34

12. Which of the following storage devices is volatile?

R
A. Pen Drive
B. CD/DVD

L
C. HDD
D. None of the above

N
Answer: D. None of the above

EE
13. Unstructured data can include all except:
A. Audio files
B. Web pages with multimedia
H
AT
C. Tables with rows and columns
D. Social media messages
Answer: C. Tables with rows and columns
M

14. Which of the following tasks represents data collection?


A. Calculating standard deviation
ED

B. Retrieving data from a file


C. Filling student details in an online form
M

D. Printing a report card


Answer: C. Filling student details in an online form
M

15. Which tool is suggested for data processing in future chapters?


A

A. Java
B. MySQL
H

C. Python
O

D. HTML
Answer: C. Python
M

17. Which statistical technique is suitable for finding income disparity?


A. Mode
B. Median
C. Mean
D. Standard Deviation
Answer: D. Standard Deviation

L R MOHAMMED MATHEEN M.C.A., M.A., B.ED., UGC NET.,


LECTURER, PRIMUS PU COLLEGE, BANGALORE - 560 035
13
STUDENT NOTES || CHAPTER 7 UNDERSTANDING DATA April 23, 2025

18. What does the median represent in a dataset?


A. The average of all values
B. The value that occurs most frequently
C. The value that appears at the center after sorting
D. The maximum value in the list
Answer: C. The value that appears at the center after sorting
19. Which of the following best describes unstructured data?

R
A. Can be easily stored in tables
B. Has a clear format with fixed fields

L
C. Lacks predefined structure
D. Cannot be stored digitally

N
Answer: C. Lacks predefined structure

EE
20. What is the formula for standard deviation (σ) as per the chapter?
A. 𝜎 = 𝑀 𝑒𝑎𝑛𝑜𝑓𝑣𝑎𝑙𝑢𝑒𝑠 B. 𝜎 = (𝑀 𝑎𝑥𝑖𝑚𝑢𝑚 − 𝑀 𝑖𝑛𝑖𝑚𝑢𝑚)
C. 𝜎 = √(Σ(𝑥𝑖 − 𝑥)2 /𝑛)
H
AT
D. 𝜎 = Σ(𝑥𝑖 + 𝑥)/𝑛
Answer: C. 𝜎 = √(Σ(𝑥𝑖 − 𝑥)2 /𝑛)
M

Thank you for the clarification. Below is the fully formatted set of 10 Assertion and Reason-type
MCQs, each with a complete question, options (A to D), and correct answer, strictly based on Chap-
ED

ter 7: Understanding Data from the Class XII Computer Science textbook (2021–22). All topics and
subtopics from the PDF are covered.
21.
M

Assertion (A): Raw data by itself is usually not meaningful for decision-making.
M

Reason (R): Raw data can be interpreted directly without the need for processing.
Options:
A

A. Both A and R are true, and R is the correct explanation of A.


H

B. Both A and R are true, but R is not the correct explanation of A.


O

C. A is true, but R is false.


D. A is false, but R is true.
M

Answer: C. A is true, but R is false.


22. Assertion (A): Structured data is easier to manage and analyze using spreadsheet or database soft-
ware.
Reason (R): Structured data is organized in a well-defined format, typically using rows and columns
where each column represents a data attribute.
Options:

L R MOHAMMED MATHEEN M.C.A., M.A., B.ED., UGC NET.,


LECTURER, PRIMUS PU COLLEGE, BANGALORE - 560 035
14
STUDENT NOTES || CHAPTER 7 UNDERSTANDING DATA April 23, 2025

A. Both A and R are true, and R is the correct explanation of A.


B. Both A and R are true, but R is not the correct explanation of A.
C. A is true, but R is false.
D. A is false, but R is true.

Answer: A. Both A and R are true, and R is the correct explanation of A.

23. Assertion (A): Metadata helps in identifying and organizing unstructured data.
Reason (R): Metadata provides information about data, such as file type, size, resolution, and content

R
description.

L
Options:
A. Both A and R are true, and R is the correct explanation of A.

N
B. Both A and R are true, but R is not the correct explanation of A.

EE
C. A is true, but R is false.
D. A is false, but R is true.

H
Answer: A. Both A and R are true, and R is the correct explanation of A.
AT
24. Assertion (A): Median is a preferred measure of central tendency when data contains outliers.
Reason (R): Median is calculated by averaging all the data values in the dataset.
M

Options:
A. Both A and R are true, and R is the correct explanation of A.
B. Both A and R are true, but R is not the correct explanation of A.
ED

C. A is true, but R is false.


D. A is false, but R is true.
M

Answer: C. A is true, but R is false.


M

25. Assertion (A): Standard deviation gives a more accurate measure of variability than range.
Reason (R): Standard deviation considers the deviation of each data point from the mean, while range
A

considers only the highest and lowest values.


H

Options:
O

A. Both A and R are true, and R is the correct explanation of A.


B. Both A and R are true, but R is not the correct explanation of A.
M

C. A is true, but R is false.


D. A is false, but R is true.

Answer: A. Both A and R are true, and R is the correct explanation of A.

26.
Assertion (A): Dynamic pricing models in cab and airline services rely on real-time data analysis.

L R MOHAMMED MATHEEN M.C.A., M.A., B.ED., UGC NET.,


LECTURER, PRIMUS PU COLLEGE, BANGALORE - 560 035
15
STUDENT NOTES || CHAPTER 7 UNDERSTANDING DATA April 23, 2025

Reason (R): These pricing models are based on historical sales reports compiled over a period of several
months.

Options:
A. Both A and R are true, and R is the correct explanation of A.
B. Both A and R are true, but R is not the correct explanation of A.
C. A is true, but R is false.
D. A is false, but R is true.

R
Answer: C. A is true, but R is false.

L
27. Assertion (A): Unstructured data cannot be processed efficiently using traditional database tools.
Reason (R): Traditional database tools are designed to work with structured data that has a fixed schema,

N
like tables with defined columns.

EE
Options:
A. Both A and R are true, and R is the correct explanation of A.

H
B. Both A and R are true, but R is not the correct explanation of A.
C. A is true, but R is false.
AT
D. A is false, but R is true.

Answer: A. Both A and R are true, and R is the correct explanation of A.


M

28. Assertion (A): Range and standard deviation are both used to measure how spread out data values
are from each other.
ED

Reason (R): Range and standard deviation are used to find the most frequently occurring value in a
dataset.
M

Options:
A. Both A and R are true, and R is the correct explanation of A.
M

B. Both A and R are true, but R is not the correct explanation of A.


A

C. A is true, but R is false.


D. A is false, but R is true.
H

Answer: C. A is true, but R is false.


O

29. Assertion (A): Data entry is an essential step in the data processing cycle.
M

Reason (R): Data entry is the same as the process of collecting real-world data from various sources.

Options:
A. Both A and R are true, and R is the correct explanation of A.
B. Both A and R are true, but R is not the correct explanation of A.
C. A is true, but R is false.
D. A is false, but R is true.

L R MOHAMMED MATHEEN M.C.A., M.A., B.ED., UGC NET.,


LECTURER, PRIMUS PU COLLEGE, BANGALORE - 560 035
16
STUDENT NOTES || CHAPTER 7 UNDERSTANDING DATA April 23, 2025

Answer: C. A is true, but R is false.

30. Assertion (A): Mode can be calculated for both numeric and non-numeric data values.
Reason (R): Mode identifies the value(s) that occur most frequently in the dataset.

Options:
A. Both A and R are true, and R is the correct explanation of A.
B. Both A and R are true, but R is not the correct explanation of A.
C. A is true, but R is false.

R
D. A is false, but R is true.

L
Answer: A. Both A and R are true, and R is the correct explanation of A.

N
FILL IN THE BLANKS

EE
1. Data which is organised and can be recorded in a well-defined format is called ________.
Answer: Structured Data
H
AT
2. Data which do not follow any fixed structure or format are called ________.
Answer: Unstructured Data
M

3. The singular form of the word ‘data’ is ________.


Answer: Datum
ED

4. The process of collecting, storing, and analysing data for decision making is known as ________.
Answer: Data Processing
M

5. ________ is a measure of central tendency that represents the average of a set of values.
Answer: Mean
M

6. The ________ is the middle value in a sorted list of data values.


A

Answer: Median
H

7. The value that occurs most frequently in a data set is called the ________.
Answer: Mode
O

8. The difference between the maximum and minimum values in a data set is called the ________.
M

Answer: Range

9. The standard deviation is represented by the Greek letter ________.


Answer: Sigma (𝜎)

10. The data describing other data is referred to as ________.


Answer: Metadata

L R MOHAMMED MATHEEN M.C.A., M.A., B.ED., UGC NET.,


LECTURER, PRIMUS PU COLLEGE, BANGALORE - 560 035
17
STUDENT NOTES || CHAPTER 7 UNDERSTANDING DATA April 23, 2025

11. Examples of digital storage devices include HDD, SSD, CD/DVD, Pen Drive, and ________.
Answer: Memory Card

12. Statistical techniques used to summarise data include mean, median, mode, range, and
________.
Answer: Standard Deviation

13. The process of obtaining data from reliable sources before processing is called ________.
Answer: Data Collection

R
14. ICT revolution has led to the generation of ________ volume of data at a very fast pace.

L
Answer: Large

15. The structured data is generally stored in a ________ format in computers.

N
Answer: Tabular

EE
2 MARKS QUESTIONS
H
AT
1. What is the difference between data and information?
Answer:
M

Data refers to unorganised facts that need to be processed, while information is the processed form of
data that is meaningful and useful for decision making.
ED

2. Define the term ‘datum’.


Answer:
‘Datum’ is the singular form of the word ‘data’. It represents a single piece of information or value.
M

3. What is metadata? Give an example.


M

Answer:
Metadata is data about data. For example, in an image file, metadata may include image size, type (JPEG,
A

PNG), and resolution.


H

4. Differentiate between structured and unstructured data.


O

Answer:
- Structured Data: Organised in rows and columns (e.g., database tables).
M

- Unstructured Data: Lacks a defined format (e.g., emails, web pages, videos).

5. Give two examples of structured data.


Answer:
1. School fee payment records with fields like StudentName, RollNo, and FeesAmount.
2. Inventory table with fields like ProductName, UnitPrice, and Quantity.

L R MOHAMMED MATHEEN M.C.A., M.A., B.ED., UGC NET.,


LECTURER, PRIMUS PU COLLEGE, BANGALORE - 560 035
18
STUDENT NOTES || CHAPTER 7 UNDERSTANDING DATA April 23, 2025

6. Mention two examples of unstructured data.


Answer:
1. Social media posts with text, images, and videos.
2. Email content with body text and attachments.

7. What is the significance of data in decision making?


Answer:
Data helps identify trends, draw conclusions, and support decisions in areas such as business, education,

R
healthcare, and governance.

8. Define mean and write its formula.

L
Answer:

N
Mean is the average of numeric values.
Formula: 𝑀 𝑒𝑎𝑛 = (𝑥1 + 𝑥2 + ... + 𝑥𝑛 )/𝑛

EE
9. What is median? How is it calculated for even number of values?
Answer:

H
Median is the middle value in an ordered list. For even number of values, it is the average of the two
AT
middle values.

10. Define mode with an example.


M

Answer:
Mode is the value that appears most frequently in a dataset.
ED

Example: In [34, 34, 28, 27, 34], mode is 34.

11. What is meant by range in a dataset?


Answer:
M

Range is the difference between the maximum and minimum values in a dataset.
M

Formula: Range = Maximum – Minimum

12. What does standard deviation measure?


A

Answer:
H

Standard deviation measures the spread or dispersion of values around the mean. It considers all data
O

points in the dataset.


M

13. Name two commonly used digital storage devices.


Answer:
1. Hard Disk Drive (HDD)
2. Solid State Drive (SSD)

14. Mention any two statistical techniques used for data summarisation.
Answer:
1. Measures of Central Tendency (Mean, Median, Mode)

L R MOHAMMED MATHEEN M.C.A., M.A., B.ED., UGC NET.,


LECTURER, PRIMUS PU COLLEGE, BANGALORE - 560 035
19
STUDENT NOTES || CHAPTER 7 UNDERSTANDING DATA April 23, 2025

2. Measures of Variability (Range, Standard Deviation)

15. Differentiate between range and standard deviation.


Answer:
- Range: Difference between the maximum and minimum values.
- Standard Deviation: Measures the average spread of all values from the mean.

16. What type of data is stored in an electronic voting machine?


Answer:

R
Structured data such as votes cast, which are accumulated and processed for quick result declaration.

L
17. Give two scenarios where data is used for making decisions.
Answer:

N
1. Meteorological data used to predict cyclones.

EE
2. Sales data used by businesses to offer discounts or change product placements.

18. How can Python help in data processing and analysis?


Answer:
H
Python provides libraries that allow efficient data processing, statistical analysis, and visualisation of
AT
large data sets.

19. What are the steps involved in data processing?


M

Answer:
1. Data Collection
ED

2. Data Preparation
3. Data Entry
4. Storage and Retrieval
M

5. Classification and Update


M

6. Generation of Reports/Results
A

3 MARKS QUESTIONS
H
O

1. Explain the three commonly used measures of central tendency with examples.
M

Answer:
- Mean is the average of all values.
Example: Mean of [90, 100, 110] = (90+100+110)/3 = 100
- Median is the middle value in a sorted list.
Example: Median of [85, 90, 100, 110, 115] = 100
- Mode is the value that occurs most frequently.
Example: Mode of [90, 110, 110, 110, 100] = 110

L R MOHAMMED MATHEEN M.C.A., M.A., B.ED., UGC NET.,


LECTURER, PRIMUS PU COLLEGE, BANGALORE - 560 035
20
STUDENT NOTES || CHAPTER 7 UNDERSTANDING DATA April 23, 2025

2. Differentiate between structured data and unstructured data with examples.


Answer:
- Structured Data: Organised in rows and columns, easy to store and analyse.
Example: Table of student records with Roll No, Name, Marks.
- Unstructured Data: Lacks predefined format; difficult to analyse.
Example: Social media posts with images and text.

3. Define standard deviation. Write its formula and explain its significance.

R
Answer:
Standard deviation (σ) measures the spread of data around the mean.

L
Formula:

N
1 𝑛
[𝜎 = √ ∑(𝑥𝑖 − 𝑥)̄ 2 ]

EE
𝑛 𝑖=1

It gives insights into data variability. A smaller σ means values are closer to the mean; a larger σ indicates
more spread.
H
AT
4. What are metadata? Give three examples from different digital files.
Answer:
M

Metadata are data about data. They describe content and structure.
Examples:
ED

- In an image file: resolution, format (JPEG/PNG)


- In an email: subject, recipient, date sent
- In a document: author name, word count, creation date
M

5. Describe the role of data in business decision-making with any two examples.
M

Answer:
Businesses use data to understand market trends and improve performance.
A

Examples:
H

1. Analysing customer feedback to improve products.


2. Using sales data to implement dynamic pricing (e.g., discount in happy hours based on past data).
O

6. Explain the data processing cycle with the help of a diagram or steps.
M

Answer:
The data processing cycle includes the following steps:
1. Input – Collecting and entering data.
2. Processing – Manipulating data to produce results.
3. Output – Presenting the results.
4. Storage and Retrieval – Saving for future use.
These steps convert raw data into useful information.

L R MOHAMMED MATHEEN M.C.A., M.A., B.ED., UGC NET.,


LECTURER, PRIMUS PU COLLEGE, BANGALORE - 560 035
21
STUDENT NOTES || CHAPTER 7 UNDERSTANDING DATA April 23, 2025

7. Distinguish between range and standard deviation with formula and example.
Answer:
- Range: Difference between maximum and minimum values.
Formula: Range = Max – Min
Example: For [85, 90, 115], Range = 115 – 85 = 30
- Standard Deviation: Measures average spread from the mean.
Formula: 𝜎 = √(Σ(𝑥–𝑥)2 /𝑛)

R
Example: For [90, 100, 110], σ is calculated using all data points.

8. Give three different scenarios of data collection and describe the method to convert them into

L
digital format.

N
Answer:
1. Manual Record (e.g., shopkeeper’s diary): Enter data into spreadsheet manually.

EE
2. Digital File (e.g., CSV): Directly use data for analysis using software tools.
3. No prior data: Develop software (e.g., in Python or MySQL) to store and manage sales digitally.

H
9. What are the limitations of file processing and how does DBMS help overcome them?
AT
Answer:
File Processing Limitations:
- Difficult to handle large data
M

- Poor data integrity and redundancy


- No concurrent access or security control
ED

DBMS Benefits:
- Centralised management
M

- Easy retrieval and update


- Ensures data consistency, security, and reduces redundancy
M

10. A teacher wants to compare students’ test results from five months. Which statistical technique
is suitable and why?
A

Answer:
H

Mean is the suitable technique to compare average performance over five months. It provides a quick un-
O

derstanding of how the class performed each month and highlights trends in overall class performance.
M

5 MARKS QUESTIONS

1. What are the different types of data? Explain structured and unstructured data with examples.
Answer:
Data can be broadly categorized into:
1. Structured Data – Organised in a defined format like tables (rows and columns). Each column

L R MOHAMMED MATHEEN M.C.A., M.A., B.ED., UGC NET.,


LECTURER, PRIMUS PU COLLEGE, BANGALORE - 560 035
22
STUDENT NOTES || CHAPTER 7 UNDERSTANDING DATA April 23, 2025

represents an attribute and each row represents an observation.


Examples:
- School records (RollNo, Name, Marks)
- ATM withdrawal data (AccountNo, Date, Amount)
2. Unstructured Data – Data not arranged in predefined format, lacks structure.
Examples:
- Social media posts
- Email content

R
- News articles with images, videos, and text

L
2. Explain the role of data in various real-life sectors. Give at least five examples.
Answer:

N
Data plays a crucial role in decision-making across various domains.

EE
Examples:
1. Education: Placement data helps students choose colleges.
2. Government: Census data is used for planning policies.
H
3. Healthcare: Hospitals collect patient data for treatment analysis.
AT
4. Meteorology: Weather offices use satellite data to predict storms.
5. Business: Sales data is analysed for discounts, inventory planning, and marketing decisions.
M

3. Define and differentiate Mean, Median, and Mode. Include examples.


Answer:
ED

- Mean: Average of numeric values.


Formula: Mean = (Sum of all values) / Number of values
Example: [90, 100, 110] → Mean = (90+100+110)/3 = 100
M

- Median: Middle value in a sorted list.


M

Example: [85, 90, 100, 110, 115] → Median = 100


- Mode: Most frequently occurring value.
A

Example: [90, 90, 100, 110, 110, 110] → Mode = 110


H

Difference:
- Mean considers all values.
O

- Median is less sensitive to outliers.


M

- Mode shows the most common value.

4. What is standard deviation? How is it calculated? Explain with an example.


Answer:
Standard deviation measures the dispersion or spread of data around the mean.
Formula:

Steps:

L R MOHAMMED MATHEEN M.C.A., M.A., B.ED., UGC NET.,


LECTURER, PRIMUS PU COLLEGE, BANGALORE - 560 035
23
STUDENT NOTES || CHAPTER 7 UNDERSTANDING DATA April 23, 2025

1. Find mean of the dataset.


2. Subtract each value from the mean.
3. Square the differences.
4. Find average of squared differences.
5. Take square root of the result.
Example:
For heights: [90, 102, 110, 115, 85, 90, 100, 110, 110]
- Mean ≈ 101.33

R
- 𝜎 ≈ √(938/9) ≈ 10.2

L
5. Differentiate between Range and Standard Deviation. Explain with examples and formula.

N
Answer:

EE
Feature Range Standard Deviation

Definition Difference between highest and lowest Average spread of all values from the
values
H mean
AT
Formula Max - Min Refer 3rd question in 3 markers for
formula.
M

Data Used Only two values (max and min) All values
Sensitive to Highly sensitive Less affected
ED

Outliers
M

Example:
Data: [85, 90, 90, 100, 102, 110, 110, 110, 115]
M

- Range = 115 – 85 = 30
A

- σ ≈ 10.2 (calculated using mean and all values)


H

6. Explain the data processing cycle with a real-life example.


Answer:
O

The data processing cycle includes:


M

1. Data Collection – Gather raw data.


2. Data Preparation – Organise, clean, and validate data.
3. Data Entry – Input data into the system.
4. Processing – Apply algorithms and logic to get results.
5. Storage/Retrieval – Save and retrieve data as needed.
6. Output/Reporting – Present results in a meaningful form.

Example: In online exam registration:

L R MOHAMMED MATHEEN M.C.A., M.A., B.ED., UGC NET.,


LECTURER, PRIMUS PU COLLEGE, BANGALORE - 560 035
24
STUDENT NOTES || CHAPTER 7 UNDERSTANDING DATA April 23, 2025

- Collect name, marks, payment details


- Check eligibility
- Generate roll numbers and admit cards

7. How is data collected in digital environments? Explain three different scenarios.


Answer:
1. Manual to Digital:
- Shopkeeper keeps records in a diary.

R
- Data is entered into a spreadsheet or software.

2. Already in Digital Format:

L
- Data is in CSV or database format.

N
- Can be directly imported and processed.

EE
3. Fresh Data Collection:
- A new system is developed (e.g., using Python/MySQL) to record and store sales or transactions digi-
tally.

H
8. Explain metadata with three examples. How is it useful in processing unstructured data?
AT
Answer:
Metadata is data about data. It helps identify, describe, and process unstructured data.
M

Examples:
1. Image File: Metadata includes image size, type, resolution.
ED

2. Email: Subject, recipient, time sent.


3. Document: Author name, date created, word count.

Usefulness:
M

Metadata helps organise, search, and process unstructured content like emails, images, and documents.
M

9. List and explain any five real-life applications of statistical techniques in data processing.
Answer:
A

1. Education: Teachers analyse marks using mean and median.


H

2. Business: Use mode to identify popular products.


O

3. Health: Use standard deviation to assess variability in patient recovery times.


4. Elections: Calculate vote share using mean/percentages.
M

5. Weather Forecasting: Analyse range of temperatures to predict extremes.

10. Compare the use of Mean, Median, and Mode with suitable scenarios.
Answer:

L R MOHAMMED MATHEEN M.C.A., M.A., B.ED., UGC NET.,


LECTURER, PRIMUS PU COLLEGE, BANGALORE - 560 035
25
STUDENT NOTES || CHAPTER 7 UNDERSTANDING DATA April 23, 2025

Measure Best Used When Example

Mean No extreme outliers, want overall average Average marks of students


Median Outliers present, want central tendency Income data where few are very rich
Mode Need most frequent value Popular shoe size sold in a store

R
Each measure helps summarise data based on the context of variability, frequency, or central position.

L
CHAPTER END EXERCISES

N
EE
1. Identify data required to be maintained to perform the following services:

a) Declare exam results and print e-certificates


Answer:
- Student name H
AT
- Roll number
- Class and section
M

- Subjects and marks


- Grades
ED

- Attendance
- Certificate ID
- Signature and date of issue
M

b) Register participants in an exhibition and issue biometric ID cards


M

Answer:
- Participant name
A

- Age and gender


H

- Contact details
- Photograph and biometric (fingerprint/iris scan)
O

- Address and ID proof


M

- Event or category

c) To search for an image by a search engine


Answer:
- Image metadata (name, format, size, resolution)
- Image content tags or labels
- User search keywords

L R MOHAMMED MATHEEN M.C.A., M.A., B.ED., UGC NET.,


LECTURER, PRIMUS PU COLLEGE, BANGALORE - 560 035
26
STUDENT NOTES || CHAPTER 7 UNDERSTANDING DATA April 23, 2025

- Date uploaded
- Website source

d) To book an OPD appointment with a hospital in a specific department


Answer:
- Patient name
- Age and gender
- Contact number

R
- Department
- Doctor preference

L
- Appointment date and time
- Previous patient history (if available)

N
2. A school having 500 students wants to identify beneficiaries of the merit-cum means scholarship,

EE
achieving more than 75% for two consecutive years and having family income less than 5 lakh per
annum. Briefly describe data processing steps to be taken by the school.

Answer:
H
AT
1. Data Collection: Collect student academic data and income certificates.
2. Data Entry: Enter marks of last two years and family income.
M

3. Processing:
- Check if student scored >75% in both years.
- Check if family income < Rs.5 lakh.
ED

4. Filtering: Select students fulfilling both conditions.


5. Output: Generate a list of eligible beneficiaries.
M

3. A bank ‘xyz’ wants to know about its popularity among the residents of a city ‘ABC’ on the
basis of number of bank accounts each family has and the average monthly account balance of
M

each person. Briefly describe the steps to be taken for collecting data and what results can be
A

checked through processing of the collected data.


H

Answer:
Steps for Data Collection:
O

- Collect family-wise account details from the branch database.


M

- Record number of accounts per family.


- Note average balance per person using bank statement data.

Results through Processing:


- Identify total number of families using bank services.
- Calculate average number of accounts per family.
- Assess average monthly balance trend.
- Compare with population data to estimate market penetration.

L R MOHAMMED MATHEEN M.C.A., M.A., B.ED., UGC NET.,


LECTURER, PRIMUS PU COLLEGE, BANGALORE - 560 035
27
STUDENT NOTES || CHAPTER 7 UNDERSTANDING DATA April 23, 2025

4. Identify type of data being collected/generated in the following scenarios:

a) Recording a video – Unstructured


b) Marking attendance by teacher – Structured
c) Writing tweets – Unstructured
d) Filling an application form online – Structured

5. Consider the temperature (in Celsius) of 7 days of a week as 34, 34, 27, 28, 27, 34, 34. Identify
the appropriate statistical technique to be used to calculate the following:

R
a) Find the average temperature.

L
Answer: Mean = (34+34+27+28+27+34+34)/7 = 31.14°C

b) Find the temperature range of that week.

N
Answer: Max = 34, Min = 27 → Range = 34 – 27 = 7°C

EE
c) Find the standard deviation of temperature.
Answer:
Mean = 31.14
H
AT
Each deviation squared =
(34 − 31.14)2 + (34 − 31.14)2 + (27 − 31.14)2 + (28 − 31.14)2 + (27 − 31.14)2 + (34 − 31.14)2 +
(34 − 31.14)2
M

Sum of squares = 84.85


Standard deviation = √(84.85/7) ≈ ∗ ∗ 3.48°𝐶 ∗ ∗
ED

6. A school teacher wants to analyse results. Identify the appropriate statistical technique to be
used along with its justification for the following cases:
M

a) Teacher wants to compare performance in terms of division secured by students in Class XII A
M

and Class XII B where each class strength is same.


Answer:
A

Median – It gives central tendency and helps compare two classes even if outliers exist.
H

b) Teacher has conducted five unit tests for that class in months July to November and wants to
compare the class performance in these five months.
O

Answer:
M

Mean – It provides an average of class performance across all months.

7. Suppose annual day of your school is to be celebrated. The school has decided to felicitate those
parents of the students studying in classes XI and XII, who are the alumni of the same school. In
this context, answer the following questions:

a) Which statistical technique should be used to find out the number of students whose both parents
are alumni of this school?

L R MOHAMMED MATHEEN M.C.A., M.A., B.ED., UGC NET.,


LECTURER, PRIMUS PU COLLEGE, BANGALORE - 560 035
28
STUDENT NOTES || CHAPTER 7 UNDERSTANDING DATA April 23, 2025

Answer:
Mode – To find the most frequently occurring case or count how many have both parents as alumni.

b) How varied are the age of parents of the students of that school?
Answer:
Standard Deviation – To measure the spread or variation in parents’ ages.

8. For the annual day celebrations, the teacher is looking for an anchor in a class of 42 students.
The teacher would make selection of an anchor on the basis of singing skill, writing skill, as well as

R
monitoring skill.

L
a) Which mode of data collection should be used?
Answer:

N
Observation or Rating Sheet/Survey – Teacher can assess or collect peer feedback.

EE
b) How would you represent the skill of students as data?
Answer:
Using structured rating scales (1 to 5) for each skill.
Example: H
AT
- Student A: Singing – 4, Writing – 5, Monitoring – 3

9. Differentiate between structured and unstructured data giving one example. The principal of a
M

school wants to do following analysis on the basis of food items procured and sold in the canteen:

a) Compare the purchase and sale price of fruit juice and biscuits.
ED

b) Compare sales of fruit juice, biscuits and samosa.


c) Variation in sale price of fruit juices of different companies for same quantity (in ml).
M

Create an appropriate dataset for these items (fruit juice, biscuits, samosa) by listing their purchase
price and sale price. Apply basic statistical techniques to make the comparisons.
M

Answer:
A

Structured vs Unstructured Data:


H

- Structured: Tabular format (e.g., canteen inventory)


- Unstructured: Text or multimedia (e.g., student reviews)
O

Example dataset:
M

Item Purchase Price Sale Price Quantity Sold

Fruit Juice A Rs.20 Rs.30 100


Biscuits Rs.10 Rs.15 150
Samosa Rs.8 Rs.12 120

L R MOHAMMED MATHEEN M.C.A., M.A., B.ED., UGC NET.,


LECTURER, PRIMUS PU COLLEGE, BANGALORE - 560 035
29
STUDENT NOTES || CHAPTER 7 UNDERSTANDING DATA April 23, 2025

Statistical Techniques:
- Use mean to find average sale price.
- Use range to compare variation in prices.
- Use mode if any item sells the most.

For more information Visit:

https://matheenhere.blogspot.com

R
L
N
EE
H
AT
M
ED
M
M
A
H
O
M

L R MOHAMMED MATHEEN M.C.A., M.A., B.ED., UGC NET.,


LECTURER, PRIMUS PU COLLEGE, BANGALORE - 560 035
30

You might also like