Chapter 7 Understanding Data
Chapter 7 Understanding Data
O
H
A
M
M
ED
M
AT
H
EE
N
L
R
STUDENT NOTES || CHAPTER 7 UNDERSTANDING DATA April 23, 2025
Contents
LICENSE 3
R
7.1.2 Types of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
7.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
L
7.3 Data Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
N
7.4 Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Figure 7.2: Real-Life Data Processing Examples . . . . . . . . . . . . . . . . . . . . 8
EE
7.5 Statistical Techniques for Data Processing . . . . . . . . . . . . . . . . . . . . . . . 8
Glossary of Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 MARKS QUESTIONS 18
ED
3 MARKS QUESTIONS 20
5 MARKS QUESTIONS 22
M
LICENSE
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 Interna-
tional License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ or
send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.
“Karnataka Second PUC Computer Science Study Material / Student Notes” by L R Mohammed Matheen
is licensed under CC BY-NC-ND 4.0.
R
L
N
Figure 1: Licence
EE
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 Interna-
tional License.
H
Portions of this work may include material under separate copyright. These materials are not covered by
AT
this Creative Commons license and are used by permission or under applicable copyright exceptions.
tional License.
ED
M
M
A
H
O
M
CHAPTER NOTES
Data is essential for human decision-making in nearly every field of life. Whether we are selecting a
college, making business plans, monitoring health, or predicting weather, we rely on data.
R
Data are raw facts and figures which, when processed, yield information that helps in making decisions.
L
For instance: - Colleges maintain placement data of students to attract new students. - Governments
conduct census to gather demographic data. - Banks maintain account-related data for every customer. -
N
Sports teams analyze opponent data to form strategies.
EE
Definition: Data is a collection of characters, numbers, or symbols that represent values of some
variables.
H
AT
Examples of Common Data Types:
Humans and computers both rely heavily on data. Computers can process large volumes of data quickly
H
Real-life Examples:
• ATM transactions: The system deducts the withdrawn amount from your bank account.
• Weather monitoring: Meteorological departments analyze satellite data for warnings.
• Businesses: Analyze market trends, customer feedback, and dynamic pricing.
• Transportation: Cab apps adjust pricing based on demand and supply.
• Restaurants: Offer discounts during “happy hours” by analyzing customer visits.
Other Scenarios:
R
7.1.2 Types of Data
L
(A) Structured Data: Organized in a tabular format (rows and columns), structured data can be easily
N
processed using spreadsheets or databases. Each row is a record, and each column is an attribute.
EE
Table 7.1: Structured Inventory Data
GH67 Jug 80 0 10
M
With structured data, you can calculate: - Total items in inventory - Total inventory value
Table 7.2: More Structured Examples - Books: Title, Author, Price, Year - Fee Payments: Student
M
Name, Roll No., Amount - ATM Withdrawals: Account No., Amount, ATM ID, Date & Time
A
H
(B) Unstructured Data: Lacks a fixed format or structure. Examples: - News articles - Emails -
Social media posts - Documents with images, audio, video
O
Example: A newspaper page can have a mix of text, ads, images with no fixed layout.
M
3 Scenarios:
Our daily digital activities constantly generate data (e.g., browsing, transactions, healthcare records).
R
Example Applications:
L
• Shopping malls: Analyze co-purchased items to optimize product placement.
N
• Social Media: Public sentiment analysis before elections.
• Global Institutions: Use economic data for forecasting.
EE
7.3 Data Storage
H
AT
After collection, data needs to be stored safely for current and future use. The volume of data is increasing
rapidly, making storage challenging.
M
• Tape Drives
• Pen Drives
M
• Memory Cards
A
Data such as documents, images, audio, video, etc., are stored as files. For complex requirements, we
H
Just storing data is not enough. Data needs to be processed to extract information. Raw data alone cannot
be used for analysis or decision-making.
Here is the Data Processing Cycle diagram from the textbook, properly redrawn and explained for
clarity and accuracy:
+------------------------+
| Data Collection |
+-----------+------------+
|
v
+------------------------+
R
| Data Preparation |
| (cleaning, formatting) |
L
+-----------+------------+
N
|
v
EE
+------------------------+
| Data Entry |
| (input into system) |
H
AT
+-----------+------------+
|
v
M
+------------------------+
| Data Storage |
ED
| (files/databases) |
+-----------+------------+
|
M
v
M
+------------------------+
| Data Retrieval |
A
+-----------+------------+
|
O
v
M
+------------------------+
| Data Processing |
| (classify, update, |
| analyze, compute) |
+-----------+------------+
|
v
+------------------------+
| Data Output |
| (Reports, results, |
| visualizations) |
+------------------------+
1. Data Collection
R
Gathering raw data from various sources like forms, sensors, logs, or digital inputs.
L
2. Data Preparation
Cleaning data (removing errors, duplicates), converting formats, and making it consistent.
N
3. Data Entry
EE
Inputting the cleaned data into digital systems (manually or via automated processes).
4. Data Storage
H
Saving data in files, spreadsheets, or databases for persistent use.
AT
5. Data Retrieval
Accessing stored data when needed for processing.
M
6. Data Processing
Analyzing, classifying, computing, and transforming data to generate meaningful results.
ED
7. Data Output
Presenting the results in the form of charts, tables, summaries, or reports.
M
Exam registration Student details, payment Validate data, assign roll no. Admit card
info
O
ATM withdrawal Account info, PIN Verify & deduct amount Currency,
M
receipt
Train ticket booking Journey info, payment Check availability, assign berth E-ticket
Example: [90, 102, 110, 115, 85, 90, 100, 110, 110]
→ Mean = 101.33
R
(B) Median
L
• If odd count → middle item
N
• If even count → average of two middle items
EE
Sorted Example: [85, 90, 90, 100, 102, 110, 110, 110, 115]
→ Median = 102
H
AT
(C) Mode
(A) Range
A
(B) Standard Deviation (𝜎) Measures how values differ from the mean.
∑(𝑥𝑖 − 𝑥)̄ 2
[𝜎 = √ ]
𝑛
Table 7.3: SD Calculation for Heights
• Mean = 101.33
• 𝑆𝐷 ≈ √(938/9) = 𝑎𝑝𝑝𝑟𝑜𝑥.10.21
R
L
Task Best Technique
N
Disparity in salaries Standard Deviation
EE
Average class performance Mean
Compare height in 2 cities Mean or SD
Popular car color
H Mode
AT
Find dominant data value Mode
Compare incomes Mean + SD
M
Term Explanation
M
Term Explanation
R
DBMS Software to manage large datasets efficiently
L
MULTIPLE CHOICE QUESTIONS (MCQs)
N
EE
1. What is the primary purpose of collecting data?
A. To decorate webpages
B. To store data indefinitely
H
AT
C. To make decisions based on analysis
D. To reduce computer memory usage
Answer: C. To make decisions based on analysis
M
B. An image on a website
C. A table showing inventory items in a shop
M
D. A video file
Answer: C. A table showing inventory items in a shop
M
A. Multimedia content
B. Data about data
H
R
A. The format of an image
B. A column representing a characteristic
L
C. A row of values
D. A description of metadata
N
Answer: B. A column representing a characteristic
EE
7. Which of the following is not a measure of central tendency?
A. Mean
B. Median
H
AT
C. Mode
D. Range
Answer: D. Range
M
A. Audio processing
B. Metadata generation
H
C. Statistical techniques
O
D. Data hiding
Answer: C. Statistical techniques
M
11. What is the mode of the dataset [34, 34, 27, 28, 27, 34, 34]?
A. 27
B. 34
C. 28
D. 33
Answer: B. 34
R
A. Pen Drive
B. CD/DVD
L
C. HDD
D. None of the above
N
Answer: D. None of the above
EE
13. Unstructured data can include all except:
A. Audio files
B. Web pages with multimedia
H
AT
C. Tables with rows and columns
D. Social media messages
Answer: C. Tables with rows and columns
M
A. Java
B. MySQL
H
C. Python
O
D. HTML
Answer: C. Python
M
R
A. Can be easily stored in tables
B. Has a clear format with fixed fields
L
C. Lacks predefined structure
D. Cannot be stored digitally
N
Answer: C. Lacks predefined structure
EE
20. What is the formula for standard deviation (σ) as per the chapter?
A. 𝜎 = 𝑀 𝑒𝑎𝑛𝑜𝑓𝑣𝑎𝑙𝑢𝑒𝑠 B. 𝜎 = (𝑀 𝑎𝑥𝑖𝑚𝑢𝑚 − 𝑀 𝑖𝑛𝑖𝑚𝑢𝑚)
C. 𝜎 = √(Σ(𝑥𝑖 − 𝑥)2 /𝑛)
H
AT
D. 𝜎 = Σ(𝑥𝑖 + 𝑥)/𝑛
Answer: C. 𝜎 = √(Σ(𝑥𝑖 − 𝑥)2 /𝑛)
M
Thank you for the clarification. Below is the fully formatted set of 10 Assertion and Reason-type
MCQs, each with a complete question, options (A to D), and correct answer, strictly based on Chap-
ED
ter 7: Understanding Data from the Class XII Computer Science textbook (2021–22). All topics and
subtopics from the PDF are covered.
21.
M
Assertion (A): Raw data by itself is usually not meaningful for decision-making.
M
Reason (R): Raw data can be interpreted directly without the need for processing.
Options:
A
23. Assertion (A): Metadata helps in identifying and organizing unstructured data.
Reason (R): Metadata provides information about data, such as file type, size, resolution, and content
R
description.
L
Options:
A. Both A and R are true, and R is the correct explanation of A.
N
B. Both A and R are true, but R is not the correct explanation of A.
EE
C. A is true, but R is false.
D. A is false, but R is true.
H
Answer: A. Both A and R are true, and R is the correct explanation of A.
AT
24. Assertion (A): Median is a preferred measure of central tendency when data contains outliers.
Reason (R): Median is calculated by averaging all the data values in the dataset.
M
Options:
A. Both A and R are true, and R is the correct explanation of A.
B. Both A and R are true, but R is not the correct explanation of A.
ED
25. Assertion (A): Standard deviation gives a more accurate measure of variability than range.
Reason (R): Standard deviation considers the deviation of each data point from the mean, while range
A
Options:
O
26.
Assertion (A): Dynamic pricing models in cab and airline services rely on real-time data analysis.
Reason (R): These pricing models are based on historical sales reports compiled over a period of several
months.
Options:
A. Both A and R are true, and R is the correct explanation of A.
B. Both A and R are true, but R is not the correct explanation of A.
C. A is true, but R is false.
D. A is false, but R is true.
R
Answer: C. A is true, but R is false.
L
27. Assertion (A): Unstructured data cannot be processed efficiently using traditional database tools.
Reason (R): Traditional database tools are designed to work with structured data that has a fixed schema,
N
like tables with defined columns.
EE
Options:
A. Both A and R are true, and R is the correct explanation of A.
H
B. Both A and R are true, but R is not the correct explanation of A.
C. A is true, but R is false.
AT
D. A is false, but R is true.
28. Assertion (A): Range and standard deviation are both used to measure how spread out data values
are from each other.
ED
Reason (R): Range and standard deviation are used to find the most frequently occurring value in a
dataset.
M
Options:
A. Both A and R are true, and R is the correct explanation of A.
M
29. Assertion (A): Data entry is an essential step in the data processing cycle.
M
Reason (R): Data entry is the same as the process of collecting real-world data from various sources.
Options:
A. Both A and R are true, and R is the correct explanation of A.
B. Both A and R are true, but R is not the correct explanation of A.
C. A is true, but R is false.
D. A is false, but R is true.
30. Assertion (A): Mode can be calculated for both numeric and non-numeric data values.
Reason (R): Mode identifies the value(s) that occur most frequently in the dataset.
Options:
A. Both A and R are true, and R is the correct explanation of A.
B. Both A and R are true, but R is not the correct explanation of A.
C. A is true, but R is false.
R
D. A is false, but R is true.
L
Answer: A. Both A and R are true, and R is the correct explanation of A.
N
FILL IN THE BLANKS
EE
1. Data which is organised and can be recorded in a well-defined format is called ________.
Answer: Structured Data
H
AT
2. Data which do not follow any fixed structure or format are called ________.
Answer: Unstructured Data
M
4. The process of collecting, storing, and analysing data for decision making is known as ________.
Answer: Data Processing
M
5. ________ is a measure of central tendency that represents the average of a set of values.
Answer: Mean
M
Answer: Median
H
7. The value that occurs most frequently in a data set is called the ________.
Answer: Mode
O
8. The difference between the maximum and minimum values in a data set is called the ________.
M
Answer: Range
11. Examples of digital storage devices include HDD, SSD, CD/DVD, Pen Drive, and ________.
Answer: Memory Card
12. Statistical techniques used to summarise data include mean, median, mode, range, and
________.
Answer: Standard Deviation
13. The process of obtaining data from reliable sources before processing is called ________.
Answer: Data Collection
R
14. ICT revolution has led to the generation of ________ volume of data at a very fast pace.
L
Answer: Large
N
Answer: Tabular
EE
2 MARKS QUESTIONS
H
AT
1. What is the difference between data and information?
Answer:
M
Data refers to unorganised facts that need to be processed, while information is the processed form of
data that is meaningful and useful for decision making.
ED
Answer:
Metadata is data about data. For example, in an image file, metadata may include image size, type (JPEG,
A
Answer:
- Structured Data: Organised in rows and columns (e.g., database tables).
M
- Unstructured Data: Lacks a defined format (e.g., emails, web pages, videos).
R
healthcare, and governance.
L
Answer:
N
Mean is the average of numeric values.
Formula: 𝑀 𝑒𝑎𝑛 = (𝑥1 + 𝑥2 + ... + 𝑥𝑛 )/𝑛
EE
9. What is median? How is it calculated for even number of values?
Answer:
H
Median is the middle value in an ordered list. For even number of values, it is the average of the two
AT
middle values.
Answer:
Mode is the value that appears most frequently in a dataset.
ED
Range is the difference between the maximum and minimum values in a dataset.
M
Answer:
H
Standard deviation measures the spread or dispersion of values around the mean. It considers all data
O
14. Mention any two statistical techniques used for data summarisation.
Answer:
1. Measures of Central Tendency (Mean, Median, Mode)
R
Structured data such as votes cast, which are accumulated and processed for quick result declaration.
L
17. Give two scenarios where data is used for making decisions.
Answer:
N
1. Meteorological data used to predict cyclones.
EE
2. Sales data used by businesses to offer discounts or change product placements.
Answer:
1. Data Collection
ED
2. Data Preparation
3. Data Entry
4. Storage and Retrieval
M
6. Generation of Reports/Results
A
3 MARKS QUESTIONS
H
O
1. Explain the three commonly used measures of central tendency with examples.
M
Answer:
- Mean is the average of all values.
Example: Mean of [90, 100, 110] = (90+100+110)/3 = 100
- Median is the middle value in a sorted list.
Example: Median of [85, 90, 100, 110, 115] = 100
- Mode is the value that occurs most frequently.
Example: Mode of [90, 110, 110, 110, 100] = 110
3. Define standard deviation. Write its formula and explain its significance.
R
Answer:
Standard deviation (σ) measures the spread of data around the mean.
L
Formula:
N
1 𝑛
[𝜎 = √ ∑(𝑥𝑖 − 𝑥)̄ 2 ]
EE
𝑛 𝑖=1
It gives insights into data variability. A smaller σ means values are closer to the mean; a larger σ indicates
more spread.
H
AT
4. What are metadata? Give three examples from different digital files.
Answer:
M
Metadata are data about data. They describe content and structure.
Examples:
ED
5. Describe the role of data in business decision-making with any two examples.
M
Answer:
Businesses use data to understand market trends and improve performance.
A
Examples:
H
6. Explain the data processing cycle with the help of a diagram or steps.
M
Answer:
The data processing cycle includes the following steps:
1. Input – Collecting and entering data.
2. Processing – Manipulating data to produce results.
3. Output – Presenting the results.
4. Storage and Retrieval – Saving for future use.
These steps convert raw data into useful information.
7. Distinguish between range and standard deviation with formula and example.
Answer:
- Range: Difference between maximum and minimum values.
Formula: Range = Max – Min
Example: For [85, 90, 115], Range = 115 – 85 = 30
- Standard Deviation: Measures average spread from the mean.
Formula: 𝜎 = √(Σ(𝑥–𝑥)2 /𝑛)
R
Example: For [90, 100, 110], σ is calculated using all data points.
8. Give three different scenarios of data collection and describe the method to convert them into
L
digital format.
N
Answer:
1. Manual Record (e.g., shopkeeper’s diary): Enter data into spreadsheet manually.
EE
2. Digital File (e.g., CSV): Directly use data for analysis using software tools.
3. No prior data: Develop software (e.g., in Python or MySQL) to store and manage sales digitally.
H
9. What are the limitations of file processing and how does DBMS help overcome them?
AT
Answer:
File Processing Limitations:
- Difficult to handle large data
M
DBMS Benefits:
- Centralised management
M
10. A teacher wants to compare students’ test results from five months. Which statistical technique
is suitable and why?
A
Answer:
H
Mean is the suitable technique to compare average performance over five months. It provides a quick un-
O
derstanding of how the class performed each month and highlights trends in overall class performance.
M
5 MARKS QUESTIONS
1. What are the different types of data? Explain structured and unstructured data with examples.
Answer:
Data can be broadly categorized into:
1. Structured Data – Organised in a defined format like tables (rows and columns). Each column
R
- News articles with images, videos, and text
L
2. Explain the role of data in various real-life sectors. Give at least five examples.
Answer:
N
Data plays a crucial role in decision-making across various domains.
EE
Examples:
1. Education: Placement data helps students choose colleges.
2. Government: Census data is used for planning policies.
H
3. Healthcare: Hospitals collect patient data for treatment analysis.
AT
4. Meteorology: Weather offices use satellite data to predict storms.
5. Business: Sales data is analysed for discounts, inventory planning, and marketing decisions.
M
Difference:
- Mean considers all values.
O
Steps:
R
- 𝜎 ≈ √(938/9) ≈ 10.2
L
5. Differentiate between Range and Standard Deviation. Explain with examples and formula.
N
Answer:
EE
Feature Range Standard Deviation
Definition Difference between highest and lowest Average spread of all values from the
values
H mean
AT
Formula Max - Min Refer 3rd question in 3 markers for
formula.
M
Data Used Only two values (max and min) All values
Sensitive to Highly sensitive Less affected
ED
Outliers
M
Example:
Data: [85, 90, 90, 100, 102, 110, 110, 110, 115]
M
- Range = 115 – 85 = 30
A
R
- Data is entered into a spreadsheet or software.
L
- Data is in CSV or database format.
N
- Can be directly imported and processed.
EE
3. Fresh Data Collection:
- A new system is developed (e.g., using Python/MySQL) to record and store sales or transactions digi-
tally.
H
8. Explain metadata with three examples. How is it useful in processing unstructured data?
AT
Answer:
Metadata is data about data. It helps identify, describe, and process unstructured data.
M
Examples:
1. Image File: Metadata includes image size, type, resolution.
ED
Usefulness:
M
Metadata helps organise, search, and process unstructured content like emails, images, and documents.
M
9. List and explain any five real-life applications of statistical techniques in data processing.
Answer:
A
10. Compare the use of Mean, Median, and Mode with suitable scenarios.
Answer:
R
Each measure helps summarise data based on the context of variability, frequency, or central position.
L
CHAPTER END EXERCISES
N
EE
1. Identify data required to be maintained to perform the following services:
- Attendance
- Certificate ID
- Signature and date of issue
M
Answer:
- Participant name
A
- Contact details
- Photograph and biometric (fingerprint/iris scan)
O
- Event or category
- Date uploaded
- Website source
R
- Department
- Doctor preference
L
- Appointment date and time
- Previous patient history (if available)
N
2. A school having 500 students wants to identify beneficiaries of the merit-cum means scholarship,
EE
achieving more than 75% for two consecutive years and having family income less than 5 lakh per
annum. Briefly describe data processing steps to be taken by the school.
Answer:
H
AT
1. Data Collection: Collect student academic data and income certificates.
2. Data Entry: Enter marks of last two years and family income.
M
3. Processing:
- Check if student scored >75% in both years.
- Check if family income < Rs.5 lakh.
ED
3. A bank ‘xyz’ wants to know about its popularity among the residents of a city ‘ABC’ on the
basis of number of bank accounts each family has and the average monthly account balance of
M
each person. Briefly describe the steps to be taken for collecting data and what results can be
A
Answer:
Steps for Data Collection:
O
5. Consider the temperature (in Celsius) of 7 days of a week as 34, 34, 27, 28, 27, 34, 34. Identify
the appropriate statistical technique to be used to calculate the following:
R
a) Find the average temperature.
L
Answer: Mean = (34+34+27+28+27+34+34)/7 = 31.14°C
N
Answer: Max = 34, Min = 27 → Range = 34 – 27 = 7°C
EE
c) Find the standard deviation of temperature.
Answer:
Mean = 31.14
H
AT
Each deviation squared =
(34 − 31.14)2 + (34 − 31.14)2 + (27 − 31.14)2 + (28 − 31.14)2 + (27 − 31.14)2 + (34 − 31.14)2 +
(34 − 31.14)2
M
6. A school teacher wants to analyse results. Identify the appropriate statistical technique to be
used along with its justification for the following cases:
M
a) Teacher wants to compare performance in terms of division secured by students in Class XII A
M
Median – It gives central tendency and helps compare two classes even if outliers exist.
H
b) Teacher has conducted five unit tests for that class in months July to November and wants to
compare the class performance in these five months.
O
Answer:
M
7. Suppose annual day of your school is to be celebrated. The school has decided to felicitate those
parents of the students studying in classes XI and XII, who are the alumni of the same school. In
this context, answer the following questions:
a) Which statistical technique should be used to find out the number of students whose both parents
are alumni of this school?
Answer:
Mode – To find the most frequently occurring case or count how many have both parents as alumni.
b) How varied are the age of parents of the students of that school?
Answer:
Standard Deviation – To measure the spread or variation in parents’ ages.
8. For the annual day celebrations, the teacher is looking for an anchor in a class of 42 students.
The teacher would make selection of an anchor on the basis of singing skill, writing skill, as well as
R
monitoring skill.
L
a) Which mode of data collection should be used?
Answer:
N
Observation or Rating Sheet/Survey – Teacher can assess or collect peer feedback.
EE
b) How would you represent the skill of students as data?
Answer:
Using structured rating scales (1 to 5) for each skill.
Example: H
AT
- Student A: Singing – 4, Writing – 5, Monitoring – 3
9. Differentiate between structured and unstructured data giving one example. The principal of a
M
school wants to do following analysis on the basis of food items procured and sold in the canteen:
a) Compare the purchase and sale price of fruit juice and biscuits.
ED
Create an appropriate dataset for these items (fruit juice, biscuits, samosa) by listing their purchase
price and sale price. Apply basic statistical techniques to make the comparisons.
M
Answer:
A
Example dataset:
M
Statistical Techniques:
- Use mean to find average sale price.
- Use range to compare variation in prices.
- Use mode if any item sells the most.
https://matheenhere.blogspot.com
R
L
N
EE
H
AT
M
ED
M
M
A
H
O
M