[go: up one dir, main page]

0% found this document useful (0 votes)
24 views12 pages

Problem Solving and Data Analysis (Continue)

The document discusses concepts of data analysis, focusing on measures of center (mean, median, mode) and spread (range, standard deviation). It includes various questions and examples related to data sets, employee absences, student surveys, and outliers. The document aims to enhance understanding of statistical analysis and interpretation through practical applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views12 pages

Problem Solving and Data Analysis (Continue)

The document discusses concepts of data analysis, focusing on measures of center (mean, median, mode) and spread (range, standard deviation). It includes various questions and examples related to data sets, employee absences, student surveys, and outliers. The document aims to enhance understanding of statistical analysis and interpretation through practical applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Problem Solving and Data Analysis (continue)

Data analysis: center and spread

1. Center: mean, median, and mode

- Mean (Arithmetic Average): the mean of a set of numerical values is the sum of all the values divided by
the number of values in the set.
- Median: the median of a set of numerical values is the middle value when the values are listed in increasing
(or decreasing) order.
Example: 12 14 16 18 20 22 24
●​ Mean: 18
●​ Median: 18
Example: 30 32 34 36 38 40 42 44
●​ Mean: 37
●​ Median 37
- Mode: the mode of a set of numerical values is the number that appears most frequently in a series of
numbers.
Example: 12 14 16 18 18 22 24
⇒ The mode of the set is 18, which appears twice.

Question 1.
Data set A consists of 10 positive integers less than 60. The list shown gives 9 of the integers from data set A.

43, 45, 44, 43, 38, 39, 40, 46, 40

The mean of these 9 integers is 42. If the mean of data set A is an integer that is greater than 42, what is the
value of the largest integer from data set A?

Question 2.
Value Data set A frequency Data set B frequency

30 2 9

34 4 7

38 5 5

42 7 4

46 9 2

Data set A and data set B each consists of 27 values. The table shows the frequencies of the values for each data
set. Which of the following statements best compares the means of the two data sets?
A. The mean of data set A is greater than the mean of data set B.
B. The mean of data set A is less than the mean of data set B.
C. The mean of data set A is equal to the mean of data set B.
D. There is not enough information to compare the means of the data sets.
Question 3.

Two data sets of 23 integers each are summarized in the histograms shown. For each of the histograms, the first
interval represents the frequency of integers greater than or equal to 10, but less than 20. The second interval
represents the frequency of integers greater than or equal to 20, but less than 30, and so on. What is the smallest
possible difference between the mean of data set A and the mean of data set B?
A.​ 0 C.​ 10
B.​ 1 D.​ 23

Question 4.
Employee Absences
Number of days Number of employees

0 8

1 4

2 3

4 4

5 5

13 1
The frequency table above shows the distribution of the number of days each of the 25 employees of a company
was absent last month. What is the median number of days absent for the 25 employees last month?
A.​ 1 C.​ 4
B.​ 2 D.​ 5

Question 5.

Ned runs a soybean farm and recorded the yields for 175 different one-acre sections. The results are shown in
the graph above. Which of the following could be the median yield of Ned’s soybean acres?
A.​ 44 bushels C.​ 52 bushels
B.​ 48 bushels D.​ 56 bushels
Question 6.
Each of the following frequency tables represents a data set.
Which data set has the greatest mean?
A. B. C. D.

Question 7.
Data set F consists of 55 integers between 170 and 290. Data set G consists of all of the integers in data set F as
well as the integer 10. Which of the following must be less for data set F than for data set G?
I. The mean
II. The median
A.​ I only C.​ I and II
B.​ II only D.​ Neither I nor II

Question 8.
Data set A consists of the heights of 75 buildings and has a mean of 32 meters. Data set B consists of the
heights of 50 buildings and has a mean of 62 meters. Data set C consists of the heights of the 125 buildings
from data sets A and B. What is the mean in meters of data set C?

Question 9.
A sociologist chose 300 students at random from each of two schools and asked each student how many siblings
he or she has. The results are shown in the table below.

There are a total of 2,400 students at Lincoln School and 3,300 students at Washington School. What is the
median number of siblings for all the students surveyed?
A.​ 0 C.​ 2
B.​ 1 D.​ 3

Question 10.
Let M be the median, and m the mode, of the following set of numbers: 10, 70, 20, 40, 70, 90. What is the
average (arithmetic mean) of M and m?
A.​ 55 C.​ 62.5
B.​ 60 D.​ 65
Question 11.

Say that Ed scored 16 points in the fourth game. What should be added to Jay's median score to equal the
median of Ed's scores?

Question 12.
Blaire, Chen, Erin, Liz, and Mauro all participate in a one-mile race. The average (arithmetic mean) for the
times of all girls is 6.5 minutes. If the average time for Blaire, Chen, and Erin is 6.7 minutes, what is the
average time for Liz and Mauro?
A.​ 6.1 minutes C.​ 6.3 minutes
B.​ 6.2 minutes D.​ 6.4 minutes

Question 13.
There were 10 softball teams in a tournament. The mean number of runs for all the teams is 6.4. Only 8 of the
teams made it to the playoffs. The average number of runs for the teams that made it to the playoffs was 7.1
rounded to the nearest tenth. What was the combined number of runs for the two teams that did not make it to
the tournament?
A.​ 5 C.​ 7
B.​ 6 D.​ 8

Question 14.
In a review of the cost of a gallon of gasoline from all service stations in a county, it was found that the mode of
the price of a gallon of gas was much lower than the average for the price of a gallon of gas. Which of the
following would explain the reason for this difference?
A. The average price of a gallon of gas was based on all the gas sold, but the mode was just based on some of
the prices.
B. The mode shows a cluster of prices for the lowest prices, while the mean shows a result for all the prices.
C. Mode and mean are both measures of central tendency, and it just happens that in this case the mode is lower.
D. The mean will always be more than the mode because the mean includes higher prices.

Question 15.
The table above shows the number of items 130 customers purchased from a stationery store during Sunday.
Which of the following can be obtained from the information in the table?
I.​ The average (arithmetic mean) number of items
II.​ The median number of items
III.​ The mode of the number of items
A.​ I only C.​ III only
B.​ II only D.​ I and II only
2. Spread: Range and standard deviation
- Range = maximum value – minimum value
- A larger range indicates a greater spread in the data.
- Standard deviation measures the typical spread from the mean; it is the average distance between the mean
and a value in a data set.
- Larger standard deviation indicates greater spread in the data.
𝑁
2
∑ (𝑥𝑖−µ) 2 2
(𝑥1−µ) + (𝑥2−µ) + ....... + (𝑥𝑁−µ)
2
𝑖=1
σ = 𝑁
= 𝑁

Question 1.

The dot plot represents the 15 values in data set A. Data set B is created by adding 56 to each of the values in
data set A. Which of the following correctly compares the medians and the ranges of data sets A and B?
A. The median of data set B is equal to the median of data set A, and the range of data set B is equal to the
range of data set A.
B. The median of data set B is equal to the median of data set A, and the range of data set B is greater than the
range of data set A.
C. The median of data set B is greater than the median of data set A, and the range of data set B is equal to the
range of data set A.
D. The median of data set B is greater than the median of data set A, and the range of data set B is greater than
the range of data set A.

Question 2.

Each of the dot plots shown represents the number of glue sticks brought in by each student for two classes,
class A and class B. Which statement best compares the standard deviations of the numbers of glue sticks
brought in by each student for these two classes?
A. The standard deviation of the number of glue sticks brought in by each student for class A is less than the
standard deviation of the number of glue sticks brought in by each student for class B.
B. The standard deviation of the number of glue sticks brought in by each student for class A is equal to the
standard deviation of the number of glue sticks brought in by each student for class B.
C. The standard deviation of the number of glue sticks brought in by each student for class A is greater than the
standard deviation of the number of glue sticks brought in by each student for class B.
D. There is not enough information to compare these standard deviations.
Question 3.
The 22 students in a health class conducted an experiment in which they each recorded their pulse rates, in beats
per minute, before and after completing a light exercise routine. The dot plots below display the results.

Let 𝑠1 and 𝑟1 be the standard deviation and range, respectively, of the data before exercise, and let 𝑠2 and 𝑟2 be
the standard deviation and range, respectively, of the data after exercise. Which of the following is true?
A.​ 𝑠1 = 𝑠2 and 𝑟1 = 𝑟2 C.​ 𝑠1 > 𝑠2 and 𝑟1 > 𝑟2
B.​ 𝑠1 < 𝑠2 and 𝑟1 < 𝑟2 D.​ 𝑠1 ≠ 𝑠2 and 𝑟1 = 𝑟2

Question 4.
The dot plots represent the distributions of values in data sets A and B.
Which of the following statements must be true?
I.​ The median of data set A is equal to the median of data set B.
II.​ The standard deviation of data set A is equal to the standard deviation of data set B.
A.​ I only C.​ I and II
B.​ II only D.​ Neither I nor II

Question 5.
Data set P
Value 0 1 2 3 4 5 6 7 8 9

Frequency 1 1 2 3 6 5 4 3 3 2
Data set Q
Value 4 5 6 7 8 9 10 11 12 13

Frequency 1 1 2 3 6 5 4 3 3 2
Which statement best compares the mean 𝑎 and standard deviation 𝑏 of data set P with the mean 𝑐 and standard
deviation 𝑑 of data set Q?
A.​ 𝑎 < 𝑐; 𝑏 < 𝑑 C.​ 𝑎 > 𝑐; 𝑏 = 𝑑
B.​ 𝑎 < 𝑐; 𝑏 = 𝑑 D.​ 𝑎 > 𝑐; 𝑏 > 𝑑

Question 6.
The tables below give the distribution of low temperatures in degrees Celsius (°C) for City A and City B over
the same 31 days in August.
Which of the following is true about the data shown for these 31 days?
A. The standard deviation of temperatures in these cities cannot be calculated with the data provided.
B. The standard deviation of temperatures in City A is the same as that of City B.
C. The standard deviation of temperatures in City A is larger.
D. The standard deviation of temperatures in City B is larger.

Question 7.
Locks are sections of canals in which the water level can be mechanically changed to raise and lower boats. The
table below shows the number of locks for 10 canals in France.
Name Aisne Alsace Rhone Centre Garonne Lalinde Midi Oise Vosges Sambre

#Locks 27 25 5 30 23 27 32 27 93 29
Removing which of the following two canals from the data would result in the greatest decrease in the standard
deviation of the number of locks in each canal?
A.​ Aisne and Lalinde C.​ Centre and Midi
B.​ Alsace and Garonne D.​ Rhone and Vosges

Question 8.
A data set of 27 different numbers has a mean of 33 and a median of 33. A new data set is created by adding 7
to each number in the original data set that is greater than the median and subtracting 7 from each number in the
original data set that is less than the median. Which of the following measures does NOT have the same value
in both the original and new data sets?
A.​ Median C.​ Sum of the numbers
B.​ Mean D.​ Standard deviation

Question 9.
The following table shows the number of books read by two book clubs over the past 6 months:

Which book club likely has a higher standard deviation for the number of books read per month?
A.​ Book Club A
B.​ Book Club B
C.​ The standard deviations are equal
D.​ It cannot be determined from the given information.
3. Outlier
- An outlier is a value in a data set that significantly differs from other values.
- Effect on the range and standard deviation: The inclusion of outliers increases the spread of data, leading
to larger range and standard deviation. Conversely, removing outliers decreases the spread of data, leading to
smaller range and standard deviation.
- Effect on the mean, median, and mode:

A really high outlier A really low outlier

The mean will increase significantly The mean will decrease significantly

The median will increase slightly/ stay unchanged The median will decrease slightly/ stay unchanged

The mode will stay the same The mode will stay the same

Question 1.
Age of 25 babies when they started walking

The dot plot above gives the ages, in months, at which 25 babies began walking. Which of the following is true
about the mean and the median of the data?
A.​ The mean is greater than the median
B.​ The mean is less than the median
C.​ The mean is equal to the median
D.​ The relationship between the mean and the median cannot be determined from the dot plot.

Question 2.

Data set X and data set Y are represented by the dot plots shown. Which of the following is (are) true?
I.​ The median of data set X is greater than the median of data set Y.
II.​ The mean of data set X is greater than the mean of data set Y.
A.​ I only
B.​ II only
C.​ I and II
D.​ Neither I nor II
Question 3.
52, 58, 12, 62, 62, 70, 66, 68, 53, 67
The list above shows the number of points a basketball team scored in each of 10 games. During the 10 games,
the team scored a total of 570 points. If the outlier is removed from this data set, how will the mean and median
change?
A.​ Both the mean and the median will increase
B.​ Both the mean and the median will decrease
C.​ The mean will increase and the median will not change
D.​ The mean will decrease and the median will not change
Question 4.
Percentage of on-time flights

For each of the 50 largest airlines in the world, its percentage of on-time flights is plotted to the left to the
nearest 5 percent. According to the dot plot, which of the following is true?
A.​ There are an equal number of points to the left of 72.5 as to the right of 72.5, so the median percentage
of on-time flights is 72.5
B.​ There are an equal number of points to the left of 72.5 as to the right of 72.5, so the mean percentage of
on-time flights is 72.5
C.​ There is a smaller concentration of points on the left side of the dot plot, so the median percentage of
on-time flights is less than the mean.
D.​ There is a smaller concentration of points on the left side of the dot plot, so the mean percentage of
on-time flights is less than the median.
Question 5.

The dot plot depicts the heights in inches of players on a professional basketball team. What would happen to
the standard deviation of the data set if the lowest and highest heights were removed?
A.​ It would increase
B.​ It would decrease
C.​ It would remain the same
D.​ There is not enough information to determine the impact on the standard deviation of the data set.
Question 6.

A class used catapults to launch 2 kinds of gummy candies. The dot plots at left recorded the distances the
gummy candies travelled, in inches. Which statement best compares the standard deviations and the means of
the distances travelled of the 2 kinds of candies?
A.​ The worm gummy distances have a greater standard deviation and mean than the fish gummy
B.​ The worm gummy distances have a greater standard deviation, but a lower mean than the fish gummy
distances
C.​ The fish gummy distances have a greater standard deviation and mean than the worm gummy distances
D.​ The fish gummy distances have a greater standard deviation, but a lower mean than the worm gummy
distances.

Question 7.
An employer wanted to compare the commute times between the 1st and 2nd shift employees. Both shifts had a
mean commute of 17 minutes. The histograms below summarize the average commute times of the employees.
Which of the following statements best compares the standard deviations of the shifts?

A.​ The standard deviations are equal.


B.​ The standard deviation for the 1st shift employees is greater.
C.​ The standard deviation for the 2nd shift employees is greater.
D.​ Histograms do not provide enough information to compare standard deviation.
4. Boxplot

Question 1.

Which of the following statements about the data represented in the box plot above must be true?
A.​ There are more data between 61 and 84 than between 51 and 61.
B.​ There are no data between 37 and 51.
C.​ The mean of the data is 61
D.​ The range of the data is 50.

Question 2.

The box summarizes the distribution of 100 distinct values in a data set. Which of the following intervals
contains the most data values?
A.​ 10 ≤ 𝑥 ≤ 40
B.​ 40 ≤ 𝑥 ≤ 70
C.​ 70 ≤ 𝑥 ≤ 90
D.​ 90 ≤ 𝑥 ≤ 100

Question 3.

The box plots summarize the masses, in kilograms, of two groups of gazelles. Based on the box plots, which of
the following statements must be true?
A.​ The mean mass of group 1 is greater than the mean mass of group 2.
B.​ The mean mass of group 1 is less than the mean mass of group 2.
C.​ The median mass of group 1 is greater than the median mass of group 2.
D.​ The median mass of group 1 is less than the median mass of group 2.
Question 4.

Which of the following box plots could represent Amir’s survey data?
A.

B.

C.

D.

You might also like