Basic Statistics (Module -2)
Activity Data Type
Number of beatings from Wife Discrete
Results of rolling a dice Discrete
Weight of a person Continuous
Weight of Gold Continuous
Distance between two places Continuous
Length of a leaf Continuous
Dog's weight Continuous
Blue Color Discrete
Number of kids Discrete
Number of tickets in Indian railways Discrete
Number of times married Discrete
Gender (Male or Female) Discrete
Voltage Continuous
Speed of the car Continuous
Distance between planets Continuous
The size of a two a bedroom flat Continuous
Wind speed Continuous
Facebook likes Discrete
Votes in election Discrete
Make up kits purchased Discrete
Death toll in flood disaster Continuous
The waiting time of customers in bank Continuous
Price of iPhone in the market Continuous
Stolen Cars Discrete
Q1) Identify the Data type (Continuous/Discrete) for the Following:
© 2013 - 2020 360DigiTMG. All Rights Reserved.
Q2) Identify the Data types, which were among the following
Nominal(names/labels), Ordinal(order/direction), Interval(debatable), Ratio(can’t be debated, .i.e.
facts)
Data Data Type
Gender Nominal
High School Class Ranking Ordinal
Celsius Temperature Interval
Weight Ratio
Hair Color Nominal
Socioeconomic Status Ordinal
Fahrenheit Temperature Interval
Height Ratio
Type of living accommodation Nominal
Level of Agreement Ordinal
IQ(Intelligence Scale) Ratio
Sales Figures Ratio
Blood Group Nominal
Time Of Day Ordinal
Time on a Clock with Hands Interval
Number of Children Ordinal
Religious Preference Nominal
Barometer Pressure Ratio
SAT Scores Ratio
Years of Education Ratio
Size of egg Ordinal
Monthly Income Ratio
Unemployment rate Interval
Military Rank Ratio
Shoe size Ordinal
Pulse rate Interval
Vital capacity Ratio
Favorite candy bar Nominal
Name of the Grains Nominal
Pesticides level Interval
© 2013 - 2020 360DigiTMG. All Rights Reserved.
Tribe of origin Ordinal
Help Desk Service Satisfaction Score Ordinal
Ethnicity Nominal
Marital status Nominal
Type of Residence Ordinal
Swimming level Interval
Amount of Money Ordinal
Colors of paint Nominal
Weekly Food spending Ratio
Q3) Identify whether the Data is Qualitative or Quantitative for the Following:
Data Data Type (Qualitative/Quantitative)
I bought Strawberry lipstick today Qualitative
Happiness rating Quantitative
Duration of red-light signal Quantitative
I like butterscotch ice cream Qualitative
Setosa belongs to Iris family of flowers Quantitative
cold Coffee Qualitative
The Tea smells good Qualitative
Dress Size Quantitative
Q4) Identify whether the Data is Categorical or Numerical for the Following:
Data Data Type (Categorical / Numerical)
Product type Categorical
Native language Categorical
Type of teaching approach Categorical
Virus in a System Categorical
Covid-19 Positive Cases Numerical
Lockdown Days Numerical
© 2013 - 2020 360DigiTMG. All Rights Reserved.
Q 5) Identify whether the Data is Structured or Unstructured for the following:
Data Data Type (Structured/Unstructured)
Credit card numbers Structured
Transaction information Structured
Text files Unstructured
Images Unstructured
Music files Structured
Q6) Three Coins are tossed, find the probability that two heads and one tail are obtained?
Ans) Total number of events= {hhh, hht, htt, ttt, tth, thh, hth, tht} =8
Interested events=3
Probability=3/8=0.375
Q7) Two Dice are rolled, find the probability that sum is
a) Equal to 1
b) Less than or equal to 4
c) Sum is divisible by 2 and 3
Ans) Total number of outcomes when two dice are rolled=6*6=36.
(1, 1)(1, 2)(1, 3)(1, 4)(1, 5)(1, 6)
(2, 1)(2, 2)(2, 3)(2, 4)(2, 5)(2, 6)
(3, 1)(3, 2)(3, 3)(3, 4)(3, 5)(3, 6)
(4, 1)(4, 2)(4, 3)(4, 4)(4, 5)(4, 6)
(5, 1)(5, 2)(5, 3)(5, 4)(5, 5)(5, 6)
(6, 1)(6, 2)(6, 3)(6, 4)(6, 5)(6, 6)
a) Equal to 1 = 0% probability
b) Less than or equal to 4= 6/36 = 1/6
c) sum is divisible by 2 and 3
{ 2 3 4 5 6 7
3 4 5 6 7 8
4 5 6 7 8 9
5 6 7 8 9 10
6 7 8 9 10 11
7 8 9 10 11 12}
© 2013 - 2020 360DigiTMG. All Rights Reserved.
Probability=6/36= 1/6
Q8) A bag contains 2 red, 3 green and 2 blue balls. Two balls are drawn at random. What is the
probability that none of the balls drawn is blue?
7!
Ans) total number of events= nC r =7 C 2= =21
2!∗5 !
5!
Interested events=5 C2 = =10
2!∗3 !
Probability that none of the balls is blue =10/21=0.47
Q9) Calculate the Expected number of candies for a randomly selected child
Below are the probabilities of count of candies for children (ignoring the nature of the child-Generalized
view)
CHILD Candies count Probability
A 1 0.015
B 4 0.20
C 3 0.65
D 5 0.005
E 6 0.01
F 2 0.120
Child A – probability of having 1 candy = 0.015.
Child B – probability of having 4 candies = 0.20
Ans) Expected number = E(x) = μ x=1*0.015+4*0.20+3*0.65+5*0.005+6*0.01+2*0.120= 3.09
Q10) Calculate Mean, Median, Mode, Variance, Standard Deviation, Range & comment about the
values / draw inferences, for the given dataset
- For Points,Score,Weigh>
Find Mean, Median, Mode, Variance, Standard Deviation, and Range and also Comment about the
values/ Draw some inferences.
© 2013 - 2020 360DigiTMG. All Rights Reserved.
Ans) #measures of central tendencies
# For points
> mean(ex1$points)
[1] 3.596563
> median(ex1$points)
[1] 3.695
> getmode(ex1$points)
[1] 3.92
> var(ex1$points)
[1] 0.2858814
> sd(ex1$points)
[1] 0.5346787
> range(ex1$points)
© 2013 - 2020 360DigiTMG. All Rights Reserved.
[1] 2.76 4.93
> #For Score
> mean(ex1$score)
[1] 3.21725
> median(ex1$score)
[1] 3.325
> getmode(ex1$score)
[1] 3.44
> var(ex1$score)
[1] 0.957379
> sd(ex1$score)
[1] 0.9784574
> range(ex1$score)
[1] 1.513 5.424
> #For Weight
> mean(ex1$weight)
[1] 17.84875
> median(ex1$weight)
[1] 17.71
> getmode(ex1$weight)
[1] 17.02
> var(ex1$weight)
[1] 3.193166
> sd(ex1$weight)
[1] 1.786943
> range(ex1$weight)
[1] 14.5 22.9
Inferences: the mode usually just gives the class (ex:numeric). To get mode, first we have to write a
function to get mode. In the above code, getmode(it can be any name) is the name of the function to
find the mode.
Q11) Calculate Expected Value for the problem below
a) The weights (X) of patients at a clinic (in pounds), are
108, 110, 123, 134, 135, 145, 167, 187, 199
Assume one of the patients is chosen at random. What is the Expected Value of the Weight of
that patient?
108+110+123+ 134+135+145+167 +187+199
Ans: EV=Σx/n = =145.33
9
1. Look at the data given below. Plot the data, find the outliers and find out μ , σ , σ 2
© 2013 - 2020 360DigiTMG. All Rights Reserved.
Name of company Measure X
© 2013 - 2020 360DigiTMG. All Rights Reserved.
Allied Signal 24.23%
Bankers Trust 25.53%
General Mills 25.41%
ITT Industries 24.14%
J.P.Morgan& Co. 29.62%
Lehman Brothers 28.25%
Marriott 25.81%
MCI 24.39%
Merrill Lynch 40.26%
Microsoft 32.95%
Morgan Stanley 91.36%
Sun Microsystems 25.99%
Travelers 39.42%
US Airways 26.71%
Warner-Lambert 35.00%
Ans)
> boxplot(Book1$`Measure X`,horizontal = T)
Outliers: Morgan Stanley 91.36%
> mean(Book1$`Measure X`)
[1] 0.3327133
> var(Book1$`Measure X`)
[1] 0.02871466
> sd(Book1$`Measure X`)
[1] 0.169454
© 2013 - 2020 360DigiTMG. All Rights Reserved.
2. AT&T was running commercials in 1990 aimed at luring back customers who had switched to
one of the other long-distance phone service providers. One such commercial shows a
businessman trying to reach Phoenix and mistakenly getting Fiji, where a half-naked native on a
beach responds incomprehensibly in Polynesian. When asked about this advertisement, AT&T
admitted that the portrayed incident did not actually take place but added that this was an
enactment of something that “could happen.” Suppose that one in 200 long-distance telephone
calls is misdirected. What is the probability that at least one in five attempted telephone calls
reaches the wrong number? (Assume independence of attempts.)
Ans) let us consider the probability of 1 call misdirected out of 200 as event A.
Probability of occurring of event A= 1/200
P(A)= 1/200
Probability of having at least one successful call will be
1-P(A)= 1-1/200= 199/200= 0.995
As every event is independent of other event the probability will be
1- (0.995)^5
0.02475 = 2% chance.
3. Returns on a certain business venture, to the nearest $1,000, are known to follow the following
probability distribution
X P(x)
-2,000 0.1
-1,000 0.1
0 0.2
1000 0.2
2000 0.3
3000 0.1
(i) What is the most likely monetary outcome of the business venture?
Ans) The most likely outcome of this business venture is a return of $2000 as it has the
highest probability of occurrence.
(ii) Is the venture likely to be successful? Explain
Ans) success of the venture can be defined in multiple ways. But based on the data provided,
we can look at positive returns as a measure of success.
The probability distribution gives us an idea about the long-term chances of earning
given values of returns (indicated by x). therefore, there is a 60% probability that the
venture would be successful. (Note: 0.2+0.3+0.1=0.6=>0.6*100=>60%).
(iii) What is the long-term average earning of business ventures of this kind? Explain
Ans) From the above question requirement we have to consider similar business ventures of
this type whose distribution of the returns is similar to this venture. In that case we say
that the expected value of returns to this particular venture is the required average.
(-2000*0.1)+(-1000*0.1)+(0*0.2)+(1000*0.2)+(2000 *0.3)+(3000*0.1)=800
Therefore the long-term average earning for these type of ventures would be around
$800.
© 2013 - 2020 360DigiTMG. All Rights Reserved.
(iv) What is the good measure of the risk involved in a venture of this kind? Compute this
measure
Ans) Risk stems from the possible variability in the expected returns. Therefore a good
measure to evaluate the risk for a venture of this kind would be variance or standard deviation of the
variable X.
> sd(ex$x)
[1] 1870.829
> var(ex$x)
[1] 3500000
The large value of standard deviation of $1870 is considered along with the average returns of $800
indicates that this venture is highly risky.
© 2013 - 2020 360DigiTMG. All Rights Reserved.