2022 CS244 End Sem Soln

This document contains the details of an exam for the course CS244: Data Science at Indian Institute of Technology Patna. The exam contains 11 multiple choice or short answer questions related to topics in data science and machine learning including normal distributions, hypothesis testing, naive Bayes classification, neural networks, linear regression, support vector machines, and PageRank. The exam is 3 hours long and carries a total of 50 marks.

Uploaded by

Bhanu Macharla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

83 views6 pages

2022 CS244 End Sem Soln

Uploaded by

Bhanu Macharla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Indian Institute of Technology Patna

CS244: Data Science

EN DSE M 26th April 2022

TI M E: 3 H O U R S Full Marks 50

Figure 2: Chi-square Table

tribution. z =(43.5 - 40)/2= 1.75. P(X >43.5) = P(Z

> 1.75) = 1 - P(Z < 1.75) = 1 - 0.9599 = 0.0401

3. The average height of females in the freshman class of

a certain college has historically been 162.5 centime-
ters with a standard deviation of 6.9 centimeters. Is
there reason to believe that there has been a change in
the average height if a random sample of 50 females
in the present freshman class has an average height of
165.2 centimeters? Consider value of alpha is 0.05.
State Null hypothesis and alternative hypothesis. Find
out the critical value and test statistics. [3]
The hypotheses are
Figure 1: Standard Normal Table H0 : µ = 162.5 centimeters,
H1 : µ != 162.5 centimeters.
Now, Z = (165.2-162.5)/(6.9/50) = 2.77. For α =0.05
and a two tailed test critical value is 1.96. So we
1. A lawyer commutes daily from his suburban home to
reject Null Hypothesis in favour of alternative hy-
his midtown office. The average time for a one-way
pothesis.
trip is 24 minutes, with a standard deviation of 3.8 min-
utes. Assume the distribution of trip times to be nor- 4. A manufacturer of car batteries claims that the life of
mally distributed. If the office opens at 9:00 A.M. and the company’s batteries is approximately normally dis-
the lawyer leaves his house at 8:40 A.M. daily, what tributed with a standard deviation equal to 0.9 year. If
percentage of the time he is late for work? [3] a random sample of 10 of these batteries has a standard
P(X> 20) = P(Z>((20-24)/3.8) = P(Z>-1.05). deviation of 1.2 years, do you think that σ > 0.9 year?
for Z=1.05 area is 0.8531. Use a 0.05 level of significance. [3]
P(Z>1.05)=1-0.8531= 0.1469 H0 : σ 2 = 0.81.
So P(Z<-1.05) = 0.1469 H1 : σ 2 > 0.81.
So P(Z>-1.05)=1-0.1469=0.8531 α = 0.05. Critical region: χ2 > 16.919,
Computations: s2 = 1.44 (as σ0 =1.2 given),
2. A certain machine makes electrical resistors having a n = 10, and χ2 =(9)(1.44)/0.81= 16.0
mean resistance of 40 ohms and a standard deviation of Decision: The χ2 -statistic is not significant at the 0.05
2 ohms. Assuming that the resistance follows a normal level.
distribution Find the percentage of resistances exceed-
ing 43 ohms resistance is measured to the nearest ohm. 5. Average zinc concentration recovered from a sample
[3] of zinc measurements in 36 locations of river is found
We assign a measurement of 43 ohms to all resis- to be 2.6 grams per milliliter. Find the 95% confidence
tors whose resistances are greater than 42.5 and less Intervals for the mean zinc concentration in the river.
than 43.5. We are actually approximating a discrete Assume that population standard deviation is 0.3. [3]
distribution by means of a continuous normal dis- Point estimate of µ is x̄ = 2.6.

1
Z value leaving an area of 0.025, is Z0.025 = 1.96 ‘−’) given the features A, B, and C.
Hence 95% confidence Interval is
–2.6 – 1.96*(0.3/6)< µ < 2.6 + 1.96*(0.3/6)
= –2.5 < µ < 2.7
6. Naive Bayes: Suppose we are given the following
dataset, where A, B, C are input binary random vari-
ables, and y is a binary output whose value we want to
predict. How would a naive Bayes classifier predict y
given this input: A = 0, B = 0, C = 1? Assume that in
case of a tie the classifier always prefers to predict 0
for y. [4] (A) First, consider building a decision tree by greed-
ily splitting according to information gain. (a) Which
features could be at the root of the resulting tree? (b)
How many edges are there in the longest path of the
resulting tree?
(B) Now, consider building a decision tree with the
smallest possible height. (a) Which features could
be at the root of the resulting tree? (b) How many
edges are there in the longest path of the resulting tree?
[2+2+2+1]
7. Deep Learning: Suppose you are given predictions of
n different experts (or, automated learners), whether 10. PageRank: Consider the following diagram that de-
a given email message is SPAM (1), or EMAIL (0). picts the connectivity among 4 web pages (nodes 1-
Your goal is to output a single prediction per message, 4). You need to compute the page-rank for each of the
that would be as accurate as possible. For this pur- node. Assume damping factor as 1. [Hints: Try to
pose, you’d like to implement a majority voting mech- avoid iterative method] [4]
anism. That is, if more than half of the experts predict
1 3
SPAM, than your final prediction should be SPAM for
that instance. Otherwise, the final prediction should
be EMAIL. (a) Suggest a neural network, that imple-
ments majority voting when there are 4 experts overall
(named A,B,C,D). Specify the network structure and
2 4
weights. (b) Explain shortly how to adapt the network
structure and weights to the general case of n experts.
[3+1] 11. SVM: Suppose we only have four training exam-
ples in two dimensions which are as follows. P1 =
8. Linear regression: We are interested here in a par- (0, 0), P2 = (2, 2), P3 = (h, 1), P4 = (0, 3) where
ticular 1-dimensional linear regression problem. The 0 ≤ h ≤ 3. The positive examples are P1 &P2 and
dataset corresponding to this problem has n examples the other two points are negative examples. (a) How
(x1 ; y1 ), . . . , (xn ; yn ) where xi and yi are real num- large can h ≥ 0 be so that the training points are still
bers for all i. Let w∗ = [w0∗ , w1∗ ]T be the least squares linearly separable? (b) What is the margin achieved by
solution. In other words, w∗ minimizes J(w) = the maximum margin boundary as a function of h? (c)
n
1X Assume that we can only observe the second compo-
(yi −w0 −w1 ×xi )2 . You can assume for our pur- nent of the input vectors. Without the other compo-
n
i=1
poses here that the solution is unique. Find the value nent, the labeled training points reduce to (0,+), (2,+),
of thePof the following expressions with justifications (1,−), and (3,−). What is the lowest order p of poly-
n
x nomial kernel that would allow us to correctly classify
[x̄ = in i ]
these points? [2+2+1]
(a) n1 ni=1 (yi − w0∗ − w1∗ xi )(xi − x̄)
P

(b) n1 ni=1 (yi − w0∗ − w1∗ xi )(w0∗ + w1∗ xi )

P
[2+2] 12. K-Means clustering: Consider performing K-Means
Clustering on a one-dimensional dataset containing
9. Decision Tree: You are given a dataset for training a four data points: 5, 7, 10, 12 using k = 2, Euclidean
decision tree. The goal is to predict the label (‘+’ or distance, and the initial cluster centers are c1 = 3.0

2
and c2 = 13.0. (a) What are the initial cluster assign-
ments? (That is, which examples are in cluster c1 and
which examples are in cluster c2 ?) (b) What are the
new cluster centers after making the assignments in
(a)? (c) State True or False: K-Means Clustering is
guaranteed to converge. [1+1+1]

13. Spectral clustering: Write the Laplacian matrix for

the following graph (Fig: SC) for spectral clustering.
All the edges have weight 1 (similarity measure). [2]

14. Linear algebra: Consider the following set of points

(x) (below figure (Fig: LA), rectangular region) are
transformed using the matrix A having one eigen-
value as 2 and the corresponding vector is e1 =
[0.707, −0.707]T . The other eigenvalue is 0. Draw
the plot for Ax. [2]

1 2

4 3

Fig: SC Fig: LA

Practice Midterm
No ratings yet
Practice Midterm
4 pages
hw7 Sol
No ratings yet
hw7 Sol
12 pages
Question Bank
No ratings yet
Question Bank
6 pages
Endsem ML Regular AK
No ratings yet
Endsem ML Regular AK
7 pages
IBM322 Last Year ETE
No ratings yet
IBM322 Last Year ETE
5 pages
Data Mining & Analysis Guide
No ratings yet
Data Mining & Analysis Guide
148 pages
Practice Midterm 2010
No ratings yet
Practice Midterm 2010
4 pages
ECE457 Pattern Recognition Techniques and Algorithms: Answer All Questions
No ratings yet
ECE457 Pattern Recognition Techniques and Algorithms: Answer All Questions
3 pages
CS229 Midterm Solutions 2010
No ratings yet
CS229 Midterm Solutions 2010
8 pages
ML 2024a QP Solution Full
No ratings yet
ML 2024a QP Solution Full
13 pages
ML 2023a Midsem Solution
No ratings yet
ML 2023a Midsem Solution
9 pages
CS 7641 Midterm Exam 2 Solutions
No ratings yet
CS 7641 Midterm Exam 2 Solutions
12 pages
Statistical Machine Learning Assignment
No ratings yet
Statistical Machine Learning Assignment
5 pages
ML Questions
No ratings yet
ML Questions
6 pages
Lecture 1
No ratings yet
Lecture 1
8 pages
HW 3
No ratings yet
HW 3
7 pages
Endsem ML Makeup AK - 1
No ratings yet
Endsem ML Makeup AK - 1
7 pages
Statistics Course Review Notes
No ratings yet
Statistics Course Review Notes
20 pages
Machine Learning Quiz for Students
No ratings yet
Machine Learning Quiz for Students
45 pages
Midterm 2008s Solution
No ratings yet
Midterm 2008s Solution
12 pages
10-701/15-781 Machine Learning Mid-Term Exam Solution: Your Name
No ratings yet
10-701/15-781 Machine Learning Mid-Term Exam Solution: Your Name
12 pages
Pattern Recognition Pyq
No ratings yet
Pattern Recognition Pyq
9 pages
CS246 Final Exam Solutions, Winter 2011
No ratings yet
CS246 Final Exam Solutions, Winter 2011
18 pages
IBM322 Last Year ETE
No ratings yet
IBM322 Last Year ETE
5 pages
Worksheet For Quiz
No ratings yet
Worksheet For Quiz
5 pages
Midterm With Solutions
No ratings yet
Midterm With Solutions
26 pages
HW 4
No ratings yet
HW 4
5 pages
Data Modelling Visualization Solutions Marking Scheme
No ratings yet
Data Modelling Visualization Solutions Marking Scheme
6 pages
Final Compre - Solutions - Updated FoDS
No ratings yet
Final Compre - Solutions - Updated FoDS
12 pages
Practice Finals
No ratings yet
Practice Finals
7 pages
2011 End Spring 2011 Computer Science Machine Learning
No ratings yet
2011 End Spring 2011 Computer Science Machine Learning
10 pages
2019-20-I MS Key
No ratings yet
2019-20-I MS Key
6 pages
Amazon ML Summer School Previous Year Questions
100% (1)
Amazon ML Summer School Previous Year Questions
12 pages
Activities Super
No ratings yet
Activities Super
6 pages
SMAI Question Papers
No ratings yet
SMAI Question Papers
13 pages
3rd Sem Fatgghu
No ratings yet
3rd Sem Fatgghu
11 pages
PRML 2022 Endsem
No ratings yet
PRML 2022 Endsem
3 pages
SMAI End 2015 S
No ratings yet
SMAI End 2015 S
4 pages
DS Assignment COMPLETED
No ratings yet
DS Assignment COMPLETED
11 pages
Kernel PCA
No ratings yet
Kernel PCA
13 pages
ML Question CMU
No ratings yet
ML Question CMU
12 pages
Machine Learning 10-701 Exam Prep
No ratings yet
Machine Learning 10-701 Exam Prep
14 pages
HW 02
No ratings yet
HW 02
3 pages
Midterm Solutions
No ratings yet
Midterm Solutions
8 pages
Solutions: 10-601 Machine Learning, Midterm Exam: Spring 2008 Solutions
No ratings yet
Solutions: 10-601 Machine Learning, Midterm Exam: Spring 2008 Solutions
8 pages
Week 1
100% (1)
Week 1
25 pages
Machine Learning Exam Instructions
No ratings yet
Machine Learning Exam Instructions
16 pages
MS4610 - Introduction To Data Analytics Final Exam Date: November 24, 2021, Duration: 1 Hour, Max Marks: 75
No ratings yet
MS4610 - Introduction To Data Analytics Final Exam Date: November 24, 2021, Duration: 1 Hour, Max Marks: 75
11 pages
DS ML Probability Statistics Interview
No ratings yet
DS ML Probability Statistics Interview
6 pages
DWM - END SEM LAB Questions
No ratings yet
DWM - END SEM LAB Questions
9 pages
Week 6 v1.61 (Hidden) - Revision, CW1, and Probabilistic Graphical Models
No ratings yet
Week 6 v1.61 (Hidden) - Revision, CW1, and Probabilistic Graphical Models
65 pages
ESE - October 2023
No ratings yet
ESE - October 2023
3 pages
Machine Learning Assignment 1 Basic Concepts: Due: 27 March 2015, 15:00pm
No ratings yet
Machine Learning Assignment 1 Basic Concepts: Due: 27 March 2015, 15:00pm
3 pages
Dsa - DK Question Paper
No ratings yet
Dsa - DK Question Paper
4 pages
Linear Regression & Data Analysis
No ratings yet
Linear Regression & Data Analysis
4 pages
Mid-Semester Make-Up Data Mining QP v1
No ratings yet
Mid-Semester Make-Up Data Mining QP v1
3 pages
Homework 0: Mathematical Background For Machine Learning
No ratings yet
Homework 0: Mathematical Background For Machine Learning
11 pages
Understanding Financial Prosperity - Chapter 1
No ratings yet
Understanding Financial Prosperity - Chapter 1
9 pages
20.312 D02 Rev 01 Design Brief
No ratings yet
20.312 D02 Rev 01 Design Brief
17 pages
Examination Of: Thyroid
No ratings yet
Examination Of: Thyroid
49 pages
Blood in The Woods A Beginner Adventure
No ratings yet
Blood in The Woods A Beginner Adventure
6 pages
Empowering Self-Healing Practices
No ratings yet
Empowering Self-Healing Practices
7 pages
Santa's Coming To Town
No ratings yet
Santa's Coming To Town
4 pages
DBA T4 Business Analytics Proposal Development 2023
No ratings yet
DBA T4 Business Analytics Proposal Development 2023
33 pages
Capstone Project Course Outline-Final
No ratings yet
Capstone Project Course Outline-Final
2 pages
Neutrosophic Single Acceptance Sampling Plan With Quality Parameters
No ratings yet
Neutrosophic Single Acceptance Sampling Plan With Quality Parameters
9 pages
RPS FIS622104 Statistika Sosial
No ratings yet
RPS FIS622104 Statistika Sosial
7 pages
Information Technology History in Malaysia
100% (1)
Information Technology History in Malaysia
7 pages
Hypercalcemia: Case Study
No ratings yet
Hypercalcemia: Case Study
11 pages
Evolution of The Mcdonald'S Logo
No ratings yet
Evolution of The Mcdonald'S Logo
3 pages
Empowering Education for All
No ratings yet
Empowering Education for All
1 page
Critique On Art Using John Berger's Nude vs. Naked
No ratings yet
Critique On Art Using John Berger's Nude vs. Naked
7 pages
Pharmacology of Central Nervous System
100% (1)
Pharmacology of Central Nervous System
20 pages
Motion and Momentum
No ratings yet
Motion and Momentum
6 pages
Impact of Nigeria French Language Village
No ratings yet
Impact of Nigeria French Language Village
6 pages
Measurement
No ratings yet
Measurement
2 pages
5 Wrong Reasons To Become A Freemason - MasonicFind
No ratings yet
5 Wrong Reasons To Become A Freemason - MasonicFind
9 pages
Aliquot 2008
No ratings yet
Aliquot 2008
3 pages
The-Omni Series On Characteristics-of-God Part 1
No ratings yet
The-Omni Series On Characteristics-of-God Part 1
5 pages
Councilman Phillip Baker Civil Suit
No ratings yet
Councilman Phillip Baker Civil Suit
16 pages
Example Srs Document For A Project PDF
0% (2)
Example Srs Document For A Project PDF
2 pages
Bismuth Analysis in Pepto-Bismol
No ratings yet
Bismuth Analysis in Pepto-Bismol
7 pages
ALEC Internship Program - Position Descriptions
No ratings yet
ALEC Internship Program - Position Descriptions
3 pages
214 School Fire Safety Final Notice
No ratings yet
214 School Fire Safety Final Notice
15 pages
GD-PI - 1 Ashish Sir
No ratings yet
GD-PI - 1 Ashish Sir
2 pages
Lavender Tea: Health Benefits & Uses
No ratings yet
Lavender Tea: Health Benefits & Uses
3 pages
Chen Soon Lee V Chong Voon Pin & Ors - (1966
100% (2)
Chen Soon Lee V Chong Voon Pin & Ors - (1966
6 pages

2022 CS244 End Sem Soln

Uploaded by

2022 CS244 End Sem Soln

Uploaded by

Indian Institute of Technology Patna

CS244: Data Science

EN DSE M 26th April 2022

Figure 2: Chi-square Table

tribution. z =(43.5 - 40)/2= 1.75. P(X >43.5) = P(Z

3. The average height of females in the freshman class of

(b) n1 ni=1 (yi − w0∗ − w1∗ xi )(w0∗ + w1∗ xi )

13. Spectral clustering: Write the Laplacian matrix for

14. Linear algebra: Consider the following set of points

You might also like