Statistical Learning

Exam year before

Uploaded by

clintonh149

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

23 views4 pages

Statistical Learning

Exam year before

Uploaded by

clintonh149

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 4

Univensipab Caatos IIT pe Maprip Bachelor in Data Science and Engineering Final Exam, January 2022 Statistical Learning, ‘Time: 2h 1, (2 points) Decide whether the following statemonts are true oF false, arguing your answer: (a) Ifa dataset is standardized, the correlations do not change (b) As the number of observations goes (o infinity in a dataset, a model trained on that data will have lower bias, (©) a clustering, there is uo tool to estimate te number of dusters. (@) In k-means, the ehuster centers do not need to be observed data points (0) In a Lasso regression model, if we increase the penalty parameter, the predietion bias is increased! and the varianeo is decreased (£) Ifa linear regression is trained using only half Une data, the bias will be smaller (q) Support vector machines, like logistic regression models, give a probability distibution over the possible labels given a new observation (a) When the auuber of observations is kage, overftting is more Likely (2) Holding, ont 20% of the observations for testing is faster Sefold cross-validation i less accurate than performing (j) Both linear regression and logistic regression are special cases of neural networks, Solution: (a) Tue. 8 taudardzation affects the means and the ances, not tHe correlations. (b) False. The bias is the same, the variance is smalles (0) Fase. There are several ways to estimate it, For instan be used to estimate the mumber of groups. a mixture gaussian model ean (d) Tr, The coordinates of the centroid are the (arithmetic) mean of the variables over all the observations in the eluster. It does not nced to be one of the data points, (o) True, (£) False. Bias depends on the model used, not on the number of observations, (g) False, SVMs do not give # probability distribution, just the lnbel predictions, (h) False, Ovorfitting is more likely when the (i) Trwe. (j) True. amber of variables (features) is high.2. (L point) The Mahalanobis distance between two observations and y in a population is defined Ax, y) = (2 = WEY —y), where 32 denotes the covariance mateix of the population Prove that this distance is equivalent ta the Enelidean distance iu a transformed space (iudi- cating the corresponding transformation). Solution: The Mahalanobis distance is defined as: d(x.y) = (x y)PE'x-y), which can be expanded in the following, way: d(x.y) = (x —y)PE ey) = (x-y/TEPE(x-y) PE0¢—y) = yT EV S1Ax— B-Hy) @ = (Ee? — (B98 )Fy)P (Ee By) (2-x)" = (EM yx = Boy) = ((EVPx) = (Be M2y)) "(SV — EeVey) The last expression is equivalent to the (squared) Euclidean distance between the vectors B=". and 3-”/#y. Hence, the transformation is x! = 2-"x. 3. (2 points) (a) In what senso are the linear combinations defined by PCA optimal? (b) Which are the main differences between Principal Component Analysis and Factor Anal- (6) When is it more conveniout to apply PCA (respect to PAY? (d) IEPCA is performed on an uncorrelated data set, which are the varianees of the principal components? Why? Solution: (a) PCA finds Tinear combinations of responses that have most of the variance. (b) In FA, the focus is to explain correlations betwcen indicators due to common factors (latent variables) whereas in PCA is to explain total variance, In PCA, factors (components) are uncorrelated whereas in FA are not. In PCA the factors (components) are linear combinations of indicators whereas in FA the indicators are linear combinations of factors, That implies FA is more interpretable but definition of factors is not explicit (more black- box)(©) Whea the indicators are causing the latent factors. of the correlation matrix, (4) The variances are equal to the eigenvaln 1. (1 point) Which of the following figures correspond ( possitie values that PCA may return Jor the fist principal component? Explain your answer. © | @- Solution: a) and ¢), because eigenvectors (or principal components) are unique except of the sign. 5, (I point) Which are the advantages of hierarchical clustering respect to partitioning one? Solution: No need to pre-define the umber of clusters. Easy to understand the ontput. Clusters are more interpretable, 6. (1 point) Consider the next training set for classification: ween wrwe What is the prediction for a Souearest, ueighbor classifier at the point (1,1)? Solution: + 7. (1 point) For te following confusion matrix, has this classifier a better accuraey than that. for the majority-class benchmark? Solution 60% Accumiey of classifier: (30+ 56)/106 = 81%, acenracy of naive is (8-+56)/106 =Predicted o a ; o 30 12 1 8 56 8. (1 point) Given the following coeffiieuts of a logistic regression model trying 1 prediet the target Diabotes Intercept | Sugarhublood Weight 46 2H | 0] 0.055 write precisely the formula to compute the probability of having Diabetes. Solution: Lot p be the probability of having Diabetes. Given the regeessons, the model for p » tox (2) 16-42 «(i2,) <46 Taking the exponential in both sides and transforming: HI» SugarInBleed — 0.12 Age + 0.035 Wetgne ( p ) sep (as 4234, ugarInBlood — 0.12 Age + 0.85 - Weight} 1p, p= (I~ plexp {46 + 2.34 SugnelnBlood ~ 0.12 Age + 0.35 Weight) exp (4.6 + 2.34 - SugarInBlood ~ 0.12 Age + 0.35 - Weight) 1+ exp {46 + 2.34 SugarlnBlood — 0.12 - Age + 0.35 - Weight}

2024 - PCS - 24P2CSC04 - Question Bank ML
No ratings yet
2024 - PCS - 24P2CSC04 - Question Bank ML
7 pages
MS4610 - Introduction To Data Analytics Final Exam Date: November 24, 2021, Duration: 1 Hour, Max Marks: 75
No ratings yet
MS4610 - Introduction To Data Analytics Final Exam Date: November 24, 2021, Duration: 1 Hour, Max Marks: 75
11 pages
cs675 SS2022 Midterm Solution PDF
No ratings yet
cs675 SS2022 Midterm Solution PDF
10 pages
SMAI Question Papers
No ratings yet
SMAI Question Papers
13 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
Assignment 3 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 3 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
8 pages
Machine Learning Exam Guide
No ratings yet
Machine Learning Exam Guide
9 pages
COMP 1003&1433 Midterm (Tuesday)
No ratings yet
COMP 1003&1433 Midterm (Tuesday)
8 pages
Machine Learning Quiz for Students
No ratings yet
Machine Learning Quiz for Students
45 pages
Statistics Quiz
No ratings yet
Statistics Quiz
20 pages
Machine Learning 10-701 Exam Prep
No ratings yet
Machine Learning 10-701 Exam Prep
14 pages
Question Bank
No ratings yet
Question Bank
6 pages
ML Assignments 2025
No ratings yet
ML Assignments 2025
91 pages
Endsem ML Makeup AK - 1
No ratings yet
Endsem ML Makeup AK - 1
7 pages
MLPUE2 Solution
No ratings yet
MLPUE2 Solution
9 pages
Ass8 Solns
No ratings yet
Ass8 Solns
10 pages
Endsem ML Regular AK
No ratings yet
Endsem ML Regular AK
7 pages
Advanced Stats Final Exam Sample
No ratings yet
Advanced Stats Final Exam Sample
9 pages
Final2019 Solutions
No ratings yet
Final2019 Solutions
23 pages
ML 2023a Midsem Solution
No ratings yet
ML 2023a Midsem Solution
9 pages
COMPSCI5014 1 Machine Learning (M) 201904
No ratings yet
COMPSCI5014 1 Machine Learning (M) 201904
7 pages
Predictive Modeling MCQs IMT
100% (1)
Predictive Modeling MCQs IMT
19 pages
Data Analysis & Interpretation Questions
No ratings yet
Data Analysis & Interpretation Questions
10 pages
Applied Econometrics Assignment 1
100% (1)
Applied Econometrics Assignment 1
9 pages
ML 20230316 1
No ratings yet
ML 20230316 1
9 pages
Statistics and Economics CTS 2025
No ratings yet
Statistics and Economics CTS 2025
63 pages
2024 Machine Learning
No ratings yet
2024 Machine Learning
8 pages
Exam SRM Sample Questions
No ratings yet
Exam SRM Sample Questions
69 pages
t4 Sol
No ratings yet
t4 Sol
8 pages
Machine Learning Assignment Solutions
No ratings yet
Machine Learning Assignment Solutions
46 pages
Final Fa21 Solutions
No ratings yet
Final Fa21 Solutions
40 pages
Exam 2011
No ratings yet
Exam 2011
22 pages
Assignment 1-12 ML
No ratings yet
Assignment 1-12 ML
54 pages
Quiz 2 2021 Sol
No ratings yet
Quiz 2 2021 Sol
8 pages
Stat - Model - Exam - 2017 - DBU
No ratings yet
Stat - Model - Exam - 2017 - DBU
20 pages
Final 2019
No ratings yet
Final 2019
15 pages
Final - Answers Stud
100% (1)
Final - Answers Stud
11 pages
Introds Final 2024 Incl Sol
No ratings yet
Introds Final 2024 Incl Sol
10 pages
MS2301
No ratings yet
MS2301
7 pages
DM Endsem 2023-1
No ratings yet
DM Endsem 2023-1
4 pages
Machine 2020 Jul-Dec
No ratings yet
Machine 2020 Jul-Dec
45 pages
Exam SRM Sample Questions
No ratings yet
Exam SRM Sample Questions
71 pages
ECON3334 - Mock Midterm - Fall2024
No ratings yet
ECON3334 - Mock Midterm - Fall2024
4 pages
ML FinalUpdated 1
No ratings yet
ML FinalUpdated 1
45 pages
MLFA Spring 2024
No ratings yet
MLFA Spring 2024
11 pages
ASDS22 MId 21
No ratings yet
ASDS22 MId 21
2 pages
ML Question CMU
No ratings yet
ML Question CMU
12 pages
Unit-2 MLT
No ratings yet
Unit-2 MLT
84 pages
Classification: K N X X X y I y
No ratings yet
Classification: K N X X X y I y
6 pages
Data Science Quiz for Students
100% (1)
Data Science Quiz for Students
21 pages
Practice Midterm
No ratings yet
Practice Midterm
4 pages
Itae002 Test 2
No ratings yet
Itae002 Test 2
150 pages
PSLP Notes
No ratings yet
PSLP Notes
13 pages
Exam Final 1 Exam
No ratings yet
Exam Final 1 Exam
12 pages
Statistics and Probability Quiz
100% (2)
Statistics and Probability Quiz
5 pages
Linear Regression
No ratings yet
Linear Regression
37 pages
10f 601 Midterm
No ratings yet
10f 601 Midterm
17 pages
MLvsMAP Merged
No ratings yet
MLvsMAP Merged
208 pages

Statistical Learning

Uploaded by

Statistical Learning

Uploaded by

You might also like