[go: up one dir, main page]

0% found this document useful (0 votes)
0 views2 pages

Mid-Sem_11

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 2

Birla Institute of Technology & Science - Pilani, Hyderabad Campus

Summer Semester 2019-20


BITS F464: Machine Learning
Mid Semester Test
Type: Closed Time: 90 mins Max Marks: 60 Date: 01.03.2020

All parts of the same question should be answered together.

1. a. Suppose we collect data for a group of students in a statistics class with variables X1 = hours
studied, X2 = undergrad GPA, and Y = receive an A. We fit a logistic regression and produce estimated
coefficient, w0 = −6, w1 = 0.05, w2 = 1. How many hours would the student with undergrad GPA of 3.5
need to study to have a 50% chance of getting an A in the class? [4 Marks]

Note: X1 and X2 are two features and Y is the target attribute.


1.b. Let Ω(a) = 1/ (1+exp(-a)) be the sigmoid function. Show that d(Ω(a)) = Ω(a) (1- Ω(a)). Using this
result and chain of rule of calculus, derive an expression for the gradient of the log likelihood (for logistic
regression). You may assume the number of features and training examples accordingly. [6 Marks]

1.c. Suppose that we wish to predict whether a given stock will issue a dividend this year (“Yes” or “No”)
based on X, last year’s percent profit. We examine a large number of companies and discover that the
mean value of X for companies that issued a dividend was ¯X = 10 (¯X denotes mean), while the mean
for those that didn’t was ¯X = 0. In addition, the variance of X for these two sets of companies was σ2 =
36. Finally, 80% of companies issued dividends. Assuming that X follows a normal distribution, predict
the probability that a company will issue a dividend this year given that its percentage profit was X = 4
last year. [6 Marks]

2.a. Fisher’s linear discriminant: Suppose you have found the optimal direction, w, that maximizes the
difference of means of two classes and minimizes within class variance of the projected N training
examples with D features. Find out the decision boundary in original feature space and check whether the
decision boundary is linear or not. [6 Marks]

2.b. Suppose there is a learning problem with only one feature and K target classes. The prior probability
of a data point, x, belongs to class 1, class 2, .., class K are p1, p2, …, pK. Suppose the features in each
class k, follows a normal distribution with mean µ k and variance s k2 Find the out the posterior probability
of a testing example with respect to each class. [6 Marks]

2.c. Suppose you are solving a binary classification problem (classes - C1 and C2) with one feature (x).
Given two normal distributions - p(x/C1) follows normal distribution with mean m1, variance s2 and
p(x/C2) follows normal distribution with mean m2, variance t2 - and the prior probabilities, P(C1) is p1
and P(C2) is p2. Find out the discriminant point analytically. [6 Marks]

3.a. Suppose there are D binary features (take only 1 or 0 values) for a learning problem. It so happens
that for each of the distinct 2D training instances, the target attribute takes 1 if the number of non-zero
features are odd and the target attribute takes 0 if the number of non-zero features are even. Prove or
disprove the following: “Irrespective of the value of D, there exists a linear discriminant function that
separates the positive and negative points,” [6 Marks]
3.b. Derive the loss function of 2-layered network with K-target variables for each of the following
problems: [6 Marks]
1. Regression
2. Standard multiclass classification problem in which each input is assigned to one of K mutually
exclusive classes.
Also write down the activation function in each of the above problems.

3.c. Suppose you are given a two-layer network with D features, M hidden nodes and K target attributes.
a. Find out the number of parameters (or weights) of the network. [1 Marks]
b. If activation function for each of the hidden units is assumed to be tanh function then derive the lower
bound on the number of weight tuples that have the same error. [3 Marks]

4.a. The following data was collected on several different days, along with whether people rated the day
as “nice” or “not nice”. Temperature can be either “warm” or “cold”, Precipitation can be either “low” or
“high”, and Foggy? can either be “yes” or “no”. [6 Marks]

Suppose now you are given a new description of a day:


D7 warm high yes ?

How would a Naive Bayes classifier, trained on D1-D6, classify D7?


Note: You only need to compute the probabilities relevant to the test example.

4.b. If we train a Naïve Bayes Classifier using infinite training data, then it will achieve zero training error
over these training examples. Give an appropriate reasoning for you answer. [Ture / False][2 Marks]

4.c. If we train a Naïve Bayes Classifier using infinite training data, then it will achieve zero true error
over test examples drawn from this same distribution. Give an appropriate reasoning for you answer.
[Ture / False] [2 Marks]

You might also like