Indian Institute of Technology Patna
CS244: Data Science
EN DSE M                                    26th April 2022
TI M E: 3 H O U R S                          Full Marks 50
                                                                                   Figure 2: Chi-square Table
                                                                       tribution. z =(43.5 - 40)/2= 1.75. P(X >43.5) = P(Z
                                                                       > 1.75) = 1 - P(Z < 1.75) = 1 - 0.9599 = 0.0401
                                                                    3. The average height of females in the freshman class of
                                                                       a certain college has historically been 162.5 centime-
                                                                       ters with a standard deviation of 6.9 centimeters. Is
                                                                       there reason to believe that there has been a change in
                                                                       the average height if a random sample of 50 females
                                                                       in the present freshman class has an average height of
                                                                       165.2 centimeters? Consider value of alpha is 0.05.
                                                                       State Null hypothesis and alternative hypothesis. Find
                                                                       out the critical value and test statistics. [3]
                                                                       The hypotheses are
            Figure 1: Standard Normal Table                            H0 : µ = 162.5 centimeters,
                                                                       H1 : µ != 162.5 centimeters.
                                                                       Now, Z = (165.2-162.5)/(6.9/50) = 2.77. For α =0.05
                                                                       and a two tailed test critical value is 1.96. So we
 1. A lawyer commutes daily from his suburban home to
                                                                       reject Null Hypothesis in favour of alternative hy-
    his midtown office. The average time for a one-way
                                                                       pothesis.
    trip is 24 minutes, with a standard deviation of 3.8 min-
    utes. Assume the distribution of trip times to be nor-          4. A manufacturer of car batteries claims that the life of
    mally distributed. If the office opens at 9:00 A.M. and            the company’s batteries is approximately normally dis-
    the lawyer leaves his house at 8:40 A.M. daily, what               tributed with a standard deviation equal to 0.9 year. If
    percentage of the time he is late for work? [3]                    a random sample of 10 of these batteries has a standard
    P(X> 20) = P(Z>((20-24)/3.8) = P(Z>-1.05).                         deviation of 1.2 years, do you think that σ > 0.9 year?
    for Z=1.05 area is 0.8531.                                         Use a 0.05 level of significance. [3]
    P(Z>1.05)=1-0.8531= 0.1469                                         H0 : σ 2 = 0.81.
    So P(Z<-1.05) = 0.1469                                             H1 : σ 2 > 0.81.
    So P(Z>-1.05)=1-0.1469=0.8531                                      α = 0.05. Critical region: χ2 > 16.919,
                                                                       Computations: s2 = 1.44 (as σ0 =1.2 given),
 2. A certain machine makes electrical resistors having a              n = 10, and χ2 =(9)(1.44)/0.81= 16.0
    mean resistance of 40 ohms and a standard deviation of             Decision: The χ2 -statistic is not significant at the 0.05
    2 ohms. Assuming that the resistance follows a normal              level.
    distribution Find the percentage of resistances exceed-
    ing 43 ohms resistance is measured to the nearest ohm.          5. Average zinc concentration recovered from a sample
    [3]                                                                of zinc measurements in 36 locations of river is found
    We assign a measurement of 43 ohms to all resis-                   to be 2.6 grams per milliliter. Find the 95% confidence
    tors whose resistances are greater than 42.5 and less              Intervals for the mean zinc concentration in the river.
    than 43.5. We are actually approximating a discrete                Assume that population standard deviation is 0.3. [3]
    distribution by means of a continuous normal dis-                  Point estimate of µ is x̄ = 2.6.
                                                                1
   Z value leaving an area of 0.025, is Z0.025 = 1.96                      ‘−’) given the features A, B, and C.
   Hence 95% confidence Interval is
   –2.6 – 1.96*(0.3/6)< µ < 2.6 + 1.96*(0.3/6)
   = –2.5 < µ < 2.7
6. Naive Bayes: Suppose we are given the following
   dataset, where A, B, C are input binary random vari-
   ables, and y is a binary output whose value we want to
   predict. How would a naive Bayes classifier predict y
   given this input: A = 0, B = 0, C = 1? Assume that in
   case of a tie the classifier always prefers to predict 0
   for y.                                               [4]                (A) First, consider building a decision tree by greed-
                                                                           ily splitting according to information gain. (a) Which
                                                                           features could be at the root of the resulting tree? (b)
                                                                           How many edges are there in the longest path of the
                                                                           resulting tree?
                                                                           (B) Now, consider building a decision tree with the
                                                                           smallest possible height. (a) Which features could
                                                                           be at the root of the resulting tree? (b) How many
                                                                           edges are there in the longest path of the resulting tree?
                                                                           [2+2+2+1]
7. Deep Learning: Suppose you are given predictions of
   n different experts (or, automated learners), whether               10. PageRank: Consider the following diagram that de-
   a given email message is SPAM (1), or EMAIL (0).                        picts the connectivity among 4 web pages (nodes 1-
   Your goal is to output a single prediction per message,                 4). You need to compute the page-rank for each of the
   that would be as accurate as possible. For this pur-                    node. Assume damping factor as 1. [Hints: Try to
   pose, you’d like to implement a majority voting mech-                   avoid iterative method]                           [4]
   anism. That is, if more than half of the experts predict
                                                                             1                   3
   SPAM, than your final prediction should be SPAM for
   that instance. Otherwise, the final prediction should
   be EMAIL. (a) Suggest a neural network, that imple-
   ments majority voting when there are 4 experts overall
   (named A,B,C,D). Specify the network structure and
                                                                             2                   4
   weights. (b) Explain shortly how to adapt the network
   structure and weights to the general case of n experts.
   [3+1]                                                               11. SVM: Suppose we only have four training exam-
                                                                           ples in two dimensions which are as follows. P1 =
8. Linear regression: We are interested here in a par-                     (0, 0), P2 = (2, 2), P3 = (h, 1), P4 = (0, 3) where
   ticular 1-dimensional linear regression problem. The                    0 ≤ h ≤ 3. The positive examples are P1 &P2 and
   dataset corresponding to this problem has n examples                    the other two points are negative examples. (a) How
   (x1 ; y1 ), . . . , (xn ; yn ) where xi and yi are real num-            large can h ≥ 0 be so that the training points are still
   bers for all i. Let w∗ = [w0∗ , w1∗ ]T be the least squares             linearly separable? (b) What is the margin achieved by
   solution. In other words, w∗ minimizes J(w) =                           the maximum margin boundary as a function of h? (c)
        n
    1X                                                                     Assume that we can only observe the second compo-
           (yi −w0 −w1 ×xi )2 . You can assume for our pur-                nent of the input vectors. Without the other compo-
   n
       i=1
   poses here that the solution is unique. Find the value                  nent, the labeled training points reduce to (0,+), (2,+),
   of thePof the following expressions with justifications                 (1,−), and (3,−). What is the lowest order p of poly-
             n
                x                                                          nomial kernel that would allow us to correctly classify
   [x̄ = in i ]
                                                                           these points?                                   [2+2+1]
   (a) n1 ni=1 (yi − w0∗ − w1∗ xi )(xi − x̄)
          P
   (b) n1 ni=1 (yi − w0∗ − w1∗ xi )(w0∗ + w1∗ xi )
          P
                                                           [2+2]       12. K-Means clustering: Consider performing K-Means
                                                                           Clustering on a one-dimensional dataset containing
9. Decision Tree: You are given a dataset for training a                   four data points: 5, 7, 10, 12 using k = 2, Euclidean
   decision tree. The goal is to predict the label (‘+’ or                 distance, and the initial cluster centers are c1 = 3.0
                                                                   2
    and c2 = 13.0. (a) What are the initial cluster assign-
    ments? (That is, which examples are in cluster c1 and
    which examples are in cluster c2 ?) (b) What are the
    new cluster centers after making the assignments in
    (a)? (c) State True or False: K-Means Clustering is
    guaranteed to converge.                       [1+1+1]
13. Spectral clustering: Write the Laplacian matrix for
    the following graph (Fig: SC) for spectral clustering.
    All the edges have weight 1 (similarity measure). [2]
14. Linear algebra: Consider the following set of points
    (x) (below figure (Fig: LA), rectangular region) are
    transformed using the matrix A having one eigen-
    value as 2 and the corresponding vector is e1 =
    [0.707, −0.707]T . The other eigenvalue is 0. Draw
    the plot for Ax.                                 [2]
       1             2
       4             3
           Fig: SC                    Fig: LA