DSDBA Sppu Dsbda QP
DSDBA Sppu Dsbda QP
                                                  8
                                                23
PA-10288                                                              [Total No. of Pages : 1
                                               ic-
                                     [6009]-322
                                       tat
                                     4s
                   T.E. (Computer Engineering) (Insem.)
                                  9:1
            DATA SCIENCE AND BIG DATA ANALYTICS
                           02 91
                               2:0
                (2019 Pattern) (Semester - II) (310251)
                              0
                             31
Time : 1 Hour]       3/0 13                                                [Max. Marks : 30
                          0
Instructions to the candidates:
                        4/2
                .23 GP
                                                                        8
     3) Figures to the right side indicate full marks.
                  C
                                                                      23
     4) Assume suitable data if necessary.
                                                                     ic-
              16
                                                                 tat
           8.2
                                                               4s
          .24
                                                                               [6]
                                         01
                                                    02
                                       OR
                                               4/2
                                     GP
Q2) a)     Differentiate between Data Science, Machine Learning and AI.                   [4]
                                              3/0
                                   CE
                                                                                             8
                                                                                          23
                                        .23
                                                                                       tat
                                   8.2
4s
Q3) a)     Write a short note on contingency table, explain with example.                 [4]
                               .24
9:1
     b)    With an example explain Baye's theorem. Also explain its key terms.
                                                                     91
                              49
2:0
                                                                                   [5]
                                                                    30
                                                                           31
     c)    Is there a correlation between the variables in the following data set? [6]
                                                               01
                                                                          02
           Hours      9       15   25     14 10       18       19    16     20    18
                                                                     4/2
           Marks      39      56   93     61 50       75       42    70     66    32
                                                          GP
                                                                    3/0
                                           OR
                                                     CE
                                                               80
                                         
                                                    .24
                                                49
Total No. of Questions : 8]                                   SEAT No. :
                                                    8
                                                  23
P812                            [5870] - 1133
                                                                    [Total No. of Pages : 2
                                              ic-
                        T.E. (Computer Engineering)
                                       tat
                                     7s
                 DATA SCIENCE AND BIG DATA ANALYTICS
                                  6:5
                   (2019 Pattern) (Semester - II) (310251)
                           02 91
                               8:3
Time : 2½ Hours]                                                         [Max. Marks : 70
                              0
                             20
Instructions to the candidates:
                     9/0 13
     1) Answer Q.1 or Q.2, Q.3 or Q.4, Q.5 or Q.6, Q.7 or Q.8.
                          0
     1) Neat diagrams must be drawn whenver necessary.
                        6/2
                .23 GP
                                                                      8
                  C
                                                                    23
     4) Assume suistable data, if necessary.
                                                                  ic-
              16
Q1) a) What is driving data deluge? Explain with one example. [9]
                                                                tat
           8.2
                                                              7s
     b)    What is data science? Differentiate between Business Intelligence and
          .24
                                                           6:5
           Data Science.                                                      [9]
                                                 91
         49
                                                        8:3
                                             30
                                                 OR
                                                      20
                                         01
                                                   02
Q2) a)     What are the sources of Big Data. Explain model building phase with
                                                 6/2
           example.                                                        [9]
                                    GP
                                             9/0
                                                                                         38
           discovery phase. Explain with example.                         [9]
                                                                                       c-2
                                       .23
                                                                                       i
                                    16
                                                                                   tat
                                   8.2
7s
6:5
8:3
                                           OR
                                                                   6/2
                                                         GP
           i)      Linear Regression
                                                           .23
                                               8
                                             23
          i)    Time series Analysis
                                          ic-
                                      tat
          ii)   TF - IDF.                                                         [9]
                                    7s
                                 6:5
                          02 91
    b)    What is clustering? With suitable example explain the steps involved in
                              8:3
          k - means algorithm.                                                [9]
                             0
                            20
                    9/0 13
                                       OR
                         0
                       6/2
               .23 GP
                                                                 8
          i)    Confusion matrix
                 C
                                                               23
                                                             ic-
          ii)   AVC - ROC curve                                                   [9]
             16
                                                            tat
          8.2
                                                          7s
    b)    Discuss Holdout method and Random Sub Sampling methods.                 [9]
         .24
                                                     6:5
                                             91
         49
                                                    8:3
                                         30
Q7) a)    With a suitable example explain Histogram and explain its usages.       [8]
                                                20
                                       01
                                               02
          in brief.                                                             [9]
                                  GP
                                         9/0
                                       OR
                             CE
                                       82
                                                                                   38
Q8) a)    With a suitable example explain and draw a Box plot and explain its
                                                                                  c-2
                                   .23
          usages.                                                        [8]      i
                                  16
                                                                              tat
                                 8.2
7s
    b)    Describe the challenges of data visualization. Draw box plot and explain
                            .24
6:5
                                                                       8:3
                                                           30
                                                                   20
                                  
                                                          01
                                                                  02
                                                               6/2
                                                     GP
                                                           9/0
                                                CE
                                                          82
                                                     .23
                                                     16
                                                   8.2
                                               .24
[5870] - 1133                            2
                                             49
Total No. of Questions : 8]                                 SEAT No. :
                                                8
                                              23
P-3153                                                           [Total No. of Pages : 2
                                            ic-
                                     [6003]-354
                                       tat
                                     0s
                      T.E. (Computer Engineering)
                                  2:3
                  Data Science and Big Data Analytics
                           02 91
                               0:3
                (2019 Pattern) (Semester - II) (310251)
                              0
                             31
                     1/0 13
                          0
Time : 2½ Hours]        6/2                                          [Max. Marks : 70
                .23 GP
                                                                   8
     2) Neat diagram must be drawn whenever necessary.
                  C
                                                                 23
     3) Figures to the right indicate full marks.
                                                               ic-
              16
                                                              tat
           8.2
                                                            0s
          .24
                                                       2:3
                                              91
         49
                                                      0:3
Q1) a)     What is Model Building elaborate this phase of data analytics with the
                                           30
           science.                                                        [8]
                                    GP
                                           1/0
                                         OR
                               CE
                                         82
                                                                                         8
                                                                                     23
Q2) a)     What are the three characteristic of Big Data and what are the main
                                     .23
                                                                                 tat
     b)    Explain Descriptive, Diagnostic, Predictive analytics.                   [9]
                                   8.2
                                                                               0s
                              .24
                                                                           2:3
                                                                 91
                              49
0:3
Q3) a)     Explain why decision tree are used. Draw a sample decision tree and
                                                             30
                                                                     31
                                         OR
                                                             1/0
                                                  CE
                                                                                P.T.O.
Q5) a)    What is text processing? Explain TF-IDF with example.                   [8]
                                               8
                                             23
    b)    With suitable example ,explain the steps involved in k-means algorithm.
                                          ic-
                                                                               [9]
                                      tat
                                        OR
                                    0s
                                 2:3
Q6) a)    Define following terms with respect to confusion matrix :               [8]
                          02 91
                              0:3
          i)    Accuracy
                             0
                            31
          ii)       1/0 13
                Precision
                         0
                       6/2
          iii) Recall
               .23 GP
          iv) AUC-ROC
                   E
                  82
                                                                 8
                 C
                                                               23
    b)    Explain k-fold Cross Validation & Random Subsampling.                   [9]
                                                             ic-
             16
                                                            tat
          8.2
                                                          0s
Q7) a)    With a suitable example, draw a Histogram, boxplot and explain its
         .24
          usages.                                                       [9]
                                                     2:3
                                             91
         49
                                                    0:3
    b)    Describe the data visualization tool Tableau. List of data visualization
                                         30
                                                31
          tools.                                                               [9]
                                       01
                                               02
                                        OR
                                             6/2
                                  GP
                                                                                 [9]
                             CE
                                       82
                                                                                    8
                                                                                  23
    b)    Explain architecture of Apache-Pig.                                     [9]
                                   .23
                                                                               ic-
                                  16
                                                                              tat
                                 8.2
0s
                                       
                            .24
                                                                        2:3
                                                               91
                            49
                                                                       0:3
                                                           30
                                                                   31
                                                          01
                                                                  02
                                                               6/2
                                                     GP
                                                           1/0
                                                CE
                                                          82
                                                     .23
                                                     16
                                                   8.2
                                               .24
                                             49
[6003]-354                               2
Total No. of Questions : 8]                                   SEAT No. :
                                                   8
                                                 23
PA-1449                                                            [Total No. of Pages : 3
                                             ic-
                                         [5926]-65
                                       tat
                         T.E. (Computer Engg.)
                                     6s
              DATA SCIENCE AND BIG DATA ANALYTICS
                                  0:4
                  (2019 Pattern) (Semester-II) (310251)
                           02 91
                               9:4
                              0
Time : 2½ Hours]                                                         [Max. Marks : 70
                             30
                     4/0 13
Instructions to the candidates:
     1) Answer Q1 or Q2, Q3. or Q4, Q5 or Q6, and Q7 or Q8.
                          0
                        1/2
     2) Neat diagram must be drawn wherever necessary.
                .23 GP
                                                                     8
          and steam tables is allowed.
                  C
                                                                   23
     5) Assume suitable data if necessary.
                                                                 ic-
              16
                                                                tat
Q1) a)     Draw the diagram of data analytics life cycle in big data and briefly explain
           8.2
                                                              6s
           its phases.                                                               [8]
          .24
                                                          0:4
                                                 91
     b)    Explain in detail how the model building phase is built by team in data
         49
                                                        9:4
           analytics life cycle?                                                     [9]
                                            30
                                                     30
                                         OR
                                         01
                                                  02
Q2) a)     List and explain the steps in data preparation phase of data analytics life
                                                 1/2
                                    GP
           cycle.                                                                    [8]
                                            4/0
                                                                                        38
           i)    ETL
                                                                                      c-2
                                     .23
                                                                                   tat
           iii) Model selection for data analytics.
                                   8.2
                                                                                 6s
                              .24
                                                                             0:4
                                                                   91
Q3) a)     What are the types of analytics in big data? Explain in brief.            [9]
                              49
                                                                           9:4
                                                                30
    b)     Calculate the support and confidence value for all the possible item sets.[9]
                                                                       30
                                           OR
                                                                                    P.T.O.
                                                 49
[5926]-65                                    1
Q4) a)    Explain the use of logistic function in logistic regression in detail.      [9]
                                                 8
                                               23
    b)    Write short note on the following:
                                              ic-
          i)     Removing duplicates from data set.
                                      tat
                                    6s
          ii)    Handling missing data
                                 0:4
          iii)   Data transformation.                                                 [9]
                          02 91
                              9:4
                             0
                            30
Q5) a)              4/0 13
          Suppose that the given data the taste is to cluster points (With (x.y)
          representing location) into three cluster, where the points are.
                         0
                       1/2
               .23 GP
                                                                    8
                 C
                                                                  23
          The distance function is Euclidean distance suppose initially we assign
                                                                ic-
          A1, B1 and C1 as the center of each cluster, respectively. use the k-
             16
                                                              tat
          means algorithm to show only the three cluster centers after the first
          8.2
                                                            6s
          round of execution with steps.                                      [9]
         .24
                                                         0:4
                                               91
    b)    Explain the following text analysis steps with suitable example.            [8]
         49
                                                      9:4
          i)  Part of speech (POS) tagging
                                          30
                                                    30
          ii)    Lemmatization
                                        01
                                                02
          iii)   Stemming
                                               1/2
                                  GP
                                          4/0
                                CE
                                         OR
                                        81
                                                                                        38
Q6) a)    Given the confusion matrix, calculate accuracy. precision, Recall, Error
                                                                                     c-2
                                    .23
                                                                                   tat
                                 8.2
                                                      Predicted classes
                                                                                6s
                            .24
0:4
9:4
                                         Risk-yes                Risk-No
                                                               30
                                                                       30
                 Heart Attack
                                                               4/0
                                                    CE
[5926]-65                                 2
Q7) a)    List the data visualization tools and discuss any four applications of data
                                                8
                                              23
          visualization along with the use of the suitable plot.                  [9]
                                           ic-
                                      tat
    b)    List the challenges of data visualization explain the types of visualization
                                    6s
          with example.                                                            [9]
                                 0:4
                          02 91
                                        OR
                              9:4
                             0
Q8) a)    Explain in detail the Hadoop Ecosystem with suitable diagram             [9]
                            30
                    4/0 13
    b)    Write a short note on the following                                      [9]
                         0
                       1/2
               .23 GP
          i)     Map reduce.
                   E
                  81
                                                                  8
          ii)    Pig
                 C
                                                                23
                                                              ic-
          iii)   Hive
             16
                                                             tat
          8.2
                                                           6s
         .24
                                                 0:4
                                              91
         49
                                                     9:4
                                         30
                                                 30
                                     01
                                                02
                                              1/2
                                  GP
                                         4/0
                               CE
                                       81
                                                                                     38
                                                                                   c-2
                                   .23
                                                                                   i
                                  16
                                                                               tat
                               8.2
                                                                              6s
                            .24
                                                                         0:4
                                                                91
                         49
                                                                        9:4
                                                            30
                                                                    30
                                                           01
                                                                   02
                                                                1/2
                                                      GP
                                                            4/0
                                                 CE
                                                           81
                                                      .23
                                                      16
                                                    8.2
                                                .24
                                              49
[5926]-65                                 3
Total No. of Questions : 8]                                  SEAT No. :
                                                  8
                                                23
P-7545                                                             [Total No. of Pages : 3
                                              ic-
                                       tat
                            [6180]-53
                                     5s
                  T.E. (Computer Engineering)
                                  3:3
                           02 91
                               9:5
          DATA SCIENCE AND BIG DATA ANALYTICS
                              0
                             30
             (2019 Pattern) (Semester - II) (310251)
                     2/1 13
Time : 2½ Hours]                                                      [Max. Marks : 70
                          0
                        2/2
                .23 GP
                                                                     8
                  C
                                                                   23
       3) Figures to the right side indicate full marks.
                                                                ic-
       4) Assume suitable data if necessary.
              16
                                                               tat
       5) Use of Scientific calculator is permitted.
           8.2
                                                             5s
          .24
Q1) a)     Explain Data Analytics Cycle with suitable diagram and its phases.         [8]
                                                         3:3
                                               91
         49
                                                                                 [9]
                                         01
                                                  02
                                          OR
                                               2/2
                                    GP
Q2) a)     List and explain the key roles for successful analytics project.           [8]
                                            2/1
                               CE
                                                                                            8
                                                                                       23
           i)    Common Tools for the Model Building
                                     .23
                                                                                  tat
                                   8.2
                                                                                5s
                              .24
3:3
Q3) a)     List and explain the various types of analytics in Big data.               [9]
                                                                  91
                              49
9:5
    b)     Calculates the support and confidence value for all the possible item sets.[9]
                                                               30
                                                                      30
                                            OR
                                                     8.2
                                                                                   P.T.O.
                                                  .24
                                                49
Q4) a)    Explain the need of logistic regression along with its various types. [9]
                                                 8
                                               23
    b)    Explain the following terms with suitable example.                         [9]
                                              ic-
          i)    Removing Duplicates from dataset.
                                      tat
                                    5s
          ii)   Handling Missing Data
                                 3:3
                          02 91
                              9:5
Q5) a)    Suppose that the given data the task is to cluster points (with (x, y)
                             0
                            30
          representing location) into three clusters, where the points are A1 (2, 10),
                    2/1 13
          A2(2, 5), A3(8, 4), B1(5, 8), B2(7, 5), B3(6, 4), C1(1, 2), C2(4, 9). The
                         0
                       2/2
          distance function is Euclidean distance. Suppose initially we assign A1,
               .23 GP
          Use the k-means algorithm to show only show only the first round of
                  81
                                                                   8
                 C
                                                                 23
          execution with cluster center.
                                                                ic-
    b)    Explain the following Text Analysis steps with suitable example            [9]
             16
                                                              tat
          8.2
i) Part-of-speech(POS)tagging
                                                            5s
         .24
                                                         3:3
          ii)   Lemmatization
                                              91
         49
                                                     9:5
                                         OR
                                          30
                                                    30
Q6) a)    Given the confusion matrix, Calculate Accuracy, Precision, Recall, Error
                                      01
                                                02
                                     Predicted classes
                                          2/1
                                                                                           8
                                     -Yes             -No
                                                                                     23
                                    .23
                                                                                 tat
          classes    Yes
                               8.2
5s
                                                                           3:3
                                                                 91
                     No
                          49
                                                                          9:5
                                                               30
Q7) a)    List the few data visualization tools and discuss any four applications of
                                                      GP
                                                               2/1
          data visualization along with the use of the various plots with Python/R
                                                    CE
                                         OR
                                                .24
[6180]-53                                 2
                                               49
Q8) a)    Explain in detail the Hadoop Ecosystem with suitable diagram along with
                                               8
                                             23
          the various components.                                             [9]
                                            ic-
    b)    Write a short note on the following.                                     [9]
                                      tat
                                    5s
          a)   Map Reduce
                                 3:3
          b)   Pig
                          02 91
                              9:5
                             0
                            30
                    2/1 13       
                         0
                       2/2
               .23 GP
                   E
                  81
                                                                 8
                 C
                                                               23
                                                              ic-
             16
                                                           tat
          8.2
                                                         5s
         .24
                                                       3:3
                                            91
         49
                                                   9:5
                                        30
                                                  30
                                    01
                                              02
                                            2/2
                                 GP
                                        2/1
                            CE
                                      81
                                                                                         8
                                                                                   23
                                  .23
                                                                                ic-
                                 16
                                                                               tat
                              8.2
                                                                              5s
                           .24
                                                                         3:3
                                                              91
                         49
                                                                        9:5
                                                             30
                                                                    30
                                                         01
                                                                   02
                                                              2/2
                                                    GP
                                                             2/1
                                                  CE
                                                         81
                                                       .23
                                                    16
                                                  8.2
                                              .24
[6180]-53                               3
                                             49