Midsem I 31 03 2023
Midsem I 31 03 2023
of Pages: 3
Name of the Faculty : Dr. S.VASANTHARATHNA
MATHEMATICAL FOUNDATIONS
Population and Sample - Measures of Central Tendency - Measures of Deviation – Measures of Shape - Correlation
Analysis–Bayes Theorem - Probability Density functions and distributions – Hypothesis Testing – Dimensionality
Reduction – Principal Component Analysis - Analysis of Variance (ANOVA)
Q. Question Marks CO BT PI
No.
PART A ( 10 X 1 = 10)
1. Which of the following is an example of big data? 1 1 1 2.1.3
a) NETFLIX b) Oracle Database
c) Excel spreadsheet d) Library database
2. In a computer randomly if a technical issue arise, it can 1 1 2 2.2.1
recover by creating a backup from a copy of the data that it
had automatically saved, without the user knowing what had
happened. This is
a) Yet Another Resource Negotiator (YARN)
b) Hadoop Distributed File System (HDFS)
c) Hadoop
d) Mapper-Reducer
3. Classification of cars is carried out by comparing a car with 1 1 2 4.1.4
others. Cluster cars into a tree structure based on the data.
Which characteristic of Big Data will the problem have?
a) Veracity b) Velocity c) Volume d) Variety
4. In Hadoop, ------ is the framework for job scheduling and 1 1 1 4.3.4
cluster resource management.
a) Yet Another Resource Negotiator (YARN)
b) Hadoop Distributed File System (HDFS)
c) NameNode and DataNode
d) Mapper-Reducer
PART B ( 5 X 2 = 10)
11. An airline operates several domestic flights from 4 major 2 2 4 4.2.2
airports of India: (a) New Delhi, (b) Mumbai, (c) Bangalore,
and (d) Hyderabad. The percentage of flights operated from
New Delhi, Mumbai, Bangalore, and Hyderabad are 40%,
25%, 25%, and 10%, respectively. The percentage of delayed
flights at these four airports are 10%, 8%, 7%, and 6%. If a
flight is delayed, what is the probability that the flight
originated from Bangalore airport?
12. Differentiate between data and big data. 2 1 2 5.1.2
13. With a detailed block diagram, show the MapReduce Process 2 2 3 4.3.2
for the following input words
Car, Van, Bus,
Lorry, Lorry, Bus,
Car, Lorry, Van
14. For 4 data points of two correlated variables x and y, it is 2 1 1 4.1.3
given that
2 2
∑ 𝑥 = 24, ∑= 11, ∑ 𝑥 = 202, ∑ 𝑥𝑦 = 84, ∑ 𝑦 = 39
PART C ( 3 X 10 = 30)
16a) Discuss how Hadoop Streams real time data. 5 1 2 5.3.2
OR
19 A company faces a challenge that, few students accept the 4 2 2 2.3.1
a) job offer but not joining (Renage). The percentage of renage
is 20%. If the company offers 8 jobs,
i) What is the probability that all 8 candidates who accept
the offer, join?
ii) What is the probability that exactly 2 candidates will
not join?
19 Time to failure distribution of a lithium ion battery unit 6 2 2 2.1.3
b) follows an exponential distribution with mean time between
failures of 20000 hours.
i) What is the probability that the battery will survive for at
least 10000 hours?
ii) The manufacturer would like to decide on the warranty
such that not more than 5% of batteries can fail during
the warranty period. What should be the duration of the
warranty period fixed?
20 Write the Hessian H_g and the discriminant D_g for the 5 2 3 2.4.2
following function:
g(x, y) = x^3 + 2y^2 + 3xy^2
What do the Hessian and Discriminant Signify?
20 Suggest and Justify the type of data analytics to be 5 1 5 4.2.2
b) performed for the following use cases
i. In a manufacturing companies often record the
runtime, downtime, and work queue for various
machines and then analyze the data to better plan the
workloads so the machines operate closer to peak
capacity.
ii. Gaming companies use data analytics to set reward
schedules for players that keep the majority of
players active in the game.
iii. If the likelihood of a hot summer is measured as an
average of five weather models, is above 58%, it is
preferable to add fruit juices in all food servings and
rent an additional tank to increase storage of water.
OR
21 Provide a sample that could correspond to this box plot: 5 2 5 4.2.2
a)
Q Question Ma C B PI
. rks O T
N
o.
PART A ( 10 X 1 = 10)
1. Which of the following is an example of big data? 1 1 1 2.1
.3
b) NETFLIX b) Oracle Database c) Excel spreadsheet d)
Library database
Answer: NETFLIX
2. In a computer randomly if a technical issue arise, it can recover by 1 1 2 2.2
creating a backup from a copy of the data that it had automatically .1
saved, without the user knowing what had happened. This is
i. Yet Another Resource Negotiator (YARN)
ii. Hadoop Distributed File System (HDFS)
iii. Hadoop
iv. Mapper-Reducer
Answer: Hadoop Distributed File System (HDFS)
A. Reduce , Map
B. Map, Reduce
C. Map, Stream
D. Stream, Reduce
6. If the values of two variables move in the same direction, the 1 2 3 1.1
correlation is said to be .1
Answer: positive
7. The matrix [0 − 4 + 𝑖 4 + 𝑖 0 ] is 1 2 4 1.1
a) Symmetric b) Skew-Symmetric .2
d) Hermitian d) Skew-Hermitian
Answer: d) Skew-Hermitian
PART B ( 5 X 2 = 10)
11. An airline operates several domestic flights from 4 major airports of 2 2 4 4.2
India: (a) New Delhi, (b) Mumbai, (c) Bangalore, and (d) Hyderabad. .2
The percentage of flights operated from New Delhi, Mumbai,
Bangalore, and Hyderabad are 40%, 25%, 25%, and 10%,
respectively. The percentage of delayed flights at these four airports
are 10%, 8%, 7%, and 6%. If a flight is delayed, what is the
probability that the flight originated from Bangalore airport?
13. With a detailed block diagram, show the MapReduce Process for the 2 2 3 4.3
following input words .2
MAP REDUCE
Input Splitting Mapping Shuffling Reducing Final
Result
Car, Car, Car 1 Bus 1 Bus 2 Bus 2
Van, Van, Van 1 Bus 1 Car 2
Bus, Bus, Bus 1 Lorry
Lorry, 3
Lorry, Van 2
Bus,
Car,
Lorry,
Van
Van 1 Van 2
Van 1
14. 2 1 1 4.1
.3
For 4 data points of two correlated variables x and y, it is given that
OR
1 Discuss any three data analysis highlighting on the data sources, type 5 1 3 6.1
7 of analysis and the impacts in the sector to benefit .1
a) Banking Sector
b) Industrial Automation Sector
1 Two persons X and Y appear in an interview for two vacancies in the same 5 2 2 2.1
8 post, the probability of X’s selection is 1/5 and that of Y’s selection is .3
X and Y appear for an interview for two posts. The probabilities X’s
selection is 1/3 and that of Y’s selection is 1/5. Find the probability
that at least one of them is selected?
Alternately
Thus,
OR
1 A company faces a challenge that, few students accept the job offer 4 2 2 2.3
9 but not joining (Renage). The percentage of renage is 20%. If the .1
a) company offers 8 jobs,
iii) What is the probability that all 8 candidates who accept
the offer, join?
iv) What is the probability that exactly 2 candidates will
not join?
1 Time to failure distribution of a lithium ion battery unit follows an 6 2 2 2.1
9 exponential distribution with mean time between failures of 20000 .3
b) hours.
iii) What is the probability that the battery will survive for at least
10000 hours?
iv) The manufacturer would like to decide on the warranty such
that not more than 5% of batteries can fail during the warranty
period. What should be the duration of the warranty period
fixed?
2 Suggest and Justify the type of data analytics to be performed for the 5 1 5 4.2
0 following use cases .2
b) iv. In a manufacturing companies often record the runtime,
downtime, and work queue for various machines and then
analyze the data to better plan the workloads so the machines
operate closer to peak capacity.
v. Gaming companies use data analytics to set reward schedules
for players that keep the majority of players active in the
game.