100% found this document useful (1 vote)

451 views10 pages

Big Data Analytics Exam 2020

The document is an exam for a Big Data Analytics course covering Hadoop and related technologies. It contains 22 multiple choice questions testing knowledge of concepts like HDFS, MapReduce, YARN, and other Hadoop ecosystem projects. The exam is 1 hour and 45 minutes long and instructs students to write their name and other details before beginning.

Uploaded by

ekane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

451 views10 pages

Big Data Analytics Exam 2020

Uploaded by

ekane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Big Data Analytics

ST2BIG
2020 - 2021
December 16, 2020
Lecturer : Issam Falih
Time Limit : 1h 45 minutes

Firstname:
Lastname:
Promotion:
Group:

Rules (please read carefully)

- You have 1h 45 minutes for the exam.

- Before you start, write down your name and student number on this page.

- The use of material (book, slides, laptop, etc.) during the exam is not allowed.

- Every multiple-choice question has just one correct answer. To select an answer, circle
the letter.

- Any answers or marks on other pages than the answer sheet instead of the last two
questions will be completely ignored even if correct.

- If you picked an answer and then would like to change it, make it very clear with an
additional circle around the newly chosen answer. Any ambiguity will be handled with
no point granted.

- Only use a black or a blue pen. DO NOT use a pencil. DO NOT use red pen.

- The total number of available points is 4O.

Good luck !
1 Multiple-choice questions [15 pts]

1. Which type of data Hadoop can deal

with is
5. Which among the following is the Re-
(a) structured source Management Layer
(b) semi-structured (a) Mapreduce
(c) unstructured (b) YARN
(d) All of the above (c) HDFS

2. What is a Resilient Distributed Dataset? (d) HIVE

(a) An immutable distributed collection

of elements 6. Which one of these technologies deals
(b) A mutable distributed collection of with graph data?
elements.
(a) Google BigTable
(c) A write-enabled distributed collec-
(b) Mongodb
tion of elements.
(c) Apache HBase
(d) A spilled distributed collection of el-
ements (d) neo4j

3. In Hadoop, the optimal input split size is

the same as the 7. In MapReduce, the shuffling phase takes
(a) block size place...

(b) average file size in the cluster (a) Before and right after mapping
(c) mininum hard disk size in the clus- (b) Before mapping
ter (c) After mapping and before reducing
(d) number of DataNodes (d) After reducing

4. How does the DataNode protocol work? 8. Which one of these input formats cannot
be processed by MapReduce at all?
(a) The NameNode always initiates the
connection, and the DataNodes (a) Key-value pairs
only answer (b) Unstructured lines of text
(b) The DataNode always initiates the (c) Tables
connection, and the NameNode
(d) None of them: all three formats are
only answers
in fact supported by MapReduce
(c) The client initiates the connection
to the DataNode
(d) Both the NameNode and the DataN- 9. For 129 MB file how many Blocks will be
ode may initiate the connection created

2
(a) 3 (c) Before a crash, the primary Na-
meNode always copies its memory
(b) 1
content to the secondary NameN-
(c) 2 ode
(d) 4 (d) The secondary NameNode replays
the edit log, which contains the
blocks’ locations

10. Which languages are most appropriate

for querying graph databases?
13. To start a Hadoop cluster, it is necessary
(a) Cypher to start both of which two clusters?
(b) Java (a) HDFS and YARN
(c) XQuery, JSONiq (b) SPARK and CLOUDERA
(d) SQL (c) YARN and SPARK
(d) NoSQL and HDFS

11. The default replication factor for HDFS

file system in Hadoop is which of the fol- 14. Let’s assume we have a Hadoop clus-
lowing? ter with 12 Petabytes of disk space and
replication factor 4. What can you say
(a) 1 about the maximum possible file size?
(b) 4
(a) The maximum size of a file is re-
(c) 2 stricted to the disk size of the
(d) 3 largest DataNode.
(b) The maximum size of a file cannot
exceed 3 Petabytes.

12. When the primary NameNode crashes, (c) The maximum size of a file is re-
the secondary NameNode takes over. stricted by the physical disk space
NameNodes do not persistently store available on the NameNode.
(i.e. write to disk) the location of (d) Files of any size can be processed
blocks. How does the secondary Na- in the cluster.
meNode learn about the blocks’ loca-
tions in the cluster?
15. Bob has a Hadoop cluster with 20 ma-
(a) DataNodes send regular heartbeat
chines with the following Hadoop setup:
messages, which include informa-
replication factor 2, 128MB input split
tion about blocks they maintain
size. Each machine has 500GB of HDFS
(b) The secondary NameNode sends a disk space. The cluster is currently
special message to all DataNodes, empty (no job, no data). Bob intends
asking for their block information to upload 4 Terabytes of plain text (in 4

3
files of approximately 1 Terabyte each), (c) MapReduce
followed by running Hadoop’s standard (d) Mongodb
WordCount1 job. What is going to hap-
pen?

(a) The data upload fails at the first file: 19. What happens if a number of reducers
it is too large to fit onto a DataNode. are set to 0?

(b) The data upload fails at a later (a) Reduce-only job take place
stage: the disks are full. (b) Map-only job take place
(c) WordCount fails: too many input (c) Reducer output will be the final out-
splits to process. put
(d) WordCount runs successfully.

20. Which of the following is the correct se-

16. In Hadoop, the optimal input split size quence of MapReduce flow?
is the same as the
(a) Map>Combine>Reduce
(a) average file size in the cluster. (b) Combine>Reduce>Map
(b) block size. (c) Map>Reduce>Combine
(c) mininum hard disk size in the clus- (d) Reduce>Combine>Map
ter.
(d) number of DataNodes.
21. Which of the following phases occur si-
multaneously

17. The time it takes for a Hadoop job’s Map (a) Shuffle and Map
task to finish mostly depends on: (b) Reduce and Sort
(a) the placement of the NameNode in (c) Shuffle and Sort
the cluster.
(b) the placement of the blocks re-
quired for the Map task. 22. Which statement is true about passive
NameNode in Hadoop
(c) the duration of the job’s shuffle &
sort phase. (a) It is a standby namenode
(d) the duration of the job’s Reduce (b) It simply acts as a slave
task. (c) Provide a fast failover
(d) All of these

18. HDFS is inspired by which of following

Google project 23. Which concept is not part of the “3V’s of
(a) BigTable Big Data”?

(b) GFS (a) Velocity

4
(b) Variety
(c) Valorisation
27. Which of the following statements is
(d) Velocity true?

(a) The input to the Mapper is the out-

put of the Reducer.
24. Which command is used to check the sta-
tus of all daemons running in the HDFS. (b) The input to the Combiner is the
output of the Reducer.
(a) jps
(c) The input to the Combiner is the in-
(b) fsck
put o f the Mapper.
(c) distcp
(d) The input to the Reducer is the out-
(d) None of the above put of the Mapper.

25. Which of the following statements is

28. Where in the Hadoop framework is the
NOT true? It is possible to run a Hadoop
mapping from files to blocks stored?
job which consists only of a (two correct
answer) (a) DataNode.

(a) Reducer (b) BlockNode.

(b) Combiner (c) NameNode.

(c) Mapper (d) FileNode.

(d) None of the above

29. Which among the following is ulti-

26. Which statement is true about NameN- mate authority that arbitrates resources
ode High Availability among all the applications in the system.

(a) Solve Single point of failure (a) NodeManager

(b) For high scalability (b) ResourceManager

2 Multiple Answer Question [5 pts]

1. Which of the following operations require the client to communicate with the NameN-
ode?

(a) A client deleting a file from HDFS.

(b) A client writing to a new file on HDFS.
(c) A client appending data to the end of an existing file on HDFS.

5
(d) A client reading a file from HDFS.

2. Bob has a Hadoop cluster with 20 machines with the following Hadoop setup: replication
factor 2, 128MB input split size. Each machine has 500GB of HDFS disk space. The
cluster is currently empty (no job, no data). Bob intends to upload 4 Terabytes of plain
text (in 4 files of approximately 1 Terabyte each), followed by running Hadoop’s standard
WordCount1 job. What is going to happen?

(a) The data upload fails at the first file: it is too large to fit onto a DataNode
(b) The data upload fails at a later stage: the disks are full
(c) WordCount fails: too many input splits to process
(d) WordCount runs successfully

3. The distributed file systems GFS and HDFS were devised with a number of use cases
(data scenarios) in mind. Consider the following data storage scenarios:
(S1) A global company dealing with the data of its one hundred million employees (salary,
bonuses, age, performance, etc.)
(S2) A Web search engine’s query log (each search request by a user is logged)
(S3) A hospital’s medical imaging data generated during an MRI scan
(S4) Data sent by the Hubble telescope to the Space Telescope Science Institute
For which of these scenarios are GFS or HDFS a good choice?

(a) Scenarios (S1) and (S4)

(b) Scenarios (S2) and (S3)
(c) Scenarios (S2) and (S4)
(d) Scenarios (S1) and (S3)

3 Free Form Question [20 pts]

1. In the shuffle & sort phase, a job with m mappers and r reducers may involve up to m × r
distinct copy operations. In which scenario are exactly m × r copy operations necessary?

2. How to recover a NameNode when it is down?

6
3. Why is Hadoop used for Big Data Analytics?

4. A large cluster runs HDFS on 100 nodes. Each node in the cluster, including the NameN-
ode, has 16 Terabytes of hard disk storage and 64 Gigabytes of main memory available.
The cluster uses a block-size of 64 Megabytes and a replication factor of 3. The master
maintains 64 bytes of metadata for each 64MB block.

(a) What is the cluster’s disk storage capacity? Explain your answer.

(b) A client downloads a 1 Gigabyte file from the cluster: explain precisely how data
flows between the client, NameNode and the DataNodes.

5. Name three different purposes of Heartbeat messages in a Hadoop cluster.

6. Write MapReduce pseudcode for the following problems. You will be graded on how
appropriate your solution is to the MapReduce framework and the quality of you descrip-
tions. Note, for some of these problems you may have to write more than one pair of
map/reduce functions.

7
Problem 1: Anagrams
Given a file containing text, the program should output key/value pairs where the
value is a comma separated list of lines in the file that are anagrams, i.e. use exactly
the same letters (ignoring spaces). The output key is up to you and will depend on
your implementation.

Problem 2: Feature normalization

Given a file containing a list of examples of the form:

<label> <feature1> <feature2> <feature3> ... <featurem>

we want to generate a version of this file where the feature values have been mean
centered. The output examples should include the label and should also appear in
the same order as the original file.
You may assume that each example has all of the features defined

8
THE END.

9
10

Questions Certif BigData
No ratings yet
Questions Certif BigData
12 pages
Hadoop Quiz and Exam Answers
No ratings yet
Hadoop Quiz and Exam Answers
10 pages
IBM Cloud and Big Data Quiz
100% (1)
IBM Cloud and Big Data Quiz
206 pages
Subject Name:: Knowledge Institute of Technology & Engineering-135
No ratings yet
Subject Name:: Knowledge Institute of Technology & Engineering-135
22 pages
Devoir Surveillé: Please Answer The Following Multiple-Choice Questions
No ratings yet
Devoir Surveillé: Please Answer The Following Multiple-Choice Questions
8 pages
Unit 3 Big Data MCQ AKTU: Royal Brinkman Gartenbaubedarf
No ratings yet
Unit 3 Big Data MCQ AKTU: Royal Brinkman Gartenbaubedarf
17 pages
Final Exam
17% (6)
Final Exam
6 pages
BDT Quiz
No ratings yet
BDT Quiz
4 pages
Big Data Analytics
No ratings yet
Big Data Analytics
6 pages
Bda MCQ
No ratings yet
Bda MCQ
9 pages
QCM Bigdata 1 Exampdf
No ratings yet
QCM Bigdata 1 Exampdf
7 pages
PDF
No ratings yet
PDF
23 pages
Bigdataaaaa
No ratings yet
Bigdataaaaa
180 pages
Final Exam Big Data - 11112
100% (1)
Final Exam Big Data - 11112
6 pages
Grade 12 IT Final Exam Paper
No ratings yet
Grade 12 IT Final Exam Paper
3 pages
Big Data Questions
100% (1)
Big Data Questions
39 pages
BigData Exam C2122 PDF
100% (1)
BigData Exam C2122 PDF
6 pages
SQL Certification Study Guide
No ratings yet
SQL Certification Study Guide
2 pages
Big Data QCM 1 PDF
100% (1)
Big Data QCM 1 PDF
7 pages
IBM Big Data Engineer Quiz Prep
No ratings yet
IBM Big Data Engineer Quiz Prep
30 pages
NoSQL Intro
No ratings yet
NoSQL Intro
26 pages
Data Mining and Warehousing
100% (3)
Data Mining and Warehousing
30 pages
A Exercises Solutions
No ratings yet
A Exercises Solutions
13 pages
Hadoop - Quick Guide Hadoop - Big Data Overview
No ratings yet
Hadoop - Quick Guide Hadoop - Big Data Overview
41 pages
Unit I Introduction To Data Science
No ratings yet
Unit I Introduction To Data Science
79 pages
MCQ On Data Mining With Answers Set-1
No ratings yet
MCQ On Data Mining With Answers Set-1
11 pages
Data Mining Exam Guidelines
100% (2)
Data Mining Exam Guidelines
3 pages
Big Data Mock Exam: Right or Wrong
No ratings yet
Big Data Mock Exam: Right or Wrong
11 pages
How Sqoop Works?: Sqoop "SQL To Hadoop and Hadoop To SQL"
No ratings yet
How Sqoop Works?: Sqoop "SQL To Hadoop and Hadoop To SQL"
27 pages
Big Data Question Bank
No ratings yet
Big Data Question Bank
38 pages
Bda MCQ Set
No ratings yet
Bda MCQ Set
8 pages
Midterm Solution
0% (1)
Midterm Solution
7 pages
Data Mining
50% (2)
Data Mining
34 pages
Computer Science - Data Warehouse MCQS With Answer
No ratings yet
Computer Science - Data Warehouse MCQS With Answer
35 pages
Big Data & Hadoop Essentials
No ratings yet
Big Data & Hadoop Essentials
4 pages
Big Data Hadoop MCQ Question
No ratings yet
Big Data Hadoop MCQ Question
109 pages
Final Exam MN
No ratings yet
Final Exam MN
2 pages
Neo4J - Sample Questions 3 - Glitchdata PDF
No ratings yet
Neo4J - Sample Questions 3 - Glitchdata PDF
25 pages
Chapter 3 Exercises
50% (2)
Chapter 3 Exercises
3 pages
Chapter+9+ HIVE
No ratings yet
Chapter+9+ HIVE
50 pages
Question Bank - Big Data Analytics - Final1
100% (1)
Question Bank - Big Data Analytics - Final1
6 pages
No SQL
No ratings yet
No SQL
32 pages
Bigdata Bits PDF
No ratings yet
Bigdata Bits PDF
2 pages
MCQ - Bda
33% (3)
MCQ - Bda
3 pages
NoSQL Solutions (MCQ & Structural)
No ratings yet
NoSQL Solutions (MCQ & Structural)
12 pages
Final Practical Exam Questions
100% (1)
Final Practical Exam Questions
6 pages
HCIA Storage V4.0 Mock Exam PDF
No ratings yet
HCIA Storage V4.0 Mock Exam PDF
5 pages
Chapter 3 Quiz
No ratings yet
Chapter 3 Quiz
3 pages
Data Mining and Business Intelligence File
No ratings yet
Data Mining and Business Intelligence File
53 pages
BigData Objective
No ratings yet
BigData Objective
93 pages
All Topics MCQ S Mixed
No ratings yet
All Topics MCQ S Mixed
121 pages
Big Data Course Overview
No ratings yet
Big Data Course Overview
68 pages
Hadoop Setup Guide for Windows Users
No ratings yet
Hadoop Setup Guide for Windows Users
29 pages
Big Data Analytics PPT-2 (Section-A)
No ratings yet
Big Data Analytics PPT-2 (Section-A)
10 pages
Big Data MCQ
No ratings yet
Big Data MCQ
47 pages
Pig
No ratings yet
Pig
24 pages
Hadoop MCQs
75% (8)
Hadoop MCQs
21 pages
DSBDA Kadak Document
No ratings yet
DSBDA Kadak Document
249 pages
Hadoop Big Data Concepts Guide
100% (1)
Hadoop Big Data Concepts Guide
7 pages
Bda Bits - Mid I-Qp (2024-25)
No ratings yet
Bda Bits - Mid I-Qp (2024-25)
2 pages
An Introduction To Stochastic Orders 1st Edition Belzunce Download
100% (1)
An Introduction To Stochastic Orders 1st Edition Belzunce Download
39 pages
Rooftop Solar PV Project Planning, Design, Installation, and Operations and Maintenance Manual
No ratings yet
Rooftop Solar PV Project Planning, Design, Installation, and Operations and Maintenance Manual
137 pages
008 Confirmation Letter Wedding Hamim - Liana
No ratings yet
008 Confirmation Letter Wedding Hamim - Liana
3 pages
Completed Data QA Associate Assessment
No ratings yet
Completed Data QA Associate Assessment
3 pages
Basics Sealing Principles PDF
No ratings yet
Basics Sealing Principles PDF
43 pages
Watson S., Steprans J. - Set Theory and Its Applications (1989)
No ratings yet
Watson S., Steprans J. - Set Theory and Its Applications (1989)
232 pages
Flight Airworthiness Support Technology: J U L Y 2 0 0 4
No ratings yet
Flight Airworthiness Support Technology: J U L Y 2 0 0 4
21 pages
Understanding and Overcoming Procrastination
No ratings yet
Understanding and Overcoming Procrastination
3 pages
Schroder Dana Prestasi Plus Equity Fund Factsheet
No ratings yet
Schroder Dana Prestasi Plus Equity Fund Factsheet
1 page
Damages - Loss of Use Recoverable in An Action For The Negligent Destruction of A Chattel
No ratings yet
Damages - Loss of Use Recoverable in An Action For The Negligent Destruction of A Chattel
5 pages
Personal Tax Return Form IT 11GA 2023
No ratings yet
Personal Tax Return Form IT 11GA 2023
18 pages
Follow The Format - Formal Letter
No ratings yet
Follow The Format - Formal Letter
2 pages
Feasibility Study
No ratings yet
Feasibility Study
38 pages
CV MMM
No ratings yet
CV MMM
5 pages
UV Treated Non Woven Fabric - Ckfabrics
No ratings yet
UV Treated Non Woven Fabric - Ckfabrics
2 pages
Advanced First Fixing Level 3
No ratings yet
Advanced First Fixing Level 3
30 pages
Infoblox Datasheet Infoblox Advanced Dns Protection PDF
No ratings yet
Infoblox Datasheet Infoblox Advanced Dns Protection PDF
4 pages
Immersive Tech in Healthcare & Design
No ratings yet
Immersive Tech in Healthcare & Design
7 pages
Adhoc Networks Presentation
No ratings yet
Adhoc Networks Presentation
14 pages
Abdomen X Ray
No ratings yet
Abdomen X Ray
4 pages
De Bonding On Command of Adhesive Joints
No ratings yet
De Bonding On Command of Adhesive Joints
73 pages
Requirement For Secure Deposit For Withdrawal Processing
No ratings yet
Requirement For Secure Deposit For Withdrawal Processing
2 pages
Poultry Breed of India PDF
100% (3)
Poultry Breed of India PDF
16 pages
Gorilla Tuning Manual v3
No ratings yet
Gorilla Tuning Manual v3
20 pages
Master Sheet Rebuttals
No ratings yet
Master Sheet Rebuttals
2 pages
Invitation Letter - Summer School - 03-14 June 2024
No ratings yet
Invitation Letter - Summer School - 03-14 June 2024
1 page
(Ebook PDF) Statistics For Research: With A Guide To SPSS 3rd Edition Instant Download
100% (2)
(Ebook PDF) Statistics For Research: With A Guide To SPSS 3rd Edition Instant Download
43 pages
Beer Production Process Guide
100% (1)
Beer Production Process Guide
10 pages
Case Status - Search by Case Number - District and Sessions Court Thane - India
No ratings yet
Case Status - Search by Case Number - District and Sessions Court Thane - India
6 pages
Chapter 3 Thesis Sample Qualitative
100% (3)
Chapter 3 Thesis Sample Qualitative
4 pages

Big Data Analytics Exam 2020

Uploaded by

Big Data Analytics Exam 2020

Uploaded by

Big Data Analytics

Rules (please read carefully)

- You have 1h 45 minutes for the exam.

- The total number of available points is 4O.

1. Which type of data Hadoop can deal

2. What is a Resilient Distributed Dataset? (d) HIVE

(a) An immutable distributed collection

3. In Hadoop, the optimal input split size is

10. Which languages are most appropriate

11. The default replication factor for HDFS

20. Which of the following is the correct se-

18. HDFS is inspired by which of following

(b) GFS (a) Velocity

(a) The input to the Mapper is the out-

25. Which of the following statements is

(a) Reducer (b) BlockNode.

(b) Combiner (c) NameNode.

(c) Mapper (d) FileNode.

(d) None of the above

29. Which among the following is ulti-

(a) Solve Single point of failure (a) NodeManager

(b) For high scalability (b) ResourceManager

2 Multiple Answer Question [5 pts]

(a) A client deleting a file from HDFS.

(a) Scenarios (S1) and (S4)

3 Free Form Question [20 pts]

2. How to recover a NameNode when it is down?

5. Name three different purposes of Heartbeat messages in a Hadoop cluster.

Problem 2: Feature normalization

Given a file containing a list of examples of the form:

<label> <feature1> <feature2> <feature3> ... <featurem>

You might also like