0% found this document useful (0 votes)

2K views11 pages

Multiple Response Tasks

The document contains multiple response tasks that cover various topics related to data processing, big data tools, and system architecture. It includes questions on schema approaches, YARN, NoSQL databases, data lineage, and ETL pipelines, among others. Additionally, it describes a task to construct a ski slope using SQL queries and an ETL job for analyzing electric vehicle charge point data using PySpark.

Uploaded by

tvr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2K views11 pages

Multiple Response Tasks

Uploaded by

tvr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Task 1

Mul ple response task

This is a set of mul ple response ques ons.

Please choose as many answers as you think are correct.
It is possible for all the answers or none to be correct.
Also, incorrect answers will nega vely impact your score.

Display images in original size

Which of the following are the advantages of the ‘schema on read’ approach over ‘schema on write’?

A: Support for unstructured data

B: Faster loads to the storage layer

C: The ﬂexibility of how data is consumed

D: Faster reads from the storage layer

When it comes to big data tools, what does the acronym YARN stand for?

A: Yet Another Resource Network

B: Yet Another Release Note

C: Yet Another Rou ng Network

D: Yet Another Resource Nego ator

You are trying to decide whether to use a single machine or cluster compu ng tools in your next
project. Which of the following is the premise for using single machine architecture?

A: Your load might increase dras cally over me.

B: You are only expec ng to be loading a small amount of data.

C: You are expec ng your tasks to be very memory-intensive.

Which of the following are useful Python packages for data processing and analysis projects?

A: An gravity

B: Pandas

C: Seaborn

D: Pyglet
5.

Your system is ge ng more trac on and starts to require more compu ng power. Which of the
following are reasons for scaling your system horizontally as opposed to ver cally?

A: You are looking for more compu ng ﬂexibility.

B: You are concerned about down me when upgrading your machine.

C: You are unable to split your app into smaller logical blocks.

D: You want stable costs.

Which phase is usually the one we would like to get rid of, but might also be the most memory-
intensive?

A: Map

B: Shuﬄe

C: Reduce

Which of the following statements about ELT are true?

A: An ELT model enables faster loading mes than ETL.

B: An ELT model is an alterna ve to ETL.

C: With an ELT model, users can run transforma ons directly on the raw data.

D: An ELT model increases the me data spends in transit.

Which of the following are equivalent to AWS S3?

A: Google Big Query

B: Azure Blob Storage

C: Google Cloud Storage

D: Azure Data Factory

In terms of a Hadoop cluster, what is the heartbeat?

A: It is a signal sent from a name node to data nodes informing them about cluster health.

B: It is a signal sent from a name node to external applica ons informing them about cluster health.

C: It is a signal sent from external applica ons to a name node asking about system health.

D: It is a signal sent from data nodes to a name node informing it about node health.
10.

Match the following technologies with their applica on:

1. Spark, 2. Cassandra, 3. Zookeeper, 4. Ka a, 5. Keras, 6. Superset

A. Database, B. Visualiza on, C. Orchestra on, D. Analy cs, E. Machine Learning, F. Streaming

A: 1F, 2A, 3C, 4A, 5B, 6E

B: 1D, 2A, 3C, 4F, 5E, 6B

C: 1D, 2F, 3E, 4A, 5C, 6B

D: 1B, 2A, 3D, 4F, 5E, 6C

Please review your answers. A er submi ng you won't be able to change them.

Task 2 II

Task 2

Mul ple response task

This is a set of mul ple response ques ons.

Please choose as many answers as you think are correct.
It is possible for all the answers or none to be correct.
Also, incorrect answers will nega vely impact your score.

Display images in original size

Which of the following sentences about eventual consistency are true?

A: Eventually consistent writes are o en faster than strongly consistent ones.

B: Eventually consistent systems implement BASE proper es.

C: In eventually consistent systems, reads are o en faster than in strongly consistent ones.

D: Eventually consistent systems might not always return the same result.

Which of the following are the characteris cs of a distributed system?

A: Shared global clock

B: Concurrency of components

C: Independent failure of components

D: Scalability

3.
What are the main Lambda architecture layers?

A: Process Layer, Serving Layer

B: Batch Layer, Speed Layer, Data Layer

C: Stream Layer, Data Layer, Storage Layer

D: Batch Layer, Speed Layer, Serving Layer

In which of the following situa ons would you recommend using a NoSQL database?

A: The input data structure is expected to change o en.

B: The database is expected to serve complex queries on structured tables.

C: ACID support is required.

D: The database is expected to be able to serve changing workloads.

Which of the following statements describe data lineage?

A: Data lineage is a process of discovering pa erns in large data sets.

B: Data lineage gives visibility while greatly simplifying the ability to trace errors.

C: Data lineage provides a way of tracking data from its origin to its des na on.

D: Data lineage is a process of extrac ng data from output coming from another program.

You are looking for a distributed data store that will con nue to work if one of the nodes fails, and
will deliver the same most recent result to all clients. Which two guarantees of CAP theorem should
your system fulﬁll?

A: CA

B: CP

C: AP

Which of the following statements about big data ﬁle formats are true?

A: AVRO oﬀers be er schema evolu on than the other two formats.

B: Parquet and ORC are row-based whereas AVRO is a column-based format.

C: All three are machine-readable binary formats.

D: Parquet is be er op mized for use with Apache Spark, whereas ORC works be er with Hive.

8.
Which of the following would be reasons for building a data lake as opposed to a data warehouse for
your next project?

A: The input data might have diﬀerent formats.

B: You want to store your data in a transformed and structured way.

C: You want to store as much data as possible and decide how to use it later.

D: You have a set of predeﬁned queries.

You work for a ﬁnancial ins tu on and are trying to decide whether to build a serverless data
pla orm using tools oﬀered by one of the cloud providers. Which of the following would be
drawbacks of the serverless architecture?

A: No access to virtual machines

B: Security concerns over mul tenancy problems

C: More server maintenance required

D: High upfront costs

10.

What are some of the challenges when using real- me data processing?

A: Dealing with repeated data

B: Dealing with structured data

C: Dealing with out of order events

D: Dealing with small numbers of events

Please review your answers. A er submi ng you won't be able to change them.

All changes saved

Task3 III

Task 3

Task descrip on

A ski resort company is planning to construct a new ski slope using a pre-exis ng network of
mountain huts and trails between them. A new slope has to begin at one of the mountain huts, have
a middle sta on at another hut connected with the ﬁrst one by a direct trail, and end at the third
mountain hut which is also connected by a direct trail to the second hut. The al tude of the three
huts chosen for construc ng the ski slope has to be strictly decreasing.

You are given two SQL tables, mountain_huts and trails, with the following structure:

create table mountain_huts ( id integer not null, name varchar(40) not null, al tude integer not null,
unique(name), unique(id) ); create table trails ( hut1 integer not null, hut2 integer not null );

Each entry in the table trails represents a direct connec on between huts with IDs hut1 and hut2.
Note that all trails are bidirec onal.

Create a query that ﬁnds all triplets (startpt, middlept, endpt) represen ng the mountain huts that
may be used for construc on of a ski slope. Output returned by the query can be ordered in any way.

Examples:

1. Given the tables:

mountain_huts: +----+----------+----------+ | id | name | al tude | +----+----------+----------+ | 1 | Dakonat

| 1900 | | 2 | Na sa | 2100 | | 3 | Gajantut | 1600 | | 4 | Rifat | 782 | | 5 | Tupur | 1370 | +----+-----
-----+----------+ trails: +------+------+ | hut1 | hut2 | +------+------+ | 1 | 3 | | 3 | 2 | | 3 | 5 | | 4 | 5 | | 1
| 5 | +------+------+

your query should return:

+----------+----------+-------+ | startpt | middlept | endpt | +----------+----------+-------+ | Dakonat |

mountain_huts: +----+-----------+----------+ | id | name | al tude | +----+-----------+----------+ | 1 | Adam

| 2100 | | 2 | Emily | 1800 | | 3 | Diana | 1800 | | 4 | Bobs Inn | 1400 | | 5 | Carls Inn | 1350 | | 6 |
Hannah | 2300 | +----+-----------+----------+ trails: +------+------+ | hut1 | hut2 | +------+------+ | 2 | 1 | |
2 | 3 | | 2 | 4 | | 2 | 5 | | 3 | 1 | | 3 | 4 | | 3 | 5 | | 3 | 6 | +------+------+

your query should return:

+---------+----------+-----------+ | startpt | middlept | endpt | +---------+----------+-----------+ | Adam |

Assume that:

 there is no trail going from a hut back to itself;

 for every two huts there is at most one direct trail connec ng them;

 each hut from table trails occurs in table mountain_huts.

Task4 III

Task 4

Task descrip on

To access CSV data sets download zipped ﬁles

ETL Pipeline

The objec ve of this task is to create an ETL job which will read data from a ﬁle, transform it into the
desired state and save it to an output loca on.

 NOTE: This task runs against Spark version 3.1.1.

The input ﬁle electric-chargepoints-2017.csv

(available in input_path inside ChargePointsETLJob

class) contains sample of data published by the UK Department for Transport and presents
informa on about the usage of electric vehicle charge points in 2017.

Here are ﬁve random rows from this ﬁle:

ChargingEven StartDat StartTim EndDat EndTim Energ

CPID PluginDura on
t e e e e y

AN0726 2017-10- 2017- 3.633333333333333

15554472 13:30:00 17:08:00 5.3
3 29 10-29 3

AN1509 2017-10- 2017- 11.81666666666666

15329256 17:37:00 05:26:00 19.2
2 14 10-15 6

AN2259 2017-06- 2017-

2344473 16:10:19 13:03:21 11.5 20.88388888888889
4 02 06-03

AN1021 2017-03- 2017-

12184545 21:43:37 20:18:29 12.1 22.58111111111111
8 20 03-21

AN0213 2017-03- 2017- 31.81611111111111

11984777 10:21:17 18:10:15 7.8
7 07 03-08 3

PluginDura on

column stores plugin dura on in hours.

For each charge point, iden ﬁed by its unique ID (CPID

), we would like to know the dura on (in hours) of the longest plugin and the dura on (in hours) of
the average plugin.
Requirements

To achieve this, please use PySpark and complete the ETL pipeline containing the following three
methods:

extract

– this method should return a Spark dataframe which contains raw data from the input ﬁle in
input_path

transform

– this method should get a raw dataframe as an input parameter and return a dataframe containing
the following three columns:

chargepoint_id, max_dura on, avg_dura on

load – this method should take this transformed dataframe as input parameter and save the data as
parquet format to output path in output_path

Example output

Here's an example row from the transformed dataframe returned by the transform

method and saved to the output parquet ﬁle:

chargepoint_id max_dura on avg_dura on

AN06056 11.98 4.76

Hints

Please make sure that the output ﬁle contains one row for each charge point.

Please make sure that the columns are named correctly.

Please round numbers to two decimal places.

Solu on

from pyspark.sql import SparkSession

class ChargePointsETLJob:

input_path = 'data/input/electric-chargepoints-2017.csv'

output_path = 'data/output/chargepoints-2017-analysis'
def __init__(self):

self.spark_session = (SparkSession.builder

.master("local[*]")

.appName("ElectricChargePointsETLJob")

.getOrCreate())

def extract(self):

pass

def transform(self, df):

pass

def load(self, df):

pass

def run(self):

self.load(self.transform(self.extract()))

AWS Certified Solutions Architect Professional Slides v6
100% (8)
AWS Certified Solutions Architect Professional Slides v6
823 pages
Aws Certified Solutions Architect Associate Saa C03 DUMPS
50% (4)
Aws Certified Solutions Architect Associate Saa C03 DUMPS
402 pages
Azure Devops Explained
88% (8)
Azure Devops Explained
438 pages
Databricks Data Engineer Professional
No ratings yet
Databricks Data Engineer Professional
98 pages
Terraform Hands On Labs Complete PDF
100% (2)
Terraform Hands On Labs Complete PDF
421 pages
AWS Certified Solution Architect Associate Study Guide V1.0 Abdul Jaseem VP Release 30 Aug 2020
100% (6)
AWS Certified Solution Architect Associate Study Guide V1.0 Abdul Jaseem VP Release 30 Aug 2020
235 pages
Tutorials Dojo Study Guide and Cheat Sheets AWS Certified Cloud Practitioner 2021 10 01 xrhf1w
100% (13)
Tutorials Dojo Study Guide and Cheat Sheets AWS Certified Cloud Practitioner 2021 10 01 xrhf1w
196 pages
Kubernetes Practicals Ebook
75% (4)
Kubernetes Practicals Ebook
187 pages
Kubernetes Security
100% (2)
Kubernetes Security
156 pages
Terraform AWS
100% (3)
Terraform AWS
1,531 pages
Databricks Data Engg Pro Certification Dumps
100% (2)
Databricks Data Engg Pro Certification Dumps
41 pages
Terraform Certified
100% (3)
Terraform Certified
121 pages
Kubernetes
100% (6)
Kubernetes
138 pages
Terraform From Bigginer To Master
100% (4)
Terraform From Bigginer To Master
90 pages
Aws Lambda Tutorial
86% (7)
Aws Lambda Tutorial
393 pages
Terraform Practice Guide
100% (14)
Terraform Practice Guide
109 pages
AWS Interview 250 Questions
100% (1)
AWS Interview 250 Questions
34 pages
Honda Monkey z50j Workshop Manual
100% (5)
Honda Monkey z50j Workshop Manual
276 pages
Final MCQ DT
No ratings yet
Final MCQ DT
176 pages
Various Types of Speech Context: First Quarter
100% (1)
Various Types of Speech Context: First Quarter
12 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Google Professional Engineer
No ratings yet
Google Professional Engineer
13 pages
Ebooks File The Stolen Sisters Psychological Thriller 1st Edition Louise Jensen (Jensen All Chapters
100% (5)
Ebooks File The Stolen Sisters Psychological Thriller 1st Edition Louise Jensen (Jensen All Chapters
65 pages
CS614 FinalSolvedMCQ
No ratings yet
CS614 FinalSolvedMCQ
19 pages
Software Engineering - Mock Exit Exam 2023l24
No ratings yet
Software Engineering - Mock Exit Exam 2023l24
28 pages
Pre Requisite Form For CCS368
No ratings yet
Pre Requisite Form For CCS368
4 pages
Data Engineering Pre-Interview Quiz MCQ
100% (1)
Data Engineering Pre-Interview Quiz MCQ
8 pages
Job Description - BDA
No ratings yet
Job Description - BDA
2 pages
Eurocontrol-Spec-0139 MTCD Ed 2.0
No ratings yet
Eurocontrol-Spec-0139 MTCD Ed 2.0
36 pages
12 Ip Question Paper
No ratings yet
12 Ip Question Paper
8 pages
CompTIA Network+ N10-005 Exam Questions 600+
From Everand
CompTIA Network+ N10-005 Exam Questions 600+
Eddie Vi
2/5 (1)
Ip - P2 - Class 12 - 2023-24
No ratings yet
Ip - P2 - Class 12 - 2023-24
8 pages
Half Yearly Class12 Answer
No ratings yet
Half Yearly Class12 Answer
9 pages
Big Data 2020
No ratings yet
Big Data 2020
13 pages
CMT308 B
No ratings yet
CMT308 B
7 pages
CMT308 A
No ratings yet
CMT308 A
6 pages
Ilovepdf - Merged (3) - Merged
No ratings yet
Ilovepdf - Merged (3) - Merged
20 pages
Diversity in Designing and Assessing Learning Activities
100% (1)
Diversity in Designing and Assessing Learning Activities
42 pages
CS614 Final Term File Made by RajpoOt
No ratings yet
CS614 Final Term File Made by RajpoOt
10 pages
Cse357 MCQ
No ratings yet
Cse357 MCQ
28 pages
AWS Solution Architect Certification
67% (12)
AWS Solution Architect Certification
6 pages
Analytical Chemistry
No ratings yet
Analytical Chemistry
7 pages
CCNA (640-802) Exam Questions Cisco
From Everand
CCNA (640-802) Exam Questions Cisco
Eddie Vi
4.5/5 (14)
Redrockseventcontracts March20xx
No ratings yet
Redrockseventcontracts March20xx
19 pages
Shweta
No ratings yet
Shweta
4 pages
Class Xi Final Informatics Practices 2023-24
No ratings yet
Class Xi Final Informatics Practices 2023-24
6 pages
Computer Paper12
No ratings yet
Computer Paper12
15 pages
Class 12 Computer Science MCQ Set
No ratings yet
Class 12 Computer Science MCQ Set
11 pages
Coursera 2
No ratings yet
Coursera 2
17 pages
PRODJEDEV
No ratings yet
PRODJEDEV
8 pages
Xii Ip JPR QP PB2 Set-A
No ratings yet
Xii Ip JPR QP PB2 Set-A
7 pages
Ad/As: AP Macroeconomics Test: The Model
No ratings yet
Ad/As: AP Macroeconomics Test: The Model
6 pages
Dataengieer
No ratings yet
Dataengieer
23 pages
AD4 Reading Task PDF
No ratings yet
AD4 Reading Task PDF
1 page
DBMS Possible Qset 4
No ratings yet
DBMS Possible Qset 4
5 pages
Helpful
No ratings yet
Helpful
9 pages
CIS 631 Database Management Systems Sample Final Exam
No ratings yet
CIS 631 Database Management Systems Sample Final Exam
9 pages
Drafting Solutions: Multistep Equations in Context
No ratings yet
Drafting Solutions: Multistep Equations in Context
11 pages
Woodward PGPL Actuator-Driver - 37519
100% (1)
Woodward PGPL Actuator-Driver - 37519
38 pages
MCT 1 - CC - QP 28 Mar'24
No ratings yet
MCT 1 - CC - QP 28 Mar'24
8 pages
Themes, Theories, and Models
No ratings yet
Themes, Theories, and Models
7 pages
CCW CST308
No ratings yet
CCW CST308
6 pages
How To Download YouTube Shorts On Laptop
100% (1)
How To Download YouTube Shorts On Laptop
3 pages
PREBOARD 1 Qpaper Xii
No ratings yet
PREBOARD 1 Qpaper Xii
8 pages
Class Xii Informatics Practices (065) : Section A
No ratings yet
Class Xii Informatics Practices (065) : Section A
9 pages
QP 065 Informatic Practice New
No ratings yet
QP 065 Informatic Practice New
12 pages
Azure DPD 900 Questions
No ratings yet
Azure DPD 900 Questions
6 pages
Acid and Base
No ratings yet
Acid and Base
32 pages
Set A
No ratings yet
Set A
8 pages
ACA Big Data Dumps Full
No ratings yet
ACA Big Data Dumps Full
68 pages
RZT Service
No ratings yet
RZT Service
30 pages
CT 2
No ratings yet
CT 2
8 pages
Big Data Analytics 2M Definitions
No ratings yet
Big Data Analytics 2M Definitions
3 pages
Lab Spot Speed
67% (3)
Lab Spot Speed
18 pages
Dbms Unite Test 3 Ans
No ratings yet
Dbms Unite Test 3 Ans
6 pages
TOFD Backwall Deadzone: Pythagoras Applied To TOFD
No ratings yet
TOFD Backwall Deadzone: Pythagoras Applied To TOFD
4 pages
Date of Exam:25/09/2020: "T3 Examination, Sep 2020."
No ratings yet
Date of Exam:25/09/2020: "T3 Examination, Sep 2020."
6 pages
True Launch Assurance Training - Handover Issue
No ratings yet
True Launch Assurance Training - Handover Issue
52 pages
Processing
No ratings yet
Processing
9 pages
5 Differentiate Between Formal and Informal Planning Formal Planning Is An - Course Hero
No ratings yet
5 Differentiate Between Formal and Informal Planning Formal Planning Is An - Course Hero
20 pages
Pyqs
No ratings yet
Pyqs
9 pages
Cs614 Collection of Old Papers
No ratings yet
Cs614 Collection of Old Papers
13 pages
Xii Ip QP Second Set-1
No ratings yet
Xii Ip QP Second Set-1
6 pages
DS QCM BigData 2021
No ratings yet
DS QCM BigData 2021
6 pages
ACA BigData Consolidated Dump
No ratings yet
ACA BigData Consolidated Dump
29 pages
BDA MakeUp Solution
No ratings yet
BDA MakeUp Solution
7 pages
PMP Exam (2025-03-01) Exam Questions
No ratings yet
PMP Exam (2025-03-01) Exam Questions
61 pages
Palmer, A Fashion Theory Vol 12 'Reviewing Fashion Exhibitions'
No ratings yet
Palmer, A Fashion Theory Vol 12 'Reviewing Fashion Exhibitions'
6 pages
Value Added Services VAS Specialist
100% (1)
Value Added Services VAS Specialist
1 page
Databricks Certified Data Engineer Associate Practice Questions
No ratings yet
Databricks Certified Data Engineer Associate Practice Questions
6 pages
Biognosys Application Note HRM DIA Q Exactive
No ratings yet
Biognosys Application Note HRM DIA Q Exactive
3 pages
BD2 Text and Sol Draft
No ratings yet
BD2 Text and Sol Draft
4 pages
12 Ip Revision Paper
No ratings yet
12 Ip Revision Paper
7 pages
SQP 02 - QP
No ratings yet
SQP 02 - QP
9 pages
Big Data 2018
No ratings yet
Big Data 2018
6 pages
Xi Ip QP PT-2 2023-24
No ratings yet
Xi Ip QP PT-2 2023-24
5 pages
Cheat Sheet v2
No ratings yet
Cheat Sheet v2
3 pages
Kubernetes Basic To Advance End To End
100% (6)
Kubernetes Basic To Advance End To End
295 pages
Terraform Tutorial
No ratings yet
Terraform Tutorial
54 pages
Farm2Fam Success Story - Live Microgreens To Keep You Healthy PDF
No ratings yet
Farm2Fam Success Story - Live Microgreens To Keep You Healthy PDF
8 pages
Digital Technical Answers
No ratings yet
Digital Technical Answers
10 pages
Chapter 3
No ratings yet
Chapter 3
5 pages
Bda r16 Csdlo7032 QP
No ratings yet
Bda r16 Csdlo7032 QP
4 pages
44
No ratings yet
44
8 pages
Mr. Faisal Zia 5kW System
No ratings yet
Mr. Faisal Zia 5kW System
11 pages
AM RET670 SettingExample2 RB
100% (2)
AM RET670 SettingExample2 RB
21 pages
Jon Bonso - AWS Certified Solutions Architect Associate SAA-C03-Tutorials Dojo (2022)
67% (9)
Jon Bonso - AWS Certified Solutions Architect Associate SAA-C03-Tutorials Dojo (2022)
288 pages
Aws Solution Architect Associate Guide
100% (2)
Aws Solution Architect Associate Guide
43 pages
AWS Certified Solutions Architect Associate (Jon Bonso and Adrian Formaran)
80% (5)
AWS Certified Solutions Architect Associate (Jon Bonso and Adrian Formaran)
236 pages
Analytic Geometry Exam
No ratings yet
Analytic Geometry Exam
1 page
PDE Exam Dump 3
No ratings yet
PDE Exam Dump 3
98 pages
Amazon AWS Certified Solutions Architect Associate PDF
100% (2)
Amazon AWS Certified Solutions Architect Associate PDF
369 pages
Learn Kubernetes 5 Minutes at A Time
No ratings yet
Learn Kubernetes 5 Minutes at A Time
187 pages
AWS Course - All Slides
80% (10)
AWS Course - All Slides
879 pages
Diving Deep Into Kubernetes Networking
100% (3)
Diving Deep Into Kubernetes Networking
42 pages
Terraform Full Course
80% (5)
Terraform Full Course
34 pages
Cka PDF
80% (5)
Cka PDF
58 pages
AWS Certified DevOps Engineer Professional... Tests 2021
100% (3)
AWS Certified DevOps Engineer Professional... Tests 2021
210 pages
Docker Docker Tutorial For Beginners Build Ship and Run - Dennis Hutten
100% (11)
Docker Docker Tutorial For Beginners Build Ship and Run - Dennis Hutten
187 pages
AWS CSAA Practice-Questions DCT V08-Ambu0d
100% (11)
AWS CSAA Practice-Questions DCT V08-Ambu0d
411 pages