SparkStepbyStepInterviewGuide Draft

The document contains a comprehensive list of questions related to Apache Spark, covering topics such as differences between Spark and Hadoop, RDDs, transformations and actions, Spark ecosystem, data formats, and Spark Streaming. It also addresses performance considerations, memory management, fault tolerance, and various Spark functionalities. Overall, the document serves as a guide for understanding key concepts and practical applications of Spark.

Uploaded by

Dastagiri Saheb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views3 pages

SparkStepbyStepInterviewGuide Draft

Uploaded by

Dastagiri Saheb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

1. What is the difference between Spark and Hadoop?

2. What are the differences between functional and imperative languages, and why is functional
programming important?
3. What is a resilient distributed dataset (RDD), explain showing diagrams?
4. Explain transformations and actions (in the context of RDDs)
5. What are the Spark use cases?
6. Why do we need transformations? What is lazy evaluation and why is it useful?
7. What is ParallelCollectionRDD?
8. Explain how ReduceByKey and GroupByKey work?
9. What is the common workflow of a Spark program?
10. Explain Spark environment for driver. Ref
11. What are the transformations and actions that you have used in Spark?
12. How can you minimize data transfers when working with Spark?
13. What is a lineage graph?
14. Describe the major libraries that constitute the Spark Ecosystem
15. What are the different file formats that can be used in SparkSql?
16. What are Pair RDDs?
17. What is the difference between persist() and cache()
18. What are the various levels of persistence in Apache Spark? Ref
19. Which Storage Level to choose? Ref
20. Explain advantages and drawbacks of RDD
21. Explain why dataset is preferred over RDDs?
22. How to share data from Spark RDD between two applications?
23. Does Apache Spark provide check pointing?
24. Explain the internal working of caching?
25. What is the function of Block manager?
26. Why does Spark SQL consider the support of indexes unimportant?
27. How to convert existing UDTFs in Hive to Scala functions and use them from Spark SQL? Explain
with example Ref 23
28. Why use dataframes and datasets when we have RDD? Ref Video
29. What is a Catalyst and how does it work? Ref
30. What are the top challenges developers faces while writing Spark applications? Ref Video
31. Explain the difference in implementation between DataFrames and DataSet?
32. How is memory handled in Datasets?
33. What are the limitations of dataset?
34. What are the contentions with memory?
35. Show Command to run Spark in YARN client mode?
36. Show Command to run Spark in YARN cluster mode?
37. What is Standalone and YARN mode?
38. Explain client mode and cluster mode in Spark?
39. Which cluster managers are supported by Spark?
40. What is Executor memory?
41. What is DStream and what is the difference between batch and Dstream in Spark streaming?
42. How does Spark Streaming work?
43. Difference between map() and flatMap()?
44. What is reduce() action, Is there any difference between reduce() and reduceByKey()?
45. What is the disadvantage of reduce() action and how can we overcome this limitation?
46. What are Accumulators and when are accumulators truly reliable?
47. What is Broadcast Variables and what advantage do they provide?
48. What is piping? Demonstrate with an example of a data pipeline.
49. What is a driver?
50. What does a Spark Engine do?
51. What are the steps that occur when you run a Spark application on a cluster?
52. What is a schema RDD/DataFrame?
53. What are Row objects?
54. How does Spark achieve fault tolerance?
55. What parameter is set if cores need to be defined across executors?
56. Name few Spark Master system properties?
57. Define Partitions in reference to Spark implementation?
58. Differences between how Spark and MapReduce manage cluster resources under YARN. Ref
59. What is GraphX and what is PageRank? Ref
60. What does MLlib do? Ref
61. What is a Parquet file?
62. Why is Parquet used for Spark SQL? Ref
63. What is schema evolution and what is its disadvantage, explain schema merging in reference to
parquet file? Ref
64. Will Spark replace MapReduce?
65. What is Spark Executor?
66. Name the different types of Cluster Managers in Spark.
67. How many ways we can create RDDs, show example?
68. How do you flatten rows in Spark? Explain with example. Ref
69. What is Hive on Spark?
70. Explain Spark Streaming Architecture?
71. What are the types of Transformations on DStreams?
72. What is Receiver in Spark Streaming, and can you build custom receivers?
73. Explain the process of Live streaming storing DStream data to database? Ref
74. How is Spark streaming fault tolerant?
75. Explain transform() method used in dSteam? Ref
76. What file systems does Spark support?
77. How is data security achieved in Spark?
78. Explain Kerberos security? Ref
79. Name the various types of distributing that Spark supports?
80. Show some example queries using the Scala DataFrame API. Ref
81. What are the conditions where Spark driver can parallelize dataSets as RDDs?
82. Can repartition() operation decrease the number of partitions? Ref
83. What is the drawback of repartition() and coalesce() operations?
84. In a join operaton for example val joinVal = rddA.join(rddB) will it generate partition?
85. Consider the following code in Spark, what is the final value in fVal variable?
86. Scala pattern matching - Show various ways code can be written?
87. What is the return result when a query is executed using Spark SQL or HIVE? Hint: RDD or
dataframe/dataset?
88. If we want to display just the schema of a dataframe/dataset what method is called?
89. Show various implementations for the following query in Spark?
90. What are the most important factors you want to consider when you start machine learning
project?
91. As a data scientist, which algorithm would you suggest if legal aspects and ease of explanation
to non technical people are the main criteria?
92. For the supervised learning algorithm, what percentage of data is split between training and test
dataset?
93. Compare performance of Avro and parquet file formats and their usage (in the context of Spark)
94. When you should not use Spark?
95. Can you use Spark to access and analyze data stored in Cassandra databases?
96. With which mathematical properties can you achieve parallelism?
97. What are various types of Partitioning in Apache Spark?
98. How to set partitioning for data in Apache Spark?
99. Spark master Exposes set of REST API's to submit and monitor Applications ,which data format
used for these web services?

Spark Interview Questions
No ratings yet
Spark Interview Questions
3 pages
Apache Spark Interview Questions
No ratings yet
Apache Spark Interview Questions
12 pages
Understanding Apache Spark Architecture
No ratings yet
Understanding Apache Spark Architecture
30 pages
TFWolj ND9 K
No ratings yet
TFWolj ND9 K
25 pages
Tech Mahindra
No ratings yet
Tech Mahindra
2 pages
Imp Pyspark Questions
No ratings yet
Imp Pyspark Questions
1 page
Pyspark 4
No ratings yet
Pyspark 4
5 pages
PySpark Comprehensive Notes
No ratings yet
PySpark Comprehensive Notes
59 pages
Pyq 435
No ratings yet
Pyq 435
1 page
Apache Spark Interview Prep Guide
No ratings yet
Apache Spark Interview Prep Guide
18 pages
Apache Spark IQ
No ratings yet
Apache Spark IQ
15 pages
Pyspark Interview Q & A in Topic Wise
No ratings yet
Pyspark Interview Q & A in Topic Wise
5 pages
2025 Pyspark Interview Questions Collections
No ratings yet
2025 Pyspark Interview Questions Collections
50 pages
Extended Spark Interview QA
No ratings yet
Extended Spark Interview QA
3 pages
Top Answers To Spark Interview Questions
No ratings yet
Top Answers To Spark Interview Questions
32 pages
Top Answers To Spark Interview Questions
No ratings yet
Top Answers To Spark Interview Questions
32 pages
SPARK Question Answers
No ratings yet
SPARK Question Answers
19 pages
Spark Interview Questions
No ratings yet
Spark Interview Questions
19 pages
Data Engineer
No ratings yet
Data Engineer
19 pages
Apache Spark: Key Concepts & Features
No ratings yet
Apache Spark: Key Concepts & Features
8 pages
Apache Spark Interview Scenarios
No ratings yet
Apache Spark Interview Scenarios
4 pages
Pyspark Theory Questions
No ratings yet
Pyspark Theory Questions
5 pages
Apache Spark Components Guide
No ratings yet
Apache Spark Components Guide
6 pages
Spark Interview Q&A: Key Insights
No ratings yet
Spark Interview Q&A: Key Insights
10 pages
Spark Interview Prep Guide
No ratings yet
Spark Interview Prep Guide
4 pages
Skyess Spark Syllabus
No ratings yet
Skyess Spark Syllabus
12 pages
Spark Questions
No ratings yet
Spark Questions
3 pages
Key Differences in Aache Spark Components and Concepts
No ratings yet
Key Differences in Aache Spark Components and Concepts
7 pages
Msbte Super 25 Unit 5 Notes
No ratings yet
Msbte Super 25 Unit 5 Notes
17 pages
Important Interview Qa
No ratings yet
Important Interview Qa
13 pages
Spark Vs Hadoop Features Spark
No ratings yet
Spark Vs Hadoop Features Spark
9 pages
Data Engineering Part - 2
No ratings yet
Data Engineering Part - 2
21 pages
Pysparkq
No ratings yet
Pysparkq
3 pages
New Questions From Batch
No ratings yet
New Questions From Batch
7 pages
Pyspark Dumps
No ratings yet
Pyspark Dumps
10 pages
Full PySpark Interview QA
No ratings yet
Full PySpark Interview QA
5 pages
SPARK Interview Questions
No ratings yet
SPARK Interview Questions
12 pages
Big Data & Hadoop Essentials
No ratings yet
Big Data & Hadoop Essentials
4 pages
Spark Intreview FAQ
100% (2)
Spark Intreview FAQ
21 pages
Spark Interview Questions
No ratings yet
Spark Interview Questions
61 pages
PySpark Interview Questions
No ratings yet
PySpark Interview Questions
2 pages
PySpark Core Concepts & Interview Prep
No ratings yet
PySpark Core Concepts & Interview Prep
8 pages
Spark Interview Questions
No ratings yet
Spark Interview Questions
4 pages
Super 25 Unit 5 Notes
No ratings yet
Super 25 Unit 5 Notes
11 pages
PDF Apache Spark
No ratings yet
PDF Apache Spark
15 pages
Spark Interview Questions 04
No ratings yet
Spark Interview Questions 04
4 pages
Pyspark Questions & Scenario Based
No ratings yet
Pyspark Questions & Scenario Based
25 pages
Big Data Computing Notes
No ratings yet
Big Data Computing Notes
17 pages
Spark Interview Questions: Click Here
No ratings yet
Spark Interview Questions: Click Here
35 pages
Introduction To Spark
No ratings yet
Introduction To Spark
30 pages
Spark Interview Prep for Telugu Speakers
100% (3)
Spark Interview Prep for Telugu Speakers
31 pages
Top Spark Interview Q&A
No ratings yet
Top Spark Interview Q&A
21 pages
PySpark Interview Questions 2025
No ratings yet
PySpark Interview Questions 2025
8 pages
8888888888888888888
100% (1)
8888888888888888888
131 pages
Big Data
No ratings yet
Big Data
7 pages
Spark Main
No ratings yet
Spark Main
75 pages
Big Data Tools and Its Framework
No ratings yet
Big Data Tools and Its Framework
5 pages
Top 45 Apache Spark Interview Questions and Answers - by Sanjay Kumar PHD - Medium
No ratings yet
Top 45 Apache Spark Interview Questions and Answers - by Sanjay Kumar PHD - Medium
26 pages
Tarea 8
0% (2)
Tarea 8
13 pages
Mainflutter
No ratings yet
Mainflutter
3 pages
Acnoo Admin Doc
No ratings yet
Acnoo Admin Doc
47 pages
Oyster Card Problem
No ratings yet
Oyster Card Problem
2 pages
Import cv2
No ratings yet
Import cv2
6 pages
Kafka Sparkstreaming
No ratings yet
Kafka Sparkstreaming
75 pages
Java Memory Leaks & Finalize Method
No ratings yet
Java Memory Leaks & Finalize Method
21 pages
Spark Essentials for Data Engineers
No ratings yet
Spark Essentials for Data Engineers
17 pages
See How Talend Helped Domino's: Integrate Data From 85,000 Sources
No ratings yet
See How Talend Helped Domino's: Integrate Data From 85,000 Sources
6 pages
Mastering Critical Reasoning
No ratings yet
Mastering Critical Reasoning
50 pages
Analyzing Requirements Using Models
No ratings yet
Analyzing Requirements Using Models
27 pages
Quant Riddles for Interview Prep
No ratings yet
Quant Riddles for Interview Prep
5 pages
VMware Vsphere Security Configuration Guide 7 - Guidance - 701-20210210-01
No ratings yet
VMware Vsphere Security Configuration Guide 7 - Guidance - 701-20210210-01
10 pages
SAP BW Transport Management Guide
No ratings yet
SAP BW Transport Management Guide
14 pages
Chapter 2 Spreadsheet - 2.5
No ratings yet
Chapter 2 Spreadsheet - 2.5
21 pages
Web Technology Lesson Plan
No ratings yet
Web Technology Lesson Plan
4 pages
Online Dangers
No ratings yet
Online Dangers
3 pages
Admission Booklet
No ratings yet
Admission Booklet
43 pages
Compiler Construction Exam
No ratings yet
Compiler Construction Exam
2 pages
Methods-Exercises
No ratings yet
Methods-Exercises
5 pages
Order of Operations
No ratings yet
Order of Operations
2 pages
Prequalification Documents 2017
No ratings yet
Prequalification Documents 2017
12 pages
Lecture 5-Working With Shapefiles
No ratings yet
Lecture 5-Working With Shapefiles
15 pages
BC Reads Adult Literacy Fundamental English Reader 1 1660232486
No ratings yet
BC Reads Adult Literacy Fundamental English Reader 1 1660232486
90 pages
Music Recommender System Guide
No ratings yet
Music Recommender System Guide
11 pages
Business Requirements Document /: Project Name Module Name
No ratings yet
Business Requirements Document /: Project Name Module Name
11 pages
Reviewer Gen 008 2
No ratings yet
Reviewer Gen 008 2
4 pages
M-CAD-Solutions-BIW Fixture Design Course
No ratings yet
M-CAD-Solutions-BIW Fixture Design Course
11 pages
OBS-ONE-PN Operation Manual
No ratings yet
OBS-ONE-PN Operation Manual
211 pages
UNIT 4 Python
No ratings yet
UNIT 4 Python
28 pages
Ofosuhene Stephan 2019 ENGR CapstoneProject
No ratings yet
Ofosuhene Stephan 2019 ENGR CapstoneProject
42 pages
Machine Learning Software Engineering
No ratings yet
Machine Learning Software Engineering
3 pages
InfoSec Controls for IT Professionals
No ratings yet
InfoSec Controls for IT Professionals
20 pages
Weblogic Installation Steps
No ratings yet
Weblogic Installation Steps
5 pages
Confused - What Is .NET Aspire - Let Me Explain
No ratings yet
Confused - What Is .NET Aspire - Let Me Explain
17 pages
Statistical On Chip Variation (SOCV) Library Format
No ratings yet
Statistical On Chip Variation (SOCV) Library Format
29 pages
CSNB594 - 4423-Assignment 1 Question
No ratings yet
CSNB594 - 4423-Assignment 1 Question
4 pages
ASU Resume Template
No ratings yet
ASU Resume Template
1 page
MiniPod Receiver User Guide
No ratings yet
MiniPod Receiver User Guide
17 pages
Linux Basics For Hackers OccupyTheWeb Free Download, Borrow, and Streaming Internet Archive
No ratings yet
Linux Basics For Hackers OccupyTheWeb Free Download, Borrow, and Streaming Internet Archive
1 page
CSE Nptel Registrations
No ratings yet
CSE Nptel Registrations
82 pages
AI & Neural Networks Overview
No ratings yet
AI & Neural Networks Overview
25 pages

SparkStepbyStepInterviewGuide Draft

Uploaded by

SparkStepbyStepInterviewGuide Draft

Uploaded by

1. What is the difference between Spark and Hadoop?

You might also like