100% found this document useful (1 vote)

463 views6 pages

Pyspark vs Spark SQL: Moving Average Analysis

Uploaded by

Sozha Vendhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

463 views6 pages

Pyspark vs Spark SQL: Moving Average Analysis

Uploaded by

Sozha Vendhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Scenario Based Interview

Pyspark vs
Spark SQL

Ganesh. R
#Problem Statement You are the restaurant owner and you want to analyze a possible
expansion (there will be at least one customer every day).

Compute the moving average of how much the customer paid in a seven days window (i.e.,
current day + 6 days before). average_amount should be rounded to two decimal places.

Return the result table ordered by visited_on in ascending order.

from [Link] import SparkSession

from [Link] import functions as F
from [Link] import col, sum, round, window
from [Link] import DateType

# Initialize Spark session

spark = [Link]("MovingAverage").getOrCreate()

# Sample data
data = [
(1, "Jhon", "2019-01-01", 100),
(2, "Daniel", "2019-01-02", 110),
(3, "Jade", "2019-01-03", 120),
(4, "Khaled", "2019-01-04", 130),
(5, "Winston", "2019-01-05", 110),
(6, "Elvis", "2019-01-06", 140),
(7, "Anna", "2019-01-07", 150),
(8, "Maria", "2019-01-08", 80),
(9, "Jaze", "2019-01-09", 110),
(1, "Jhon", "2019-01-10", 130),
(3, "Jade", "2019-01-10", 150),
]

# Create DataFrame
columns = ["customer_id", "name", "visited_on", "amount"]
df = [Link](data, schema=columns)

[Link]()
[Link]()

root
|-- customer_id: long (nullable = true)
|-- name: string (nullable = true)
|-- visited_on: string (nullable = true)
|-- amount: long (nullable = true)

# Define a window specification

window_spec = [Link]("visited_on").rowsBetween(-6, 0)

# Calculate the rolling sum and average

result_df = (
[Link]("visited_on")
.agg(sum("amount").alias("daily_amount"))
.withColumn("amount", sum("daily_amount").over(window_spec))
.withColumn("average_amount",
round(avg("daily_amount").over(window_spec), 2))
)

# Filter to include only rows where row_number >= 7

result_df = (
result_df.withColumn("row_number",
row_number().over([Link]("visited_on")))
.filter(col("row_number") >= 7)
.select("visited_on", "amount", "average_amount")
)

# Show the result

result_df.display()

[Link]("Customer")

%sql
WITH CustomerGrouped AS (
SELECT
visited_on,
SUM(amount) AS total_amount
FROM
Customer
GROUP BY
visited_on
),
MovingAverage AS (
SELECT
visited_on,
total_amount,
SUM(total_amount) OVER (
ORDER BY
visited_on ROWS BETWEEN 6 PRECEDING
AND CURRENT ROW
) AS sum_amount_7d
FROM
CustomerGrouped
)
SELECT
visited_on,
sum_amount_7d AS amount,
ROUND(sum_amount_7d / 7, 2) AS average_amount
FROM
MovingAverage
WHERE
DATEDIFF(
visited_on,
(
SELECT
MIN(visited_on)
FROM
CustomerGrouped
)
) >= 6
ORDER BY
visited_on;
IF YOU FOUND
THIS POST
USEFUL, PLEASE
SAVE IT.

Ganesh. R
+91-9030485102. Hyderabad, Telangana. rganesh0203@[Link]

[Link] [Link]
[Link] [Link]

[Link] [Link]

Scenario Based Interview
Pyspark vs
Spark SQL

Ganesh. R

#Problem Statement You are the restaurant owner and you want to analyze a possible
expansion (there will be at least one cus

df.groupBy("visited_on")
.agg(sum("amount").alias("daily_amount"))
.withColumn("amount", sum("daily_amount").over

visited_on,
(
SELECT
MIN(visited_on)
FROM
CustomerGrouped
)
) >= 6
ORDER BY
visit

IF YOU FOUND
THIS POST
USEFUL, PLEASE
SAVE IT.

Ganesh. R

+91-9030485102.
Hyderabad, Telangana.
rganesh0203@gmail.com

https://med

PySpark RDD Operations Cheat Sheet
No ratings yet
PySpark RDD Operations Cheat Sheet
1 page
Apache Spark Tutorial Overview
No ratings yet
Apache Spark Tutorial Overview
8 pages
SQL to PySpark Conversion Guide
No ratings yet
SQL to PySpark Conversion Guide
9 pages
PySpark ELT CheatSheet and Commands
100% (1)
PySpark ELT CheatSheet and Commands
8 pages
Pyspark Union vs UnionByName Guide
No ratings yet
Pyspark Union vs UnionByName Guide
42 pages
PySpark 3.0 Quick Reference Guide
No ratings yet
PySpark 3.0 Quick Reference Guide
2 pages
Data Cleaning with Apache Spark
No ratings yet
Data Cleaning with Apache Spark
21 pages
PySpark SQL Basics Cheat Sheet
No ratings yet
PySpark SQL Basics Cheat Sheet
1 page
PySpark Tutorial for Beginners
No ratings yet
PySpark Tutorial for Beginners
19 pages
PySpark Employee Salary Queries
No ratings yet
PySpark Employee Salary Queries
22 pages
Spark RDD Actions and Transformations
No ratings yet
Spark RDD Actions and Transformations
25 pages
Data Lakes and PySpark Interview Questions
100% (1)
Data Lakes and PySpark Interview Questions
14 pages
Py Spark
83% (6)
Py Spark
195 pages
99 Apache Spark Interview Questions For Professionals
33% (12)
99 Apache Spark Interview Questions For Professionals
11 pages
PySpark ETL Process Overview
80% (5)
PySpark ETL Process Overview
7 pages
Learning Apache Spark with Python
No ratings yet
Learning Apache Spark with Python
10 pages
Spark SQL Query Optimization Techniques
No ratings yet
Spark SQL Query Optimization Techniques
29 pages
PySpark DataFrame Operations Guide
No ratings yet
PySpark DataFrame Operations Guide
7 pages
PySpark Cheat Sheet for Data Engineers
No ratings yet
PySpark Cheat Sheet for Data Engineers
7 pages
PySpark SQL Basics Cheat Sheet
No ratings yet
PySpark SQL Basics Cheat Sheet
1 page
Pyspark Union and UnionByName Guide
No ratings yet
Pyspark Union and UnionByName Guide
66 pages
Pyspark Window Functions Overview
100% (1)
Pyspark Window Functions Overview
8 pages
Optimizing Databricks Workloads
100% (1)
Optimizing Databricks Workloads
18 pages
Optimizing Databricks with Z-Ordering
No ratings yet
Optimizing Databricks with Z-Ordering
16 pages
Pyspark Actions and Transformations Guide
No ratings yet
Pyspark Actions and Transformations Guide
25 pages
SCD Type-1 & 2 in PySpark Guide
No ratings yet
SCD Type-1 & 2 in PySpark Guide
6 pages
Python Interview Questions and Answers
No ratings yet
Python Interview Questions and Answers
4 pages
PySpark Practice Notes and Examples
100% (1)
PySpark Practice Notes and Examples
63 pages
PySpark Interview Questions and Scenarios
0% (1)
PySpark Interview Questions and Scenarios
3 pages
Spark Interview Questions and Concepts
No ratings yet
Spark Interview Questions and Concepts
3 pages
Master PySpark: DataFrame Operations Guide
No ratings yet
Master PySpark: DataFrame Operations Guide
106 pages
Databricks Sparkconfig 1669383836
No ratings yet
Databricks Sparkconfig 1669383836
1 page
Apache Spark Tutorial
100% (4)
Apache Spark Tutorial
36 pages
Top Apache Spark Interview Questions
No ratings yet
Top Apache Spark Interview Questions
20 pages
SCD Implementation in Databricks
No ratings yet
SCD Implementation in Databricks
16 pages
Spark Interview Questions: Driver & Data Skew
No ratings yet
Spark Interview Questions: Driver & Data Skew
34 pages
Databricks For The SQL Developer: Gerhard Brueckl
No ratings yet
Databricks For The SQL Developer: Gerhard Brueckl
40 pages
Advanced Apache Spark Interview Questions
100% (1)
Advanced Apache Spark Interview Questions
7 pages
Apache Spark Interview Questions Guide
100% (3)
Apache Spark Interview Questions Guide
31 pages
Spark Optimization Techniques Handbook
No ratings yet
Spark Optimization Techniques Handbook
7 pages
Spark Optimization Techniques Overview
No ratings yet
Spark Optimization Techniques Overview
7 pages
PySpark SQL Interview Questions & Solutions
100% (2)
PySpark SQL Interview Questions & Solutions
16 pages
Databricks Lakehouse Overview and Use Cases
100% (1)
Databricks Lakehouse Overview and Use Cases
219 pages
Essential PySpark Commands Guide
No ratings yet
Essential PySpark Commands Guide
12 pages
Overview of Spark Architecture
50% (2)
Overview of Spark Architecture
12 pages
Data Engineering with Databricks Course
No ratings yet
Data Engineering with Databricks Course
5 pages
Pyspark PDF
100% (1)
Pyspark PDF
406 pages
Building Data Pipelines with Delta Live Tables
No ratings yet
Building Data Pipelines with Delta Live Tables
52 pages
PySpark Features and Applications
No ratings yet
PySpark Features and Applications
31 pages
Pyspark Moving Average Calculation
No ratings yet
Pyspark Moving Average Calculation
6 pages
Understanding SQL Window Functions
No ratings yet
Understanding SQL Window Functions
14 pages
Sliding Window Analytics in Spark
No ratings yet
Sliding Window Analytics in Spark
16 pages
MySQL Window Functions Explained
No ratings yet
MySQL Window Functions Explained
10 pages
SQL and PySpark Window Functions Guide
No ratings yet
SQL and PySpark Window Functions Guide
5 pages
EY & Zepto Data Analyst Interview Guide
No ratings yet
EY & Zepto Data Analyst Interview Guide
24 pages
Data Engineering Interview Prep Guide
No ratings yet
Data Engineering Interview Prep Guide
7 pages
50 Essential PySpark Transformations
No ratings yet
50 Essential PySpark Transformations
18 pages
Loan Fraud Detection with SQL Analytics
No ratings yet
Loan Fraud Detection with SQL Analytics
9 pages
Data Engineering with Microsoft Fabric
No ratings yet
Data Engineering with Microsoft Fabric
47 pages
SQL Functions and Queries Overview
No ratings yet
SQL Functions and Queries Overview
9 pages
Understanding Linear Regression Concepts
No ratings yet
Understanding Linear Regression Concepts
37 pages
Top 12 Python Libraries for Finance
100% (1)
Top 12 Python Libraries for Finance
15 pages
Understanding Apache Spark Concepts
No ratings yet
Understanding Apache Spark Concepts
63 pages
SQL Bootcamp: From Zero to Hero
100% (2)
SQL Bootcamp: From Zero to Hero
110 pages
SQL Fundamentals for Data Engineering
No ratings yet
SQL Fundamentals for Data Engineering
61 pages
Healthcare Management System Project Report
No ratings yet
Healthcare Management System Project Report
21 pages
Lifan X60 Parts Catalog Overview
100% (3)
Lifan X60 Parts Catalog Overview
242 pages
Powerwerx DB-750X User Manual
No ratings yet
Powerwerx DB-750X User Manual
46 pages
Vitek BCI
100% (1)
Vitek BCI
124 pages
Clear Script for Pluto Actual Data
No ratings yet
Clear Script for Pluto Actual Data
2 pages
HLD6000 Refrigerant Leak Detector
100% (1)
HLD6000 Refrigerant Leak Detector
6 pages
Insights on Altered States of Consciousness
No ratings yet
Insights on Altered States of Consciousness
14 pages
OpenCode AI Agents: Features & Timeline
No ratings yet
OpenCode AI Agents: Features & Timeline
8 pages
R0144dom PDF
No ratings yet
R0144dom PDF
470 pages
Power Electronics
No ratings yet
Power Electronics
676 pages
Understanding Generative AI Applications
No ratings yet
Understanding Generative AI Applications
23 pages
Digital Cinema Editing: A New Paradigm
No ratings yet
Digital Cinema Editing: A New Paradigm
6 pages
Key Concepts of ROS Programming
No ratings yet
Key Concepts of ROS Programming
40 pages
Battery Energy Storage Evaluation Method
No ratings yet
Battery Energy Storage Evaluation Method
25 pages
WhatsApp Privacy Policy Overview 2025
No ratings yet
WhatsApp Privacy Policy Overview 2025
5 pages
SAP Database Access for Beginners
No ratings yet
SAP Database Access for Beginners
13 pages
8085 Microprocessor: 8-bit Arithmetic Operations
No ratings yet
8085 Microprocessor: 8-bit Arithmetic Operations
13 pages
IT Infrastructure and Service Management
No ratings yet
IT Infrastructure and Service Management
35 pages
Uln2803 Motor Driver Applications
No ratings yet
Uln2803 Motor Driver Applications
2 pages
Optical Fiber Cable Overview and Splicing
No ratings yet
Optical Fiber Cable Overview and Splicing
29 pages
Oracle DBA Backup Scheduling Strategies
No ratings yet
Oracle DBA Backup Scheduling Strategies
11 pages
English Manual Trainz 2006
50% (2)
English Manual Trainz 2006
165 pages
Overview of Operating System Types
No ratings yet
Overview of Operating System Types
11 pages
ASUS TX300CA-DH71 Ultrabook Manual
No ratings yet
ASUS TX300CA-DH71 Ultrabook Manual
120 pages
Sound Level Configuration for NPC Alerts
No ratings yet
Sound Level Configuration for NPC Alerts
45 pages
Access Control Lists (Acls)
No ratings yet
Access Control Lists (Acls)
10 pages
WLAN Troubleshooting Guide by Dr. Al-Doori
No ratings yet
WLAN Troubleshooting Guide by Dr. Al-Doori
107 pages
Exam Day Access Arrangements Form
No ratings yet
Exam Day Access Arrangements Form
2 pages
Digitization Benefits for Offices
No ratings yet
Digitization Benefits for Offices
38 pages

Pyspark vs Spark SQL: Moving Average Analysis

Uploaded by

Pyspark vs Spark SQL: Moving Average Analysis

Uploaded by

Scenario Based Interview

Return the result table ordered by visited_on in ascending order.

from [Link] import SparkSession

# Initialize Spark session

# Define a window specification

# Calculate the rolling sum and average

# Filter to include only rows where row_number >= 7

# Show the result

You might also like