0% found this document useful (0 votes)

347 views7 pages

4.1 The Spark UI - Databricks

SPARK DataBricks

Uploaded by

Javier Melendrez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

347 views7 pages

4.1 The Spark UI - Databricks

SPARK DataBricks

Uploaded by

Javier Melendrez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

29/4/2021 4.

1 The Spark UI - Databricks

4.1 The Spark UI

%run ../Includes/Classroom-Setup

Mounting course-specific datasets to /mnt/training...

Datasets are already mounted to /mnt/training from s3a://databricks-corp-training/common

res1: Boolean = false

res2: Boolean = false

DROP TABLE IF EXISTS People10M;

CREATE TABLE People10M
USING csv
OPTIONS (
path "/mnt/training/dataframes/people-10m.csv",
header "true");

DROP TABLE IF EXISTS ssaNames;

CREATE TABLE ssaNames USING parquet OPTIONS (
path "/mnt/training/ssn/names.parquet",
header "true"
);

Catalog Error

file:///home/reivajmc/Documentos/SparkSQL/4.1 The Spark UI.html 1/7

29/4/2021 4.1 The Spark UI - Databricks

SELECT
firstName,
lastName,
birthDate
FROM
People10M
WHERE
year(birthDate) > 1990
AND gender = 'F'

  
firstName lastName birthDate
1 An Cowper 1992-02-08T05:00:00.000Z
2 Caroyln Cardon 1994-05-15T04:00:00.000Z
3 Yesenia Goldring 1997-07-09T04:00:00.000Z
4 Hedwig Pendleberry 1998-12-02T05:00:00.000Z
5 Kala Lyfe 1994-06-23T04:00:00.000Z
6 Gussie McKeeman 1991-11-15T05:00:00.000Z
7 Pansy Shrieves 1991-05-24T04:00:00.000Z
Showing the first 1000 rows.

Plan Optimization Example

CREATE OR REPLACE TEMPORARY VIEW joined AS

SELECT People10m.firstName,
to_date(birthDate) AS date
FROM People10m
JOIN ssaNames ON People10m.firstName = ssaNames.firstName;

CREATE OR REPLACE TEMPORARY VIEW filtered AS

SELECT firstName,count(firstName)
FROM joined
WHERE
date >= "1980-01-01"
GROUP BY
firstName, date;

file:///home/reivajmc/Documentos/SparkSQL/4.1 The Spark UI.html 2/7

29/4/2021 4.1 The Spark UI - Databricks

SELECT * FROM filtered;

 
firstName count(firstName)
1 Ellan 49
2 Charline 117
3 Latisha 72
4 Tonita 73
5 Gwenn 76
6 Nidia 67
7 Torri 91
Showing the first 1000 rows.

CACHE TABLE filtered;

SELECT * FROM filtered;

 
firstName count(firstName)
1 Ellan 49
2 Charline 117
3 Latisha 72
4 Tonita 73
5 Gwenn 76
6 Nidia 67
7 Torri 91
Showing the first 1000 rows.

SELECT * FROM filtered WHERE firstName = "Latisha";

 
firstName count(firstName)
1 Latisha 72
2 Latisha 72

file:///home/reivajmc/Documentos/SparkSQL/4.1 The Spark UI.html 3/7

29/4/2021 4.1 The Spark UI - Databricks

3 Latisha 72
4 Latisha 72
5 Latisha 72
6 Latisha 72
7 Latisha 72
Showing all 513 rows.

UNCACHE TABLE IF EXISTS filtered;

SELECT * FROM filtered WHERE firstName = "Latisha";

 
firstName count(firstName)
1 Latisha 72
2 Latisha 72
3 Latisha 72
4 Latisha 72
5 Latisha 72
6 Latisha 72
7 Latisha 72
Showing all 513 rows.

Set Partitions
DROP TABLE IF EXISTS bikeShare;
CREATE TABLE bikeShare
USING csv
OPTIONS (
path "/mnt/training/bikeSharing/data-001/hour.csv",
header "true")

file:///home/reivajmc/Documentos/SparkSQL/4.1 The Spark UI.html 4/7

29/4/2021 4.1 The Spark UI - Databricks

SELECT
*
FROM
bikeShare
WHERE
hr = 10

    
instant dteday season yr mnth hr
1 11 2011-01-01 1 0 1 10
2 34 2011-01-02 1 0 1 10
3 56 2011-01-03 1 0 1 10
4 79 2011-01-04 1 0 1 10
5 102 2011-01-05 1 0 1 10
6 125 2011-01-06 1 0 1 10
7 148 2011-01-07 1 0 1 10
Showing all 727 rows.

DROP TABLE IF EXISTS bikeShare_partitioned;

CREATE TABLE bikeShare_partitioned
PARTITIONED BY (p_hr)
AS
SELECT
instant,
dteday,
season,
yr,
mnth,
hr as p_hr,
holiday,
weekday,
workingday,
weathersit,
temp
FROM
bikeShare

Query returned no results

SELECT * FROM bikeShare_partitioned WHERE p_hr = 10

file:///home/reivajmc/Documentos/SparkSQL/4.1 The Spark UI.html 5/7

29/4/2021 4.1 The Spark UI - Databricks

    
instant dteday season yr mnth p_hr
1 11 2011-01-01 1 0 1 10
2 34 2011-01-02 1 0 1 10
3 56 2011-01-03 1 0 1 10
4 79 2011-01-04 1 0 1 10
5 102 2011-01-05 1 0 1 10
6 125 2011-01-06 1 0 1 10
7 148 2011-01-07 1 0 1 10
Showing all 727 rows.

Beware of small files!

DROP TABLE IF EXISTS bikeShare_parquet;
CREATE TABLE bikeShare
PARTITIONED BY (p_instant)
AS
SELECT
instant AS p_instant,
dteday,
season,
yr,
mnth,
hr
holiday,
weekday,
workingday,
weathersit,
temp
FROM
bikeShare_csv

%run ../Includes/Classroom-Cleanup

Citations
Bike Sharing Data

file:///home/reivajmc/Documentos/SparkSQL/4.1 The Spark UI.html 6/7

29/4/2021 4.1 The Spark UI - Databricks

[1] Fanaee-T, Hadi, and Gama, Joao, Event labeling combining ensemble detectors
and background knowledge, Progress in Artificial Intelligence (2013): pp. 1-15,
Springer Berlin Heidelberg, doi:10.1007/s13748-013-0040-3.

@article{ year={2013}, issn={2192-6352}, journal={Progress in Artificial Intelligence},

doi={10.1007/s13748-013-0040-3}, title={Event labeling combining ensemble
detectors and background knowledge}, url={http://dx.doi.org/10.1007/s13748-013-
0040-3} (http://dx.doi.org/10.1007/s13748-013-0040-3}), publisher={Springer Berlin
Heidelberg}, keywords={Event labeling; Event detection; Ensemble learning;
Background knowledge}, author={Fanaee-T, Hadi and Gama, Joao}, pages={1-15} }

Apache, Apache Spark, Spark and the Spark logo are trademarks of the Apache
Software Foundation (http://www.apache.org/).

file:///home/reivajmc/Documentos/SparkSQL/4.1 The Spark UI.html 7/7

ADF Copy Data
100% (1)
ADF Copy Data
81 pages
Azure Comapny Wise Question
No ratings yet
Azure Comapny Wise Question
68 pages
Snowflake
No ratings yet
Snowflake
122 pages
DBT Interview Questions
No ratings yet
DBT Interview Questions
18 pages
Azure DataEngineering End To End Videos
No ratings yet
Azure DataEngineering End To End Videos
21 pages
Azure DE Interview Que
100% (1)
Azure DE Interview Que
25 pages
Databricks Performance Tuning
No ratings yet
Databricks Performance Tuning
54 pages
Unity Catalog
No ratings yet
Unity Catalog
16 pages
Databricks Quiz Questions
No ratings yet
Databricks Quiz Questions
35 pages
Maneesh Azure
No ratings yet
Maneesh Azure
6 pages
Performance Tuning Spark UI
No ratings yet
Performance Tuning Spark UI
37 pages
Databricks Course Curriculum
No ratings yet
Databricks Course Curriculum
2 pages
Interview DE by Company Azurelib Dot Com
No ratings yet
Interview DE by Company Azurelib Dot Com
14 pages
PySpark Cheatsheet
No ratings yet
PySpark Cheatsheet
12 pages
DataEngineer Roadmap
No ratings yet
DataEngineer Roadmap
12 pages
Data Engineer Interview Questions
No ratings yet
Data Engineer Interview Questions
16 pages
How To Land On Azure Data Engineer Job
No ratings yet
How To Land On Azure Data Engineer Job
5 pages
Bhaskar ADE - Altimetrik
No ratings yet
Bhaskar ADE - Altimetrik
3 pages
Databricks Delta Guide
No ratings yet
Databricks Delta Guide
11 pages
Pyspark Hands On
No ratings yet
Pyspark Hands On
189 pages
Databricks Spark Reference Applications
No ratings yet
Databricks Spark Reference Applications
37 pages
Srikanth M - Data Engineer
No ratings yet
Srikanth M - Data Engineer
5 pages
Final Print Py Spark
No ratings yet
Final Print Py Spark
133 pages
Akash Resume
No ratings yet
Akash Resume
7 pages
Zclus - Harish - Data Engineer
No ratings yet
Zclus - Harish - Data Engineer
6 pages
Best Practices For Bucketing in Spark SQL - by David Vrba - Towards Data Science
No ratings yet
Best Practices For Bucketing in Spark SQL - by David Vrba - Towards Data Science
27 pages
Simulado Databricks
No ratings yet
Simulado Databricks
25 pages
Data Engineering Roadmap 2023
No ratings yet
Data Engineering Roadmap 2023
1 page
DBT Flow
No ratings yet
DBT Flow
15 pages
Databricks
No ratings yet
Databricks
11 pages
Databricks Pyspark 1712042928
100% (1)
Databricks Pyspark 1712042928
21 pages
Jarupula Praveen
No ratings yet
Jarupula Praveen
7 pages
SCD Typ2 in Databricks Azure
0% (1)
SCD Typ2 in Databricks Azure
8 pages
Microsoft Certified: Azure Data Engineer Associate - Skills Measured
No ratings yet
Microsoft Certified: Azure Data Engineer Associate - Skills Measured
4 pages
Azure DataEngineer Course Outline
No ratings yet
Azure DataEngineer Course Outline
4 pages
TCS Azure Data Engineer Interview Questions and Answers
No ratings yet
TCS Azure Data Engineer Interview Questions and Answers
7 pages
Lab 3 - Enabling Team Based Data Science With Azure Databricks
No ratings yet
Lab 3 - Enabling Team Based Data Science With Azure Databricks
18 pages
PracticeExam DataEngineerAssociate
No ratings yet
PracticeExam DataEngineerAssociate
23 pages
Master Snowflake Interview Q A 1729835390
No ratings yet
Master Snowflake Interview Q A 1729835390
7 pages
Must Know Pyspark Coding Before Databricks Interview
No ratings yet
Must Know Pyspark Coding Before Databricks Interview
7 pages
Azure Data Engineer Interview Questions
No ratings yet
Azure Data Engineer Interview Questions
15 pages
Transformations and Actions: A Visual Guide of The API
No ratings yet
Transformations and Actions: A Visual Guide of The API
122 pages
Airflow - Notes
No ratings yet
Airflow - Notes
82 pages
Farrell Reflective Practice Final
No ratings yet
Farrell Reflective Practice Final
28 pages
List of Eligible Not Eligible Candidates For The Post of Dental Surgeon Under Mobile Dental Clinic Project Under NHM
No ratings yet
List of Eligible Not Eligible Candidates For The Post of Dental Surgeon Under Mobile Dental Clinic Project Under NHM
21 pages
Priestly Formation Joe Mannath
100% (1)
Priestly Formation Joe Mannath
15 pages
Dhanush Bigdata Resume Updated
No ratings yet
Dhanush Bigdata Resume Updated
9 pages
Spark Interview Questions 1713805760
No ratings yet
Spark Interview Questions 1713805760
40 pages
Uflgrad 92656210 20241022114934
No ratings yet
Uflgrad 92656210 20241022114934
6 pages
Emassfile 3011
No ratings yet
Emassfile 3011
53 pages
Siva
No ratings yet
Siva
4 pages
Advanced Project For Data Engineering in Azure
100% (1)
Advanced Project For Data Engineering in Azure
5 pages
Interactive Visual Data Exploration With Spark in Databricks Cloud
No ratings yet
Interactive Visual Data Exploration With Spark in Databricks Cloud
26 pages
Azure Data Factory Interview Questions and Aswers
No ratings yet
Azure Data Factory Interview Questions and Aswers
5 pages
Ajay Kadiyala Resume 2023 PDF
No ratings yet
Ajay Kadiyala Resume 2023 PDF
6 pages
Spark Optimization PDF
100% (1)
Spark Optimization PDF
14 pages
CEP211L - Computer Fundamentals and Programming 2 (Laboratory) - SYLLABUS
No ratings yet
CEP211L - Computer Fundamentals and Programming 2 (Laboratory) - SYLLABUS
3 pages
Autonomouscollege
No ratings yet
Autonomouscollege
31 pages
BigBookIndex GregMat
No ratings yet
BigBookIndex GregMat
55 pages
Spark SQL Optimization
No ratings yet
Spark SQL Optimization
29 pages
SCD Type 2. Pyspark
No ratings yet
SCD Type 2. Pyspark
7 pages
CSE Vision Mission PEO PSO PO
No ratings yet
CSE Vision Mission PEO PSO PO
5 pages
Ajay Resume VLaF
No ratings yet
Ajay Resume VLaF
2 pages
Snow SQL
No ratings yet
Snow SQL
3 pages
Mini-Test 1 2022
No ratings yet
Mini-Test 1 2022
5 pages
Tech Enhanced Lesson Plan
100% (1)
Tech Enhanced Lesson Plan
6 pages
2024assignment W9
No ratings yet
2024assignment W9
9 pages
Suligan Angel Kently
No ratings yet
Suligan Angel Kently
3 pages
TWC20172754369
No ratings yet
TWC20172754369
31 pages
Angle of Arrival Measurement Using Multiple Static Monopole Antennas
No ratings yet
Angle of Arrival Measurement Using Multiple Static Monopole Antennas
11 pages
Deep-Learning Based Linear Precoding For MIMO Channels With Finite-Alphabet Signaling
No ratings yet
Deep-Learning Based Linear Precoding For MIMO Channels With Finite-Alphabet Signaling
4 pages
Gnuradio Programming
No ratings yet
Gnuradio Programming
28 pages
ADB Course Catalog
No ratings yet
ADB Course Catalog
84 pages
Databricks Project
No ratings yet
Databricks Project
1 page
Dempster 2016
No ratings yet
Dempster 2016
9 pages
Calculus For Engineering Q1 Exercises Lecture 1
No ratings yet
Calculus For Engineering Q1 Exercises Lecture 1
3 pages
Training and Development
No ratings yet
Training and Development
5 pages
List of South African Deobandi Sufis
No ratings yet
List of South African Deobandi Sufis
3 pages
BDA (Communications Component)
No ratings yet
BDA (Communications Component)
4 pages
Module 3 Topic 3 Statsap
No ratings yet
Module 3 Topic 3 Statsap
3 pages
Unit Plan: Grade 6 Social Studies, Provincial Government
No ratings yet
Unit Plan: Grade 6 Social Studies, Provincial Government
38 pages
Present Simple & Present Continuous: I Always Have Breakfast at 7 Am
No ratings yet
Present Simple & Present Continuous: I Always Have Breakfast at 7 Am
5 pages
可汗学院真题模考卷 12 模考卷答案
No ratings yet
可汗学院真题模考卷 12 模考卷答案
2 pages
From Teachers Centered To Students Centered EE
No ratings yet
From Teachers Centered To Students Centered EE
8 pages
Week 6: FS1 NAME: Maidy M. Cabason Course: Beed Iii
No ratings yet
Week 6: FS1 NAME: Maidy M. Cabason Course: Beed Iii
2 pages
Reflection Mitosis (Lab)
No ratings yet
Reflection Mitosis (Lab)
2 pages
Letter of Application
No ratings yet
Letter of Application
6 pages
Decision Making Strategies For Career Success
No ratings yet
Decision Making Strategies For Career Success
3 pages
Toastmasters Membership: The Benefits of
No ratings yet
Toastmasters Membership: The Benefits of
1 page
Doctor To The Barrios
No ratings yet
Doctor To The Barrios
1 page
Letter of Recommendation For MD Abdullah Al Masud - UBC
No ratings yet
Letter of Recommendation For MD Abdullah Al Masud - UBC
1 page
Ultimate AWS Certified Solutions Architect Associate Exam Guide: Master Designing Resilient, Scalable Architectures with Core and Advanced AWS Services to Crack the SAA-C03 Certification (English Edition)
From Everand
Ultimate AWS Certified Solutions Architect Associate Exam Guide: Master Designing Resilient, Scalable Architectures with Core and Advanced AWS Services to Crack the SAA-C03 Certification (English Edition)
Venkata Sasi Kanumuri
No ratings yet
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet

4.1 The Spark UI - Databricks

Uploaded by

4.1 The Spark UI - Databricks

Uploaded by

29/4/2021 4.

1 The Spark UI - Databricks

4.1 The Spark UI

Mounting course-specific datasets to /mnt/training...

res1: Boolean = false

res2: Boolean = false

DROP TABLE IF EXISTS People10M;

DROP TABLE IF EXISTS ssaNames;

file:///home/reivajmc/Documentos/SparkSQL/4.1 The Spark UI.html 1/7

Plan Optimization Example

CREATE OR REPLACE TEMPORARY VIEW joined AS

CREATE OR REPLACE TEMPORARY VIEW filtered AS

file:///home/reivajmc/Documentos/SparkSQL/4.1 The Spark UI.html 2/7

SELECT * FROM filtered;

CACHE TABLE filtered;

SELECT * FROM filtered;

SELECT * FROM filtered WHERE firstName = "Latisha";

file:///home/reivajmc/Documentos/SparkSQL/4.1 The Spark UI.html 3/7

UNCACHE TABLE IF EXISTS filtered;

SELECT * FROM filtered WHERE firstName = "Latisha";

file:///home/reivajmc/Documentos/SparkSQL/4.1 The Spark UI.html 4/7

DROP TABLE IF EXISTS bikeShare_partitioned;

Query returned no results

SELECT * FROM bikeShare_partitioned WHERE p_hr = 10

file:///home/reivajmc/Documentos/SparkSQL/4.1 The Spark UI.html 5/7

Beware of small files!

file:///home/reivajmc/Documentos/SparkSQL/4.1 The Spark UI.html 6/7

@article{ year={2013}, issn={2192-6352}, journal={Progress in Artificial Intelligence},

© 2020 Databricks, Inc. All rights reserved.

file:///home/reivajmc/Documentos/SparkSQL/4.1 The Spark UI.html 7/7

You might also like