Address
:
[go:
up one dir
,
main page
]
Include Form
Remove Scripts
Session Cookies
Homepage
Open in app
Sign in
Get started
Data Engineer Things
Insights and ideas on data and engineering.
ETL
Data Architecture
Optimization
Interview Guide
Career Growth
AI in Data Engineering
About
Contribute
Follow
Following
PySpark Interview Questions for Data Engineers || Part I
PySpark Interview Questions for Data Engineers || Part I
Most Frequently asked PySpark interview questions in data engineering interviews.
Vishal Barvaliya
Feb 27
Trending Now
How Did LinkedIn Handle 7 Trillion Messages Daily With Apache Kafka?
How Did LinkedIn Handle 7 Trillion Messages Daily With Apache Kafka?
Was adding more machines enough?
Vu Trinh
Aug 14
How did Facebook design their Real-Time Processing ecosystem
How did Facebook design their Real-Time Processing ecosystem
Hundreds of GBs per Second
Vu Trinh
Aug 17
I spent 7 hours diving deep into Apache Iceberg
I spent 7 hours diving deep into Apache Iceberg
The more details on how everything works
Vu Trinh
Aug 31
No, Data Engineers Don’t NEED dbt.
No, Data Engineers Don’t NEED dbt.
But It Sure Does Solve a Lot of Problems
Leo Godin
Jul 19
Data Engineering Using Modern Data-Stack
Data Engineering Using Modern Data-Stack
Experience dbt, Fivetran, PostgreSQL, and Apache Airflow for ETL
Temidayo Omoniyi
Aug 19
How to build a real-time CDC pipeline with Estuary Flow
How to build a real-time CDC pipeline with Estuary Flow
An end-to-end project transferring data from Google Sheets to PostgreSQL
Ana Escobar
Aug 23
Latest stories
Troubleshooting Spark Jobs: Overcoming Errors and Performance Challenges
Troubleshooting Spark Jobs: Overcoming Errors and Performance Challenges
A comprehensive guide for data engineers to identify troubleshoot,resolve common Spark job errors,optimize performance and boost efficiency
Pritam Deb
Sep 11
Looking to Enhance Your Data Quality? This is for You
Looking to Enhance Your Data Quality? This is for You
Practical techniques to implement data verification and validation processes for your data
Rahul Madhani
Sep 11
Why Would Someone Execute Databricks API From Azure Data Factory?
Why Would Someone Execute Databricks API From Azure Data Factory?
Explained scenarios where leveraging the Databricks REST API from ADF is essential to perform specific tasks with implementation
Rahul Madhani
Sep 11
I spent 6 hours learning how Apache Spark plans the execution for us.
I spent 6 hours learning how Apache Spark plans the execution for us.
Catalyst, Adaptive Query Execution, and how Airbnb leverages Spark 3.
Vu Trinh
Sep 11
How to Decide if Databricks Is the Right Tool for You
How to Decide if Databricks Is the Right Tool for You
Essential Questions You Need to Answer Before Adopting Databricks
Eduard Popa
Sep 6
Software Engineering Principles That Also Apply to Data Engineering
Software Engineering Principles That Also Apply to Data Engineering
Applying Software Design Principles like KISS, DRY and SOLID to Modern Data Architecture
Santosh Joshi
Sep 2
Advanced Data Engineering Interview Questions-Part 3
Advanced Data Engineering Interview Questions-Part 3
This interview guide is part of a series:
Arpita Mishra
Aug 31
Demoing DuckDB on Jupyter and Docker
Demoing DuckDB on Jupyter and Docker
How to share demos and proofs-of-concept with Jupyter on Docker
Chad Isenberg
Aug 28
Quick Tips That Reduced Our Lake Size by 100 TB
Quick Tips That Reduced Our Lake Size by 100 TB
Effective Approaches for Streamlining Lake Storage and Managing Azure Costs
Santosh Joshi
Aug 26
Making Data Pipeline Production Ready — Taking dbt Model to Production with Astronomer
Making Data Pipeline Production Ready — Taking dbt Model to Production with Astronomer
Using Astro CLI to orchestrate dbt models
Temidayo Omoniyi
Aug 25
Enhancing Data Lakehouse Security: Cryptography as a Service for Personal Data Protection
Enhancing Data Lakehouse Security: Cryptography as a Service for Personal Data Protection
Learn about data encryption and decryption using cryptography
Caesario Kisty
Aug 25
Introduction to Databricks
Introduction to Databricks
A Beginner’s Guide to Databricks
Pavan Kumar
Aug 24
I spent 8 hours learning Parquet. Here’s what I discovered
I spent 8 hours learning Parquet. Here’s what I discovered
I finally sat down and learned about it.
Vu Trinh
Aug 24
Locking Mechanisms in High-Load Systems
Locking Mechanisms in High-Load Systems
In the world of concurrent systems, especially when it comes to highly loaded distributed environments, finding a balance between data…
Kirill Bobrov
Aug 23
Unlock SQL Window Functions: 10 Minutes to Pro Level!
Unlock SQL Window Functions: 10 Minutes to Pro Level!
Exploring Concepts and Use Cases of SQL Window Functions for Data Professionals
Santosh Joshi
Aug 22
How to set up an AWS lambda function | Lambdas for Data Engineers
How to set up an AWS lambda function | Lambdas for Data Engineers
AWS lambda offers a lot of power for Data Engineers. In this article, we explore the basics and how Data Teams should leverage serverless
Hugo Lu
Aug 22
Essential Linux Commands Every Data Engineer Should Know: dd, scp, setacl, and mailx
Essential Linux Commands Every Data Engineer Should Know: dd, scp, setacl, and mailx
Master These 4 Powerful Linux Commands to Boost Your Data Engineering Skills and Efficiency
Naveenkumar Murugan
Aug 21
How did Discord evolve to handle trillions of data points
How did Discord evolve to handle trillions of data points
From in-house solutions to the modern data stack
Vu Trinh
Aug 20
Can We Use Databricks CLI Without Installing It? You Will Be Amazed Like Me
Can We Use Databricks CLI Without Installing It? You Will Be Amazed Like Me
I was amazed to discover a way to use the Databricks CLI without needing to install any executable files on my machine
Rahul Madhani
Aug 16
Creating Business Value with Databricks: The Role of Solution Architects
Creating Business Value with Databricks: The Role of Solution Architects
Bridging the gap between stakeholders and data teams to bring valuable data solutions into production
Eduard Popa
Aug 15
Timeless Skills for Navigating the Evolving World of Data Engineering
Timeless Skills for Navigating the Evolving World of Data Engineering
What technologies and programming languages should you learn to become a data engineer?
Ben Rogojan
Aug 12
Perhaps the ultimate Orchestration Tool was in front of us all along
Perhaps the ultimate Orchestration Tool was in front of us all along
Hopefully you’ve been using this all along
Hugo Lu
Aug 11
I spent 4 hours learning Apache Iceberg. Here’s what I found.
I spent 4 hours learning Apache Iceberg. Here’s what I found.
The table format’s overview and architecture
Vu Trinh
Aug 10
Understanding Flight Cancellations and Rescheduling in Airlines Using Databricks and PySpark
Understanding Flight Cancellations and Rescheduling in Airlines Using Databricks and PySpark
Using Databricks and PySpark for Enhanced Flight Operations in the Airline Industry.
Brahma, The Data Engineer.
Aug 9
Big-O Essentials for Data Engineers in 5 Minutes
Big-O Essentials for Data Engineers in 5 Minutes
Essential Concepts to Enhance Your Coding Efficiency
Santosh Joshi
Aug 8
About Data Engineer Things
Latest Stories
Archive
About Medium
Terms
Privacy
Teams