0% found this document useful (0 votes)

90 views63 pages

Parallel Query Execution in DBMS

The document discusses parallel query execution in databases. It describes three process models for parallel execution: process per worker, process pool, and thread per worker. It also discusses inter-query and intra-query parallelism. Intra-query parallelism can be achieved through intra-operator parallelism, where operators are decomposed into fragments that operate on different data subsets, or through inter-operator parallelism by pipelining operators.

Uploaded by

akshay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

90 views63 pages

Parallel Query Execution in DBMS

Uploaded by

akshay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 63

Query Execution

13 Part II

Intro to Database Systems Andy Pavlo

15-445/15-645
Fall 2019 AP Computer Science
Carnegie Mellon University
2

ADMINISTRIVIA

Homework #3 is due Today @ 11:59pm

Mid-Term Exam is Wed Oct 16th @ 12:00pm

Project #2 is due Sun Oct 20th @ 11:59pm

CMU 15-445/645 (Fall 2019)

QUERY EXECUTION
SELECT R.id, S.cdate
We discussed last class how to FROM R JOIN S
compose operators together to ON R.id = S.id
execute a query plan. WHERE S.value > 100

We assumed that the queries execute

p R.id, S.value

with a single worker (e.g., thread).

⨝ R.id=S.id

We now need to talk about how to

execute with multiple workers…
s value>100

R S
CMU 15-445/645 (Fall 2019)
4

W H Y C A R E A B O U T PA R A L L E L E X E C U T I O N ?

Increased performance.
→ Throughput
→ Latency

Increased responsiveness and availability.

Potentially lower total cost of ownership (TCO).

CMU 15-445/645 (Fall 2019)

PA R A L L E L V S . D I S T R I B U T E D

Database is spread out across multiple resources

to improve different aspects of the DBMS.

Appears as a single database instance to the

application.
→ SQL query for a single-resource DBMS should generate
same result on a parallel or distributed DBMS.

CMU 15-445/645 (Fall 2019)

PA R A L L E L V S . D I S T R I B U T E D

Parallel DBMSs:
→ Resources are physically close to each other.
→ Resources communicate with high-speed interconnect.
→ Communication is assumed to cheap and reliable.

Distributed DBMSs:
→ Resources can be far from each other.
→ Resources communicate using slow(er) interconnect.
→ Communication cost and problems cannot be ignored.

CMU 15-445/645 (Fall 2019)

T O D AY ' S A G E N D A

Process Models
Execution Parallelism
I/O Parallelism

CMU 15-445/645 (Fall 2019)

PROCESS MODEL

A DBMS’s process model defines how the system

is architected to support concurrent requests from
a multi-user application.

A worker is the DBMS component that is

responsible for executing tasks on behalf of the
client and returning the results.

CMU 15-445/645 (Fall 2019)

PROCESS MODELS

Approach #1: Process per DBMS Worker

Approach #2: Process Pool

Approach #3: Thread per DBMS Worker

CMU 15-445/645 (Fall 2019)

PROCESS PER WORKER

Each worker is a separate OS process.

→ Relies on OS scheduler.
→ Use shared-memory for global data structures.
→ A process crash doesn’t take down entire system.
→ Examples: IBM DB2, Postgres, Oracle

Dispatcher Worker
CMU 15-445/645 (Fall 2019)
11

PROCESS POOL

A worker uses any process that is free in a pool

→ Still relies on OS scheduler and shared memory.
→ Bad for CPU cache locality.
→ Examples: IBM DB2, Postgres (2015)

Dispatcher Worker Pool

CMU 15-445/645 (Fall 2019)
12

THREAD PER WORKER

Single process with multiple worker threads.

→ DBMS manages its own scheduling.
→ May or may not use a dispatcher thread.
→ Thread crash (may) kill the entire system.
→ Examples: IBM DB2, MSSQL, MySQL, Oracle (2014)

Worker Threads
CMU 15-445/645 (Fall 2019)
13

PROCESS MODELS

Using a multi-threaded architecture has several

advantages:
→ Less overhead per context switch.
→ Do not have to manage shared memory.

The thread per worker model does not mean that

the DBMS supports intra-query parallelism.

Andy is not aware of any new DBMS from last 10

years that doesn’t use threads unless they are
Postgres forks.
CMU 15-445/645 (Fall 2019)
14

SCHEDULING

For each query plan, the DBMS decides where,

when, and how to execute it.
→ How many tasks should it use?
→ How many CPU cores should it use?
→ What CPU core should the tasks execute on?
→ Where should a task store its output?

The DBMS always knows more than the OS.

CMU 15-445/645 (Fall 2019)

I N T E R- V S . I N T R A - Q U E R Y PA R A L L E L I S M

Inter-Query: Different queries are executed

concurrently.
→ Increases throughput & reduces latency.

Intra-Query: Execute the operations of a single

query in parallel.
→ Decreases latency for long-running queries.

CMU 15-445/645 (Fall 2019)

I N T E R- Q U E R Y PA R A L L E L I S M

Improve overall performance by allowing multiple

queries to execute simultaneously.

If queries are read-only, then this requires little

coordination between queries.

If multiple queries are updating the database at the

same time, then this is hard to do correctly…

CMU 15-445/645 (Fall 2019)

I N T R A - Q U E R Y PA R A L L E L I S M

Improve the performance of a single query by

executing its operators in parallel.

Think of organization of operators in terms of a

producer/consumer paradigm.

There are parallel algorithms for every relational

operator.
→ Can either have multiple threads access centralized data
structures or use partitioning to divide work up.

CMU 15-445/645 (Fall 2019)

PA R A L L E L G R A C E H A S H J O I N

Use a separate worker to perform the join for each

level of buckets for R and S after partitioning.

R(id,name) HTR HTS

0 S(id,value,cdate)
1
h1 2 h1
⋮ ⋮
max

CMU 15-445/645 (Fall 2019)

PA R A L L E L G R A C E H A S H J O I N

Use a separate worker to perform the join for each

level of buckets for R and S after partitioning.

R(id,name) HTR HTS

1 0 S(id,value,cdate)
2 1
h1 3 2 h1
⋮ ⋮
n max

CMU 15-445/645 (Fall 2019)

I N T R A - Q U E R Y PA R A L L E L I S M