[go: up one dir, main page]

0% found this document useful (0 votes)
24 views17 pages

PostgreSQL Masterclass Tutor Guide

The PostgreSQL Masterclass Tutor Guide provides a structured 28-day plan for learning database management, data analysis, and data science using PostgreSQL and pgAdmin. Each day includes key concepts, worked examples, and exercises to reinforce learning, covering topics from basic SQL queries to advanced features like indexing, transactions, and ETL processes. Prerequisites include PostgreSQL 13+ and pgAdmin, with a focus on practical application through hands-on exercises and real-world datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views17 pages

PostgreSQL Masterclass Tutor Guide

The PostgreSQL Masterclass Tutor Guide provides a structured 28-day plan for learning database management, data analysis, and data science using PostgreSQL and pgAdmin. Each day includes key concepts, worked examples, and exercises to reinforce learning, covering topics from basic SQL queries to advanced features like indexing, transactions, and ETL processes. Prerequisites include PostgreSQL 13+ and pgAdmin, with a focus on practical application through hands-on exercises and real-world datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

PostgreSQL Masterclass — Tutor Guide &

Daily Practice Plan


Instructor: Your AI SQL Tutor

Target: Database Management • Data Analysis • Data Science (PostgreSQL + pgAdmin)

How to Use This Guide


 Run the provided SQL setup script first: sql_practice_setup.sql (creates schema and
synthetic data).
 Use pgAdmin’s Query Tool or psql to execute the queries.
 Follow the 28-day plan (approx. 60–90 minutes per day).
 Each day includes concepts, worked examples, and exercises.
 An exercises scaffold is provided in daily_exercises.sql; solutions appear later in this guide.

Prerequisites & Environment


 Installed: PostgreSQL 13+ and pgAdmin.
 Optional: psql CLI for faster iteration and EXPLAIN ANALYZE.
 Set search path: SET search_path TO academy;

SET search_path TO academy;

Dataset Overview (created by sql_practice_setup.sql)


 customers • products • orders • order_items • payments (commerce domain)
 clinics • patients • visits (healthcare domain)
 rides • events (mobility & behavioral analytics)
 Indexes on common query paths (date, foreign keys, category).

28-Day Master Plan

Day 1 — SELECT, WHERE, ORDER BY, LIMIT


Key Concepts:

• Concepts: result sets, projection, filtering, sorting, pagination.

• Tip: Always start with SELECT ... FROM ...; expand iteratively.

Worked Examples:
-- Newest 10 customers
SELECT customer_id, full_name, city, signup_date
FROM customers
ORDER BY signup_date DESC
LIMIT 10;

-- Filter & sort


SELECT product_id, name, category, price
FROM products
WHERE category = 'Electronics' AND price < 100
ORDER BY price ASC;

Exercises:

□ List newest 15 customers from Lagos or Oyo by signup_date.

□ Find the 20 cheapest products in 'Health' or 'Grocery'.

Day 2 — Predicates & Pattern Matching


Key Concepts:

• Concepts: AND/OR/NOT, IN, BETWEEN, LIKE/ILIKE, NULL handling (IS NULL).

• Tip: Prefer ILIKE for case-insensitive search.

Worked Examples:

-- Case-insensitive search
SELECT product_id, name
FROM products
WHERE name ILIKE '%product 1%';

-- Range filter
SELECT *
FROM orders
WHERE order_date::date BETWEEN CURRENT_DATE - INTERVAL '30 days' AND
CURRENT_DATE;

Exercises:

□ Customers with missing email (NULL) — show id + name.

□ Orders in last 7 days with status NOT IN ('cancelled','refunded').

Day 3 — Aggregations & GROUP BY


Key Concepts:

• Concepts: COUNT/SUM/AVG/MIN/MAX, GROUP BY, HAVING.


• Tip: HAVING filters groups after aggregation.

Worked Examples:

-- Orders per status


SELECT status, COUNT(*) AS cnt
FROM orders
GROUP BY status
ORDER BY cnt DESC;

-- Average product price by category (only active products)


SELECT category, AVG(price)::numeric(10,2) AS avg_price
FROM products
WHERE active = TRUE
GROUP BY category
ORDER BY 1;

Exercises:

□ Total revenue (sum of total_amount) per state (from customers->orders).

□ Top 5 categories by total quantity sold (order_items).

Day 4 — Joins (INNER/LEFT/RIGHT/FULL)


Key Concepts:

• Concepts: keys, FK relationships, join types, join selectivity.

• Tip: Start with INNER JOIN; use LEFT JOIN to include non-matching rows.

Worked Examples:

-- Order lines with customer & product


SELECT o.order_id, o.order_date, c.full_name, p.name AS product,
oi.quantity, (oi.quantity*oi.unit_price) AS line_total
FROM orders o
JOIN customers c ON c.customer_id = o.customer_id
JOIN order_items oi ON oi.order_id = o.order_id
JOIN products p ON p.product_id = oi.product_id
ORDER BY o.order_date DESC
LIMIT 20;

-- Orders without payments (LEFT JOIN + NULL filter)


SELECT o.order_id, o.status, o.total_amount
FROM orders o
LEFT JOIN payments p ON p.order_id = o.order_id
WHERE p.order_id IS NULL;

Exercises:
□ For each state, total orders and total paid orders.

□ Patients with no clinic visits in the last 90 days.

Day 5 — Subqueries & CTEs


Key Concepts:

• Concepts: scalar subquery, EXISTS/IN, common table expressions (WITH).

• Tip: CTEs improve readability; Postgres may inline for perf.

Worked Examples:

-- Customers above 100k total spend


WITH spend AS (
SELECT o.customer_id, SUM(o.total_amount) AS total_spend
FROM orders o
GROUP BY o.customer_id
)
SELECT c.customer_id, c.full_name, s.total_spend
FROM spend s JOIN customers c ON c.customer_id = s.customer_id
WHERE s.total_spend > 100000
ORDER BY s.total_spend DESC;

-- Latest order per customer


WITH ranked AS (
SELECT o.*, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY
order_date DESC) AS rn
FROM orders o
)
SELECT customer_id, order_id, order_date
FROM ranked
WHERE rn = 1;

Exercises:

□ Find products never ordered.

□ Find top 3 customers by total rides distance.

Day 6 — Window Functions I


Key Concepts:

• Concepts: OVER(), PARTITION BY, ORDER BY; row_number, rank, dense_rank.

• Tip: Use window functions for analytics without collapsing rows.

Worked Examples:
-- Rank customers by spend within each state
WITH spend AS (
SELECT c.state, o.customer_id, SUM(o.total_amount) AS total_spend
FROM orders o JOIN customers c ON c.customer_id = o.customer_id
GROUP BY c.state, o.customer_id
)
SELECT state, customer_id, total_spend,
RANK() OVER (PARTITION BY state ORDER BY total_spend DESC) AS
rnk
FROM spend
ORDER BY state, rnk;

-- 7-day moving sum of orders (by order date::date)


WITH daily AS (
SELECT order_date::date AS d, SUM(total_amount) AS revenue
FROM orders
GROUP BY order_date::date
)
SELECT d, revenue,
SUM(revenue) OVER (ORDER BY d ROWS BETWEEN 6 PRECEDING AND
CURRENT ROW) AS rev_7d
FROM daily
ORDER BY d;

Exercises:

□ Compute rolling 30-day rides revenue per city.

□ Compute cumulative customer count over time (by signup_date).

Day 7 — Date/Time Toolbox


Key Concepts:

• Concepts: date_trunc, intervals, age, extract, time zones.

• Tip: Store timestamps in UTC; display in local time when needed.

Worked Examples:

-- Monthly orders & revenue


SELECT date_trunc('month', order_date) AS month,
COUNT(*) AS orders, SUM(total_amount) AS revenue
FROM orders
GROUP BY 1
ORDER BY 1;

-- Clinic billing by quarter


SELECT date_trunc('quarter', visit_date) AS qtr,
clinic_id, SUM(amount_billed) AS billed
FROM visits
GROUP BY 1,2
ORDER BY 1,2;

Exercises:

□ Average order value by day-of-week.

□ Visits trend (last 6 months) per clinic.

Day 8 — Set Ops (UNION/INTERSECT/EXCEPT)


Key Concepts:

• Concepts: dedup with UNION, overlap via INTERSECT, anti-join via EXCEPT.

Worked Examples:

-- Customers who placed orders and raised support tickets


WITH orderers AS (SELECT DISTINCT customer_id FROM orders),
tickets AS (
SELECT DISTINCT customer_id FROM events WHERE
event_type='support_ticket'
)
SELECT * FROM orderers INTERSECT SELECT * FROM tickets;

-- Customers who took rides but never ordered


WITH riders AS (SELECT DISTINCT customer_id FROM rides),
orderers AS (SELECT DISTINCT customer_id FROM orders)
SELECT * FROM riders EXCEPT SELECT * FROM orderers;

Exercises:

□ Customers who purchased and later refunded (ids).

□ Product categories that were ordered vs never ordered.

Day 9 — Nulls, Coalesce, Conditional Logic


Key Concepts:

• Concepts: NULL behavior, COALESCE, CASE WHEN ... THEN.

Worked Examples:

-- Default to 0 when missing


SELECT p.product_id, p.name,
COALESCE(SUM(oi.quantity),0) AS qty_sold
FROM products p
LEFT JOIN order_items oi ON oi.product_id = p.product_id
GROUP BY p.product_id, p.name
ORDER BY qty_sold DESC;

-- Categorize orders by value


SELECT order_id, total_amount,
CASE
WHEN total_amount >= 50000 THEN 'high'
WHEN total_amount >= 20000 THEN 'medium'
ELSE 'low'
END AS value_band
FROM orders;

Exercises:

□ Compute NPS-like bands for product price (low/med/high).

□ Show city with 0 rides as 0, not NULL, in daily series (hint: generate_series).

Day 10 — Advanced Aggregations


Key Concepts:

• Concepts: DISTINCT aggregates, FILTER clause, GROUPING SETS/ROLLUP/CUBE.

• Tip: FILTER lets you aggregate subsets in one pass.

Worked Examples:

-- Multi-metric in one query


SELECT
COUNT(*) AS orders,
SUM(total_amount) AS revenue,
SUM(total_amount) FILTER (WHERE status='refunded') AS refunded
FROM orders;

-- Rollup by state then city


SELECT c.state, c.city, COUNT(*) AS customers
FROM customers c
GROUP BY ROLLUP (c.state, c.city)
ORDER BY c.state NULLS LAST, c.city NULLS LAST;

Exercises:

□ Compute category revenue plus revenue for ALL categories using ROLLUP.

□ Use CUBE on state x status for order counts.

Day 11 — Window Functions II (frames, percentiles)


Key Concepts:
• Concepts: frame clauses, lag/lead, percentiles, ntile.

Worked Examples:

-- Median order total per state


SELECT state,
percentile_cont(0.5) WITHIN GROUP (ORDER BY total_amount) AS
median_order
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
GROUP BY state
ORDER BY state;

-- Month-over-month change
WITH m AS (
SELECT date_trunc('month', order_date) AS m, SUM(total_amount) AS
revenue
FROM orders GROUP BY 1
)
SELECT m,
revenue,
LAG(revenue) OVER (ORDER BY m) AS prev_rev,
(revenue - LAG(revenue) OVER (ORDER BY m)) AS delta
FROM m;

Exercises:

□ Top 3 products by monthly revenue (use dense_rank partitioned by month).

□ Compute 90th percentile ride fare by city.

Day 12 — Text Search & Trigrams (optional)


Key Concepts:

• Concepts: LIKE vs trigram (pg_trgm), basic full-text search approach.

Worked Examples:

-- Trigram similarity (requires pg_trgm)


SELECT name, similarity(name, 'produckt 12') AS sim
FROM products
WHERE name % 'produckt 12'
ORDER BY sim DESC
LIMIT 10;

Exercises:

□ Find near-duplicates among product names using % operator.


Day 13 — JSONB (semi-structured data)
Key Concepts:

• Concepts: jsonb structure, ->, ->>, ?, @>, indexing with GIN.

Worked Examples:

-- Example: temporary jsonb data


WITH tmp AS (
SELECT jsonb_build_object(
'customer_id', 1,
'prefs', jsonb_build_object('theme','dark','alerts',true)
) AS doc
)
SELECT doc->'prefs'->>'theme' AS theme FROM tmp;

Exercises:

□ Add a jsonb column (e.g., products.meta JSONB) and store extra attributes; query keys and
values.

Day 14 — Data Modeling & Normalization


Key Concepts:

• Concepts: 1NF/2NF/3NF/BCNF, surrogate vs natural keys, denormalization trade-offs.

Worked Examples:

-- Add NOT NULL/UNIQUE constraints


ALTER TABLE customers
ALTER COLUMN full_name SET NOT NULL,
ADD CONSTRAINT unique_email UNIQUE(email);

Exercises:

□ Sketch a normalized schema for a hospital pharmacy (entities: drugs, prescriptions,


prescribers, patients).

Day 15 — DDL Deep Dive & Constraints


Key Concepts:

• Concepts: CHECK, DEFAULT, FK options, ON DELETE/UPDATE, generated columns.

Worked Examples:

-- Add a CHECK constraint for positive quantities


ALTER TABLE order_items
ADD CONSTRAINT chk_quantity_positive CHECK (quantity > 0);
Exercises:

□ Add FK to ensure payments.order_id references orders, delete cascading behavior (already


applied).

□ Add a generated column (e.g., line_total) to order_items (Postgres 12+ use expression or
view).

Day 16 — Transactions & Isolation


Key Concepts:

• Concepts: ACID, BEGIN/COMMIT/ROLLBACK, isolation levels, deadlocks.

Worked Examples:

BEGIN;
UPDATE products SET price = price * 0.9 WHERE category='Health';
-- Check results then COMMIT or ROLLBACK
COMMIT;

-- Read committed vs repeatable read (open two sessions and observe).

Exercises:

□ Simulate a transfer mistake and use ROLLBACK to restore state.

Day 17 — Indexing Strategies


Key Concepts:

• Concepts: B-tree, hash, GIN/GiST; covering indexes; multi-column order.

• Tip: Index predicates you filter/join on frequently; beware over-indexing.

Worked Examples:

-- Create index for common filter


CREATE INDEX idx_orders_status_date ON orders(status, order_date);

-- Inspect plan
EXPLAIN ANALYZE
SELECT * FROM orders WHERE status='paid' AND order_date > NOW() -
INTERVAL '30 days';

Exercises:

□ Create an index that speeds up: top products by category in last 90 days.

□ Verify with EXPLAIN ANALYZE before/after.


Day 18 — Query Tuning
Key Concepts:

• Concepts: EXPLAIN (ANALYZE, BUFFERS), work_mem, join order, CTE inlining.

Worked Examples:

EXPLAIN (ANALYZE, BUFFERS)


SELECT c.state, SUM(o.total_amount)
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
WHERE o.order_date > NOW() - INTERVAL '180 days'
GROUP BY c.state;

Exercises:

□ Compare performance of correlated subquery vs JOIN for: total spend per customer.

Day 19 — Views & Materialized Views


Key Concepts:

• Concepts: logical layering, refresh cadence, dependency management.

Worked Examples:

CREATE MATERIALIZED VIEW mv_customer_spend AS


SELECT o.customer_id, SUM(o.total_amount) AS total_spend
FROM orders o
GROUP BY o.customer_id;

-- Refresh when needed


REFRESH MATERIALIZED VIEW mv_customer_spend;

Exercises:

□ Create a view for monthly category revenue; query top categories.

Day 20 — Partitioning (Time/Range)


Key Concepts:

• Concepts: declarative partitioning, pruning, local indexes.

Worked Examples:

-- Example skeleton (events by month)


-- CREATE TABLE events_p (... ) PARTITION BY RANGE (event_time);
-- CREATE TABLE events_2025_01 PARTITION OF events_p FOR VALUES FROM
('2025-01-01') TO ('2025-02-01');
Exercises:

□ Design partitions for the events table by month for the last 6 months.

Day 21 — Security & Roles


Key Concepts:

• Concepts: roles, grants, row-level security (RLS).

Worked Examples:

-- Read-only role for analysts


CREATE ROLE analyst LOGIN PASSWORD 'changeme';
GRANT USAGE ON SCHEMA academy TO analyst;
GRANT SELECT ON ALL TABLES IN SCHEMA academy TO analyst;
ALTER DEFAULT PRIVILEGES IN SCHEMA academy GRANT SELECT ON TABLES TO
analyst;

Exercises:

□ Create a limited role that can only SELECT from views, not base tables.

Day 22 — Backup & Restore


Key Concepts:

• Concepts: pg_dump, pg_restore, plain vs custom format, point-in-time basics.

Worked Examples:

-- Backup (shell)
-- pg_dump -U postgres -d yourdb -Fc -f backup.dump
-- Restore
-- pg_restore -U postgres -d yourdb backup.dump

Exercises:

□ Practice exporting only the academy schema and restoring to a new database.

Day 23 — ETL with SQL


Key Concepts:

• Concepts: staging tables, upserts, deduping, slowly changing dimensions (SCD).

Worked Examples:

-- Upsert (on conflict)


INSERT INTO products(name, category, price, active)
VALUES ('Product 5','Electronics', 99.99, TRUE)
ON CONFLICT (name) DO UPDATE SET price = EXCLUDED.price;
Exercises:

□ Load a CSV into a staging table (COPY ... FROM) and merge into products.

Day 24 — Reporting Patterns


Key Concepts:

• Concepts: cohort retention, funnels, ABC segmentation, pivoting (crosstab).

Worked Examples:

-- crosstab example (requires tablefunc)


-- SELECT * FROM crosstab(
-- $$ SELECT date_trunc('month', order_date), category,
SUM(total_amount)
-- FROM orders o JOIN order_items oi USING(order_id) JOIN products
p USING(product_id)
-- GROUP BY 1,2 ORDER BY 1,2 $$
-- ) AS ct(month timestamp, electronics numeric, fashion numeric,
grocery numeric);

Exercises:

□ Build a cohort table of signup month vs returning buyers at month+1, +2, +3.

Day 25 — Time-Series Extras


Key Concepts:

• Concepts: generate_series for calendar tables, gaps & islands, seasonality.

Worked Examples:

-- Calendar spine join


WITH cal AS (
SELECT d::date AS d
FROM generate_series(CURRENT_DATE - INTERVAL '90 days', CURRENT_DATE,
INTERVAL '1 day') AS g(d)
)
SELECT cal.d, COALESCE(SUM(total_amount),0) AS revenue
FROM cal
LEFT JOIN orders o ON o.order_date::date = cal.d
GROUP BY cal.d
ORDER BY cal.d;

Exercises:

□ Detect gaps in daily orders; fill with zeros for a chart-ready series.
Day 26 — Data Science Bridge
Key Concepts:

• Concepts: creating feature tables with SQL; percentiles, bucketing, joins.

Worked Examples:

-- Customer features (example)


WITH spend AS (
SELECT customer_id, SUM(total_amount) AS total_spend, COUNT(*) AS
orders_count,
MAX(order_date) AS last_order
FROM orders
GROUP BY customer_id
)
SELECT c.customer_id, c.state, c.signup_date, s.total_spend,
s.orders_count,
EXTRACT(DAY FROM (NOW() - s.last_order)) AS recency_days
FROM customers c LEFT JOIN spend s USING(customer_id);

Exercises:

□ Create a product features table: price band, total sold, last sold date, refund rate.

Day 27 — Capstone Project (Pick One)


Key Concepts:

• Project A (Commerce): Build a full revenue dashboard: daily revenue, AOV, top categories,
repeat rate.

• Project B (Healthcare): Analyze clinic performance: visits, billing, common diagnoses, patient
retention.

• Project C (Mobility): City ride KPIs: rides/day, avg fare, surge periods, heavy user segments.

Worked Examples:

Deliverables:
1) A documented .sql file with all queries and views.
2) A short write-up (1–2 pages) explaining your metrics and insights.
3) (Optional) A materialized view layer to speed up dashboards.

Exercises:

□ Tip: Use views/materialized views for clean layers. Consider indexes on filter/join keys.

Day 28 — Review & Next Steps


Key Concepts:
• Revisit weak areas; practice with EXPLAIN plans and window functions.

• Next: PostGIS for geospatial, TimescaleDB for time-series, dbt for analytics engineering.

Worked Examples:

Checklist:
- Comfortable with joins, GROUP BY, window functions
- Can design schemas with constraints
- Can index and tune common queries
- Can build views/materialized views for analytics

Exercises:

□ Optional: Try migrating your capstone to a different schema optimized for BI.

Selected Exercise Solutions

Day 1 — Newest Lagos/Oyo customers


SELECT customer_id, full_name, state, city, signup_date
FROM customers
WHERE state IN ('Lagos','Oyo')
ORDER BY signup_date DESC
LIMIT 15;

Day 3 — Revenue per state


SELECT c.state, SUM(o.total_amount) AS revenue
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
GROUP BY c.state
ORDER BY revenue DESC;

Day 4 — Paid vs all orders per state


WITH stats AS (
SELECT c.state,
COUNT(*) AS orders,
SUM((o.status IN ('paid','shipped','refunded'))::int) AS
paidish
FROM customers c JOIN orders o ON o.customer_id = c.customer_id
GROUP BY c.state
)
SELECT state, orders, paidish, (paidish::numeric/orders)::numeric(5,2)
AS pay_rate
FROM stats
ORDER BY pay_rate DESC;
Day 6 — Rolling 30-day rides revenue per city
WITH d AS (
SELECT ride_date::date AS d, city, SUM(fare) AS revenue
FROM rides GROUP BY 1,2
)
SELECT d, city, revenue,
SUM(revenue) OVER (PARTITION BY city ORDER BY d
ROWS BETWEEN 29 PRECEDING AND CURRENT ROW) AS
rev_30d
FROM d
ORDER BY city, d;

Day 8 — Purchased and later refunded


WITH purchased AS (
SELECT DISTINCT customer_id FROM events WHERE event_type='purchase'
),
refunded AS (
SELECT DISTINCT customer_id FROM events WHERE event_type='refund'
)
SELECT * FROM purchased INTERSECT SELECT * FROM refunded;

Day 11 — 90th percentile ride fare by city


SELECT city,
percentile_cont(0.9) WITHIN GROUP (ORDER BY fare) AS p90_fare
FROM rides
GROUP BY city
ORDER BY city;

Day 18 — Correlated vs Join (join version)


SELECT o.customer_id, SUM(o.total_amount) AS total_spend
FROM orders o
GROUP BY o.customer_id; -- Compare with correlated subquery via EXPLAIN

Admin & Ops Cheat Sheet


 List active connections: SELECT * FROM pg_stat_activity;
 Cancel a backend PID: SELECT pg_cancel_backend(pid);
 Table sizes: SELECT relname, pg_size_pretty(pg_total_relation_size(relid)) FROM
pg_catalog.pg_statio_user_tables ORDER BY 2 DESC;
 Index usage: SELECT relname, idx_scan FROM pg_stat_user_indexes ORDER BY idx_scan;
 Vacuum/Analyze: VACUUM (ANALYZE, VERBOSE) table_name;

Next Steps & Learning Path


 Build a real mini-warehouse: staging → core models → mart views.
 Learn dbt for analytics engineering workflows.
 Explore PostGIS for mapping/geo analytics (Lagos traffic, clinic coverage).
 Practice with EXPLAIN ANALYZE weekly; maintain a personal query diary.

You might also like