[go: up one dir, main page]

0% found this document useful (0 votes)
19 views7 pages

Notes On SQL For AI - ML

The document provides an overview of SQL fundamentals essential for AI/ML, covering data handling, retrieval, cleaning, transformation, and preparation for machine learning models. It discusses various SQL commands, data aggregation, joins, and advanced analytical queries, emphasizing the importance of clean data and feature engineering. Additionally, it highlights the integration of SQL with ML tools and real-world applications, such as customer churn analysis.

Uploaded by

bezatibe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views7 pages

Notes On SQL For AI - ML

The document provides an overview of SQL fundamentals essential for AI/ML, covering data handling, retrieval, cleaning, transformation, and preparation for machine learning models. It discusses various SQL commands, data aggregation, joins, and advanced analytical queries, emphasizing the importance of clean data and feature engineering. Additionally, it highlights the integration of SQL with ML tools and real-world applications, such as customer churn analysis.

Uploaded by

bezatibe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Notes on SQL for AI/ML

Part I – SQL Fundamentals for Data Handling


1. What is SQL?

• SQL (Structured Query Language) is used to store, retrieve, and manipulate data in
relational databases.
• Databases are essential in ML/AI for data storage, labeling, and preprocessing before
modeling.

Popular SQL engines: MySQL, PostgreSQL, SQLite, MSSQL, Google BigQuery.

2. Basic SQL Commands

Type Purpose Examples


Define or modify database
DDL (Data Definition Language) CREATE, ALTER, DROP
structure
DML (Data Manipulation INSERT, UPDATE,
Add or modify data
Language) DELETE
DQL (Query Language) Retrieve data SELECT
DCL (Control Language) Control permissions GRANT, REVOKE

3. Data Retrieval
SELECT column1, column2
FROM table_name
WHERE condition
ORDER BY column1 DESC
LIMIT 10;

Example:

SELECT age, salary


FROM employees
WHERE salary > 50000
ORDER BY age ASC;

4. Filtering and Conditions

Operator Meaning
= Equal
<> or != Not equal
Notes on SQL for AI/ML

Operator Meaning
> < >= <= Comparison
AND, OR, NOT Logical
BETWEEN, IN, LIKE Range & pattern filters

Example:

SELECT * FROM sales


WHERE region IN ('Asia', 'Europe') AND revenue BETWEEN 10000 AND 50000;

5. Aggregations

Useful for summarizing or grouping data before feeding into ML models.

SELECT department, AVG(salary), MAX(salary)


FROM employees
GROUP BY department
HAVING AVG(salary) > 40000;

Common Functions: COUNT(), SUM(), AVG(), MAX(), MIN()

6. Joins – Combining Data Sources

Data for ML often comes from multiple tables.

Join Type Description


INNER JOIN Matching rows in both tables
LEFT JOIN All rows from left + matching from right
RIGHT JOIN All from right + matching from left
FULL JOIN All records from both sides

Example:

SELECT e.name, d.department_name


FROM employees e
JOIN departments d ON e.dept_id = d.id;

Part II – SQL for Data Analysis & Feature Engineering


7. Data Cleaning with SQL
Notes on SQL for AI/ML

AI models need clean, consistent data.


Common cleaning operations:

• Handling missing data

SELECT * FROM data WHERE value IS NULL;


UPDATE data SET value = 0 WHERE value IS NULL;

• Removing duplicates

DELETE FROM table


WHERE id NOT IN (
SELECT MIN(id) FROM table GROUP BY column1, column2
);

• Fixing inconsistent text

UPDATE users
SET city = INITCAP(city); -- Standardize case

8. Data Transformation for Features

SQL can create derived variables (features) for ML models.

• Creating new columns

ALTER TABLE customers ADD COLUMN income_level TEXT;


UPDATE customers
SET income_level = CASE
WHEN income > 80000 THEN 'High'
WHEN income BETWEEN 40000 AND 80000 THEN 'Medium'
ELSE 'Low'
END;

• Scaling / normalization (conceptually)


You can compute min/max or z-score values:

SELECT (salary - AVG(salary)) / STDDEV(salary) AS z_score


FROM employees;

• Binning / Categorization

SELECT
CASE
WHEN age < 20 THEN 'Teen'
WHEN age BETWEEN 20 AND 35 THEN 'Adult'
ELSE 'Senior'
END AS age_group,
COUNT(*) AS count
FROM users
Notes on SQL for AI/ML

GROUP BY age_group;

9. Statistical Analysis in SQL

SQL supports basic descriptive statistics directly — useful before exporting to Python/R.

SELECT
AVG(age) AS mean_age,
STDDEV(age) AS std_age,
MIN(age),
MAX(age)
FROM users;

Correlations (approximation):
While not native in SQL, some engines (e.g., PostgreSQL) allow:

SELECT CORR(income, spending_score) FROM customers;

10. Time-Series and Sequential Data

Used in AI for forecasting, behavior tracking, or IoT.

SELECT
DATE(timestamp) AS day,
AVG(sensor_value) AS avg_value
FROM sensors
GROUP BY day
ORDER BY day;

You can also calculate moving averages:

SELECT
timestamp,
AVG(sensor_value)
OVER (ORDER BY timestamp ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS
moving_avg
FROM sensors;

11. CTEs (Common Table Expressions)

They simplify complex analysis.

WITH avg_sales AS (
SELECT product_id, AVG(revenue) AS avg_rev
FROM sales
GROUP BY product_id
)
SELECT p.name, a.avg_rev
Notes on SQL for AI/ML

FROM products p
JOIN avg_sales a ON p.id = a.product_id;

This is handy when building multi-step data pipelines for ML.

Part III – SQL in ML Pipelines & Advanced Analytics


12. Preparing Data for ML

SQL can output data directly to CSV or ML tools.

Example pipeline:

1. Collect: from raw tables (sales, users, transactions)


2. Clean: remove duplicates/nulls
3. Engineer: new features (avg_purchase, days_active)
4. Export: to ML tool

SELECT
user_id,
COUNT(order_id) AS total_orders,
AVG(amount) AS avg_order_value,
MAX(order_date) AS last_order_date,
EXTRACT(DAY FROM CURRENT_DATE - MAX(order_date)) AS days_since_last_order
FROM orders
GROUP BY user_id;

13. Integrating SQL with ML Tools

SQL often acts as the data source for:

• Python (pandas, scikit-learn) → pd.read_sql()


• R → dbGetQuery()
• Power BI / Tableau / BigQuery ML

BigQuery ML example:

CREATE MODEL my_model


OPTIONS(model_type='linear_reg') AS
SELECT
age, income, spending_score
FROM customers;

14. Advanced Analytical Queries


Notes on SQL for AI/ML

• Ranking and Window Functions

SELECT
user_id,
SUM(amount) AS total_spent,
RANK() OVER (ORDER BY SUM(amount) DESC) AS rank
FROM orders
GROUP BY user_id;

• Anomaly Detection Prep

SELECT *,
CASE
WHEN amount > (AVG(amount) + 3 * STDDEV(amount)) THEN 'Outlier'
ELSE 'Normal'
END AS anomaly_flag
FROM transactions;

15. SQL for AI Use Cases

Use Case SQL Role


Data preprocessing Clean, join, filter, and aggregate raw data
Feature engineering Create categorical/continuous features
Model training Export structured data to ML framework
Monitoring AI models Track predictions and metrics in SQL tables
Explainability Compare predicted vs actual outcomes

16. Real-World Example: Customer Churn Dataset


SELECT
customer_id,
AVG(call_duration) AS avg_call_time,
COUNT(complaints) AS complaint_count,
CASE WHEN SUM(payment_late) > 0 THEN 1 ELSE 0 END AS payment_flag,
CASE WHEN churned = 'Yes' THEN 1 ELSE 0 END AS churn_label
FROM telecom_data
GROUP BY customer_id;

Then export this dataset to Python for model training (classification task).

17. Key Takeaways

• SQL is the foundation for all ML/AI data preparation.


• Use SQL for:
Notes on SQL for AI/ML

o Cleaning, joining, and aggregating data


o Feature creation and basic statistics
o Time-series and anomaly tracking
• Combine with Python/R for actual model training

You might also like