Module I
Introduction to Relational
Database
Introduction
Database is a collection of related data and data is a collection
of facts and figures that can be processed to produce
information.
Data represents recordable facts.
The processed data is called Information, which is based on
facts. For example, if we have data about marks obtained by all
students, we can then conclude about toppers and average
marks.
Data, Information and knowledge
Data is unprocessed facts and figures without any added interpretation or
analysis. Example : "The price of crude oil is $80 per barrel."
Information is data that has been interpreted so that it has meaning for the user.
"The price of crude oil has risen from $70 to $80 per barrel" gives meaning to
someone who tracks oil prices.
Knowledge is a combination of information, experience and insight that may
benefit the individual or the organization. "When crude oil prices go up by $10
per barrel, it's likely that petrol prices will rise by 2p per litre" is knowledge. So
the question is should we fill the tank today? (decision?)
Database Management System
A Database Management System (DBMS) stores data in
such a way that it becomes easier to retrieve, manipulate,
and produce information.
Example : MySQL, Oracle, Access, Microsoft SQL server,
MongoDB, NoSQL, PostgreSQL etc.,
Three types of Databases
• RDBMS (Relational Database Management System)
• NoSQL (Non Relational Database model)
• OLAP database
Who use database?
Analysts Data Science Specialist
Characteristics Database
Real-world entity − A DBMS should be more realistic and represent
real-world entities to design its architecture. For example, a school
database may use students as an entity and their age as an attribute.
Relation-based tables − DBMS allows entities and relations among them
to form tables. A user can understand the architecture of a database just by
looking at the table names.
Isolation of data and application − A database system is entirely different
than its data.
Database characteristics
Less redundancy − DBMS follows the rules of normalization, which
splits a relation when any of its attributes is having redundancy in values.
Normalization is a mathematically rich and scientific process that reduces
data redundancy.
Security − DBMS offers methods to impose constraints while
entering data into the database and retrieving the same at a later stage.
Architectural view- DBMS
Application (Middle)
User (Presentation)
Tier − For a user, this
Tier − At this layer,
application tier presents
multiple views of the
an abstracted view of the
database can be provided
database. End-users are
by the application.
unaware of any existence
of the database beyond
the application. At the
other end, the database
tier is not aware of any
other user beyond the
Database (Data) Tier − application tier.
The database resides along
with its query processing
languages.
Data models
Data models define how the logical structure of a database is
modeled.
Data models define how data is connected to each other and
how they are processed and stored inside the system.
Earlier data models were not so scientific, hence they were prone to
introduce lots of duplication and update anomalies.
What is SQL (pronounced ess-que-ell, or sequel).
What is SQL?
SQL stands for Structured Query Language
SQL lets you access and manipulate databases
SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of the
International Organization for Standardization (ISO) in 1987
What Can SQL do?
SQL can execute queries against a database
SQL can retrieve data from a database
SQL can insert records in a database
SQL can update records in a database
SQL can delete records from a database
SQL can create new databases
SQL can create new tables in a database
SQL can create stored procedures in a database
SQL can create views in a database
SQL can set permissions on tables, procedures, and views
Database – View
Relation
The data in RDBMS is stored in database objects called tables. A table is a collection of
related data entries and it consists of columns and rows.
Example: Relation/Table
Example – Relation/Table
Source: Google images
Characteristics of a Relation
Data is stored in tables called relations.
Relations can be normalized.
In normalized relations, values saved are atomic values.
Each row in a relation contains a unique value.
Each column in a relation contains values from a same domain.
Information system components
Fig. Example Manufacturing information system view
Example: Relation Between tables
Source: Google images
Product – Sales relations
Employee-Department relations
Enterprise databases
Example : College Registration Database
DWH star schema – Relational model
The figure on the left side
shows an example Sales
Analysis Star Schema in
Data Warehouse
MySQL - RDBMS
MySQL is the most popular Open Source Relational SQL
Database Management System.
MySQL is one of the best RDBMS being used for developing
various web-based software applications and standalone applications.
MySQL is open source software provided under the GPL License.
Most of the SQL database programs also have their own proprietary
extensions in addition to the SQL standard!
E-R: Peterchen and Bachman
notations
Entity Relationship (ER) Diagram is diagrammatic
representation of data in databases.
Entity - student Notation
Student-course relationship
A relationship type represents the association between entity types. For
example,‘Enrolled in’ is a relationship type that exists between entity type Student and
Course. In ER diagram, relationship type is represented by a diamond and connecting the
entities with lines.
1…1 indicate cardinality.
Student database view
1..1 cardinality using set notation
* A set represents a set of entities/relations
Cardinalities
When entities in one entity set can take part only once in the relationship set
and entities in other entity set can take part more than once in the relationship
set, cardinality is many to one.
The above fig represents that a student can take only One Course but one course can
be taken by Many students. So the cardinality will be n to 1. It means that for one
course there can be n students but for one student, there will be only one course.
Sample tables for E-R diagram
Write sample tables from the following:
Students: S01,S02,S03….
A student takes only one course.
Courses: C01,C02,C03…
A student takes more than one course
Student name : alphabets
Write the student no, name and course taken
Case: Internet Sales Model
Source: Google images
Case: Example
E-R diagrams captures business rules
Complete Participation
In the above diagram we have total participation at E1 end. Only the primary key of E1,
which is in total participation should be allowed as the primary key of the reduced table,
since if the primary key of E2 is used, it might have null values for many of its entries, since
its participation is only partial and may not have corresponding entries for all its values.
E-R diagram [order-products]
*Source: Google images E-R diagram
E-R diagram[Rental database]
*Source: Google images E-R diagram
Key terms
Database − A database is a collection of tables, with related data.
Table − A table is a matrix with data. A table in a database looks like a simple
spreadsheet.
Column − One column (data element) contains data of one and the same kind, for
example the column postcode.
Row − A row (= tuple, entry or record) is a group of related data, for example the data
of one subscription.
Redundancy − Storing data twice, redundantly to make the system faster.
Primary Key − A primary key is unique. A key value can not occur twice in one table.
With a key, you can only find one row.
Foreign Key − A foreign key is the linking pin between two tables.
Compound Key − A compound key (composite key) is a key that consists of multiple
columns, because one column is not sufficiently unique.
Index − An index in a database resembles an index at the back of a book.
Referential Integrity − Referential Integrity makes sure that a foreign key value
always points to an existing row.
References
Data warehousing, Fundamentals for IT Professions, Paulraj
Ponniah, Wiley publications. 2 nd edition, ISBN: 978-81-265-
3729-7(Print), 2015 (reprint).
Data Base –SQL, Complete Reference.
DWHDM
Module – II: Introduction to Data Warehouse
S.Srinivas
BUISNESS CHALLENGE ?
Converting Data into knowledge and making it
available throughout the organization.
2
INTELLIGENCE-FUNDAMENTAL PRINCIPLES
Data –Raw facts and numbers.
Information –
Assigning meaning to data
Simplified: data in context.
Knowledge
- is the subjective interpretation of Information and
approach to act upon in the mind of perceiver.
Intelligence
- Intelligence or wisdom embodies awareness, insight,
moral judgments, and principles to construct new
knowledge and improve upon existing ones.
3
WHAT BUSINESS DO?
Most companies collect a large amount of data from
their business operations. To keep track of that
information, a business need to use a wide range of
software programs , such as Excel, Access and
different database applications for various
departments throughout their organization. Using
multiple software programs makes it difficult to
retrieve information in a timely manner and to
perform analysis of the data.
4
WHAT BI DO?
Business Intelligence is a tool that places the right
information, into the right hands, at the right time.
5
WHAT BI DO….CONTINUE
BI is a technology-driven process for analyzing data and presenting
actionable information to help corporate executives, business
managers and other end users make more informed business
decisions.
BI encompasses a variety of tools, applications and methodologies
that enable organizations to collect data from internal systems and
external sources, prepare it for analysis, develop and run queries
against the data, and create reports, dashboards and data
visualizations to make the analytical results available to corporate
decision makers as well as operational workers.- TechTarget.com
6
BI ADVANTAGES
Improve Decision making
Optimizing internal business processes
Increase operational efficiency
Driving new revenues
Competitive advantages over business rivals.
BI systems can also help companies identify
market trends and spot business problems that
need to be addressed.
BI data can include historical information
Enable BI analysis to support both strategic and
tactical decision-making processes. 7
8
WHAT BI DO …
Traditional approach:
BI tools were primarily used by data analysts
and other IT professionals who ran analyses and
produce reports with query results for business
users.
o Current Trend:
Business executives and workers are using BI
software themselves. i.e. self-service BI
9
DWH IS ENVIRONMENT NOT
PRODUCT
The end result is the creation of a new computing
environment for the purpose of providing the strategic
information for every business needs
10
DATA WAREHOUSE DEFINITION
According to Bill Inmon, A data warehouse is a
subject oriented, integrated, nonvolatile, and time
variant collection of data in support of
management decisions.
According to Sean Kelly, Data warehouse is
separate, available, integrated, Time stamped,
Subject oriented, Non volatile and accessible.
11
SUBJECT ORIENTED
Data is not stored by operational applications, but
by business subjects.
Example operational applications: order processing,
customer billing, customer accounts, savings
accounts, consumer loans, Account receivables ,
claim processing etc.,
DWH Subjects: sales, products, customer, account,
policy etc.,
12
INTEGRATED
Savings
A/C
Subject=Account
Checking
A/C
Note: Integration require standardization:
codes, naming convention, attributes,
Measurements
Loan A/C 13
SUBJECT….
The subject oriented design enhances user ability
to quickly analyze large multidimensional data
sets.
14
TIME-VARIANT
Operational systems support day-to-day data.
DWH data is used analysis and decision making
Data is stored as snapshots over past and current
trends.
Data contains time element
Allows for analysis
Relates information to the present
Enable forecast
Example Query: what are the total sales of product A for the
15
last 3 years on New Year day across region “Y”
NONVOLATILE
Loads
OLTP DWH
Read, Add/Change, Delete Read
16
EXERCISES
1) For an airline company, how can strategic
information increases the number of frequent
flyers? Write specific details.
17
WHAT IS BUSINESS INTELLIGENCE?
Business Intelligence is a technology based on
customer and profit oriented models that reduces
operating costs and provide increased profitability by
improving productivity, sales, service and helps to
make decision making capabilities at no time.
18
BI DEFINITION
Converting data into knowledge and making it available
throughout the organization are the job of processes and
applications known as “Business Intelligence”.
BI is a term that encompasses a broad range of analytical
software and solutions for gathering, consolidating, analyzing
and providing access to information in a way that is supposed
to let an enterprise’s users make better business decisions.
19
BI SOFTWARES
The term BI includes software for ETL, data
warehousing, database query and reporting,
multidimensional/OLAP data analysis, data mining
and visualization.
20
BI-BUSINESS QUESTIONS
Business Intelligence: Finance
What is the net income, expenses, gross profit,
and net profit for this quarter, year?
Business Intelligence: Accounts
What is the sales amount this month and what is
the outstanding pending payment?
Business Intelligence: Purchase:
Who is the vendor to be contacted to purchase
products?
21
BI-BUSINESS QUESTIONS
Business Intelligence: Production
How many products are manufactured in each
production unit today, weekly, monthly?
Business Intelligence: Sales
How many products have been sold in each area today,
weekly, monthly?
Business Intelligence: Quality
How many products have been defective today, weekly,
monthly, quarterly, yearly?
Business Intelligence: Service
Are the customers satisfied with the quality?
22
EXAMPLE.
Data: The numbers 100 or 5.
Information: Principal amount : $100, Interest rate, 5%
Knowledge: At the end of Year I get $105 back
Intelligence: Concept of growth.
23
KPI’S
Business intelligence often uses KPI”s(key
Performance Indicators) to assess the present state of
business and to prescribe a course of action.
24
KPI’S
Key Performance Indicators (KPI) are financial and
non-financial metrics used to help an organization
define and measure progress toward organizational
goals.
Example Marketing KPIS’
Customer related numbers
Number of customers acquired
Customer attrition
Turnover
Outstanding balances held by customers etc.,
25
BUSINESS INTELLIGENCE -CORPORATE
Business Intelligence Tools help to gather, store, access and
analyze corporate data to aid in decision-making.
Example applications:
Customer profiling
Customer support
Market research
Market segmentation
Product profitability
Statistical analysis
Inventory and distribution analysis.
26
BUSINESS INTELLIGENCE -CORPORATE
Business Intelligence Tools help to gather, store, access and
analyze corporate data to aid in decision-making.
Example applications:
Customer profiling
Customer support
Market research
Market segmentation
Product profitability
Statistical analysis
Inventory and distribution analysis.
27
CORPORATE BI SYSTEM
28
Source: IBM Redbook, IBM Framework for e-business, SG-24-6248-00, Sept., 2001, ibm.com/redbooks
29
BI SOLUTIONS ARCHITECTURE
Relationship
Solutions Management
Billing/Payment
Enabling OLAP Systems
MOLAP Data Mining
Technologies
XML Advertising/
Data Warehousing Promotions
Core Messaging
Technologies Databases Data Marts
Virtual Data Views Personalization
Meta Data
Data Networks
Expert Systems Security Supply Chain
Management
CORBA/IIOP
COM/DCOM Knowledge
Business Management
Performance
Source: Kalakota, e-business 2.0 Measurement
Roadmap for Success 30
MAJOR BI INFORMATION COMPONENTS
OLTP – online transaction processing for
gathering operational data
Data Warehouse – operational data
collected & stored in a form for analyzing
OLAP – online analytical processing for
interactive insight and decision-making
Data Mining – Extract previously
unknown, comprehensible information for
decision support
Metadata – used to describe the kinds of
information stored 31
DATA WAREHOUSE
ARCHITECTURES
32
DWH ARCHITECTURE –1 FOR BI
EIS /DSS
Metadata
Select Query Tools
Extract
Transform Data
Integrate Warehouse OLAP/ROLAP
Maintain
Web Browsers
Operational
Systems/Data Middleware/
API Data Mining
Data
Preparation 33
WAREHOUSE ARCHITECTURE - 2
Metadata
EIS /DSS
Data Mart
Metadata
Select Query Tools
Extract
Transform Data Mart
Integrate OLAP/ROLAP
Maintain Metadata
Web Browsers
Operational Data Mart
Systems/Data Middleware/
Data API Data Mining
Preparation 34
REFERENCE BOOK
Data warehousing, Fundamentals for IT
Professions, Paulraj Punniah, Wiley publications.
2 nd edition, ISBN: 978-81-265-3729-7(Print),
2015 (reprint).
35
HOW IT DIFFERS FROM
OPERATIONAL DATABASES
Data Warehousing Operational data usually
Accessing data organized by application
Transforming data
Distributing data
requirements
Storing data Data needs to be organized
Automating processes
Managing metadata
differently for decision-making
Finding and understanding data purposes
Data Analysis BI requires consolidation of data
Getting answers
Creating OLAP applications
across multiple systems.
Analyzing the data Tools are needed to extract,
Other analysis tools
cleanse, and transform data
36
SQL Basics
Module - III
SQL Introduction
Standard language for querying and manipulating data
Structured Query Language
Many standards out there:
• ANSI SQL, SQL92 (a.k.a. SQL2), SQL99 (a.k.a. SQL3), ….
• Vendors support various subsets: watch for fun discussions in class !
SQL
• Data Definition Language (DDL)
– Create/alter/delete tables and their attributes
– Following lectures...
• Data Manipulation Language (DML)
– Query one or more tables
– Insert/delete/modify tuples in tables
Table name Attribute names
Tables in SQL
Product
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
Tuples or rows
SQL CONSTRAINTS
Following are some of the most commonly used constraints available in
SQL.
•NOT NULL Constraint − Ensures that a column cannot have NULL value.
•DEFAULT Constraint − Provides a default value for a column when none is
specified.
•UNIQUE Constraint − Ensures that all values in a column are different.
•PRIMARY Key − Uniquely identifies each row/record in a database table.
•FOREIGN Key − Uniquely identifies a row/record in any of the given database
table.
•CHECK Constraint − The CHECK constraint ensures that all the values in a
column satisfies certain conditions.
•INDEX − Used to create and retrieve data from the database very quickly.
Why integrity constraints?
Integrity constraints are used to ensure accuracy and
consistency of the data in a relational database. Data integrity
is handled in a relational database through the concept of
Date in MySQL
• DATE - Stores a date value in the form YYYY-MM-DD.
For example 2008-10-23.
• CREATE TABLE `MySampleDB`.`orders` (`order_no` INT NOT NULL
AUTO_INCREMENT, `order_item` varchar(30) not null, `order_date`
DATE NOT NULL, `order_delivery` DATE NOT NULL, PRIMARY
KEY (`order_no`) ) ;
• SELECT MONTHNAME(NOW());
• SELECT NOW();
• Curdate( ) :Returns the current date
• Year () : Returns the year
• Week():Returns the week number
• DATETIME - Stores a date and time value of the form YYYY-MM-DD
HH:MM:SS. For example 2008-10-23 10:37:22.
• 2008-10-23 10:37:22 : store form
Statistics – Aggregations using MySQL
• |1,WildTech 250Gb 1700 | SATA Disk Drive | 120 |
• | 2 | Moto Razr | Mobile Phone | 200 |
• | 3 | Microsoft 10-20 Keyboard | Ergonmoc Keyboard | 49
• | 4 | EasyTech Mouse 7632 | Cordless Mouse | 49 |
• | 5 | Dell XPS 400 | Desktop PC | 999 |
• | 6 | Buffalo AirStation Turbo G | Wireless Ethernet Bridge | 60 |
• | 7 | Apple iPod Touch | Portable Music/Movie Player | 199 |
• | 8 | Apple iPhone 8Gb | Smart Phone | 39| 9 |
• SELECT AVG(prod_price) AS price_ag FROM products;
• SELECT AVG(prod_price) AS price_avg FROM products
WHERE prod_price BETWEEN 10 and 199;
• SELECT COUNT(*) FROM products;
• SELECT COUNT(prod_price) AS low_price_items FROM
products WHERE prod_price < 200;
• ELECT MAX(prod_price) AS max_price FROM products;
• SELECT MIN(prod_price) AS min_price FROM products;
Aggregate funcitons in MySQL
• SELECT MAX(prod_price) AS max_price,
MIN(prod_price) AS max_price FROM
product;
References
• Data warehousing, Fundamentals for IT
Professions, Paulraj Ponniah, Wiley
publications. 2 nd edition, ISBN: 978-81-
265-3729-7(Print), 2015 (reprint).
• Data Base –SQL, Complete Reference.
Tables Explained
• The schema of a table is the table name and its
attributes:
Product(PName, Price, Category, Manfacturer)
• A key is an attribute whose values are unique;
we underline a key
Product(PNumber, Price, Category, Manfacturer)
Data Types in SQL
• Atomic types:
– Characters: CHAR(20), VARCHAR(50)
– Numbers: INT, BIGINT, SMALLINT, FLOAT
– Others: MONEY, DATETIME, …
• Every attribute must have an atomic type
– Hence tables are flat
– Why ?
Tables Explained
• A tuple = a record
– Restriction: all attributes are of atomic type
• A table = a set of tuples
– Like a list…
– …but it is unorderd:
no first(), no next(), no last().
SQL Query
Basic form: (plus many many more bells and whistles)
SELECT <attributes>
FROM <one or more relations>
WHERE <conditions>
Simple SQL Query
Product PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
SELECT *
FROM Product
WHERE category=‘Gadgets’
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
“selection” Powergizmo $29.99 Gadgets GizmoWorks
Simple SQL Query
Product PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
SELECT PName, Price, Manufacturer
FROM Product
WHERE Price > 100
PName Price Manufacturer
“selection” and SingleTouch $149.99 Canon
“projection” MultiTouch $203.99 Hitachi
Notation
Input Schema
Product(PName, Price, Category, Manfacturer)
SELECT PName, Price, Manufacturer
FROM Product
WHERE Price > 100
Answer(PName, Price, Manfacturer)
Output Schema
Details
• Case insensitive:
– Same: SELECT Select select
– Same: Product product
– Different: ‘Seattle’ ‘seattle’
• Constants:
– ‘abc’ - yes
– “abc” - no
The LIKE operator
SELECT *
FROM Products
WHERE PName LIKE ‘%gizmo%’
• s LIKE p: pattern matching on strings
• p may contain two special symbols:
– % = any sequence of characters
– _ = any single character
Eliminating Duplicates
Category
SELECT DISTINCT category Gadgets
FROM Product Photography
Household
Compare to:
Category
Gadgets
SELECT category Gadgets
FROM Product Photography
Household
Ordering the Results
SELECT pname, price, manufacturer
FROM Product
WHERE category=‘gizmo’ AND price > 50
ORDER BY price, pname
Ordering is broken by the second attribute on the ORDER BY
list, etc.
Ordering is ascending, unless you specify the DESC keyword.
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
SELECT DISTINCT category
FROM Product
ORDER BY category ?
SELECT Category
FROM Product
ORDER BY PName
?
SELECT DISTINCT category
FROM Product
ORDER BY PName
?
Keys and Foreign Keys
Company
CName StockPrice Country
GizmoWorks 25 USA
Key
Canon 65 Japan
Hitachi 15 Japan
Product
PName Price Category Manufacturer
Foreign
Gizmo $19.99 Gadgets GizmoWorks
key
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
Aggregation
SELECT avg(price) SELECT count(*)
FROM Product FROM Product
WHERE maker=“Toyota” WHERE year > 1995
SQL supports several aggregation operations:
sum, count, min, max, avg
Except count, all aggregations apply to a single attribute
Aggregation: Count
COUNT applies to duplicates, unless otherwise stated:
SELECT Count(category) same as Count(*)
FROM Product
WHERE year > 1995
We probably want:
SELECT Count(DISTINCT category)
FROM Product
WHERE year > 1995
More Examples
Purchase(product, date, price, quantity)
SELECT Sum(price * quantity)
FROM Purchase
What do
they mean ?
SELECT Sum(price * quantity)
FROM Purchase
WHERE product = ‘bagel’
Purchase
Simple Aggregations
Product Date Price Quantity
Bagel 10/21 1 20
Banana 10/3 0.5 10
Banana 10/10 1 10
Bagel 10/25 1.50 20
SELECT Sum(price * quantity)
FROM Purchase 50 (= 20+30)
WHERE product = ‘bagel’
Grouping and Aggregation
Purchase(product, date, price, quantity)
Query: Find total sales after 10/1/2005 per product.
SELECT product, Sum(price*quantity) AS TotalSales
FROM Purchase
WHERE date > ‘10/1/2005’
GROUP BY product
Let’s see what this means…
Grouping and Aggregation
1. Compute the FROM and WHERE clauses.
2. Group by the attributes in the GROUPBY
3. Compute the SELECT clause: grouped attributes and aggregates.
1&2. FROM-WHERE-GROUPBY
Product Date Price Quantity
Bagel 10/21 1 20
Bagel 10/25 1.50 20
Banana 10/3 0.5 10
Banana 10/10 1 10
3. SELECT
Product Date Price Quantity Product TotalSales
Bagel 10/21 1 20
Bagel 10/25 1.50 20 Bagel 50
Banana 10/3 0.5 10
Banana 10/10 1 10
Banana 15
SELECT product, Sum(price*quantity) AS TotalSales
FROM Purchase
WHERE date > ‘10/1/2005’
GROUP BY product
GROUP BY v.s. Nested Quereis
SELECT product, Sum(price*quantity) AS TotalSales
FROM Purchase
WHERE date > ‘10/1/2005’
GROUP BY product
SELECT DISTINCT x.product, (SELECT Sum(y.price*y.quantity)
FROM Purchase y
WHERE x.product = y.product
AND y.date > ‘10/1/2005’)
AS TotalSales
FROM Purchase x
WHERE x.date > ‘10/1/2005’
Another Example
What does
it mean ?
SELECT product,
sum(price * quantity) AS SumSales
max(quantity) AS MaxQuantity
FROM Purchase
GROUP BY product
HAVING Clause
SELECT product, Sum(price * quantity)
FROM Purchase
WHERE date > ‘10/1/2005’
GROUP BY product
HAVING Sum(quantity) > 30
*HAVING clause contains conditions on aggregates.
General form of Grouping and Aggregation
SELECT S
FROM R1,…,Rn
WHERE C1
GROUP BY a1,…,ak Why ?
HAVING C2
S = may contain attributes a1,…,ak and/or any aggregates but NO OTHER
ATTRIBUTES
C1 = is any condition on the attributes in R1,…,Rn
C2 = is any condition on aggregate expressions
General form of Grouping and Aggregation
SELECT S
FROM R1,…,Rn
WHERE C1
GROUP BY a1,…,ak
HAVING C2
Evaluation steps:
1. Evaluate FROM-WHERE, apply condition C1
2. Group by the attributes a1,…,ak
3. Apply condition C2 to each group (may have aggregates)
4. Compute aggregates in S and return the result
Insertions
General form:
INSERT INTO R(A1,…., An) VALUES (v1,…., vn)
Example: Insert a new purchase to the database:
INSERT INTO Purchase(buyer, seller, product, store)
VALUES (‘Joe’, ‘Fred’, ‘wakeup-clock-espresso-machine’,
‘The Sharper Image’)
Missing attribute NULL.
May drop attribute names if give them in order.
Insertions
INSERT INTO PRODUCT(name)
SELECT DISTINCT Purchase.product
FROM Purchase
WHERE Purchase.date > “10/26/01”
The query replaces the VALUES keyword.
Here we insert many tuples into PRODUCT
Insertion: an Example
Product(name, listPrice, category)
Purchase(prodName, buyerName, price)
prodName is foreign key in Product.name
Suppose database got corrupted and we need to fix it:
Purchase
Product
prodName buyerName price
name listPrice category
camera John 200
gizmo 100 gadgets gizmo Smith 80
camera Smith 225
Task: insert in Product all prodNames from Purchase
Insertion: an Example
INSERT INTO Product(name)
SELECT DISTINCT prodName
FROM Purchase
WHERE prodName NOT IN (SELECT name FROM Product)
name listPrice category
gizmo 100 Gadgets
camera - -
Insertion: an Example
INSERT INTO Product(name, listPrice)
SELECT DISTINCT prodName, price
FROM Purchase
WHERE prodName NOT IN (SELECT name FROM Product)
name listPrice category
gizmo 100 Gadgets
camera 200 -
camera ?? 225 ?? - Depends on the implementation
Deletions
Example:
DELETE FROM PURCHASE
WHERE seller = ‘Joe’ AND
product = ‘Brooklyn Bridge’
Factoid about SQL: there is no way to delete only a single
occurrence of a tuple that appears twice
in a relation.
Updates
Example:
UPDATE PRODUCT
SET price = price/2
WHERE Product.name IN
(SELECT product
FROM Purchase
WHERE Date =‘Oct, 25, 1999’);
NULL values
A NULL value is different from a zero value or a field that
contains spaces. A field with a NULL value is one that has been
left blank during record creation
COALESCE(cus
tomer_name,
'John Doe') .
SELECT
order_id,
product_name,
discount,
CASE
WHEN discount IS
NULL THEN 0
ELSE discount
END
FROM orders;