[go: up one dir, main page]

0% found this document useful (0 votes)
10 views164 pages

Essential Postgres_ Database Development Using PostgreSQL

Essential Postgres is a comprehensive guide for learning PostgreSQL, a popular open-source relational database management system. The book covers key topics such as selecting and manipulating data, creating and altering tables, and utilizing built-in functions, aimed at beginners and intermediate developers. It emphasizes practical examples over theoretical concepts, making it suitable for anyone interested in Postgres development.

Uploaded by

the_omen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views164 pages

Essential Postgres_ Database Development Using PostgreSQL

Essential Postgres is a comprehensive guide for learning PostgreSQL, a popular open-source relational database management system. The book covers key topics such as selecting and manipulating data, creating and altering tables, and utilizing built-in functions, aimed at beginners and intermediate developers. It emphasizes practical examples over theoretical concepts, making it suitable for anyone interested in Postgres development.

Uploaded by

the_omen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 164

Essential Postgres: Database Development using PostgreSQL

Copyright 2020 © Rick Silva

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted,
in any form without prior written permission of the author.

Every effort has been made to ensure the accuracy of the information presented. However, the
information contained in the book is sold without warranty, either express or implied. The author will
not be held liable for any damages caused directly or indirectly by this book.

The image of the elephant on the front cover by Ksyu Deniska/Shutterstock.com

ISBN: 9798687854184 (paperback)


To Patti, Jacki, and Rj - the best family in the history of families.
TABLE OF CONTENTS

Introduction

Audience

Conventions
Ellipses Used in Examples
Parens

1 Selecting Data from a Postgres Database


Introducing Database Tables
Introducing SQL
Ordering Rows
Null Values
Formatting SQL
Adding Comments to SQL
Calling Postgres Functions
The upper( ) Function
The coalesce( ) Function
Calling Aggregate Functions
Using “Group By” with Aggregate Functions
A Rule of Thumb for Using “Group By”
Using “Having” in SQL
Don’t Repeat Yourself
Joining Database Tables
Table Aliasing
Types of Joins
Inner Joins
Outer Joins
Cross Joins
Self Joins
Column Aliasing
Alternate Join Syntax
Using Parens
Joining with “Using” Instead of “On”
Old-School SQL
Complex Joins with Multiple Tables.
Complex SQL Statements with Inner and Outer Joins
String Concatenation
Comparing Values in SQL
Greater Than
Greater Than or Equal to
Less Than
Less Than or Equal to
Not Equal
In
Between
Like
Is Null
Not
Subqueries
Correlated Subqueries
Subqueries that Return One Row
Distinct
Limit

2 Creating and Changing Database Tables


Data Types
Commonly-Used Data Types
Text
Integer
Numeric
Date
Time Without Time Zone
Time With Time Zone
Timestamp Without Time Zone
Timestamp With Time Zone
Boolean
Character
Less Commonly-Used Data Types
Numbers
Formatted Data
Serial
Bits and Bytes
Geometry-Related
Specialty Addresses
Miscellaneous
Constraints
Primary Keys
Single Column Primary Keys
Multi-Column Primary Keys
Foreign Keys
Not Null Columns
Unique
Check
Indexes
Dropping Tables
Altering Tables
Adding and Modifying Data
Inserting Data
The Number of Columns Must Match the Number of Values
Always List your Columns
Inserting Data from Another Table
Updating Data
Deleting Data
The Delete Statement
The Truncate Statement

3 More Database Objects


Views
Using Views to Hide Sensitive Data
Materialized Views
Sequences
Creating Functions
Parameters and Arguments
SQL vs. PL/pgSQL
Writing Functions with SQL
Writing Functions with PL/PgSQL
Return Types
Dollar Quoting with Tags
Parameters by Position vs Name
Parameter Modes: In, Out, Inout, and Variadic
Procedures
Triggers
Audit Triggers
Creating the Audit Table
Creating the Trigger Function
Creating the Trigger
Watching it Work
Auditing Updates and Deletes
Special Variables for Triggers
Altering Triggers
Dropping Triggers
Disabling Triggers
Enabling Triggers

4 Built-In Functions
Aggregate Functions
abs( )
upper( )
lower( )
initcap( )
round( )
trunc( )
ceil( )
floor( )
length( )
substr( )
trim( )
ltrim( )
rtrim( )
left( )
right( )
lpad( )
replace( )
format( )
extract( )
exp( )
pi( )
power( )
random( )
position( )
version( )

5 Tips and Tricks


PgAdmin
Connecting PgAdmin to a Postgres Server
Production Servers and Development Servers
Creating the Server Connections
The Tree Control Pane
The Query Tool
Changing a Table’s Column Order
Debugging with Raise
All Balls
Casting Data Types
Functions for Casting
to_date( )
to_timestamp( )
to_char( )
to_number( )
Using Parens for Math
Transferring Data To and From Files
INTRODUCTION

PostgreSQL is a popular free and open source relational database


management system. It is also widely known as “Postgres”. I will refer to it
by its more informal alias, “Postgres”, in this book rather than as
“PostgreSQL” because that’s the way I hear it said most often in the real
world.

A relational database management system (RDMS) or “relational database”,


at its heart, is a system to store and retrieve data from a database. The term
“relation” describes a structure that stores data in a grid of rows and columns.
A relation is essentially a database table, so a relational database is a database
system that stores data in tables that have rows and columns.

Structured Query Language (SQL) is the language we use to interact with


relational databases, including Postgres. SQL allows us to store, modify, and
delete data from our database. It also allows us to create and remove tables,
query our data, and much more.

Postgres is a great technology to learn. It is a powerful, secure, and fast


relational database system that is growing in popularity. It is a mature, stable
database that has been around for over 30 years. Postgres runs on a wide
variety of operating systems, including Linux, Windows, macOS, and it even
runs in the cloud. Postgres helps you to maintain the integrity of your data. It
has a robust set of features, and is extensible if you ever need to do
something unconventional that it doesn’t already do.

This book, Essential Postgres, will give you exposure to the most used – and
useful – parts of Postgres development, as well as tips and insights that I have
picked up over my years of working with Postgres. Writing SQL statements,
creating tables, functions, triggers, views, and data integrity are explained.
AUDIENCE

This book is suitable for anybody who wants to learn to use Postgres. That
includes folks who are new to Postgres and databases, intermediate-level
developers who would like a refresher, and even seasoned software
developers who are transitioning from another database to Postgres. This
book is for anybody who is interested in learning about the essentials of using
the Postgres database.

The aim of this book is to show you how to do things with the Postgres
database. The book is short on theory and long on examples. If you are the
type of person who likes that sort of thing, read on.

Since this book focuses on Postgres development rather than administration,


if you are a Postgres Database Administrator (DBA) you may want to choose
another book. This book does not discuss backup, recovery, security or other
DBA-related issues.
CONVENTIONS

In this book, I have made some stylistic choices that you might want to be
aware of.

ELLIPSES USED IN EXAMPLES


There are several examples in this book that show data being returned by the
Postgres database. In cases where many rows are returned, I have shown a
few of the rows and added an ellipsis (“…”) rather than showing all the
results and making the book 900 pages long. When you see “…” be aware
that I am not showing all of the results in order to save space. For example:

president_id president_name president_party


1 George Washington null
2 John Adams Federalist
3 Thomas Jefferson Democratic-Republican
… … …
43 George W. Bush Republican
44 Barack Obama Democratic
45 Donald Trump Republican

In this case, a full set of all 45 presidents was returned from the database, but
I omitted presidents 4 – 42 for the sake of brevity.

PARENS
I often refer to parentheses as “parens” because that’s the way I most often
hear them referred to among developers.

Also, I add parens to the end of function names. For example, Postgres has a
function called “upper” that I write as “upper( )”. You would say this aloud
as “the upper function”, but in this book I add the “parens” to the end of
function name to make it clear that “upper( )” is a function. Functions are
explained later in this book, and we’ll see what the “parens” are for then.
1
SELECTING DATA FROM A
POSTGRES DATABASE

INTRODUCING DATABASE TABLES


In Postgres, data is stored in database tables. Think of a table as a grid of
rows and columns where data can be saved. You can create your own tables
and choose the names of the columns that you want, and specify what type of
data they should contain. Each record is saved as a row in the table.

For example, here is a table called us_president that has been defined with
three columns: president_id, president_name, and president_party.

president_id president_name president_party


1 George Washington null
2 John Adams Federalist
3 Thomas Jefferson Democratic-Republican
4 James Madison Democratic-Republican
5 James Monroe Democratic-Republican
6 John Quincy Adams Democratic-Republican
7 Andrew Jackson Democratic
8 Martin Van Buren Democratic
9 William Harrison Whig
10 John Tyler Whig
11 James Polk Democratic
12 Zachary Taylor Whig
13 Millard Fillmore Whig
14 Franklin Pierce Democratic
15 James Buchanan Democratic
16 Abraham Lincoln Republican
17 Andrew Johnson Democratic
18 Ulysses Grant Republican
19 Rutherford Hayes Republican
20 James Abram Garfield Republican
21 Chester Arthur Republican
22 Grover Cleveland Democratic
23 Benjamin Harrison Republican
24 Grover Cleveland Democratic
25 William McKinley Republican
26 Theodore Roosevelt Republican
27 William Taft Republican
28 Woodrow Wilson Democratic
29 Warren Harding Republican
30 Calvin Coolidge Republican
31 Herbert Hoover Republican
32 Franklin Roosevelt Democratic
33 Harry Truman Democratic
34 Dwight Eisenhower Republican
35 John Kennedy Democratic
36 Lyndon Johnson Democratic
37 Richard Nixon Republican
38 Gerald Ford Republican
39 Jimmy Carter Democratic
40 Ronald Reagan Republican
41 George Herbert Walker Bush Republican
42 Bill Clinton Democratic
43 George W. Bush Republican
44 Barack Obama Democratic
45 Donald Trump Republican

This table contains 45 rows of data, one row for each president.

When naming database tables, it’s a good idea to stick with singular names
like us_president rather than plural names like us_presidents. Although
Postgres will allow you to name tables in the plural form, table names should
be singular by convention.

INTRODUCING SQL
In order to interact with Postgres, you will use Structured Query Language
(SQL). You will use SQL often for interacting with Postgres. Using SQL,
you can create tables, show the contents of tables, create new rows, update
values, and much more.

For example, you could see the data in our us_president table by using this
SQL select statement.

selectpresident_id,
president_name,
president_party
from us_president;

This select statement returns data from the us_president database table and
displays it for you. Here we chose to select the president_id, president_name,
and president_party columns. Note that you should type a semi-colon at the
end of your SQL statements. The results are:

president_id president_name president_party


1 George Washington null
2 John Adams Federalist
3 Thomas Jefferson Democratic-Republican
… … …
43 George W. Bush Republican
44 Barack Obama Democratic
45 Donald Trump Republican

The query returns all 45 US presidents (although I have not shown presidents
4 – 42 for brevity).

Now let’s say you want to show only the rows for Republicans in the table.
You could make a change to the SQL and add a “where” clause.

selectpresident_id,
president_name,
president_party
from us_president
where president_party = 'Republican';

president_id president_name president_party


16 Abraham Lincoln Republican
18 Ulysses Grant Republican
19 Rutherford Hayes Republican
20 James Garfield Republican
21 Chester Arthur Republican
23 Benjamin Harrison Republican
25 William McKinley Republican
26 Theodore Roosevelt Republican
27 William Taft Republican
29 Warren Harding Republican
30 Calvin Coolidge Republican
31 Herbert Hoover Republican
34 Dwight Eisenhower Republican
37 Richard Nixon Republican
38 Gerald Ford Republican
40 Ronald Reagan Republican
41 George Herbert Walker Bush Republican
43 George W. Bush Republican
45 Donald Trump Republican

The SQL query now returns 19 rows because there are 19 Republicans in the
table.

Since we are now retrieving a list of Republicans only, let’s change the SQL
to not select the president_party column. Let’s select just the president_id and
president_name columns.

selectpresident_id,
president_name
from us_president
where president_party = 'Republican';

president_id president_name
16 Abraham Lincoln
18 Ulysses Grant
19 Rutherford Hayes
20 James Garfield
21 Chester Arthur
23 Benjamin Harrison
25 William McKinley
26 Theodore Roosevelt
27 William Taft
29 Warren Harding
30 Calvin Coolidge
31 Herbert Hoover
34 Dwight Eisenhower
37 Richard Nixon
38 Gerald Ford
40 Ronald Reagan
41 George Herbert Walker Bush
43 George W. Bush
45 Donald Trump
The SQL query didn’t show the party_name column because we didn’t select
it, and again, the query returned 19 rows.

ORDERING ROWS
Now let’s display all the rows and columns in the table ordered by political
party and then by the president’s ID. To do that, we can add an “order by”
clause to our SQL statement.

select president_id,
president_name,
president_party
from us_president
order by president_party,
president_id;

president_id president_name president_party


7 Andrew Jackson Democratic
8 Martin Van Buren Democratic
11 James Polk Democratic
14 Franklin Pierce Democratic
15 James Buchanan Democratic
17 Andrew Johnson Democratic
22 Grover Cleveland Democratic
24 Grover Cleveland Democratic
28 Woodrow Wilson Democratic
32 Franklin Roosevelt Democratic
33 Harry Truman Democratic
35 John Kennedy Democratic
36 Lyndon Johnson Democratic
39 Jimmy Carter Democratic
42 Bill Clinton Democratic
44 Barack Obama Democratic
3 Thomas Jefferson Democratic-Republican
4 James Madison Democratic-Republican
5 James Monroe Democratic-Republican
6 John Quincy Adams Democratic-Republican
2 John Adams Federalist
16 Abraham Lincoln Republican
18 Ulysses Grant Republican
19 Rutherford Hayes Republican
20 James Garfield Republican
21 Chester Arthur Republican
23 Benjamin Harrison Republican
25 William McKinley Republican
26 Theodore Roosevelt Republican
27 William Taft Republican
29 Warren Harding Republican
30 Calvin Coolidge Republican
31 Herbert Hoover Republican
34 Dwight Eisenhower Republican
37 Richard Nixon Republican
38 Gerald Ford Republican
40 Ronald Reagan Republican
41 George Herbert Walker Bush Republican
43 George W. Bush Republican
45 Donald Trump Republican
9 William Harrison Whig
10 John Tyler Whig
12 Zachary Taylor Whig
13 Millard Fillmore Whig
1 George Washington null

Adding the “order by” clause had the effect of showing us the rows ordered
alphabetically by the president_party column and then ordered by the
president_id column in lowest to highest order numerically. Democrats were
shown first because the string “Democrat” comes alphabetically before the
other political parties (“Democratic-Republican”, “Federalist”, “Republican”,
and “Whig”). Of the Democratic presidents, Andrew Jackson appeared before
Martin Van Buren because Jackson’s president_id value was lower.

You can also specify whether you want to order in ascending or descending
order. That is, in low-to-high order or in high-to-low order.

select president_id,
president_name,
president_party
from us_president
order by president_party desc,
president_id asc;

president_id president_name president_party


1 George Washington null
9 William Harrison Whig
10 John Tyler Whig
12 Zachary Taylor Whig
13 Millard Fillmore Whig
16 Abraham Lincoln Republican
18 Ulysses Grant Republican
… … …
2 John Adams Federalist
3 Thomas Jefferson Democratic-Republican
4 James Madison Democratic-Republican
5 James Monroe Democratic-Republican
6 John Quincy Adams Democratic-Republican
7 Andrew Jackson Democratic
8 Martin Van Buren Democratic
11 James Polk Democratic
14 Franklin Pierce Democratic
15 James Buchanan Democratic
17 Andrew Johnson Democratic
… … …

The query returned all 45 rows (Again, I haven’t shown all the rows for
brevity). The data is displayed in descending (reverse) order for the
president_party column and then displayed in ascending (low-to-high) order
for the president_id column. If you don’t specify descending or ascending
(which can be abbreviated “desc” or “asc”), Postgres defaults to ascending
order and you will see your results in low-to-high order.

NULL VALUES
What’s going on with George Washington? Why is his political party listed
as null? Null is a special value in Postgres that represents an empty value.
George Washington was never a member of a political party so his
political_party value is set to null in our database table.

To show only the rows for presidents that have a null political_party, you
could run this query:

selectpresident_id,
president_name,
president_party
from us_president
where president_party is null;

That SQL query would return results for only George.

president_id president_name president_party


1 George Washington null
On the other hand, if you wanted to exclude rows that have a null
president_party, you would run this query:

selectpresident_id,
president_name,
president_party
from us_president
where president_party is not null;

All presidents except George Washington would be returned by this query


because we specified that we want to see rows where the president_party
column is not null.

Null is a special value in Postgres. Null is not the same as zero, and it is not
the same as an empty string or a space character. It is its own value, and
Postgres has special syntax to help you deal with it.

The word “syntax”, in software development speak, means the words and
symbols you can use that are part of the language. Some of the special syntax
Postgres has to handle null values includes “is null”, “is not null”, and
“coalesce”. (We will see “coalesce” in a moment).

FORMATTING SQL
So far, the SQL we have used has been in a nice readable format.

1selectpresident_id,
2 president_name,
3 president_party
4from us_president;

The column names and the table name all line up together vertically. There is
only one column name on each of lines 1 - 3. It is a good idea to write your
SQL statements in a neat, maintainable format like this, but SQL will also
allow you to write SQL statements that look like this:
select president_id, president_name, president_party from
us_president;

or like this:
select president_id, president_name, president_party
from us_president;

As long as SQL can figure out what you intended, it will run your statement.
You can add extra spaces between commas if you want. You can write the
entire statement on one line or you can use many lines.

You can also use the asterisk wildcard character as a shortcut to tell SQL to
retrieve all columns.
select * from us_president;

would return the same results as:


select president_id, president_name, president_party from
us_president;

“select * from” means to select all columns from the table. Since
president_id, president_name, and president_party are all of the columns we
have in the us_president table, “select *” will produce the same results as
listing the column names individually. People will often say “select * from”
aloud as “select star from”.

ADDING COMMENTS TO SQL


Sometimes you will want to add comments to your SQL code to add
explanations that will help developers maintain the code in the future. That’s
a good idea, especially because the developer who has to maintain the code in
the future may be you.

To add single-line comments, use “--” (dash dash).


-- This SQL selects all US Presidents from the database
select * from us_president;

You can also use “--” to add a comment at the end of a line of SQL.

selectpresident_id,
president_name,
president_party -- Political party, not birthday
from us_president;

To add comments that will be spread over more than one line, you can also
use “/*” at the beginning and “*/” at the end of comment.
/*
This query retrieves all the US Presidents.
There sure are a lot of them!
*/
select * from us_president;

A well-placed comment that explains a complex SQL statement will save


time for anybody who has to maintain your code in the future. But avoid
adding comments that are obvious or can be easily understood by simply
reading the code. In the example above, for instance, it isn’t worth the time to
comment that “This SQL selects all US Presidents from the database” when
that is obvious from the SQL on the next line.

CALLING POSTGRES FUNCTIONS


Postgres comes with several pre-written functions that you can call. We’ll
learn more about these functions in the “Built-In Functions” section a little
later, but let’s start by taking a look at the upper( ) function and the coalesce(
) function.

The upper( ) Function


One function Postgres comes with is the upper( ) function. The upper( )
function converts a string to all uppercase letters. You could call it from a
SQL statement like this:
select upper('Say it loud');

The string “Say it loud” is an argument to the function. In other words, the
string “Say it loud” is sent to the upper( ) function, and the upper( ) function
will use that value to do something. In this case, it returns the uppercase
version of it, which is “SAY IT LOUD”.

Note that Postgres allows you to use a “select” statement without the “from”
clause. Normally, you would use a select statement to select data from a
database table, but in this case, since we are passing a hardcoded string (“Say
it loud”) into the function, we do not need to specify which database table to
get the data from. We don’t need to say “from table_name” because there is
no table involved. We can just “select upper(‘Say it loud’)” without the word
“from”.

To call the upper( ) function and pass a value to it from the database, we can
use a SQL statement like this:

selectupper(president_name)
from us_president
where president_id = 10;

The result is:

upper
JOHN TYLER

In our us_president table, president 10 is “John Tyler”. We got “John Tyler”


from the president_name column of the us_president table and then used it as
an argument to the upper( ) function, which used it to return the text “JOHN
TYLER”.

The coalesce( ) Function


Another example of a Postgres function is the coalesce( ) function. coalesce(
) helps us to deal with null values in our data. Sometimes when we select a
null value from a table, we want to display something other than null.

For example, when we selected from the us_president table, we saw that
George Washington’s political party was set to null.

selectpresident_name,
president_party
from us_president;

president_name president_party
George Washington null
John Adams Federalist
Thomas Jefferson Democratic-Republican
… …

If we want to display “No Political Party” instead of null, we could use the
coalesce( ) function.

selectpresident_name,
coalesce(president_party, 'No Political Party')
from us_president;

president_name president_party
George Washington No Political Party
John Adams Federalist
Thomas Jefferson Democratic-Republican
… …

The coalesce( ) function takes a list of values and returns the first value that
isn’t null. In this case, we sent the coalesce( ) function two values: the
president_party column from the us_president table, and the text “No
Political Party”. If the president_party column in the us_president table is not
null, it will be returned. But if the president_party column in the us_president
table is null, the text “No Political Party” will be returned. The effect is that
each president’s political party is shown, and for George Washington, null is
replaced by the text “No Political Party”.

Calling Aggregate Functions


An aggregate function is a function that can take multiple values in the
database and return a single value. Some examples of aggregate functions
are:

count()Returns a count of rows


max() Returns the maximum, or largest, of a set
min() Returns the minimum, or smallest, of a set
avg() Returns the average of a set of numbers
sum() Returns the sum of a set of numbers

Let’s take a look at how we might call the count( ) function.

selectcount(*)
from us_president;
This count( ) function will return the number of rows in the us_president
table, which is 45. Note that with the count( ) function you use an asterisk
between the parentheses, like this: count(*).

Now let’s look at calling the max( ) function.

selectmax(president_id)
from us_president;

This max( ) function returns the maximum president_id value in the table.
The row in this table with the highest president_id has a president_id of 45,
so 45 is returned.

Here is how we can call the min( ) function.

selectmin(president_id)
from us_president;

This min( ) function returns the minimum president_id value in the table. The
lowest president_id in the table is 1, so 1 is returned.

Using “Group By” with Aggregate Functions


Let’s say we wanted to get a count of how many presidents there are of each
political party. We could use this SQL query.

select president_party,
count(*)
from us_president
group by president_party;

president_party count
null 1
Republican 19
Democratic-Republican 4
Federalist 1
Democratic 16
Whig 4

The query’s results show us that there was one president without a political
party, 19 Republicans, 4 Democratic-Republicans, 1 Federalist, 16
Democrats, and 4 Whigs.

Let’s look at that last line of SQL that says “group by president_party”. We
said that an aggregate function returns a single value based on multiple
values in the database. For example, there are 19 rows in the table with
Republican presidents and the count( ) function returns just one row for them
that shows the total of 19. The “group by” in the SQL statement tells Postgres
to group the results by the president_party column.

A common mistake is to use an aggregate function and forget to use a “group


by” clause. For example, if you were to run this SQL statement without the
words “group by”:

selectpresident_party,
count(*)
from us_president;

Postgres would return this error:


ERROR: column "us_president.president_party" must appear in the GROUP
BY clause or be used in an aggregate function
LINE 1: select president_party, count(*) from us_president;
^
SQL state: 42803
Character: 8

To correct the problem, add the “group by” clause and tell SQL which
column to group by.

A Rule of Thumb for Using “Group By”


Here is a rule of thumb for figuring out which column or columns to group
by: The part of the query between the word “select” and the word “from” is
called the “select list”. It lists the things that we want to select from the
database. In this case there are two things: president_name and count(*).
Determine which ones are calls to aggregate functions and which are not, and
group by the ones that are not. In other words, we want to group by the non-
aggregates.

In this example, count(*) is a call to an aggregate function and


president_name is not, so we want to group by the president_name column.

Here is another example. Let’s say we want to list the highest president_id
value for each political party. We want to do something like this:

selectpresident_party,
max(president_id)
from us_president;

But the query fails because we forgot the “group by” statement.
ERROR: column "us_president.president_party" must appear in the GROUP
BY clause or be used in an aggregate function
LINE 1: select president_party, max(president_id)
^
SQL state: 42803
Character: 8

To figure out what things should go in the “group by statement”, we look at


the select list. We are selecting two things: the president_party column and
max(president_id). Remember, we almost always want to group by all of the
non-aggregates. The max( ) function is an aggregate function so the only non-
aggregate being selected is the president_party column. If we group by the
president_party column our query works nicely.

select president_party,
max(president_id)
from us_president
group by president_party;

president_party max
null 1
Republican 45
Democratic-Republican 6
Federalist 2
Democratic 44
Whig 13

Here is a more complex example using a table containing world population


data:
selectcontinent,
country,
sum(population),
sum(area)
from world_population;

This query will not run because we forgot the “group by” clause. But what
should be in the “group by” clause? Everything in the select clause that isn’t
an aggregate function.

selectcontinent,
country,
sum(population),
sum(area)
from world_population
group by continent,
country;

continent country sum sum


Africa Algeria 35978000 2381741
Africa Angola 18992708 1246700
South America Uruguay 3356584 176215
South America Venezuela 28833845 716445

USING “HAVING” IN SQL


There is an interesting aspect of our president data that is exposed by this
query:

selectpresident_name,
count(*)
from us_president

group by president_name;

I would expect this query to return a list of all the presidents with a count
of 1. After all, it’s not like there were two presidents named George
Washington. But when we run the query, we see one president comes back
with a count of 2.

president_name count
… …
Ronald Reagan 1
John Tyler 1
Grover Cleveland 2
Chester Arthur 1
George Washington 1
… …

What is going on here? If we run a query to look at the data in the table, we
can see an interesting wrinkle in the data.
select * from us_president;

president_id president_name president_party


… … …
20 James Garfield Republican
21 Chester Arthur Republican
22 Grover Cleveland Democratic
23 Benjamin Harrison Republican
24 Grover Cleveland Democratic
25 William McKinley Republican
26 Theodore Roosevelt Republican
… … …

It turns out that Grover Cleveland served two non-consecutive terms as


president. That makes him both the 22nd and 24th president. Our query that
selected a count of 2 actually worked correctly because Grover Cleveland
was a glutton for punishment!

When we ran this query, it returned 44 rows:

select president_name,
count(*)
from us_president
group by president_name;

president_name count
… …
Franklin Pierce 1
Ronald Reagan 1
John Tyler 1
Grover Cleveland 2
Chester Arthur 1
George Washington 1
… …
We had to look through all of those 44 resulting rows to see if there were any
counts that were more than 1. But there is an easier way. We can display only
counts that are higher than 1 by using the “having” statement.

select president_name,
count(*)
from us_president
group by president_name
having count(*) > 1;

Now only Grover Cleveland is shown in the results.

president_name count
Grover Cleveland 2

The “having” clause is a bit like the “where” clause, but it can be used after a
“group by” clause to operate on grouped rows.

DON’T REPEAT YOURSELF


“Don’t Repeat Yourself”, or DRY, is a principle of software development
that we should adhere to. One thing that isn’t ideal about the way the
us_president table was designed is that there is a lot of repeated data in the
table. For example, there are 19 Republicans in the table, and each one of
them has the text “Republican” repeated in the president_party column.

selectpresident_id,
president_name,
president_party
from us_president
where president_party = 'Republican';

president_id president_name president_party


16 Abraham Lincoln Republican
18 Ulysses Grant Republican
19 Rutherford Hayes Republican
20 James Garfield Republican
21 Chester Arthur Republican
23 Benjamin Harrison Republican
25 William McKinley Republican
26 Theodore Roosevelt Republican
27 William Taft Republican
29 Warren Harding Republican
30 Calvin Coolidge Republican
31 Herbert Hoover Republican
34 Dwight Eisenhower Republican
37 Richard Nixon Republican
38 Gerald Ford Republican
40 Ronald Reagan Republican
41 George Herbert Walker Bush Republican
43 George W. Bush Republican
45 Donald Trump Republican

A better approach is to create a second table that contains just a list of


political parties and an identifier to represent each one.
select * from political_party;

party_id party_name
1 Republican
2 Democratic-Republican
3 Federalist
4 Democrat
5 Whig

We should also change the us_president table and replace the president_party
column with a party_id column that matches the party_id from the
political_party table.
select * from us_president;

president_id president_name party_id


1 George Washington null
2 John Adams 3
3 Thomas Jefferson 2
… … …
43 George W. Bush 1
44 Barack Obama 4
45 Donald Trump 1

Now the text “Republican”, “Democrat” and “Whig” isn’t repeated over and
over in the table, which is an improvement.

JOINING DATABASE TABLES


Because we added the political_party table, our president data isn’t all in one
table anymore. It is spread across two tables. So now, in order to get a list of
presidents and their political parties, we would need to join those two tables.
That is, we need to link the two tables on the party_id column of each table
and then select data from columns in both the tables.

1selectus_president.president_id,
2 us_president.president_name,
3 political_party.party_name
4from us_president
5inner join political_party
6on us_president.party_id = political_party.party_id;

The results of the query are:

president_id president_name party_name


2 John Adams Federalist
3 Thomas Jefferson Democratic-Republican
4 James Madison Democratic-Republican
5 James Monroe Democratic-Republican
6 John Quincy Adams Democratic-Republican
7 Andrew Jackson Democratic
8 Martin Van Buren Democratic
9 William Harrison Whig
… … …

We are selecting from two database tables: the us_president table in line 4 of
our SQL and the political_party table in line 5. Both tables have a party_id
column so we join the tables based on the party_id column of both tables in
line 6.

Postgres allows you to do different types of joins, like inner joins, outer joins,
and cross joins. These will be explained momentarily, but you can see in line
5 that this query does an inner join.
Since we are now selecting from two tables in our query, every time we
reference a column name, we should specify which table the column comes
from. We can do that by using the table name followed by a period followed
by the column name. For example us_president.president_id refers to the
president_id column in the us_president table.

1selectus_president.president_id,
2 us_president.president_name,
3 political_party.party_name
4from us_president
5inner join political_party
6on us_president.party_id = political_party.party_id;

In this query, the president_id and president_name columns come from the
us_president table, and the party_name column comes from the
political_party column.

TABLE ALIASING
One time-saving shortcut we can use when writing SQL is to make a short
alias for our table names. A table alias is a short, temporary, substitute name
for the table. We can save keystrokes by using an alias of “a” for the
us_president table, and “b” for political_party table.

1selecta.president_id,
2 a.president_name,
3 b.party_name
4from us_president a
5inner join political_party b
6on a.party_id = b.party_id;

This query returns the same information as the previous query, but it requires
us to do a lot less typing to create the query.

On line 4 we declare that “a” will be the alias for the us_president table. On
line 5 we declare that “b” will be the alias for the political_party table. We
can choose any letter or letters we want, but this example uses “a” and “b”.
Now we can type just “a” or “b” instead of the table names in lines 1, 2, 3,
and 6. The benefit of table aliasing, as you can see, is that it saves us quite a
bit of typing.
You may see queries that use a slight variation of this syntax, using the word
“as” to define table aliases.

1selecta.president_id,
2 a.president_name,
3 b.party_name
4from us_president as a
5inner join political_party as b
6on a.party_id = b.party_id;

You can see on lines 4 and 5 the word “as” was used between the table names
and the alias names. It’s fine to use the “as” syntax, but since the objective of
using table aliases is to cut down on typing, personally I prefer not to use the
word “as”.

TYPES OF JOINS

Inner Joins
Inner joins are the most common type of join. In an inner join, there must be
a match in both tables in order for data to be retrieved from the database.
Let’s take another look at the inner join from the example above.

1selecta.president_id,
2 a.president_name,
3 b.party_name
4from us_president a
5inner join political_party b
6on a.party_id = b.party_id;

The query joins the us_president and political_party tables based on the
party_id columns of each of the tables.

We know there are 45 presidents in the us_president table, but strangely,


when we run this query only 44 rows are returned.

Why?

The reason is that the political_party table has one row for each of these
parties: Democrat, Democratic-Republican, Federalist, Republican, and
Whig. But there is one president in the us_president table that isn’t from any
of those parties. George Washington’s party in the us_president table is null.
So, when we joined the two tables, there was no match for George
Washington’s party_id and he did not appear in the results of the query.

That is the nature of an inner join. In an inner join, there must be a match in
both tables in order for the data to be retrieved.

Also, where the “inner join” is specified on line 5, the word “inner” is
optional. The default join type is inner. So this query:

1selecta.president_id,
2 a.president_name,
3 b.party_name
4from us_president a
5inner join political_party b
6on a.party_id = b.party_id;

Will produce the same results as this query that does not use the word “inner”
on line 5:

1selecta.president_id,
2 a.president_name,
3 b.party_name
4from us_president a
5join political_party b
6on a.party_id = b.party_id;

Let’s look at another example of an inner join using different tables. Let’s say
we have a customer table and an item_ordered table. The customer table and
the item_ordered table both have a customer_id column so we can join the
tables based on customer_id.

Here are the contents of the customer table:

customer_id customer_name
1 Penny Sato
2 Dixie Gooseman
3 Molly Terry
Here is what the item_ordered table looks like:

order_id customer_id item_ordered


90 1 Hat
91 2 Gloves

We can use this inner join to see which customer ordered which item.

selecta.customer_name,
b.item_ordered
from customer a
join item_ordered b
on a.customer_id = b.customer_id;

Remember, the word “inner” is optional. This query does an inner join even
though the syntax says “join” instead of “inner join”. Inner is the default type
of join.

In this query we saved typing by aliasing the “customer” table as “a”, and the
“item_ordered” table as “b”.

Here are the results:

customer_name item_ordered
Penny Sato Hat
Dixie Gooseman Gloves

Since this was an inner join, results appeared only when there was a match in
both tables. For this query, that means only customers who ordered an item
will appear in the results. “Molly Terry” is in the customer table but she
doesn’t have an order in the item_ordered table so she doesn’t appear in the
results at all.

Outer Joins
Outer joins work a little differently. An outer join will display all rows from
one table, whether or not there is a matching row in the other table. If there
are matching rows in the other table, it will show the columns from that table
too.
1selecta.customer_name,
2 b.item_ordered
3from customer a left outer join item_ordered b
4on a.customer_id = b.customer_id;

customer_name item_ordered
Penny Sato Hat
Dixie Gooseman Gloves
Molly Terry null

Using an outer join, Molly Terry appears in the results of the query, but the
item_ordered column appears as null because there was no matching row in
the item_ordered table. In other words, SQL can’t show us which item she
ordered because she didn’t order an item, so it displays null.

The customer table is the left table because it is to the left of the “left outer
join” syntax on line 3. The item_ordered table is the right table because it is
to the right of the “left outer join” syntax.

Here is another outer join example using our US Presidents data.

1selecta.president_id,
2 a.president_name,
3 b.party_name
4from us_president a left outer join political_party b
5on a.party_id = b.party_id;

president_id president_name party_name


… … …
9 William Harrison Whig
10 John Tyler Whig
1 George Washington null

“left outer join” returns all the rows from the left table (us_president) even if
there is no match in the right table (the political_party table).

Using this outer join, the query returns all 45 presidents. George Washington
appears in the results even though he has no matching political party in the
political_party table. This is different than the inner join we did on these
tables that returned only 44 rows.
The outer join worked as advertised. It showed all of the rows in the
us_president table and showed matching party_name columns from the
political_party table where there were some.

I like to think of the tables that are joined in an outer join as a mandatory
table and an optional table. In order for the president data to appear in the
results, it is mandatory that rows exist in the us_president table. But having
matching rows in the political_party table is optional. Rows will be returned
in our result set even if there isn’t a matching row in the political_party table.

To my way of thinking, us_president is the mandatory table and


political_party is the optional table. The “left outer join” syntax tells Postgres
to do an outer join and treat the table on the left as the mandatory table.

The first table mentioned on line 4 is the us_president table. It is the left table
because it is to left of the “left outer join” syntax. The political_party table is
the right table because it is to the right of the “left outer join” syntax. Or
more simply stated, the us_president table is on the left in line 4 and the
political_party table is on the right.

This is how Postgres knows to select all rows from the us_president table and
only matching rows from the political_party table. “left outer join” returns all
the rows from the left table (us_president) even if there are no matches in the
right table (political_party).

Also note that the word “outer” is optional. I recommend that you use it for
clarity, but you will undoubtedly come across outer joins that use the syntax
“left join” instead of “left outer join”. The same results will be returned.

You can also do an outer join with the “right outer join” syntax rather than
“left outer join”. The “right outer join” syntax tells Postgres to do an outer
join and treat the table on the right side as the mandatory table.

These two outer joins will return the same results:

Left outer join:


1selecta.president_id,
2 a.president_name,
3 b.party_name
4from us_president a left outer join political_party b
5on a.party_id = b.party_id;

Right outer join:

1selecta.president_id,
2 a.president_name,
3 b.party_name
4from political_party b right outer join us_president a
5on a.party_id = b.party_id;

Notice that in the SQL statement with the right outer join, the order of the
tables has been switched. The political_party table is now the left table and
the us_president table is now on the right side.

Cross Joins
Cross joins are used less frequently than inner and outer joins, but there are
occasions when they come in handy.

Let’s say a chain of bookstores wants to send one copy of each new book to
each of their stores. They have a table called bookstore that looks like this:

store_id store_name
1 Mall Store
2 College Campus Bookstore
3 Suburban Store
4 Cellar Store

And they have a table called new_book that looks like this:

book_id book_name
100 Essential Postgres
200 How to Knit
300 Poetry for Dogs

To get a list of which books should go to which stores, we could do a query


that does a cross join between the two tables.
select b.store_name,
n.book_name
from bookstore b
cross join new_book n;

store_name book_name
Mall Store Essential Postgres
Mall Store How to Knit
Mall Store Poetry for Dogs
College Campus Bookstore Essential Postgres
College Campus Bookstore How to Knit
College Campus Bookstore Poetry for Dogs
Suburban Store Essential Postgres
Suburban Store How to Knit
Suburban Store Poetry for Dogs
Cellar Store Essential Postgres
Cellar Store How to Knit
Cellar Store Poetry for Dogs

A cross join joins every row of one table to every row of another table. There
are four rows in our bookstore table and three rows in our new_book table.
So it makes sense that 12 rows are returned by the cross join because 4 x
3 = 12.

Self Joins
A self join is a join where the same table is joined to itself.

Let’s take a look at the church_hierarchy table.

position_id position_desc supervisor_id


1 God null
2 Pope 1
3 Cardinal 2
4 Bishop 3
5 Priest 4

Each row represents one position in the church. The supervisor_id column
represents the position_id of the supervisor. For example, position_id 5 –
“Priest” – has a supervisor_id of 4. If we look at the row in the table that has
a position_id of 4, we see Bishop. So, we know that the Priest’s supervisor is
the Bishop.
In order to select an org chart from the database, we can join the
church_hierarchy table to itself.

selecta.position_desc,
b.position_desc
from church_hierarchy a
join church_hierarchy b
on a.position_id = b.supervisor_id;

position_desc position_desc
God Pope
Pope Cardinal
Cardinal Bishop
Bishop Priest

In the result set, the left column is a list of supervisors and the right column is
a list of their subordinates.

By the way, when I named the column “position_desc” in the table I used
“desc” to stand for “description”. That column is a text description of the
position.

The table alias “a” represents the church_hierarchy table, and the table alias
“b” also represents the church_hierarchy table, but we are treating them as
different tables. We joined on the position_id column of the “a” table to the
supervisor_id column of the “b” table.

COLUMN ALIASING
We have seen that we can create aliases for tables. Now let’s create aliases
for columns.

The query above gave us the information we needed for our org chart, but
both columns names appeared as “position_desc”.

position_desc position_desc
God Pope
Pope Cardinal
Cardinal Bishop
Bishop Priest
These column headings could be confusing to our end users. Really, the left
column is a list of supervisors and the right column is a list of workers. Let’s
change those column headings to show “Supervisor” and “Worker” instead of
the column headings from the table, “position_desc” and “position_desc”.

selecta.position_desc Supervisor,
b.position_desc Worker
from church_hierarchy a
join church_hierarchy b
on a.position_id = b.supervisor_id;

Now the query returns the same results, but the column headings make the
meaning of the data more clear.

Supervisor Worker
God Pope
Pope Cardinal
Cardinal Bishop
Bishop Priest

Just as with table aliases, you may see queries that use the word “as” when
defining column aliases, like this:

selecta.position_desc as Supervisor,
b.position_desc as Worker
from church_hierarchy a
join church_hierarchy b
on a.position_id = b.supervisor_id;

Either way is fine. You can use the word “as” or you can leave it out.

ALTERNATE JOIN SYNTAX

Using Parens
Postgres allows you to write SQL queries that accomplish the same results
using different syntax. We saw this query earlier.

1selecta.customer_name,
2 b.item_ordered
3from customer a
4join item_ordered b
5on a.customer_id = b.customer_id;

You may come across queries where they use parens on line 5.

1selecta.customer_name,
2 b.item_ordered
3from customer a
4join item_ordered b
5on (a.customer_id = b.customer_id);

That syntax produces the same results.

Joining with “Using” Instead of “On”


Also, because we are joining the tables on columns that are called
“customer_id” in both tables, we can use the following “using” syntax
instead of the “on” syntax used above.

selecta.customer_name,
b.item_ordered
from customer a
join item_ordered b
using (customer_id);

Either approach is fine. You will come across queries that use both the “on”
and the “using” styles.

Old-School SQL
This inner join between the customer and item_ordered tables looked like
this.

selecta.customer_name,
b.item_ordered
from customer a
join item_ordered b
on a.customer_id = b.customer_id;

But you may see this query rewritten like this:

selecta.customer_name,
b.item_ordered
from customer a,
item_ordered b
where a.customer_id = b.customer_id;

This is an older style of SQL that is still supported by Postgres. This syntax
doesn’t include the word “join”; it just lists the table names separated by
commas. I recommend that you use the syntax shown in the first example, but
you may come across queries written using this older “ANSI-89” style shown
in the second example.

COMPLEX JOINS WITH MULTIPLE TABLES.


So far all of our SQL examples have selected from only two tables. But in the
real world, your SQL queries will get more complicated.

Let’s take a look at the tables for a group of music stores.

store
store_id store_name
1 Bill's Banjos
2 Mike's Music Emporium
3 Tennessee Instruments

inventory
store_id product_id quantity
1 11 5
2 22 9
2 11 2
3 33 12
3 11 4

product
product_id product_name supplier_id
11 Banjo 100
22 Guitar 200
33 Fiddle 300
supplier
supplier_id supplier_name
100 Missing Teeth Mfg.
200 Peg's Board Guitar Co.
300 Scratchy Cat Fiddle Co.

Let’s join these four tables to get an inventory list of the products at all of our
music stores:

1selecta.store_name,
2 b.product_name,
3 c.supplier_name,
4 d.quantity
5from store a
6join inventory d on d.store_id = a.store_id
7join product b on b.product_id = d.product_id
8join supplier c on c.supplier_id = b.supplier_id
9order by a.store_name;

Here are the results:

store_name product_name supplier_name quantity


Bill's Banjos Banjo Missing Teeth Mfg. 5
Mike's Music Banjo Missing Teeth Mfg. 2
Emporium
Mike's Music Guitar Peg's Board Guitar 9
Emporium Co.
Tennessee Banjo Missing Teeth Mfg. 4
Instruments
Tennessee Fiddle Scratchy Cat Fiddle 12
Instruments Co.

That query worked nicely. It was more complicated than our previous SQL
examples because the data we needed for this query was spread across four
tables. That required us to join across all four tables in order to produce our
inventory list.

In lines 5 – 8, we defined tables aliases for all four tables. We used “a” for
the store table, “b” for the product table, “c” for the supplier table, and “d”
for the inventory table. That saved us quite a bit of typing.

In lines 6 – 8, we used inner joins on the tables and specified which columns
the tables should be matched on. Recall that the “join” syntax is shorthand for
“inner join”.

In line 6, we said “join inventory d on d.store_id = a.store_id”. That defined


“d” as the table alias for the inventory table and joined the inventory table to
the store table using the store_id columns of both tables. In line 5, “a” had
already defined as the table alias for the store table, so when we joined where
“d.store_id = a.store_id”, Postgres knew to join the store and inventory tables
based on the store_id columns of both of those tables.

So far all of the table aliases we have seen have been “a”, “b”, “c”, or “d”,
but you can make up your own table aliases. When you are writing a complex
query, you can make your SQL easier to read by making the table aliases
more meaningful. For example, you could write the query like this:

1selectst.store_name,
2 p.product_name,
3 sp.supplier_name,
4 i.quantity
5from store st
6join inventory i on i.store_id = st.store_id
7join product p on p.product_id = i.product_id
8join supplier sp on sp.supplier_id = p.supplier_id
9order by st.store_name;

Instead of using “a”, “b”, “c”, and “d”, this time we used table aliases that are
shorthand for the table names. For example, “p” for the product table and “i”
for the inventory table. That can make it easier for you to write the join
condition on say, line 7. It’s harder to lose sight of which tables you are
joining when the table alias gives you a hint like the first letter of the table.

Unfortunately, in this join, two of the tables start with the same letter: store
and supplier. Since there are two tables that start with the letter “s”, for those
tables we used two-letter table aliases of “st” for the store table, and “sp” for
the supplier table.

Also, when you write queries that are joining many tables, note that the order
of the joins matters. For example, let’s switch the order of lines 6 and 7 in the
query above.
1selectst.store_name,
2 p.product_name,
3 sp.supplier_name,
4 i.quantity
5from store st
6join product p on p.product_id = i.product_id
7join inventory i on i.store_id = st.store_id
8join supplier sp on sp.supplier_id = p.supplier_id
9order by st.store_name;

When we run the query, Postgres returns this error:


ERROR: missing FROM-clause entry for table "i"
LINE 6: join product p on p.product_id = i.product_id
^
SQL state: 42P01
Character: 125

The reason is that line 6 makes reference to a table alias of “i” but “i” isn’t
defined until the “inventory i” part of line 7. We tried to use the table alias “i”
before it was defined. The fix is to change the order of lines 6 and 7 back to
their original positions.

COMPLEX SQL STATEMENTS WITH INNER AND


OUTER JOINS
Let’s say the “Tennessee Instruments” store discovered a new mandolin in
their stockroom. They added it to our database tables, but they don’t know
who the supplier is. The inventory table would now look like this:

store_id product_id quantity


1 11 5
2 22 9
2 11 2
3 33 12
3 11 4
3 44 1

The last row represents the new mandolin in store_id 3 (“Tennessee


Instruments”). The new product_id is 44 (“Mandolin”), and the quantity is 1.

The mandolin has also been added to the product table, but the supplier was
set to null because we don’t know the supplier.

product_id product_name supplier_id


11 Banjo 100
22 Guitar 200
33 Fiddle 300
44 Mandolin null

The trouble is that now our inventory list query isn’t showing the new
mandolin.

1selecta.store_name,
2 b.product_name,
3 c.supplier_name,
4 d.quantity
5from store a
6join inventory d on d.store_id = a.store_id
7join product b on b.product_id = d.product_id
8join supplier c on c.supplier_id = b.supplier_id
9order by a.store_name;

store_name product_name supplier_name quantity


Bill's Banjos Banjo Missing Teeth Mfg. 5
Mike's Music Banjo Missing Teeth Mfg. 2
Emporium
Mike's Music Guitar Peg's Board Guitar 9
Emporium Co.
Tennessee Banjo Missing Teeth Mfg. 4
Instruments
Tennessee Fiddle Scratchy Cat Fiddle 12
Instruments Co.

These are the same results that we saw before. Why isn’t the mandolin
appearing?

You can see that this query uses all inner joins. Lines 6 – 8 use just the word
“join”, which is shorthand for an inner join. In order to display a product that
is in the product table but doesn’t have a supplier in the supplier table, we
need to do an outer join between the product and supplier tables. But we want
the other joins in the query to continue to be inner joins.

It turns out that all we need to do is add the word “left” to the query to
change the inner join to an outer join.

1select a.store_name,
2 b.product_name,
3 c.supplier_name,
4 d.quantity
5from store a
6join inventory d on d.store_id = a.store_id
7join product b on b.product_id = d.product_id
8left join supplier c on c.supplier_id = b.supplier_id
9order by a.store_name;

Recall that “left join” is shorthand for “left outer join”. So now we have a
query that does two inner joins (on lines 6 and 7) and one outer join (on line
8). Now the mandolin appears in our inventory list.

store_name product_name supplier_name quantity


Bill's Banjos Banjo Missing Teeth Mfg. 5
Mike's Music Banjo Missing Teeth Mfg. 2
Emporium
Mike's Music Guitar Peg's Board Guitar 9
Emporium Co.
Tennessee Banjo Missing Teeth Mfg. 4
Instruments
Tennessee Fiddle Scratchy Cat Fiddle 12
Instruments Co.
Tennessee Mandolin null 1
Instruments

STRING CONCATENATION
Let’s say we want to get a list of presidents in this format: “President Name
(Political Party)”. To do that, we can take the president_name column and
add a space, then and a left paren to it, then add the president_party value,
then add a right paren to it, like this:
select president_name || ' (' || president_party || ')'
from us_president;

The SQL will return:


Bill Clinton (Democratic)
George W. Bush (Republican)
Barack Obama (Democratic)
Donald Trump (Republican)

Concatenation is a fancy word for joining things together. The string


concatenation operator is “||”. It is made by typing two vertical bars, or “pipe”
characters. The pipe character is usually on the right side of the keyboard
above the backslash character.

COMPARING VALUES IN SQL


So far, all of our joins have compared columns based on equality, like
“a.position_id = b.supervisor_id”. But we can also check for these conditions:

> Greater Than


>= Greater Than or Equal to
< Less Than
<= Less Than or Equal to
!= or <>Not Equal

And we can use these keywords in our queries:

in Checks that a value is in a list of values


betweenChecks that a value is in a range of values
like Checks that a string value is like some pattern
is nullChecks that a value is null
not Combined with another statement, checks that the condition is
not true

Greater Than
Using “>” we can list all the presidents with a president_id that is more than,
say, 40:

select*
from us_president
where president_id > 40;

president_id president_name president_party


41 George Herbert Walker Bush Republican
42 Bill Clinton Democratic
43 George W. Bush Republican
44 Barack Obama Democratic
45 Donald Trump Republican
Notice that president 40, Ronald Reagan, doesn’t appear in our results. To
show Reagan as well, we can use “>=” instead of “>”.

Greater Than or Equal to


Using “>=” we can list all the presidents with a president_id that is greater
than 40 or equal to 40. Now Reagan appears with the others.

select*
from us_president
where president_id >= 40;

president_id president_name president_party


40 Ronald Reagan Republican
41 George Herbert Walker Bush Republican
42 Bill Clinton Democratic
43 George W. Bush Republican
44 Barack Obama Democratic
45 Donald Trump Republican

Less Than
Using “<” we can list all the presidents with a president_id that is less than,
say, 5:

select*
from us_president
where president_id < 5;

president_id president_name president_party


1 George Washington null
2 John Adams Federalist
3 Thomas Jefferson Democratic-Republican
4 James Madison Democratic-Republican

Notice that president 5, James Monroe, doesn’t appear in our results. To


show Monroe as well, we can use “<=” instead of “<”.

Less Than or Equal to


Using “<=” we can list all the presidents with a president_id that is less than
5, or equal to 5. Now Monroe appears with the others.

select*
from us_president
where president_id <= 5;

president_id president_name president_party


1 George Washington null
2 John Adams Federalist
3 Thomas Jefferson Democratic-Republican
4 James Madison Democratic-Republican
5 James Monroe Democratic-Republican

Not Equal
There are two ways to represent “not equal” in SQL: “!=” and “<>”.

Here is a query using the != syntax:

select*
from product
where product_name != 'Mandolin'
and product_id != 11;

The “!” is the “not” operator. So “!=” means “not equal to”. This query
selects products where the product_name is not Mandolin and the product_id
is not 11 (Banjo).

Here is the same query using the “<>” syntax:

select*
from product
where product_name <> 'Mandolin'
and product_id <> 11;

I think of the “<>” syntax as saying, “show me everything that is less than (<)
11, and also show me everything that is greater than (>) 11”, which is really
another way to say, “not equal to 11”.

Both queries produce the same results: They both produce a list of all the
products except for Mandolin and Banjo, because Banjo is product_id 11.

product_id product_name supplier_id


22 Guitar 200
33 Fiddle 300
I personally like to use the “!=” syntax more than the “<>” syntax, but they
do the same thing. There is no advantage to using one over the other.

In
The “in” keyword checks that a value is in a list of values.

select*
from store
where store_id in (1,3);

store_id store_name
1 Bill's Banjos
3 Tennessee Instruments

Our query returns all rows from the store table where the store_id is a 1 or a
3. Another way to write this query is using the “or” keyword:

select*
from store
where store_id = 1
or store_id = 3;

The advantage to using the “in” syntax is that it’s less typing than using “or”,
and with the “in” clause, you can also select the list of values from the
database, like this:

select *
from store
where store_id in
(
selectstore_id
from south_east_store
);

The part of the SQL statement within the parens is called a subquery. We’ll
explore subqueries in more detail momentarily.

Between
The “between” keyword checks that a value is within a range of values.

select *
from product
where product_id between 11 and 33;

product_id product_name supplier_id


11 Banjo 100
22 Guitar 200
33 Fiddle 300

Postgres returns a list of rows from the product table where the product_id is
between 11 and 33. Note that “between” returns rows where the product_id
equals 11 or equals 33, as well as any values that are between 11 and 33.

Like
The “like” keyword checks that a string value matches some pattern. In the
following query, we are returning values from the us_president table where
the values for president_name are like the pattern “George%”.

selectpresident_name
from us_president
where president_name like 'George%';

president_name
George Washington
George Herbert Walker Bush
George W. Bush

This query returns the names of presidents where the president_name value
starts with the text “George”. The percent character represents any characters.
So, the pattern “George%” checks for “George” followed by any characters at
all.

The percent sign will also match no characters. If a president were named just
“George” with no last name, the “where president_name like ‘George%’”
condition would return him as well.

The percent sign can be placed at any position in the string. In the query
below, the “%” is placed in the middle of the string, right between “George”
and “Bush”.
selectpresident_name
from us_president
where president_name like 'George%Bush';

president_name
George Herbert Walker Bush
George W. Bush

When using “like”, you can also use the underscore character (“_”) to
represent one character. The following query searches for presidents with
names that have the pattern of “_onald%”. That is, any one character
followed by the text “onald” followed by anything at all.

selectpresident_name
from us_president
where president_name like'_oNaLd%';

president_name
Ronald Reagan
Donald Trump

“Like” is case-sensitive. If we search for lower case letters, we won’t find


uppercase letters. The following query won’t match any rows:

selectpresident_name
from us_president
where president_name like '_oNaLd%';

If you want to do a case-insensitive “like”, use the ilike keyword instead of


like. ilike will return president_name values regardless of whether the letters
are upper or lowercase.

selectpresident_name
from us_president
where president_name ilike '_oNaLd%';

president_name
Ronald Reagan
Donald Trump

Is Null
“is null” checks if a value is null. Recall that null is a special value in
Postgres that represents an empty value.

select*
from product
where supplier_id is null;

product_id product_name supplier_id


44 Mandolin null

Postgres comes with special keywords to deal with null values. A common
mistake is to use the syntax “where supplier_id = null”. That will return
unexpected results. The correct syntax is “where supplier_id is null”.

Not
The word “not” can be used to negate other keywords. For example:

select*
from product
where supplier_id is not null;

select*
from product
where supplier_id not in (11, 22);

selectpresident_name
from us_president
where president_name not like '_onald%';

select*
from product
where product_id not between 11 and 33;

“Not” has the effect of reversing the search criteria. “where supplier_id in
(11, 22)” returns only 11 and 22, but “where supplier_id not in (11, 22)”
returns everything except 11 and 22.

SUBQUERIES
A subquery is a query that is nested within another query.
selecttornado_alert_message
from emergency_message
where exists
(
select*
from tornado_alert
where alert_flag = 'Y'
);

This SQL statement has an inner query and an outer query. The part of the
SQL statement that is the inner query is also called a subquery. In the SQL
statement below, I have added comments to show which part is the outer
query and which part is the inner query or subquery.

1 select tornado_alert_message -- outer query


2 from emergency_message -- outer query
3 where exists -- outer query
4(
5 select * -- inner query (subquery)
6 from tornado_alert -- inner query (subquery)
7 where alert_flag = 'Y' -- inner query (subquery)
8 );

The word “exists” on line 3 checks for the existence of rows in the subquery.
The inner query runs first, and checks the tornado_alert table to see if any
rows exist where the alert_flag column is set to “Y”. Let’s take a look at the
tornado_alert table.
select * from tornado_alert;

State alert_flag
Massachusetts N
North Carolina Y
Florida N

There is a row in the table with an alert_flag of Y. It is the row for North
Carolina. Since at least one row with a Y exists, the outer query then selects
the tornado_alert_message column from the emergency_message table and
returns it. The results are:

tornado_alert_message
There is a tornado alert
If there had been more than one row with an alert_flag of Y, the results
would have been the same. “Exists” checks that one or more rows exist in the
subquery.

If there had not been any rows returned by the subquery with an alert flag of
Y, nothing would have been returned by the SQL statement at all.

Correlated Subqueries
Let’s take another look at our music store data. Let’s get a list of all stores
that have more than 10 of an item in stock.

1 select store_name
2 from store s
3 where exists
4 (
5 select i.store_id
6 from inventory i
7 where i.store_id = s.store_id
8 and i.quantity > 10
9 );

In this SQL statement, we have an inner query and an outer query like in the
last example, but there is something different about this SQL statement. In
line 7, the store table from the outer query is joined with the inventory table
from the inner query. This is a correlated subquery because there is a
correlation between the inner and outer queries. In other words, the inner and
outer queries are joined.

Because the inner and outer queries are correlated, the query can show us not
only that there is a store with more than 10 of an item in stock, but it can
show us which store it is.

The result of this SQL statement is:

store_name
Tennessee Instruments

Subqueries that Return One Row


If we know for sure that the subquery will only return one row, we could
rewrite the SQL statement this way.

1 select store_name
2 from store s
3 where s.store_id =
4 (
5 select i.store_id
6 from inventory i
7 where i.quantity > 10
8 );

Instead of using the “exists” keyword, we are checking that the store_id in
the outer query matches the store_id in the inner query, on lines 3 and 5.

That SQL statement produces the same result.

store_name
Tennessee Instruments

The trouble is that if the data in the inventory table changes, and a second
store begins stocking more than 10 of an item, our query would fail.

Right now, if we were to run the just the inner query by itself, we would see
that it returns one row that represents store_id 3, “Tennessee Instruments”.

selecti.store_id
from inventory i
where i.quantity > 10;

store_id
3

But now if another store, say “Bill’s Banjos”, got a new shipment of banjos,
the inner query would return two rows.

selecti.store_id
from inventory i
where i.quantity > 10;

store_id
3
1

Now when we run our complete query, we get an error:

1select store_name
2from store s
3where s.store_id =
4(
5 select i.store_id
6 from inventory i
7 where i.quantity > 10
8);

ERROR: more than one row returned by a subquery used as an expression


SQL state: 21000

That is because the equal sign on line 3 tells SQL to expect only one value to
be returned from the subquery. When the subquery returns more than one
value, Postgres returns an error.

This is a relatively common problem. To avoid it, we shouldn’t have used an


equal sign on line 3. Unless you are sure that there is no possibility of the
subquery returning more than one row, use “in” instead of “=”.

1select store_name
2from store s
3where s.store_id in
4(
5 select i.store_id
6 from inventory i
7 where i.quantity > 10
8);

Now that we have used “in” on line 3, our results come back nicely:

store_name
Bill's Banjos
Tennessee Instruments

Now the outer query isn’t surprised when the inner query returns multiple
rows. The “in” keyword can handle a set of values while “=” expects just one
value.
DISTINCT
The “distinct” keyword removes duplicates from our result set. If we select
the president_party column from our us_president table, we get lots of
duplicates.

selectpresident_party
from us_president;

president_party
null
Federalist
Democratic-Republican
Democratic-Republican
Democratic-Republican
Democratic-Republican
Democratic
Democratic
Whig
Whig

That query returns 45 values, one showing the party for each president. We
could use “distinct” if we want to see each party only once.

selectdistinct president_party
from us_president;

president_party
null
Republican
Democratic-Republican
Federalist
Democratic
Whig

Now the query returns only 6 rows. Selecting the distinct parties shows each
party value only once.

LIMIT
The “limit” keyword allows us to select a limited number of rows. If we are
interested in seeing what the us_president table looks like, but we don’t need
to see all 45 rows in it, we could limit the results to 5 rows.

select *
from us_president
limit 5;

president_id president_name president_party


1 George Washington null
2 John Adams Federalist
3 Thomas Jefferson Democratic-Republican
4 James Madison Democratic-Republican
5 James Monroe Democratic-Republican
2
CREATING AND CHANGING
DATABASE TABLES

So far, we have been selecting from database tables that already exist. Now
let’s look at how to create tables in the first place.

To create a table in Postgres, we use the “create table” statement.

1create table us_president (


2 president_id int,
3 president_name text,
4 president_party text
5);

In line 1, we provide the name of the table we want to create, us_president in


this case. In lines 2 – 4, we provide the column names we want, and the data
type for each column.

We can make up our own names for tables and columns. The names must be
63 characters or fewer, and they need to start with a letter or an underscore.
For readability, I recommend keeping your table and column names under 30
characters, and starting table and column names with a letter. Underscores are
often used between words in table and column names, but dashes are not
allowed.

By convention, tables are given singular names, not plural names. For
example, we called this table us_president instead of us_presidents.
In lines 2 – 4, we defined three columns for the table: president_id,
president_name, and president_party. We specified a data type for each
column. president_id is an integer, president_name is text, and
president_party also has a text data type.

DATA TYPES
Postgres provides lots of data types you can use to create your columns.
Some data types you will use frequently and some you will use once in a
while or only when some special situation arises.

Commonly-Used Data Types


Postgres gives us a choice of many data types that we can use for our
columns. Here are the most commonly-used data types.

text A character string of any length


integer An integer (often abbreviated as “int”)
numeric A floating point number (also called “decimal”)
date A calendar date with year, month, and day
time without time zone The time of day without a time zone (also called “time”)
time with time zone The time of day with a time zone (also called “timetz”)
timestamp without time zone Date and time without a time zone (also called “timestamp”)
timestamp with time zone Date and time with a time zone (also called “timestamptz”)
boolean A true or false value (often abbreviated as “bool”)
character A fixed-length character string (often abbreviated as “char”)

Text
The text data type stores a string of characters. A string can be made up of
almost any characters, like letters, numbers, underscores, dashes,
ampersands, and lots more. Here the product_name column of the product
table is created with the text data type.

create table product


(
product_id int,
product_name text,
supplier_id int
);

When dealing with text values in Postgres, we use single quotes not double-
quotes.

-- Surrounding Banjo with single quotes works nicely


select *
from product
where product_name = 'Banjo';

product_id product_name supplier_id


11 Banjo 100

-- Surrounding Banjo with double quotes produces an error. Don't do


this.
select *
from product
where product_name = "Banjo";

ERROR: column "Banjo" does not exist


LINE 3: where product_name = "Banjo";
^
SQL state: 42703
Character: 47

If you are creating a table and you need a column for text data of unknown
length, use the text data type.

Integer
The integer data type is used for storing whole numbers. An example of a
whole number is 3. int would not be the right data type to store 3.1415,
because that is not a whole number. That is because 3.1415 has a fractional
part (the part after the decimal point).

To create an integer column, you can save keystrokes by abbreviating


“integer” as “int”.

create table product


(
product_id int,
product_name text,
supplier_id int
);

The int data type can store numbers between -2147483648 and +2147483647.
If you need to store larger numbers you can use the bigint data type instead.
If you are dealing with smaller numbers and you want to save disk space, you
can use the smallint data type.

Data Type Range of Values


int -2147483648 to +2147483647
smallint -32768 to +32767
bigint -9223372036854775808 to +9223372036854775807

The smallint and bigint data types are available if you need them, but most
times, if you need to store numbers that don’t have a decimal point, the int
data type is what you want.

Numeric
The “numeric” data type is used for storing numbers that have a fractional
amount.

For example, let’s take a look at a table called special_number. The


special_value column in the table has been defined using the numeric data
type.

In Postgres, the “numeric” data type and the “decimal” data type are the
same. You can create the column using either word.

create table special_number


(
name text,
special_value numeric
);

Is the same as:

create table special_number


(
name text,
special_value decimal
);

We might see values like these in the special_number table:


select * from special_number;
name special_value
Pi 3.14159
Golden Ratio 1.61803
Euros to Dollars 1.17
US National Debt 28523782444392.17

Numbers like these that have a decimal point are said to have a precision and
scale. Precision is the total number of digits. Scale is the number of digits
after the decimal point. For example, 3.14159 has a precision of 6 and a scale
of 5.

We can use the “numeric” data type without specifying a precision and scale,
like we did above, or we can define a precision and scale for our numeric
column. When we created the special_number table, we could have created
the special_value column with a precision of, say, 10 and a scale of 4.

create table special_number


(
name text,
special_valuenumeric(10,4)
);

Now the values appear with 4 digits to the right of the decimal point, and up
to 6 digits to the left of the decimal point. Since only 10 digits total are
allowed in the column, and 4 are to the right of the decimal point, that leaves
us with 6 digits to the left of the decimal point.

select * from special_number;


name special_value
Pi 3.14159
Golden Ratio 1.61803
Euros to Dollars 1.17

We could not add the US National Debt to this column anymore because it is
too large for our column definition. The number has more than 6 digits to the
left of the decimal point.

Date
To store dates in Postgres, we can use the “date” data type. For example, here
is a table called clock_change that has one column called clock_change_date.
The clock_change_date column was created with the “date” data type, so we
can store dates in that column.
create table clock_change
(
clock_change_date date
);

We might see dates like these in the table.

select*
from clock_change;

clock_change_date
2020-11-01
2021-03-14
2021-11-07
2022-03-13
2022-11-06

Postgres typically displays dates in YYYY-MM-DD format. That is, as a 4-


digit year, followed by a dash, a 2-digit month, another dash, and a 2-digit
day. We can use that same format to select a particular date from the table.

1select*
2from clock_change
3where clock_change_date = '2021-11-07';

The “date” data type is handy when you need to store a date but when you
don’t need to store the time of day or a time zone. Notice that the date is
surrounded by single quotes on line 3.

Postgres provides an easy way to get today’s date called current_date. The
current_date function gives you today’s date as a “date” data type.

Strangely, the current_date function is one of a handful of Postgres functions


that you call without parens. The syntax to call the current_date function is
“select current_date” not “select current_date( )”.
select current_date;

current_date
2020-12-27

We could also use current_date to check for dates in our clock_change table
that have today’s date.

select *
from clock_change
where clock_change_date = current_date;

You can also get the current_date and then add or subtract from it. For
example, the following SQL statement displays all rows in the table with a
clock_change_date from the last ten days:

select *
from clock_change
where clock_change_date > current_date – 10;

We are checking that the clock_change_date column is newer than today’s


date minus 10 days. This uses, but doesn’t change the value of, current_date.
The next time you make use of current_date it will be set to the current date.

Time Without Time Zone


The “time without time zone” data type stores a time of day without a date or
time zone.
create table start_time
(
twotz time without time zone
);

I named this column twotz because it stands for “Time Without Time Zone”.
The following query returns the column twotz from the start_time table:

selecttwotz
from start_time;

The query might return this value.


twotz
15:55:25.832142

The time that was returned consists of several parts. 15 is military time for
3pm because 15 minus 12 is 3. 55 is the minutes. 25 is the seconds. 832142 is
the milliseconds. So, this time represents 3:55 pm and 25 seconds and
832142 milliseconds.

Notice that the “time without time zone” data type doesn’t include a date or a
time zone.

When using the “time without time zone” data type, you can either spell out
“time without time zone” or shorten it to “time”. The default for “time” is
“without time zone”. These two tables would be the same:
create table start_time
(
twotz time without time zone
);
create table start_time
(
twotz time
);

Time With Time Zone


The “time with time zone” data type stores a time of day without a date, but
with a time zone.

selecttwtz
from start_time;

This query returns the twtz column from the start_time table.

twtz
15:55:25.832142-04:00

This returned the same time as the “time without time zone” data type
example above, but now there is a “-4:00” at the end of it.
The “-4:00” is an offset that represents the time zone. This offset tells us the
difference between the time zone of this time and the UTC (Universal Time
Coordinated) time zone. The time “15:55:25.832142-04:00” is still 3:55 pm
and 25 seconds and 832142 milliseconds, but the “-4:00” tells us that it is in
the EDT time zone. EDT is Eastern Daylight Time, which is a time zone used
in the eastern part of the United States. EDT is 4 hours behind UTC, so the
offset is negative 4.

When using the “time with time zone” data type, you can either spell out
“time with time zone” or shorten it to “timetz”. These two tables would be
the same:
create table start_time
(
twtz time with time zone
);

create table start_time


(
twtz timetz
);

Postgres provides an easy way to get the current time called current_time.
The current_time function gives you the current time as a “time with time
zone” data type.
select current_time;

14:34:19.274711-04:00

You could compare the times in our table to the current time using this query.

select*
from start_time
where twtz = current_time;

We are unlikely to have any rows in the table with a time that is exactly the
same as the current time including the milliseconds, but we can check for
times within the last hour using a query like this:
select*
from start_time
where twtz > (current_time - interval '1 hour');

The “interval” syntax allows us to do math on dates or times.

Timestamp Without Time Zone


The “timestamp without time zone” data type stores a date and time without a
time zone. The format of the field is “2020-12-29 14:54:56.710457”. That is
the date in YYYY-MM-DD format, followed by the time in hours, minutes,
seconds, and milliseconds.

The “timestamp without time zone” data type can also be referred to as
simply “timestamp”. These two tables would be the same:
create table start_time
(
tswotz timestamp without time zone
);

create table start_time


(
tswotz timestamp
);

Timestamp With Time Zone


The “timestamp with time zone” data type stores a date and time with a time
zone.

The “timestamp with time zone” data type can also be shortened to
“timestamptz”. These two tables would be the same:
create table start_time
(
tswtz timestamp with time zone
);

create table start_time


(
tswtz timestamptz
);
Postgres provides a function called now( ) that is an easy way for you to get a
current “timestamp with time zone” value.
select now();

now
2020-12-29 14:54:56.710457-04

You can also use the Postgres current_timestamp function, which does this
same thing as now( ).
select current_timestamp;

current_timestamp
2020-12-29 14:54:56.710457-04

You could also use now( ) or current_timestamp to check the table to see if
there are any values with the current timestamp, like this:

select*
from start_time
where tswtz = now();

or:

select*
from start_time
where tswtz = current_timestamp;

Boolean
The boolean data type, often abbreviated as “bool”, stores a value that
represents true or false. Let’s take a look at the presidential_hair table.
create table presidential_hair (
president_name text,
good_hair bool
);

You might see data like this in the table:


select * from presidential_hair;

president_name good_hair
Richard Nixon false
Martin Van Buren false
John F. Kennedy true

To select all the presidents with good hair, you can say “where good_hair” in
your query.

selectpresident_name
from presidential_hair
where good_hair;

president_name
John F. Kennedy

You may see queries that use a slightly different syntax for checking
booleans, for example:
select president_name from presidential_hair where good_hair is true;
select president_name from presidential_hair where good_hair = true;
select president_name from presidential_hair where good_hair = 't';

All of those statements do the same thing.

To check a boolean for a value of false, you can use the “not” syntax.
selectpresident_name
from presidential_hair
where not good_hair;

president_name
Richard Nixon
Martin Van Buren

And again, you may come across queries that use a slightly different syntax
to do the same thing.
select president_name from presidential_hair where good_hair is
false;
select president_name from presidential_hair where good_hair = false;
select president_name from presidential_hair where good_hair = 'f';

In some databases, you will see columns named with the suffix “_flag” that
have a boolean data type. Since a boolean is either true or false, columns will
be given names like “active_user_flag” because if the user is active, setting
the column to true can be compared to raising a flag.

create table application_user (


user_name text,
active_user_flag bool,
speaks_english_flag bool,
speaks_spanish_flag bool
);

That table could contain data like this.

User_name active_user_flag speaks_english_flag speaks_spanish_flag


Rick Silva true true false
Kendra Duncan true true true
Jose Gonzalez false false true

If you wanted a list of active users who speak Spanish, you could use this
query:

select*
from application_user
where active_user_flag
and speaks_spanish_flag;
User_name active_user_flag speaks_english_flag speaks_spanish_flag
Kendra Duncan true true true

Character
The “character” data type allows us to store fixed-length character strings. It
is often abbreviated as “char”.

For example, the following table was created to store data about states in the
USA:
create table us_state (
state_code char(2),
state_name text
);

The state_code column is defined as a character data type with a length of 2.


This table might store data that looks like this:
select * from us_state;

state_code state_name
MA Massachusetts
NC North Carolina
HI Hawaii

Since we defined the state_code column as char(2), Postgres won’t allow us


to enter more than 2 characters in that column. On the other hand, the
state_name column was defined with the “text” data type because we don’t
know how many characters a state name will have.

Less Commonly-Used Data Types


Postgres is a robust database. Sometimes it feels like they have thought of
everything. Postgres provides so many data types that you can use to create
your columns that it’s hard to imagine using them all. Some of these less
commonly-used data types you will use once in a while, some you will use
only when some special situation arises, and some you will probably never
use.

Some of the data types - like point, circle, and polygon - are for geographical
data. There are data types for binary data, MAC addresses, host addresses,
and all manner of other things. You can find a complete description of all of
the Postgres data types on the Postgres website. You can even create your
own data types using the “create type” command if you ever need a really
unconventional data type.

Numbers
bigint A larger version of "int"
smallint A smaller version of "int"
money A currency amount
real A floating point number
double precision A larger floating point number (also
called "double" or "float")

Formatted Data
json Stores textual JSON (JavaScript Object Notation)
data
jsonb Stores binary JSON data, decomposed
xml Stores XML (Extensible Markup Language) data

Serial
serial An autoincrementing integer
smallserial A smaller autoincrementing integer
bigserial A larger autoincrementing integer

Bits and Bytes


bit A fixed-length bit string
bit varying A variable-length bit string
bytea Binary data ("byte array")

Geometry-Related
box A rectangular box on a plane
circle A circle on a plane
line An infinite line on a plane
lseg A line segment on a plane
path A geometric path on a plane
point A geometric point on a plane
polygon A closed geometric path on a plane

Specialty Addresses
cidr An IPv4 or IPv6 network address
inet An IPv4 or IPv6 host address
macaddr A MAC (Media Access Control) address
macaddr8 A MAC (Media Access Control) address
(EUI-64 format)

Miscellaneous
interval A time span
pg_lsn A Postgres log sequence number
tsquery A text search query
tsvector A text search document
txid_snapshot A user-level transaction ID snapshot
uuid A universally unique identifier
character varying A variable-length character string

CONSTRAINTS
When you create your own database tables, Postgres gives you a way to put
“constraints” on your columns. Constraints are rules about the data that can
be saved in your columns. Can there be two rows in the table with the same
value in this column? Are null values allowed for this column? Using
constraints, you can define rules about the data that is allowed, and Postgres
will enforce those rules for you.

Constraints help us with “data integrity”. That is, they help us to keep the
data in our database accurate and consistent.

Primary Keys
A “primary key” is a column, or more than one column, that uniquely
identifies the rows in a table. When you create a database table, one of the
most important questions to consider is “what should the primary key for this
table be?”
Single Column Primary Keys
In our us_president table, which column uniquely identifies each row?

president_id president_name president_party


1 George Washington null
2 John Adams Federalist
3 Thomas Jefferson Democratic-Republican
4 James Madison Democratic-Republican
5 James Monroe Democratic-Republican
… … …

The answer is the president_id column. That is because no two rows can have
the same president_id. There will never be two different rows with the same
president_id.

That can’t be said of the president_party column. We can see that there are
multiple rows that have a president_party of “Democratic-Republican”.

It can’t even be said of the president_name column. It is possible that there


could be two presidents with exactly the same name.

But we can always be sure that president_id 3 will refer to just one row in the
table. So, when we create the us_president table, we should make the
president_id column the primary key.

To designate the president_id column the primary key for the table, we can
use the “primary key” syntax:

1create table us_president (


2 president_id int primary key,
3 president_name text,
4 president_party text
5);

In line 2, we named the column “president_id”, we set its data type to integer,
and we defined the column as the primary key for this table by using the
syntax “primary key”.

Making president_id the primary key for this table does three good things for
us.

First, it prevents duplicate President IDs from being inserted into the table. If
a user of our database tries to add president_id 40 and there already is a
president_id 40 in the table, Postgres will stop them and give them an error
message.

Secondly, making president_id the primary key prevents users from adding a
null value to the president_id column. When we defined president_id as the
primary key, we designated it as a special column that cannot be null.

Those two things fall under the category of “data integrity”. They help to
keep the data in our database accurate and consistent. Once we define this
primary key, we can be assured that all rows in the table will have a unique
president_id, and that no president_id will be null. Postgres will enforce that
for us, and that will help to keep the data in our database of a high quality.

The third advantage to creating a primary key is that it causes Postgres to


create an index for us. An index will help to speed up queries that select from
this table. We’ll look at indexes more in a moment.

Multi-Column Primary Keys


Sometimes a primary key consists of more than one column. For example, in
the following table we are storing the daily high temperature by city.

city temp_date high_temp


Boston 2020-12-01 36
Boston 2020-12-02 31
Boston 2020-12-03 27
Raleigh 2020-12-01 54
Raleigh 2020-12-02 56
Raleigh 2020-12-03 49

For this table, the primary key should consist of the city and temp_date
columns. That is because there should be only one row with the same city and
date. For example, we have a row for Boston for 2020-12-01 with a high
temperature of 36. If we make the city and temp_date the primary key for this
table, Postgres will prevent users from adding a second row for Boston for
2020-12-01.

There would never be a reason to add a second row for Boston for 2020-12-
01, so if we make city and temp_date the primary key, Postgres will make
sure that doesn’t happen. To make city and temp_date the primary key for
this table, we can use the “primary key” syntax.

1create table high_temperature (


2 city text,
3 temp_date date,
4 high_temp int
5 primary key (city,
temp_date)
6);

In line 5, we told Postgres that the primary key should be comprised of the
city and temp_date columns. When a primary key is made up of more than
one column this way, it is called a “composite” key.

Postgres doesn’t require you to define a primary key for the tables you create,
but I recommend that you do. Defining primary keys for your tables sets
Postgres up to help you with data integrity. It also helps you to understand
your data and the best way to define your tables. Most times, if you can’t
figure out what the primary key should be for a new table, that means you
need to rethink your table design. Except in rare cases, every table should
have a primary key.

Foreign Keys
A foreign key is a column, or more than one column, that has a relationship
with the primary key of another table. You can use a table’s foreign key to
join it to another table. A foreign key consists of the columns that are used to
match this table to another table’s primary key columns.

Let’s take another look at the tables for our music stores:

supplier
supplier_id supplier_name
100 Missing Teeth Mfg.
200 Peg's Board Guitar Co.
300 Scratchy Cat Fiddle Co.

product
product_id product_name supplier_id
11 Banjo 100
22 Guitar 200
33 Fiddle 300

When we create the supplier table, we should make the primary key the
supplier_id column because we never want there to be two rows in the
supplier table with the same supplier_id.

create table supplier


(
supplier_id int primary key,
supplier_nametext
);

The product table also has a supplier_id column. That column can be used to
link the product table to the supplier table. Product 11 – Banjo – in the
product table has a supplier_id of 100. We can go to the supplier table and
see that supplier_id 100 is “Missing Teeth Mfg” so we know that the supplier
of the banjo is Missing Teeth Mfg.

When we create the product table, we should define the supplier_id column
as a foreign key that references the supplier_id column of the supplier table.

1create table product


2(
3 product_id int,
4 product_name text,
5 supplier_id int,
6 foreign key (supplier_id) references supplier (supplier_id)
7);

In line 6, we defined the supplier_id column of the product table as a foreign


key. We also told Postgres that the supplier_id column of the product table is
related to the supplier_id column of the supplier table.
Now if a user tries to add a product to the product table with a supplier_id of,
say, 900 then Postgres will prevent them from adding the product and will
display an error message. Postgres will recognize that there is no supplier_id
900 in the supplier table, so adding that row to the product table would
violate the foreign key we set up when we defined the table.

This is known as “referential integrity”. By setting up this foreign key, we


can be assured that every supplier_id in the product table refers to a
supplier_id that exists in the supplier table.

Not Null Columns


Sometimes we want to create a database table with columns that should not
allow null values. To do that we can use the “not null” syntax.

1create table us_president (


2 president_id int primary key,
3 president_name textnot null,
4 president_partytext
5);

On line 3, we set the president_name column to not allow null values. It


doesn’t make sense to have a null value in the president_name column
because if we don’t know the president’s name, why would we add the row in
the table at all?

Now that we have defined the president_name column as “not null”, if a user
in our database tries to add a null value for president_name, Postgres will
stop them and display an error message.

By the way, the president_id column defined in line 2 will also not allow null
values because it has been defined as the table’s primary key. Recall that
primary keys columns cannot have a null value. Because it is the primary
key, we didn’t need to specify “not null” for the president_id column.
Postgres does that for us automatically.

Unique
Sometimes we want to prevent duplicate values in a column. We might want
to create a customer table, for example, that will not allow duplicate email
addresses.

1create table customer (


2customer_id int primary key,
3customer_nametextnot null,
4email_addresstextunique
5);

On line 4 of our “create table” SQL statement, we used the word “unique”.
Because the table was created this way, Postgres will not allow two
customers in the table to have the same email address. Postgres will display
an error message if a user tries to create a customer with an email address that
is already in the table.

Check
We can use the “check” constraint to make sure that a column contains
certain values, or a certain range of values. We can define the
high_temperature table this way.

create table high_temperature (


city text,
temp_datedate check (temp_date between '2020-01-01' and '2030-01-
01'),
high_tempint check (high_temp < 150)
);

If a user tries to add a row to the high_temperature table with a temp_date


before January 1, 2020 or after January 1, 2030, Postgres will prevent the row
from being added and will display an error message. If a user tries to add a
row with a temperature that is 150 or higher, Postgres will likewise prevent
the row from being added, and will display an error message.

INDEXES
Postgres lets us create indexes on our tables that will help to speed up the
SQL queries that use those tables. You can create an index using the “create
index” syntax.

create index product_supplier_index on product(supplier_id);

This SQL command creates an index named product_supplier_index on the


supplier_id column of the product table. The new index won’t change
anything about the way we use the table. It will just speed up access to it.

Again, you don’t need to create indexes for columns that have been defined
as primary keys. When you create a primary key for a table, Postgres
automatically indexes those columns for you.

DROPPING TABLES
The syntax to drop a table is “drop table”. Be careful with this command
because it will remove the table and all of the data in it.
drop table product;

ALTERING TABLES
The alter command lets us add columns to the table. We can also use it to
drop a column, change the data type of a column, rename a column, and
more.
alter table customer add column county text;

alter table customer drop column zip;

alter table customer rename column city to city_or_town;

ADDING AND MODIFYING DATA


Once we have created our database tables, we will certainly want to insert
data, update that data, and delete data. Often, we will use SQL to add,
change, and delete data. The SQL commands to do that are insert, update,
delete, and truncate. We will see examples of those commands in a moment.
One way to execute SQL commands against a Postgres database is using a
graphical database management tool like PgAdmin. PgAdmin comes with a
query tool that allows us to enter SQL and run that SQL against a Postgres
database. We will take a closer look at PgAdmin momentarily.

There are other ways to insert, update, and delete data from a Postgres
database in addition to using PgAdmin. Many applications have a User
Interface (UI) that allows end users to enter and change data and to have
those changes affect a Postgres database. An application’s UI might be web-
based, or it might be part of a mobile application, or it may be part of a
desktop application. Often, an application’s user interface sends SQL
commands like insert, update, and delete to a Postgres database, where those
commands are executed.

Inserting Data
To use SQL to insert data into a table, use the “insert” command.

1insert into us_president (president_id, president_name,


president_party)
2values (33, 'Harry Truman', 'Democratic');

Here we are inserting a row into the us_president table. In line 1, we are
telling Postgres that we want to insert this data into the president_id,
president_name, and president_party columns. Then on line 2, we provide the
data to insert, in this case, 33, “Harry Truman”, and “Democratic”. “Harry
Truman” and “Democratic” have single quotes around them because they are
text fields. 33 is an integer so it does not have quotes around it. This SQL
statement will add a new row to the us_president table.

Notice that we listed three columns in line 1 and three values in line 2.

The Number of Columns Must Match the Number of Values


A common mistake is to provide more or fewer values than columns. For
example:

1 -- This is wrong. Too few columns for the data.


2 insert into us_president (president_id, president_name)
3 values (33, 'Harry Truman', 'Democratic');
Postgres looks for a column to insert the text “Democratic” in, but since we
removed the reference to the “president_party” column from line 2, it can’t
find one. We are trying to insert three values into two columns, so Postgres
returns this error message:
ERROR: INSERT has more expressions than target columns
LINE 2: values (33, 'Harry Truman', 'Democratic');
^
SQL state: 42601
Character: 85

Always List your Columns


One “gotcha” that has burned many a developer is caused by the fact that
Postgres doesn’t require us to list the columns in the table we are inserting
data into. Instead of specifying the president_id, president_name, and
president_party columns in our insert statement, we can leave those column
names out entirely, like this:
1 -- Don't do this.
2 insert into us_president
3 values (33, 'Harry Truman', 'Democratic');

Even though we aren’t specifying any columns to insert the data into on line
2, Postgres accepts this syntax and adds the row to the table. Postgres sees
that there are three columns in the table and since we provided three values, it
writes the values to those three columns in the table without displaying an
error.

The problem comes later when we decide to make a change to the


us_president table. Let’s say we add a new column to the table called
inauguration_date. Now when we run our insert statement, there are four
columns in the table and there are only three values being added. That causes
Postgres to reject the new row and display an error message.

It can be perplexing when an insert statement is part of an application that has


been running without a problem and now all of a sudden it is failing. It can
take some time to figure out that the insert statement hasn’t changed, but the
database table has.
For this reason, I recommend that you always list the column names that you
want to insert into.
1 -- Don't do this, even though it will work.
2 insert into us_president
3 values (33, 'Harry Truman', 'Democratic');

1-- Do this instead.


2insert into us_president (president_id, president_name,
president_party)
3values (33, 'Harry Truman', 'Democratic');

Listing the column names on line 2 allows this insert statement to work even
if we add new columns to the table later.

Inserting Data from Another Table


We can also insert data from one table into another table.

1 insert into us_republican_president (


2 president_id,
3 president_name
4 )
5 select president_id,
6 president_name
7 from us_president
8 where president_party = 'Republican';

This SQL statement will insert all 19 Republican presidents from the
us_president table into the us_republican_president table. If we were to run
just lines 5 – 8 in isolation, it would display the Republican presidents from
the us_president table. Lines 1 – 4 take that data and insert it into the
us_republican_president table. Notice that the word “values” is not part of
this syntax as it was in the previous example.

Updating Data
To change data that already exists in a table, we can use the “update”
statement and specify which data we want to change in our “where” clause.
Here we are replacing any null values in the president_party column with the
text “None”.
updateus_president
set president_party = 'None'
where president_party is null;

This changes George Washington’s political party from null to the text
“None”.

Now let’s change the row for Lyndon Johnson to include his middle name.

updateus_president
set president_name = 'Lyndon Baines Johnson'
where president_name = 'Lyndon Johnson';

This updates the row for Lyndon Johnson and sets his president_name to
Lyndon Baines Johnson. We can see that the update worked by querying the
table:

select*
from us_president
where president_id = 36;

president_id president_name president_party


36 Lyndon Baines Johnson Democratic

One thing to be aware of is that the update statement will update as many
rows as meet the criteria in the “where” clause. In this case, there was only
one row in the table with a president_id of 36, so only that row was updated.

But let’s imagine that we forgot to include the where clause in our update
statement. Instead of running this SQL statement:

updateus_president
set president_name = 'Lyndon Baines Johnson'
where president_name = 'Lyndon Johnson';

By mistake, we ran this one:

updateus_president
set president_name = 'Lyndon Baines Johnson';
Now instead of updating one row in the table, we would update all the rows
in the table! Let’s look at the table now:
select * from us_president;

president_id president_name president_party


1 Lyndon Baines Johnson None
2 Lyndon Baines Johnson Federalist
3 Lyndon Baines Johnson Democratic-Republican
4 Lyndon Baines Johnson Democratic-Republican
5 Lyndon Baines Johnson Democratic-Republican
6 Lyndon Baines Johnson Democratic-Republican
8 Lyndon Baines Johnson Democratic
9 Lyndon Baines Johnson Whig
… … …

We have mistakenly changed every president’s name to Lyndon Baines


Johnson!

So be aware that update statements can update more than one row. It will
update as many rows as the where clause specifies. If there is no “where”
clause at all, all rows in the table will be updated.

Deleting Data
To delete data from database tables, we usually use the “delete” statement.
There is also a “truncate” statement that we can use if we want to quickly
remove all rows in a table.

The Delete Statement


Let’s say we wanted to remove the row for supplier 500 from the supplier
table. We can use the delete keyword.
delete from supplier
where supplier_id = 500;

If there were no supplier 500 in the table to begin with, the delete statement
would not display an error message. It would simply delete zero rows.

To delete all rows in a table, you can use the “delete” syntax without a
“where” clause.
delete from supplier;

This will delete all the rows in the table, but it will not drop the table. The
table will still exist, but there will be no rows of data in it. If you want to
remove the entire table, you can use the “drop table” statement instead.

Like the “insert” statement, “delete” will remove as many rows that match
the criteria in the where clause. “Delete” can also delete many rows in a
single SQL statement. This one delete statement will delete every Republican
in the table:
delete from us_president
where president_party = 'Republican';

The Truncate Statement


A faster way to delete all rows from a large table is to use the “truncate”
statement.
truncate supplier;

Truncate has the same effect as deleting all rows from a table with the
“delete” statement. It will delete all the rows in the table, but will not drop the
table. The table will still exist, but there will be no rows in it. The difference
is that “truncate” is faster because of the way Postgres implements “truncate”
behind the scenes.
3
MORE DATABASE OBJECTS

Tables are by far the most-used database objects, but there are others we will
use as well, like views, materialized views, sequences, functions, and
triggers.

You can name your objects using the same rules that we used for tables.
Their names must be 63 characters or fewer, and they need to start with a
letter or an underscore. I like to name my database objects with a prefix that
indicates what type of objects they are. For example:

dog A table containing data about dogs


v_dog A view containing data about dogs
mv_dog A materialized view containing data about dogs
f_dog( ) A function that processes data about dogs
tr_dog A trigger that processes data about dogs

These aren’t industry-standard naming conventions, just something I like to


do. If you are maintaining an existing codebase, you should continue using
whatever naming conventions were established when the database was
created.

VIEWS
A “view” in Postgres is a saved SQL statement that you can query like it’s a
table. To create a view, use the “create view” statement.

create view v_democratic_president as


select president_id,
president_name
from us_president
where president_party = 'Democratic';

Now that you have defined the v_democratic_president view, you can query
it as if it were a table.
select * from v_democratic_president;

president_id president_name
7 Andrew Jackson
8 Martin Van Buren
11 James Polk
14 Franklin Pierce
… …

You can also use a “where” clause when selecting from a view.

select*
from v_democratic_president
where president_id = 7;

president_id president_name
7 Andrew Jackson

Each time we select from a view, Postgres runs the query that we defined in
our “create view” statement. That query typically selects rows from a
database table. That means that if we make a change to the data in our
underlying database table, the change will also appear when we select from
the view. For example, when we selected from the v_democratic_president
view for president_id 7, Postgres ran the query that we defined when we
created the v_democratic_president view.

create view v_democratic_president as


select president_id,
president_name
from us_president
where president_party = ‘Democratic’;

That caused Postgres to return a set of Democratic presidents including


president_id 7. But now let’s see what happens if we change the political
party of president_id 7 to a Republican in the underlying us_president table.
updateus_president -- This is the table that the view is based on
set president_party = 'Republican'
where president_id = 7;

Now if we run our select statement against the view again, we see that
president_id 7 is not returned. This query returns no rows:

select*
from v_democratic_president
where president_id = 7;

That makes sense because this view is supposed to return only Democratic
presidents. We can see that the changes to the underlying table took effect
when we queried the view.

Using Views to Hide Sensitive Data


Views can be also be handy for making sure sensitive data is seen only by the
right users. For example, let’s say we have a table called employee:
select * from employee;

employee_id employee_name title salary


1 Dave Nelson CEO $980,000.00
2 Rob Field Sales Manager $349,000.00
3 Cindy Rhodes Software Developer $574,000.00

We might want all users to be able to see the employee_id, employee_name,


and title columns, but we only want Human Resources to be able to see the
salaries.

We could create a view called v_employee that has all of the columns except
for salary.

create view v_employee as


select employee_id,
employee_name,
title
from employee;

When we query the new v_employee view, we see that it has been created
correctly and that it has the information we intended.
select * from v_employee;

employee_id employee_name title


1 Dave Nelson CEO
2 Rob Field Sales Manager
3 Cindy Rhodes Software Developer

Now we have a table called employee that includes salary data, and a view
called v_employee that does not include salary data. When data changes in
the employee table, the v_employee view will automatically reflect those
changes, so we’ll only have to make our changes in one place.

Now we can give users in the Human Resources department access to the
employee table, and give all other users access to the v_employee view.

MATERIALIZED VIEWS
A “materialized view” in Postgres is created by running a SQL statement, but
unlike a regular view, the results of the SQL statement are saved. To create a
materialized view, use the “create materialized view” syntax.

1create materialized view mv_democratic_president as


2select president_id,
3 president_name
4from us_president
5where president_party = 'Democratic';

The results of the query in lines 2 – 5 are saved in the


mv_democratic_president materialized view. You can query a materialized
view as if it’s a table.

select*
from mv_democratic_president
where president_id = 7;

president_id president_name
7 Andrew Jackson

Now let’s see what happens if we change the political party of president_id 7
to “Republican” in the underlying us_president table.

updateupdate us_president -- The underlying table


set president_party = 'Republican'
where president_id = 7;

Now if you run the select statement against the mv_democratic_president


materialized view again, you might be surprised to see that president_id 7 is
still being returned.

select*
from mv_democratic_president
where president_id = 7;

president_id president_name
7 Andrew Jackson

Even though we changed the data in the underlying us_president table, the
data in the materialized view did not get refreshed. The materialized view is
still reflecting the state of the us_president table before we changed it. Unlike
a regular view, we need to run the “refresh materialized view” SQL
command to refresh a materialized view.
refresh materialized view mv_democratic_president;

Now when we check the materialized view, president 7 does not appear. The
change we made to the underlying table is now reflected in the materialized
view.

select*
from mv_democratic_president
where president_id = 7;

Now that we have refreshed the materialized view, the query returns no rows,
as we would expect.

Materialized views are good for situations where you either want to save
point-in-time data or where the query that creates the materialized view is
complex and takes a long time to run. If the query for a view takes, say, 5
minutes to run then you may make the decision to use a materialized view
instead. You may decide to refresh a materialized view every hour rather than
using a regular view that will run the query every time somebody selects
from the view.

SEQUENCES
Sometimes we want a column in a table to contain consecutive numbers. In
the us_president table for example, the row for the first president has a
president_id of 1. The next row has a president_id of 2. Then 3, 4, 5, etc.

president_id president_name president_party


1 George Washington null
2 John Adams Federalist
3 Thomas Jefferson Democratic-Republican
4 James Madison Democratic-Republican
5 James Monroe Democratic-Republican
… … …

One way to accomplish that is to create a “sequence” that starts at 1 and


increments by 1 each time it is used.
create sequence president_sequence;

Each time a new row is added to the us_president table, we can use the
“president_sequence” sequence we just created to keep a running tally and
insert the right president_id into the table. Postgres provides a function called
nextval( ) to help us with that. Calling nextval( ) adds 1 to our sequence and
saves the new value.

Let’s assume that the us_president table is empty when we start. We can add
rows to it using the sequence and the nextval( ) function to create a value for
the president_id column.
insert into us_president (president_id, president_name,
president_party)
values (nextval(‘president_sequence’), ‘George Washington’, null);

insert into us_president (president_id, president_name,


president_party)
values (nextval(‘president_sequence’), ‘John Adams’, ‘Federalist’);
insert into us_president (president_id, president_name,
president_party)
values (nextval(‘president_sequence’), ‘Thomas Jefferson’,
‘Democratic-Republican’);

When we query the us_president table, we can see that the


“president_sequence” sequence inserted the correct values into the
president_id column in the table.
select * from us_president;

president_id president_name president_party


1 George Washington null
2 John Adams Federalist
3 Thomas Jefferson Democratic-Republican

CREATING FUNCTIONS
We saw earlier how to call functions that have been provided to us by
Postgres, but now let’s take a look at creating our own functions.

Functions are helpful when you have a complex SQL statement, or a group of
SQL statements that have several steps, and you know that you will need to
run those same SQL statements again sometime in the future. It makes sense
to save that SQL as a function so you can call it by name later. This will
prevent you from having to recreate the SQL next time, and it will make it
easy to run the SQL the next time you need it. Functions are stored in the
Postgres database, so we can write the SQL once and simply call the function
whenever we need to.

Recall that Postgres comes with functions like the upper( ) function. We saw
that calling upper( ) with a value of “Say it loud” returned the text “SAY IT
LOUD”.
select upper('Say it loud');

upper
SAY IT LOUD

Notice that when you call the upper( ) function, there are left and right parens
after the word “upper”. You can pass values to the function by putting the
values between the parens. In this case, we wrapped the value “Say it loud”
in single quotes because it is text. If the value had been a number, we
wouldn’t have used quotes. You will call the functions that you write the
same way that you call functions provided by Postgres.

Parameters and Arguments


Many developers use the words “parameters” and “arguments”
interchangeably. Technically, there is a slight difference between the two
terms. Arguments are the values that are sent to functions, and parameters are
the variables that receive the values in those functions. The main point,
however, is that we can call functions and send values to them, and we can
write functions that receive those values and do something with them.

In our “select upper(‘Say it loud’)” example, the text “Say it loud” was the
only argument passed to the upper( ) function. The upper( ) function was
written to accept one value and to do something with it. In this case, the
function takes the value sent to it and returns the uppercase version of it.

Functions can be written to accept no parameters at all, or they can be written


to take any number of parameters – even thousands of parameters. Typically,
you wouldn’t write a function that accepts more than 5 or 10 parameters.

SQL vs. PL/pgSQL


We can write simple Postgres functions using the SQL language. For more
complex functions, we can use the PL/pgSQL language. PL/pgSQL stands
for “Procedural Language/PostgreSQL”.

We have been using plain SQL up until now and it has done everything we
needed it to do, but there are times when we need more functionality than the
SQL language can offer. SQL is a non-procedural language. It can query
database tables nicely, but you can’t define variables or create loops with it.
You can’t add if/then logic to your programs with plain SQL. If you need to
write a function using a procedural language to do those types of more
advanced things, use PL/pgSQL.

Writing Functions with SQL


Let’s use SQL to write our own function called f_get_president_and_party (
). The function will take a president_id value and return the president’s name
and political party wrapped in parens.

1create or replace function f_get_president_and_party(int) returns


text as $$
2 select president_name || ' (' || president_party || ')'
3 from us_president
4 where president_id = $1;
5$$ language sql;

On line 1, we create the function. We could drop and then recreate the
function in two steps, but the “create or replace function” syntax will do that
for us in one step. If the function doesn’t exist, it will be created. If the
function already exists, it will be recreated.

We named the function f_get_president_and_party( ) and we said it takes one


parameter that is an integer. This function will accept the argument being
passed into it and return a text value.

The body of the function goes within the “$$” symbols (two dollar signs).
Surrounding text with $$ in Postgres is known as “dollar quoting”. The
function’s body goes from the $$ at the end of line 1 until the $$ at the
beginning of line 5. Also on line 5, we are specifying that this function has
been written in SQL, not PL/PgSQL.

Lines 2 – 4 are the body of the function. They contain a SQL query that
selects the president’s name, concatenates it to a space and a left paren, adds
the president’s political party, and concatenates a right paren. The effect is to
return the president’s name followed by his political party in parens. Recall
that two pipe characters (“||”) is the concatenation operator; it appends strings
together.

On line 4, “$1” represents the first argument passed into this function. In this
case there is only one argument passed in, but if there were others, we could
have referenced them as $2, $3, $4 for the second argument, the third
argument, the fourth argument, etc.
We can call the function like this:
select f_get_president_and_party(7);

and it will return:

Andrew Jackson (Republican)

The function queried the us_president table for president_id 7 and returned
the president_name and president_party columns, nicely formatted. We could
also call this function embedded in a SQL query like this:

1selectf_get_president_and_party(president_id)
2from us_president
3where president_id > 40;

George Herbert Walker Bush (Republican)


Bill Clinton (Democratic)
George W. Bush (Republican)
Barack Obama (Democratic)
Donald Trump (Republican)

In line 1, we got the president_id from the us_president table and sent it as an
argument to the f_get_president_and_party( ) function. The function returned
a president name and the political party shown within parens. On line 3, in
our “where” clause, we asked for only presidents that have a president_id
greater than 40, so the function retrieved 5 presidents from the us_president
table. The f_get_president_and_party( ) function was called 5 times, with 5
different president IDs, returning 5 rows.

Writing Functions with PL/PgSQL


Writing Postgres functions using PL/PgSQL is similar to writing them with
SQL, but PL/PgSQL has more capabilities.

Here is an example of a PL/PgSQL function that returns not only the


president’s ID, name, and party, but if that president is currently in office it
also displays the text “**Current President**”.

1 create or replace function


f_get_president_note_current(p_president_id int)
2 returns text as $$
3 declare
4 v_max_president_id int;
5 v_president_name text;
6 v_president_party text;
7 v_current_text text = ";
8 begin
9 select max(president_id)
10 into v_max_president_id
11 from us_president;
12
13 if (v_max_president_id = p_president_id) then
14 v_current_text = '**Current President**';
15end if;
16
17 select president_name,
18 president_party
19 into v_president_name,
20 v_president_party
21 from us_president
22 where president_id = p_president_id;
23
24 return 'President ID: ' || p_president_id || ' Name: ' ||
v_president_name ||
25 ' Political Party: ' || v_president_party || ' ' ||
v_current_text;
26
27end;
28$$ language plpgsql;

In line 1, we defined the f_get_president_note_current( ) function using the


“create or replace” syntax. We are telling Postgres that if this function
doesn’t exist to create it, and if it does exist to replace it with this new code.

In line 1, we also defined the function to take one parameter called


p_president_id. I like to name parameters starting with “p_” for clarity, but
this isn’t required.

In line 2, we said that this function returns text. Then the function’s body
goes from the “$$” symbol at the end of line 2 until the other “$$” on line 28.

In lines 3 – 7, we are declaring variables to be used in the function. I like to


name variables starting with “v_” for clarity, but again, this isn’t required. On
line 7, we declared the v_current_text variable and we also set its value to an
empty string.

The “begin” on line 8 goes with the “end” on line 27. This groups those lines
together in a code block. Notice that the “end” in line 27 has a semi-colon
after it, but the “begin” on line 8 does not.

In lines 9 – 11, we are putting a value in the v_max_president_id variable


from the database. We select the maximum president_id from the
us_president table, which is 45, and put that value into the
v_max_president_id variable.

On line 13, we have an “if statement”. The “if” on line 13 goes with the “end-
if” on line 15. If the value of the v_max_president_id variable is the same as
the value of the p_president_id parameter then we set the v_current_text
variable to the text “**Current President**”. If the value of the
v_max_president_id variable is not the same as the value of the
p_president_id parameter then we don’t set v_max_president_id, so it
remains equal to the empty string we set it to in line 7. Notice that the “end-
if” on line 15 has a semi-colon after it, but the “if” on line 13 does not.

Lines 17 – 22, we get the president_name and president_party from the


Postgres database and write them to our v_president_name and
v_president_party variables.

In line 2, we promised that this function “returns text”. In line 24, we actually
return that text. We return a text value that is created by combining the
p_president_id parameter and the v_president_name, v_president_party, and
v_current_text variables formatted with punctuation. Line 24 is a long line so
we continued it on line 25. Postgres knows we are done with the line when it
sees the semi-colon at the end of line 25.

Let’s test the function by calling it with an argument of 33 and then again
with an argument of 45.
select f_get_president_note_current(33);
President ID: 33 Name: Harry Truman Political Party: Democratic

select f_get_president_note_current(45);

President ID: 45 Name: Donald Trump Political Party: Republican


**Current President**

We know the function is working because the data appears in the format we
expected and the current president shows the text “**Current President**”.

Return Types
Functions can return a value of any Postgres data type. The example we just
saw returned a text value, but functions can also return other data types like
int, date, or boolean.

Functions can also return nothing. To define a function that doesn’t return a
value, declare the function to return “void”.
create or replace function f_no_return_type(p_president_id int)
returns void as $$
begin
insert into some_table (president_id)
select p_president_id;
end;
$$ language plpgsql;

Dollar Quoting with Tags


We have seen how dollar quoting is used when creating Postgres functions.
The examples we have seen show how we can use the “$$” symbol to mark
the beginning and the end of the function body. You are also allowed to add a
“tag” between the dollar signs. A tag is a small piece of text. You may see
functions that use text between the dollar signs, for example the following
function uses the text “funct” between the dollar signs on lines 2 and 7.

1create or replace function f_no_return_type(p_president_id int)


2returns void as $funct$
3begin
4 insert into some_table (president_id)
5 select p_president_id;
6end;
7$funct$ language plpgsql;
Some developers like to use tags because they feel that it makes the code
more readable. If you plan to use a tag, the text within the dollar signs must
be the same on lines 2 and 7.

Parameters by Position vs Name


When we write a function, we can specify the parameters that the function
will accept, like this:

create or replace function f_get_president(


p_president_id int,
p_party text) ...

This function takes two parameters: an integer called p_president_id and a


text field called p_party. When we call the f_get_president( ) function, we
can pass arguments to it by putting the arguments between commas and
within parens.
select f_get_president(33, 'Republican');

Postgres knows that we want to send the value 33 into the p_president_id
parameter and the text “Republican” into the p_party parameter. It knows that
because of the order of the parameters and arguments. When we called the
function, we provided 33 first and when we created the function, we accepted
p_president_id first. So Postgres knows that we want 33 to be the
p_president_id value. This is called “positional notation”.

Alternatively, we can pass arguments to a function using “named notation”.


When we call the function, we can provide the name of the parameters with
the values that we want them to have.
select f_get_president(
p_president_id => 33,
p_party => 'Republican'
);

The equal sign followed by the greater than sign (“=>”) is intended to look
like an arrow, as if to say p_president_id points to 33 and p_party points to
“Republican”.
Using this named notation, we could switch the order that we provide the
arguments in, and the function would still work. Because we are specifying
the name of the parameter for each value, calling the function with the
arguments in a different order will work just as long as the correct parameter
name is still associated with the correct value.
select f_get_president(
p_party => 'Republican',
p_president_id => 33
);

Named notation is especially helpful when a function takes many arguments.


Using named notation helps to make it easier to see which parameter takes
which value.

Parameter Modes: In, Out, Inout, and Variadic


A function parameter can have one of four possible modes: in, out, inout, and
variadic. The default mode is “in”. All of the parameters we have seen so far
have been “in” parameters. Since we didn’t specify “in”, “out”, “inout”, or
“variadic” mode, Postgres assumed they were “in”.

We can use an “in” parameter to pass a value into the function. We can use
an “out” parameter to return a value from the function. An “inout” parameter
is used when we want to pass an argument into a function, change it, and then
return it. To send an array of values to a function, we can use “variadic”
mode.

For example, here is a function called f_multiply_by_5( ) that multiplies a


number by 5.

1create or replace function f_multiply_by_5(


2 in p_in int,
3 out p_out int
4) as $$
5begin
6 select p_in * 5
7 into ;p_out;
8end;
9$$ language plpgsql;
select f_multiply_by_5(3);

f_multiply_by_5
15

In line 2, we defined p_in as an “in” parameter. In line 3, we defined p_out as


an “out” parameter. Both parameters are integers.

On line 4, we didn’t need to say “returns int” because we defined p_out as an


integer with a mode of “out” in line 3. Postgres already knows to return an
integer from this function.

In lines 6 and 7, we multiplied the p_in parameter by 5 and put the resulting
value into p_out. The p_out parameter gets returned from the function
because we declared its mode as “out” in line 3.

We could rewrite that function using just one “inout” parameter, like this:

1create or replace function f_multiply_by_5(


2 inout p_inout int
4) as $$
5begin
6 select p_inout * 5
7 into p_inout;
8end;
9$$ language plpgsql;

select f_multiply_by_5(3);

f_multiply_by_5
15

Now the function accepts one parameter called p_inout on line 2. The value
of p_inout gets changed on lines 6 and 7 and is automatically returned to the
caller of the function because we declared its mode as “inout” on line 2.

Variadic mode is used less commonly than the in, out, and inout modes.
Variadic mode lets us pass an array into a function. An “array” is a group of
values that all have the same data type. Let’s write a function that accepts an
array of integers and multiplies them together.

1 create or replace function f_multiply_together(variadic p_nums


int[])
2 returns int as $$
3 declare
4 v_tot int = 1;
5 v_ind int;
6 begin
7 foreach v_ind in array p_nums loop
8 v_tot = v_tot * v_ind;
9 end loop;
10 return v_tot;
11end;
12$$ language plpgsql;

select f_multiply_together(3,5,2);

f_multiply_together
30

In line 1, we defined the f_multiply_together( ) function and declared one


parameter called “p_nums”, and set its mode to “variadic”. The “int[]” syntax
means that this parameter is an array of integers.

In line 2, we promised that this function will return an integer, which in this
case will be all the parameter values multiplied together.

In lines 4 and 5, we declared two variables. We will use the “v_tot” variable
to calculate the product of the parameter values. We will use the “v_ind”
variable for each individual value.

In lines 7 - 9, we are looping through each value in p_nums. The v_ind


variable is assigned to each value, one value at a time. In line 8, we take the
value of our v_tot variable and multiply it by the value in the v_ind variable.

When we call the function with arguments of 3, 5, and 2, the function loops
through lines 7 – 9 three times. The first time v_ind is 3. The second time
v_ind is 5. The third time v_ind is 2.
The v_tot variable started out as 1 because we defined it as 1 in line 4. The
first time we looped through lines 7 – 9, v_tot became 3 because 1 times 3 is
3. The second time we looped through, v_tot became 15 because 3 times 5 is
15. The third time we looped through, v_tot became 30 because 15 times 2 is
30. When the function was finished looping through the parameter values, it
returned the value of 30 in line 10.

We passed three values into the function, but because the function was
written to accept an array of integers, it can accept any number of integer
values.
select f_multiply_together(1,2,3,4,5);

f_multiply_together
120

select f_multiply_together(12,4,9,8,1,333,1,87,5,2);

f_multiply_together
1001237760

PROCEDURES
Procedures, sometimes called “stored procedures”, are similar to functions.
Procedures were added to Postgres fairly recently, in version 11. In the real
world, I see functions being used more frequently than procedures in
Postgres.

We saw that you can call a function by using the “select” keyword, like this:
select f_multiply_by_5(3);

To call a procedure, use the “call” keyword instead:


call proc_multiply_by_5(3);

Writing a procedure is similar to writing a function.

1 create or replace procedure proc_multiply_by_5(


2 inout p_inout int
3 ) as $$
4 begin
5 select p_inout * 5
6 into p_inout;
7 end;
8 $$ language plpgsql;

In line 1, we created the procedure using the “create or replace procedure”


statement.

In line 2, we defined an integer parameter called p_inout that has a mode of


“inout”. Parameters for procedures can have one of three modes: in, inout, or
variadic. There is no “out” mode for procedures.

TRIGGERS
You can use a trigger to automatically run a Postgres function when some
event occurs. For example, when a new row gets inserted into a particular
table, you can have a trigger “fire” and call a function to insert a row in
another table.

Anytime you think “wouldn’t it be nice if when X happened in the database


that Y automatically happened too”, think of triggers.

Audit Triggers
One common use of triggers is for keeping track of which users have made
changes to a table. In cases where we have several different database users
making changes to data in an important table, it can be useful to create a
second table where we can log which user made which changes.

When new rows get inserted into the us_president table, let’s track which
user inserted them. To do that, we will create three new database objects:

A new table called us_president_insert_audit that will store an audit


trail of the inserts made to the us_president table.
A new trigger called tr_us_president_insert_audit that will fire when a
new row is inserted into the us_president table. This trigger will call a
function.
A new function called tf_us_president_insert_audit( ) that the trigger
will call to insert the audit information into the
us_president_insert_audit table.

When a user inserts a row for a new president in the us_president table, the
trigger will automatically fire and call the function. The function will write a
row to the us_president_insert_audit table. The audit table will store the name
of the user who inserted the row into the us_president table as well as what
time that row was inserted.

Creating the Audit Table


1create table us_president_insert_audit (
2 insert_user text,
3 insert_date timestamp with time zone,
4 president_id int,
5 president_name text,
6 president_partytext
7);

In line 2, we are creating a column in the audit table called “insert_user” to


track the user who inserted the row into the us_president table.

In line 3, we create a column called “insert_date” to track the date that the
new row was inserted into the us_president table.

In lines 4 – 6, we created columns in the audit table that will track the
president_id, president_name, and president_party columns that were inserted
into the us_president table. The same values for the columns that the user
inserted into the us_president table will be written to these columns in the
us_president_insert_audit table.

Creating the Trigger Function


Now we need to create a function that will insert a row into the
us_president_insert_audit table when a row is inserted into the us_president
table. This type of function is called a “trigger function”. A trigger function is
a function that doesn’t take any parameters and returns “trigger”. Let’s name
the trigger function tf_us_president_insert_audit( ).
1 create or replace function tf_us_president_insert_audit( )
2 returns trigger as $$
3 begin
4 insert into us_president_insert_audit (
5 insert_user,
6 insert_date,
7 president_id,
8 president_name,
9 president_party
10)
11values
12(
13 current_user,
14 now(),
15 new.president_id,
16 new.president_name,
17 new.president_party
18);
19return new;
20end;
21$$ language plpgsql;

In line 1, I named the function starting with the text “tf_” for “trigger
function”, but this isn’t necessary. Postgres will allow you to pick your own
naming conventions. You can see there are no parameters defined between
the parens. For trigger functions, you don’t define any parameters.

You can see in line 2 that the return type is “trigger”.

In line 13, we got the user’s username by calling the “current_user” Postgres
function. You can test the current_user function by selecting your own
username from the database, like this:
select current_user;

current_user
Rick

In line 14, we got the date and time that the row was inserted by using the
Postgres now( ) function. By the way, you can test this function by selecting
the current date and time where you are.
select now();
now
2020-12-23 10:29:44.216481-04

We use the “new” syntax in lines 15 – 17, for example “new.president_id”.


The “new” tells us that this is the new president_id value that was inserted
into the us_president table.

Creating the Trigger


Now let’s create the trigger that will fire and call our
tf_us_president_insert_audit( ) function.
1 create trigger tr_us_president_insert_audit
2 after insert on us_president
3 for each row
4 execute function tf_us_president_insert_audit();

In line 1, I named the trigger tr_us_president_insert_audit, where “tr” stands


for “trigger”.

In line 2, we specified that the trigger should fire “after insert”. That is, after
the insert has been made to the us_president table. When a user inserts a row
into the us_president table, the row will be written to us_president first and
then the audit record will be written to the us_president_insert_audit table
afterwards. Our options are “before”, “after”, or “instead of”. “Instead of”
means don’t insert the record to the us_president table at all; instead run the
function.

On line 3, we said “for each row”. That means for every row we insert into
the us_president table, the tf_us_president_insert_audit( ) function will be
called once. Our other option is “for each statement”. Consider this insert
statement:

insertinto us_president (
president_id,
president_name,
president_party
)
selectpresident_id,
president_name,
president_party
from a_table_with_10_rows;

This one insert statement is inserting 10 rows into the us_president table. If
our trigger specified “for each row”, it would call the
tf_us_president_insert_audit ( ) function 10 times. On the other hand, if we
specified “for each statement” in the trigger, it would call the
tf_us_president_insert_audit( ) function just once. Depending upon what you
are trying to accomplish, there are times that you might want to choose “for
each statement” or “for each row”.

Watching it Work
Now that the trigger is in place, any new rows that get inserted to the
us_president table will get audited to the us_president_insert_audit table. For
example, a user might add a new row to the us_president table using this
insert statement:

insertinto us_president (
president_id,
president_name,
president_party
)
values
(
46,
'Hillary Clinton',
'Democratic'
);

We might see this row in the us_president table and wonder who added that
data. Because we have set up an audit trigger, we can query the
us_president_insert_audit table to find out.

insert_user insert_date president_id president_name president_party


HRC 2017-01-21 46 Hillary Democratic
14:39:44.184637- Clinton
04

The audit trail shows that a database user named “HRC” created the row in
the us_president table on 1/21/2017 at 2:39pm Eastern Daylight Time. That
row of data doesn’t appear to be correct, so we might want to pay a visit to
the HRC user and possibly revoke the user’s database privileges. We should
probably also delete that row from the us_president table.

Auditing Updates and Deletes


Triggers have a lot of flexibility, and there is more we can do with them. We
looked at an audit trigger that tracked when users inserted rows into a table,
but you can also audit when users update and delete rows as well. We saw
how the trigger was defined for inserts:
create trigger tr_us_president_insert_audit
after insert on us_president
for each row
execute function tf_us_president_insert_audit();

We could write another trigger for deletes:


create trigger tr_us_president_delete_audit
after delete on us_president
for each row
execute function tf_us_president_delete_audit();

And we could write another trigger for updates:


create trigger tr_us_president_update_audit
after update on us_president
for each row
execute function tf_us_president_update_audit();

We could even write one trigger that fires for inserts, updates, and deletes.
create trigger tr_us_president_audit
after insert or update or delete on us_president
for each row
execute function tf_us_president_audit();

Special Variables for Triggers


Trigger functions can make use of some special variables that regular
functions cannot. The most commonly-used ones are “new”, “old”, and
“tg_op”.

If you are writing a trigger to track rows that have been updated in a table, it
is helpful to know what the original values were and what the new values are.
For example, if a user runs this update statement:

updateus_president
set president_name = 'Harry S. Truman'
where president_name = 'Harry Truman';

The original president name, old.president_name, would be “Harry Truman”


and new.president_name would be “Harry S. Truman”. We could use those
variables in our trigger function to write a message like “Harry Truman was
changed to Harry S. Truman” to our audit table.

We already saw “new” used in our tf_us_president_insert_audit( ) function


trigger in lines 15 – 17:
new.president_id,
new.president_name,
new.president_party

Since tf_us_president_insert_audit( ) is used to track inserts, it will have


“new” values but not “old” values. That is because for inserts we are tracking
rows that are being inserted that never existed before.

The special variable “tg_op” tells us if a row was inserted, updated, or


deleted. If you want to write one trigger function that will be called for
inserts, updates, and deletes, it is helpful to know which command caused the
trigger to fire. You can add “if statements” to your trigger functions and take
different actions based on whether a row was inserted, deleted, or updated.
if (tg_op = 'INSERT') then
...
end if;
if (tg_op = 'DELETE') then
...
end if;
if (tg_op = 'UPDATE') then
...
end if;

Altering Triggers
If we wanted to rename a trigger, we can use the “alter trigger” command.
alter trigger tr_us_president_insert_audit on us_president
rename to tr_us_pres_ins_aud;

Here we renamed the tr_us_president_insert_audit to tr_us_pres_ins_aud.

Dropping Triggers
To remove a trigger, we can use the “drop trigger” syntax.
drop trigger tr_us_president_insert_audit on us_president;

Notice that we need to tell Postgres that the trigger was associated with the
us_president table. If we are sure we won’t need them anymore, we could
also drop the other database objects that we created for use with the trigger.
drop function tf_us_president_insert_audit();

drop table us_president_insert_audit;

Disabling Triggers
If you have a trigger that you know you will not need again, you can drop the
trigger. Dropping the trigger will remove it permanently. On the other hand,
if you want to keep the trigger but prevent it from firing temporarily, you can
disable the trigger and reenable it when you want it to start firing again.
alter table us_president
disable trigger tr_us_president_insert_audit;

Notice that to disable a trigger, you alter the table that it is associated with.

Enabling Triggers
After you have disabled a trigger temporarily, you can enable it again by
using the “alter table” command.
alter table us_president
enable trigger tr_us_president_insert_audit;

Disabling a trigger temporarily and reenabling it later could be useful in a


situation where you need to load some data into a table, but for this load you
don’t want the trigger to audit the inserts.
4
BUILT-IN FUNCTIONS

As mentioned, Postgres comes with lots of functions that you can call from
your SQL statements. There are functions to transform strings, do
mathematical calculations, get today’s date, and perform all manner of other
helpful operations.

Postgres comes with over 2,000 functions. You won’t need to use – or even
know about – most of them. The majority of the functions that Postgres
provides are not commonly-used and are not shown here. Most of them can
be found on the Postgres website if you do ever have occasion to use them.

Let’s take a look at some of the built-in Postgres functions that you are most
likely to use.

AGGREGATE FUNCTIONS
We have already talked about aggregate functions. You will use these
functions frequently:

count()Returns a count of rows


max() Returns the maximum, or largest, of a set
min() Returns the minimum, or smallest, of a set
avg() Returns the average of a set of numbers
sum() Returns the sum of a set of numbers

ABS()
The abs( ) function returns the absolute value of a number. “Absolute value”
measures how far away from zero a number is.
selectabs(8),
abs(-8);

abs abs
8 8

The results show that the absolute value of 8 is 8, and the absolute value of -8
is also 8.

Consider the following table that contains a list of meteorologists, their most
recent forecasted temperature, and the actual temperature:

meteorologist forecasted_high actual_high


Gail Winds 74 80
Sunny Cloudfoot 77 86
Storm Warning 59 38

Which meteorologist made the closest prediction? We can use the abs( )
function to help find out:

selectmeteorologist,
abs(forecasted_high – actual_high) degrees_off
from weather_forecast_vs_actual;

meteorologist degrees_off
Gail Winds 6
Sunny Cloudfoot 9
Storm Warning 21

We can see that Gail Winds was the most accurate with only 6 degrees
difference between her forecasted temperature and the actual temperature.
Sunny Cloudfoot was 9 degrees off, and Storm Warning’s forecast was
wrong by a whopping 21 degrees.

We subtracted the actual_high from the forecasted_high and then got the
absolute value of that number. If we had not used the abs( ) function, the
results would have been -6, -9, and 21. The abs( ) function had the effect of
removing the minus signs, which was useful for our purposes because we
don’t care whether the forecasted temperature was higher or lower than the
actual temperature.
We used a column alias of “degrees_off” so that the column header of the
result set would be more descriptive.

You can call Postgres functions in the “select list” of your SQL statements, as
shown above, and you can also call functions in the “where clause” of your
SQL statement, as shown below:

select*
from weather_forecast_vs_actual
where abs(forecasted_high – actual_high) < 10;

meteorologist forecasted_high actual_high


Gail Winds 74 80
Sunny Cloudfoot 77 86

UPPER()
The upper( ) function takes a string and returns an all-uppercase version of it.
select upper('MiXed');

upper
MIXED

The upper function can be helpful in all sorts of scenarios. For example, you
may have two tables that were loaded from different data sources that use
different formatting rules for cities.
select * from city;

city state
Boston Massachusetts
Raleigh North Carolina
Sacramento California

select * from yearly_precipitation;

city_name precipitation
BOSTON 47.1
RALEIGH 50.9
SACRAMENTO 21.4
If you were to join the tables, no rows would be returned because the cities
are all in uppercase in the yearly_precipitation table and therefore they
wouldn’t match the cities in the city table:

1selecta.city_name,
2 a.state,
3 b.precipitation
4from city a
5join yearly_precipitation b
6on a.city_name = b.city_name;

The query returns no rows, but if we change the query to join using the
upper( ) function on line 6, we get the expected results.

1selecta.city_name,
2 a.state,
3 b.precipitation
4from city a
5join yearly_precipitation b
6on upper(a.city_name) = upper(b.city_name);

city_name state precipitation


Boston Massachusetts 47.1
Raleigh North Carolina 50.9
Sacramento California 21.4

In this case, we actually didn’t need to use the upper( ) function on the cities
in the yearly_precipitation table because they were already in uppercase
letters, but I recommend using upper( ) for both tables just to make sure you
are comparing apples to apples. In the future there could be cities in the
yearly_precipitation table that are not in uppercase letters.

By the way, it’s not a great idea to have different tables in your database that
represent cities in different ways. In a real, production database, we would
pick some standard format for cities and eliminate the need to call upper( ) in
our where clause. We might use upper( ) to help build our real tables based
on raw data tables that were loaded from some external source.

LOWER()
The lower( ) function does the opposite of what the upper( ) function does.
The lower( ) function takes a string and returns the all-lowercase version of it.
select lower('MiXed');

lower
mixed

The lower( ) function can also be used to join values that are in different
formats, in the same way that we used the upper( ) function above.

1selecta.city_name,
2 a.state,
3 b.precipitation
4from city a
5join yearly_precipitation b
6on lower(a.city_name) = lower(b.city_name);

city_name state precipitation


Boston Massachusetts 47.1
Raleigh North Carolina 50.9
Sacramento California 21.4

In this case, we are matching “boston” to “boston” in line 6, instead of


matching “BOSTON” to “BOSTON” as we did when we used the upper( )
function. As long as we remember to convert the values on both sides of the
equal sign using upper( ) or lower( ), we will get our expected results.

INITCAP()
The initcap( ) function returns the first letter of each word as a capital letter
and all the other letters as lowercase. In other words, the initcap( ) function
takes the initial letter of each word and capitalizes it.
select initcap('george HERBERT wAlKeR BuSh');

initcap
George Herbert Walker Bush

Another example:
select initcap('e e cummings');
initcap
E E Cummings

ROUND()
The round( ) function takes a number with a decimal point and returns the
number rounded to its closest integer value.
select round(3.1415);

round
3

select round(3.5141);

round
4

In the examples above, we supplied one argument to the round( ) function.


3.1415 was rounded to 3, and 3.5141 was rounded to 4. You can also send a
second argument to the round( ) function if you want to round to a digit that
is after the decimal point. The second argument specifies how many places
after the decimal point you want to round to.
select round(3.49999, 2);

round
3.50

This is useful when creating reports or displaying values to users who want to
see numeric values in a consistent format.

TRUNC()
The trunc( ) function truncates a number to a certain number of decimal
places. By default, it truncates all of the digits after the decimal point.
select trunc(3.1415);

trunc
3
select trunc(3.5141);

trunc
3

We got different results using the trunc( ) function than we did with the
round( ) function. When we called “trunc(3.5141)” it returned 3 because it
truncated – or chopped off – the “.5141”. The round( ) function, on the other
hand, rounded 3.5141 up to 4.

You can send the trunc( ) function a second parameter to tell it how many
places after the decimal point to keep:
select trunc(3.5141, 2);

trunc
3.51

Here it kept the 2 digits after the decimal point and truncated everything to
the right of that.

CEIL()
The ceil( ) function selects the closest integer that is more than the argument
passed in.
select ceil(3.1415);

ceil
4

select ceil(-3.1415);

ceil
-3

FLOOR()
The floor( ) function selects the closest integer that is less than the argument
passed in.
select floor(3.1415);

floor
3

select floor(-3.1415);

floor
-4

LENGTH()
The length( ) function counts the number of characters in a string.
select length('A long string');

length
13

Let’s get a list of the lengths of the cities in our city table:

selectcity_name,
length(city_name)
from city;

city_name length
Boston 6
Raleigh 7
Sacramento 10

Those results make sense because “Boston” has 6 letters, “Raleigh” has 7
letters, and “Sacramento” has 10 letters.

One of the aspects of Postgres functions that makes them so useful is that you
can string them together. Earlier we saw that we can use the max( ) function
to get the maximum, or highest, value. Let’s use the max( ) function together
with the length( ) function to see what the longest length for our cities is:

selectmax(length(city_name))
from city;
max
10

We selected each city_name from the city table and wrapped it in the length(
) function. Then we wrapped that in the max( ) function. At the end of the
first line, there are two right parens. The first right paren ends the call to the
length( ) function. The second right paren ends the call to the max( ) function.

The results show that the longest city name has 10 characters. That makes
sense because our longest city name, Sacramento, has 10 characters.

In a similar way, we could use the min( ) function to get the shortest
character length for our cities:

selectmin(length(city_name))
from city;

min
6

The number 6 is returned because the shortest city name in our table, Boston,
has six characters.

SUBSTR()
The substr( ) function takes a string and returns some part of it. It returns a
substring.

The substr( ) function takes a string, the position of the character in the string
that you want to start at, and the number of characters that you want to select
from the string.

Let’s say that we have a table that stores barcode numbers.


select * from barcode;

code
811399100131
096244031005
789661001783
Barcodes are typically comprised of several parts. In our example, we want to
extract the manufacturing code, which starts on the 2nd character and is 5
characters long.

selectsubstr(code, 2, 5) mfg_code
from barcode;

mfg_code
11399
96244
89661

Calling “substr(code, 2, 5)” got us the values from the “code” column starting
at position 2 and going for 5 characters. We used a column alias of
“mfg_code” so that the results would display nicely under the heading
“mfg_code”.

If you want to get the string starting at a character position and going all the
way to the end of the string, you can leave off the 3rd argument to the substr(
) function:

selectcode,
substr(code, 9)
from barcode;

Here we are selecting the part of the “code” column that goes from the 9th
position all the way until the end of the value.

code substr
811399100131 0131
096244031005 1005
789661001783 1783

TRIM()
The trim( ) function returns a string with any extra characters at the beginning
and end of a string removed.

Sometimes when you load data from files into a database, the fields can have
extra spaces at the beginning or end of them. In the raw_data_load table
below, each of the column values have some number of spaces at both the
beginning and end.
select * from raw_data_load;

col1 col2 col3


" Apple " " Penny " " String "

I have included quotes here to highlight where the leading and trailing spaces
are.

If we use the trim( ) function, it will remove the spaces from both ends of
each string:

selecttrim(col1),
trim(col2),
trim(col3)
from raw_data_load;

col1 col2 col3


"Apple" "Penny" "String"

The trim function also allows you to remove characters other than the space
character. You can specify which character you want to remove. Let’s
remove the letter “k” from the beginning and end of the word “kayak”.
select trim('kayak', 'k');

The result is:

aya

LTRIM()
The ltrim( ) function returns a string with the extra characters at the
beginning of a string removed. It trims characters from the left of the string.
Like the trim( ) function, the default character to remove is the space
character.
select ltrim (' Penny ');
This function returns “Penny ” with the spaces removed from the left side,
but not from the right side.

You can also specify which character you want to remove from the left
instead of removing the space characters. Let’s remove the leading zeroes
from “0000738”:
select ltrim('0000738', '0');

ltrim
738

RTRIM()
The rtrim( ) function, as you probably guessed, returns a string with the extra
characters removed from the right side of a string. The default character to
remove is a space character, but you can specify other characters. Here we
trimmed the text “Massachusetts” from the right side of the text “Boston
Massachusetts”:
select rtrim('Boston Massachusetts', 'Massachusetts');

rtrim
Boston

As mentioned, we can string functions together, taking the results of one


function and using them as an argument for another function. We can, for
example, trim “Massachusetts” from the right side of “Boston
Massachusetts” and then take the results and run them through the upper( )
function.

selectupper(
rtrim('Boston Massachusetts', 'Massachusetts')
);

upper
BOSTON

LEFT()
The left( ) function returns the first – or leftmost – characters of a string. We
can get the first 3 characters of our customers’ last names using this query:

selectlast_name,
left(last_name, 3)
from customer;

last_name left
Smith Smi
Guy Guy
Jones Jon

If we call the left( ) function with a negative number of, say -1, it will return
everything except the last 1 character:

selectlast_name,
left(last_name, -1)
from customer;

last_name left
Smith Smit
Guy Gu
Jones Jone

RIGHT()
The right( ) function returns the last – or rightmost – characters of a string.
We can get the last 3 characters of our customers’ last names using this
query:

selectlast_name,
right(last_name, 3)
from customer;

last_name right
Smith ith
Guy Guy
Jones nes

If you call the right( ) function with a negative number, say -2, it will return
everything except the first 2 characters:

selectlast_name,
right(last_name, -2)
from customer;

last_name right
Smith ith
Guy y
Jones nes

LPAD()
The lpad( ) function adds characters to the left of a value. It pads characters to
the left. You can specify the characters you want added and what the total
length should be.

Our send_check table has a column called “amt” that represents the amount
of the checks that we need to send out.

selectamt
from send_check;

amt
123
854
24

We can add asterisks to the beginning of the amount to prevent people from
fraudulently writing in their own numbers before our amounts:

selectlpad(amt, 5, '*')
from send_check;

lpad
**123
**854
***24

REPLACE()
The replace( ) function replaces some value in a string with a new value.

Let’s say we have a table of airport runways that includes runways for
“Orange County Airport”. That airport was renamed to “John Wayne
Airport”, so we want to display “John Wayne Airport” in our results instead
of “Orange County Airport”.

selectrunway_name
from runway;

runway_name
Runway 172 – Orange County Airport
Runway 213 – Orange County Airport
Runway 985 – Washington National Airport

We can use the replace( ) function to replace “Orange County Airport” with
“John Wayne Airport”:
select replace(
runway_name,
'Orange County Airport',
'John Wayne Airport'
)
from runway;

Runway 172 – John Wayne Airport


Runway 213 – John Wayne Airport
Runway 985 – Washington National Airport

Notice that the replace( ) function didn’t change the entire string, just the part
that contained “Orange County Airport”. The text “Runway 172” and
“Runway 213” is still intact.

Also notice that we have not changed the values in the table with the replace(
) function, we are just selecting these values for display. If we wanted to
actually update the runway table, we could use the replace( ) function as part
of an update statement:

updaterunway
set runway_name =
replace(
runway_name,
'Orange County Airport',
'John Wayne Airport'
)
;
This update statement will change the runways that have “Orange County
Airport” in their name but won’t affect runways that do not.

FORMAT()
The format( ) function formats values based on a format string that you
provide.

selectformat(
'President %s was a %s',
president_name,
president_party
)
from us_president
where president_id between 31 and 34;

format
President Herbert Hoover was a Republican
President Franklin Roosevelt was a Democratic
President Dwight Eisenhower was a Republican
President Harry Truman was a Democratic

The format string used here is “President %s was a %s”. The “%s” is a
placeholder for a string value. The format string in this case has two
placeholders, the first one was replaced by the president_name value from the
us_president table, and the second one was replaced by the value of the
president_party column.

EXTRACT()
The extract( ) function selects parts from a timestamp or time value.

We saw earlier that the now( ) function returns the current date and time. The
data type that the now( ) function returns is a “timestamp with time zone”. It
includes several subfields that you can access by using the extract( ) function.
select now();

now
2020-10-28 12:13:00.439298-04
We can extract the century from the current date and time:
select extract(century from now());

date_part
21

We can extract the day from the current date and time:
select extract(day from now());

date_part
28

We can extract the month from the current date and time:
select extract(month from now());

date_part
10

Other subfields that we can extract include:

dow The day of the week. 0 (Sunday) to 6 (Saturday).


doy The day of the year. 1 – 366.
hour Hour. 0 – 23.
minute Minutes. 0 – 59.
quarter The quarter of the year. 1 – 4.
second Seconds. 0-59.
timezone The time zone offset from UTC.
week Week. 1-53.
year Year expressed in 4 digits, like 2020.

EXP()
The exp() function returns the exponential of the number supplied.

select exp(1);

exp
2.718281828459045

The results show that the exponential of 1 is 2.718281828459045.

PI()
The pi( ) function returns the value of pi.
select pi();

pi
3.141592653589793

POWER()
The power( ) function returns the value of a number raised to the power of
another number.
select power(3, 2);

power
9

RANDOM()
The random( ) function returns a random number between 0 and 1.
select random();

random
0.41278845070616654

POSITION()
The position( ) function finds the position of a substring within a string.
select position('pal' in 'principal');

position
7

The results tell us that the string “pal” is found at position 7 of the string
“principal”.

If we search for the position of a substring that doesn’t exist within a string,
Postgres will return a zero:
select position('i' in 'team');

position
0

The query returned zero because there is no “i” in “team”.

Strangely, Postgres has another function called strpos( ) that does the same
thing that the position( ) function does, using slightly different syntax. The
strpos( ) function also returns the position of a substring within a string, but
with the strpos( ) function, you supply the string first and the substring
second, separated by commas:
select strpos('principal', 'pal');

strpos
7

VERSION()
The version( ) function tells you which version of Postgres you are running.
select version();

version
PostgreSQL 13.0 on x86_64-apple-darwin, compiled by Apple clang
version 11.0.3 (clang-1103.0.32.59), 64-bit

The results show that we are using Postgres version 13.


5
TIPS AND TRICKS

PGADMIN
Using PgAdmin to interact with Postgres will make life easier for you.
PgAdmin is a free and open source tool designed for Postgres. PgAdmin is a
graphical tool that runs in your web browser. PgAdmin is the Swiss Army
Knife of database tools. It has a slew of features for seemingly anything you
could ever want to do with Postgres.

You can download PgAdmin from the PgAdmin website. As of this writing,
it can be found at https://www.pgadmin.org/download/

The first time you use PgAdmin, it will ask you to create a master password.
Each time you start PgAdmin it will ask you for this password.
Connecting PgAdmin to a Postgres Server
To use PgAdmin with your Postgres dstabase servers, you will create
connections to the Postgres servers that you want to use. Here I created two
connections to two different servers. I called the connections “Server 1 –
Development” and “Server 2 – Production”.

Production Servers and Development Servers


Many times, companies will set up Postgres database servers for development
or testing purposes. These servers store test data for testing SQL queries or
for testing a set of features before they “go live” to the production
environment.

The production environment is where the real, live data is stored. Production
Postgres servers are where the data that the users of our applications see is
located.

It is important that the data in the production environment is accurate and


error-free. It is not as important that the data on our development or test
servers is perfect. The data on test and development servers is usually seen
only by developers who are using the data to test or develop new
functionality. That data is usually not visible to our end users.

For this reason, some companies will give developers access to test or
development Postgres servers, but not to production servers. The thinking is
that there is no way for a developer to make a mistake in production if he or
she doesn’t have access to the production environment.
If you do have access to both production and development Postgres database
servers, however, it’s a good idea to mark each server connection clearly in
PgAdmin. That will help to prevent you from making a mistake in the
production environment. You want to make sure you don’t, say, delete data
from the production environment thinking that you are in the development
environment.

Creating the Server Connections


To create the connections to Postgres database servers in PgAdmin, click the
“Object” pull-down menu and choose “Create” and then “Server…”.

That will take you to the “General” tab of PgAdmin’s “Create – Server”
screen. The only value you need to enter on this tab is the server connection
name. You can call the connection whatever you want, but it’s a good idea to
name it something meaningful that describes what the server is used for.
As mentioned, if you have access to production Postgres servers as well as
non-production Postgres servers, I recommend that you clearly mark which
connection is which when you set up your server connections. Here I called
this connection “Postgres Essentials – Production”.

Another way to make it clear that this is a production server is to use color.
Some PgAdmin users will use the “Background” color picker to change a
connection’s background to red for a production database server.

The connections below show “Server 1 – Development” with the default


white background and “Server 2 – Production” with a bright red background.
After you have entered a server connection name in the “General” tab, you
can go to the “Connection” tab and supply PgAdmin with the connection
information for your Postgres database server. The “General” and
“Connection” tabs are the only tabs you will typically need for entering your
server connection information. Most times, the default values will suffice on
the other tabs.
The person who set up your Postgres server, probably your Database
Administrator, can provide you with the information to enter on this screen.
The “host name/address” is the IP address or hostname of the Postgres
database server. The “port” is usually 5432. The “maintenance database” is
usually “postgres”. You will also enter your database username and
password. If you want PgAdmin to save your password for next time, you can
click the “save password” checkbox. When you click the “save” button, your
new connection will be saved and you can use PgAdmin to quickly connect
to it from now on.

Now let’s take a look at two of the most useful parts of PgAdmin: The Tree
Control pane and the Query Tool.

The Tree Control Pane


When you open PgAdmin, you will see the Tree Control pane on the left.
PgAdmin’s Tree Control shows you the database objects on your Postgres
server organized as a hierarchical “tree”. You can click on the right arrow
(“>”) symbol to expand a node and click on the down arrow (“v”) symbol to
collapse a node. For example, if we wanted to take a look at the “bookstore”
table, we could navigate to that table by expanding “Servers”, “Server 1 –
Development”, “Databases”, “Database1”, “Schemas”, “Schema1”, “Tables”,
and then “bookstore”. If you wanted to see which columns are in the
bookstore table, you could expand “columns” and store_id and store_name
would appear.

In other words, the bookstore table is on the Postgres database server with the
connection I called “Server 1 – Development”. That server has a database
called “Database1”. That database has a schema called “Schema1”. That
schema has a table called “bookstore”. That table has columns called store_id
and store_name. In the Tree Control pane, we can navigate to the bookstore
table by expanding nodes until we reach it.

The numbers shown in the parens to the right of “Servers” and “Databases”
tell us how many objects of those types there are. It says “Servers (2)”
because I have 2 server connections set up in PgAdmin. It says “Databases
(3)” because there are 3 databases under the “Server1 - Development” server.
The Tree Control pane is powerful because it lets us quickly drill down and
examine the part of the database we are interested in. It gives us a visual
sense of how our database objects are organized in Postgres, as well as what
naming conventions were used to create those objects.
The Query Tool
The Query Tool lets you type in some SQL, run that SQL in Postgres, and see
the results. To start the Query Tool, choose “Query Tool” from the “Tools”
drop-down menu.

Below, I entered the SQL “select * from us_president;” and clicked the
“execute” icon (the one shown at the top right), and the results appeared in
the lower “data output” pane. You can also press F5 instead of clicking the
execute icon to execute a query.
The Query Tool can be used for all sorts of SQL statements. You can use it to
run SQL to create tables, create functions, run procedures, create triggers, and
more.

CHANGING A TABLE’S COLUMN ORDER


If you want to make changes to an existing table, you have two options: You
can drop and recreate the table, or you can “alter” the table.

Let’s say we have a table called customer that looks like this:

customer_id first_name last_name city state zip


1 Joe Smith San Jose CA 12121
2 Sum Guy Omaha NE 49494
3 Mary Jones Merrimack NH 78787

Now imagine that the boss tells us that we have to start tracking customer
middle names. We need to add a middle name column to the customer table.
We can do that by using the “alter table” command.
alter table customer add column middle_name text;
We added a new column to the customer table called middle_name and we
used the “text” data type. Now if we take another look at the table, we see
that the middle_name column is there, and all the middle name values were
set to null by default.
select * from customer;

customer_id first_name last_name city state zip middle_name


1 Joe Smith San Jose CA 12121 null
2 Sum Guy Omaha NE 49494 null
3 Mary Jones Merrimack NH 78787 null

Altering the table worked nicely. We didn’t lose any of our data and the new
column now exists. Now we could update the middle names for existing
customers and insert middle names for our new customers and we could
move on to our next task.

But there is something bugging me about this table. It is the order of the
columns. I would expect the middle_name column to come between the
first_name and last_name columns. But in our customer table, the “alter”
command added middle_name as the last column.

In relational databases like Postgres, the order of the columns doesn’t really
matter, so the “alter table / add column” statement just adds the column as the
last column in the table. Many Postgres developers would declare victory and
leave this table the way it is, with middle_name as the last column. After all,
when we select from the table, we can specify the column order we want, like
this:

selectcustomer_id,
first_name,
middle_name,
last_name
from customer;

And we will see the columns in the order we expect them:

customer_id first_name middle_name last_name


1 Joe null Smith
2 Sum null Guy
3 Mary null Jones

Still, some developers will be bothered by having middle_name as the last


column in the table, and will want to change the order of the columns in the
table. To reorder the columns, you could drop and recreate the customer table
with a new column order. The trouble is that when you drop a table, it not
only removes the table but it also removes all the data in the table!

To reorder the columns in the customer table without losing our data, we can
copy our customer data to another table temporarily, drop and recreate our
customer table with a new column order, and then copy the saved data from
the temporary table back to our new customer table.
-- Make a copy customer table copy that includes all data
create table customer_temp as
select * from customer;

-- Drop the original customer table


drop table customer;

-- Recreate a new customer table with the new column order


create table customer
(
customer_idint,
first_name text,
middle_nametext, -- Put middle_name here
last_name text,
city text,
state text,
zip text
);

-- Put the saved data into the new customer table


insert into customer (
customer_id,
first_name,
middle_name,
last_name,
city,
state,
zip
)
select customer_id,
first_name,
middle_name,
last_name,
city,
state,
zip
from customer_temp;

-- Remove the copied data table


drop table customer_temp;

Now if you select from the customer table the column order will be updated.
select * from customer;

customer_id first_name middle_name last_name city state zip


1 Joe null Smith San Jose CA 12121
2 Sum null Guy Omaha NE 49494
3 Mary null Jones Merrimack NH 78787

DEBUGGING WITH RAISE


The “raise” statement can be a big help when you want to debug your
PL/pgSQL code. The purpose of the raise command is to report messages and
errors. There are six levels of severity that you can raise: debug, log, info,
notice, warning, and exception. All of the levels raise messages of different
priorities. The “exception” level is the most severe level, and will not only
raise a message but will stop the execution of your SQL. You can raise a
message by providing the level and a message. I like to use the “notice” level
for debugging.
raise notice 'Test Message';

Let’s take a look at a function that isn’t working properly. Let’s see if we can
figure out what the problem is. The function is supposed to compare the
number of Republican presidents with the number of Democratic presidents
and return the ratio of Democrats to Republicans. The problem is that the
function always returns 1.0. We would expect that about half of the
presidents that were either Democrats or Republicans would have been
Democrats, so the ratio should be about .5 or 50%, not 1.0 or 100%.

Here is the function:


1 create or replace function f_dem_to_rep_ratio() returns numeric
2 as $$
3 declare
4 v_num_democrats numeric;
5 v_num_republicans numeric;
6 v_dem_to_rep_ratio numeric;
7 begin
8 select count(*)
9 into v_num_democrats
10 from us_president
11 where president_party = 'Democratic';
12
13 select count(*)
14 into v_num_republicans
15 from us_president
16 where president_party = 'Republicans';
17
18v_dem_to_rep_ratio = v_num_democrats / (v_num_republicans +
v_num_democrats);
19
20 return v_dem_to_rep_ratio;
21end;
22$$ language plpgsql;

Our f_dem_to_rep_ratio( ) function takes no arguments and returns one


numeric value that is the ratio of Democrats to Republicans.

In lines 8 - 11, we are selecting the count of Democrats from the us_president
table. We put that value in a variable called v_num_democrats.

In lines 13 - 16, we are selecting the count of Republicans from the


us_president table. We put that value in a variable called v_num_republicans.

In line 18 we calculate the ratio of Democrats to Republicans by dividing the


count of the Democrats by the total of the counts for Democrats and
Republicans. In line 20 we return that ratio.

We know the function is returning 1.0 and we know that is wrong, but it’s a
little hard to look at the function and determine what the problem is. We
aren’t sure what part of the function is causing the problem. So let’s add
some raise statements to the function and see if that will help us to debug it.

1 create or replace function f_dem_to_rep_ratio() returns numeric


2 as $$
3 declare
4 v_num_democrats numeric;
5 v_num_republicans numeric;
6 v_dem_to_rep_ratio numeric;
7 begin
8 select count(*)
9 into v_num_democrats
10from us_president
11where president_party = 'Democratic';
12
13raise notice 'The count of Democrats is %', v_num_democrats;
14
15select count(*)
16into v_num_republicans
17from us_president
18where president_party = 'Republicans';
19
20raise notice 'The count of Republicans is %', v_num_republicans;
21
22v_dem_to_rep_ratio = v_num_democrats / (v_num_republicans +
v_num_democrats);
23
24raise notice 'The ratio is %', v_dem_to_rep_ratio;
25
26return v_dem_to_rep_ratio;
27end;
28$$ language plpgsql;

We added three “raise notice” statements at lines 13, 20, and 24. These
statements will tell us the numbers of Democrats and Republicans that are
being returned from the database, as well as the ratio that was calculated in
line 22.

In the “raise notice” statements, we used a percentage sign as a place holder


for the variable and we supplied the variable name. In the line:

raise notice ‘The count of Democrats is %’, v_num_democrats;

The value of the v_num_democrats variable will be shown where the percent
sign is, like “The count of Democrats is 15”.

When we run the function now, we can see the numbers being displayed from
our “raise notice” statements:
select f_dem_to_rep_ratio( );
NOTICE: The count of Democrats is 15
NOTICE: The count of Republicans is 0
NOTICE: The ratio is 1.00000000000000000000

This gives us a lot of insight into why the function is returning the wrong
ratio. The function is showing that there are zero Republicans. That can’t be
right. Now we can take a look at the part of the function that gets the number
of Republicans and determine what the problem is. The part of the function
that gets the count of Republicans looks like this:

selectcount(*)
into v_num_republicans
from us_president
where president_party = 'Republicans';

If we look at the rows of the us_president table we see that there are no rows
for ‘Republicans’ in the table. The values are stored in the database as
‘Republican’, without the “s”.

If we update our f_dem_to_rep_ratio( ) function and change ‘Republicans’ to


‘Republican’ on line 18, we can then rerun the function and see the results we
had hoped for. The ratio being returned is now 0.42857142857142857143
and the “raise notice” messages look a lot more reasonable:
NOTICE: The count of Democrats is 15
NOTICE: The count of Republicans is 20
NOTICE: Ratio = 0.42857142857142857143

Adding the “raise notice” messages helped us to figure out which part of the
function was causing the problem. That made fixing the function much
easier.

Now that we have fixed the problem, we can remove the “raise notice”
statements from the function.

ALL BALLS
One of the quirkier features of Postgres is the “all balls” syntax. “All balls” is
slang for all zeroes, since zeroes look like balls. If you want to insert all
zeroes into a column that has a time data type, you can surround the text
“allballs” in single quotes.
insert into start_time (twotz)
select 'allballs';

select * from start_time;

twotz
00:00:00

CASTING DATA TYPES


We have seen how Postgres tables have columns and those columns have
data types like integer, text, and boolean. There are times when we need to
convert a value from one data type to another data type. This is known as
“casting”.

Postgres provides us with multiple ways to cast. There is the cast( ) function,
the cast operator (“::”), and there are other functions available to help us cast
like the to_char( ), to_date( ), and the to_number( ) functions.

Here is an example of a scenario where you might want to cast from one data
type to another: Let’s say we want to see the current year and month in
“YYYY-MM” format. We don’t need the day or the time. We know that
there is a now( ) function that can get the current date for us, and we know
there is a substr( ) function that we can use to get a part of a value that we
care about. Let’s use those functions:
select now();

now
2020-10-28 12:13:00.439298-04

We only need the “2020-10” part of the date, so let’s wrap the now( )
function in the substr( ) function to select just the first 7 characters.
select substr(now(), 1, 7);
Postgres returns an error message:
ERROR: function substr(timestamp with time zone, integer, integer)
does not exist
LINE 1: select substr(now(),1,7);
^ HINT: No function matches the given name and argument
types. You might need to add explicit type casts.
SQL state: 42883
Character: 8

The problem is that the substr( ) function expects the first argument it gets
sent to be a string, and we sent it the results of the now( ) function, which is a
datetime.

The error message that Postgres gave us tells us how to resolve the problem:
“You might need to add explicit type casts.” We need to send the substr( )
function a string instead of a datetime. Let’s try calling the now( ) function
and then casting the results to the text data type before we send it to the
substr( ) function.
select substr(now()::text, 1, 7);

substr
2020-09

It worked.

The “::” symbol (two colons) is the Postgres cast operator. The syntax “now(
)::text” calls the now( ) function and then casts the results to a text data type.
Another way to accomplish the same thing is to use the cast( ) function:
select substr(cast(now() as text), 1, 7);

substr
2020-09

We got the same results. The syntax “cast(now( ) as text)” also calls the now(
) function and then casts the results to the text data type, but this time we are
using the cast( ) function instead of “::”.
With the cast function, we specify the data type that we want to cast to after
the word “as”:
select cast('123' as int);

Functions for Casting


Postgres also provides us with some other functions that we can use to cast.
The functions are to_date( ), to_timestamp( ), to_char( ), and to_number( ).

These functions take two arguments: The value that needs to be converted,
and the format to use when converting the value. The format is made up of a
set of character patterns that identify the parts of the field. This group of
formatting characters is sometimes called a “format mask”.

to_date( )
The to_date( ) function converts a value to a date. For example, if we wanted
to convert a string to a date, we might use this SQL statement:
select to_date('28 Oct 2020', 'DD Mon YYYY');

We know that the original value “28 Oct 2020” is a string because it is
surrounded by single quotes. The query will return the string “28 Oct 2020”
converted to a “date” data type.

to_date
2020-10-28

The “DD Mon YYYY” format that we used when calling the function is a set
of characters that identify which part of the string “28 Oct 2020” is the day,
month, and year. By using the format “‘DD Mon YYYY” we let Postgres
know that the “28” in our string is a 2-digit day, the “Oct” in our string is a 3-
character month that is capitalized, and the “2020” in our string is a 4-digit
year. With this information, Postgres can convert the string to a date.

Postgres provides more character patterns than you will ever need, but the
most common patterns for dates are:
DD Day of month (01–31)
D Day of the week, Sunday (1) to Saturday (7)
MM Month number (01–12)
MONTHUppercase month name
Month Capitalized month name
MON 3-character uppercase month name
Mon 3-character capitalized month name
YYYY 4-digit year
YY 2-digit year

to_timestamp( )
The to_timestamp( ) function converts a value to a timestamp. Recall that a
“timestamp” data type in Postgres contains not only a date, but also a time
and a time zone. For that reason, the to_timestamp( ) function can make use
of some character patterns that weren’t useful for the to_date( ) function. The
most commonly-used of those character patterns are:

HH Hour of day (0-12)


HH24 Hour of day (0-23)
MI Minute (00–59)
SS Second (00–59)
MS Millisecond (000–999)
AM, am, PM or pmMeridiem indicator
TZH Time zone hours

If we wanted to convert a string to a timestamp, we might use this SQL


statement:
select to_timestamp(
'28 Oct 2020 09:36:47 am',
'DD Mon YYYY HH:MI:SS AM'
);

The query will return the string “28 Oct 2020 09:36:47 am” converted to a
“timestamp” data type. The format mask told Postgres how to parse the parts
of the string into the correct parts of the timestamp.

to_timestamp
2020-10-28 09:36:47-04

to_char( )
The to_char( ) function converts a value to a string.

selectto_char(clock_change_date, 'Mon DD, YYYY')


from clock_change;

to_char
Nov 01, 2020
Mar 14, 2021
Nov 07, 2021
Mar 13, 2022
Nov 06, 2022

The to_char( ) function converted the dates in our clock_change table to


strings. The format mask of “Mon DD, YYYY” tells Postgres to display the
string as a 3-character month with the first character in uppercase followed by
a space, then a 2-digit day followed by a comma and a space, then a 4-digit
year. By using the to_char( ) function to convert dates to strings, we can
control the format that the value is displayed in.

The to_char( ) function can use some character patterns that we haven’t seen
before. These patterns are for converting numbers to strings. The most
commonly-used are:

9 1-digit position that will not show for leading zeroes.


0 1-digit position that will show for leading zeroes.
. Decimal point
, Comma (thousands separator)
L Currency symbol that uses locale
D Decimal point that uses locale
G Group separator that uses locale
MIMinus sign if the number is negative
PL Plus sign if the number is positive
SGPlus or minus sign

selectto_char(123, '99999'),
to_char(123, '00000'),
to_char(1234.56, 'PL9G999.99');

to_char to_char to_char


123 00123 + 1,234.56

You can see that the “99999” format mask did not display any leading zeroes,
but the “00000” format mask did. The “PL9G999.99” format mask added a
plus sign and a “group separator” to our number, which in the United States
displays as a comma. It also added a decimal point.

One of the wackiest character formats is “RN”, which will return a Roman
numeral:
select to_char(2021, 'RN');

to_char
MMXXI

to_number( )
The to_number( ) function converts a value to a number.

selectto_number(
'$1,234.56',
'L9G999.99'
);

to_number
1234.56

The format mask of “L9G999.99” helped Postgres determine which character


in the “$1,234.56” string was a currency symbol and which character was a
group separator (a comma in the US) so that it could convert the string to the
number 1234.56.

USING PARENS FOR MATH


Let’s take a look at a table of loan information:
select * from loan_info;
property_tax insurance rate
50 40 3.25

The following query does a calculation using the numbers from the table:

selectproperty_tax + insurance * 12 / rate


from loan_info;

197.6923076923076923

That result doesn’t seem right to me. If the property tax was 50 and the
insurance was 40, those two values added together would be 90. If you
multiply 90 by 12 you get 1,080. If you divide 1,080 by 3.25 you get 332.30,
not 197.69. What’s going on here?

It turns out that Postgres didn’t solve the calculation in a left-to-right manner.
Instead, it did the multiplication and division first and then it added the
property tax. It multiplied the insurance of 40 by 12, which came to 480.
Then it divided 480 by the rate of 3.25, which came to 147.69, and then it
added the property tax of 50, for a total of 197.69.

That’s not what I had intended when I wrote the query, but the way Postgres
performed the calculation was actually correct. The problem was in the way I
wrote the query, but luckily there is an easy solution: Use parens.

select((property_tax + insurance) * 12) / rate


from loan_info;

332.3076923076923077

Here I added parens to group parts of the calculation. I put parens around
“property_tax + insurance” so that they would be calculated together. I also
put parens around “(property_tax + insurance) * 12”. Postgres honored my
groupings and produced the results that I wanted.

As a check, I like to count the number of left and right parens to make sure
they match. In “((property_tax + insurance) * 12) / rate” we have 2 left
parens and 2 right parens.

TRANSFERRING DATA TO AND FROM FILES


At some point you may need to load data from a data file into a Postgres
database, or you may need to take data from a Postgres database and copy it
to a file. While there is more than one way to accomplish that, my favorite
approach is to use the “copy” command.

In its simplest form, you can copy data from a Postgres table to a file by
using the copy command like this:
copy us_president to '/home/pres/us_president.txt';

We are copying the us_president table to a new file called us_president.txt in


the “/home/pres/” directory. Using the command line interface of your
computer (also called the console, the command prompt, or terminal), you
can go to the /home/pres/ directory and see that the file got created. We can
check the contents of the file by typing it out:
> cd /home/pres/
> cat us_president.txt

1, George Washington
2, John Adams, Federalist
3, Thomas Jefferson, Democratic-Republican
4, James Madison, Democratic-Republican
5, James Monroe, Democratic-Republican

We can see that the copy command copied all 45 rows from the us_president
table to the /home/pres/us_president.txt file. Postgres didn’t remove the rows
from the us_president table, it just created a data file with a copy of the
values from the us_president table.

The copy command lets us create files in different formats. For example, we
can create a CSV (Comma Separated Values) file by specifying “with csv”:
copy us_president to '/home/pres/us_president.csv' with csv;
The file now gets created with commas between each field:
> cd /home/pres/
> cat us_president.csv

1, George Washington,
2, John Adams,Federalist
3, Thomas Jefferson, Democratic-Republican
4, James Madison, Democratic-Republican
5, James Monroe, Democratic-Republican

Using SQL, let’s create a new file and use a pipe character as a delimiter
between the fields in the file:
copy essential_postgres.us_president to
'/home/pres/us_president_pipe.txt' with delimiter '|';

In the last data file we created, we saw commas between the fields, but in this
file we see pipe characters instead.
> cd /home/pres/
> cat us_president_pipe.txt

1|George Washington|
2|John Adams|Federalist
3|Thomas Jefferson|Democratic-Republican
4|James Madison|Democratic-Republican
5|James Monroe|Democratic-Republican

Now let’s look at loading data from a file into a table. Let’s load the rows
from our us_president.txt file into a new table called us_president_staging.
To load data from a file into a table, you can specify that you want to copy
“from” the file instead of copying “to” the file.
copy us_president_staging from '/home/pres/us_president.txt';

When you load data from a file into a database, it’s a good idea to load the
data into a staging table first and move the data to its permanent table later. A
“staging table” is a table that you create just for the purposes of loading data.
Using this approach, you can decrease the risk of causing problems with the
data in your permanent table if there is a problem loading the data from the
file.

First, let’s create the us_president_staging table with the same column
definitions as the us_president table, but with no rows of data:

1create table us_president_staging as


2select *
3from us_president
4where 1 = 2;

Lines 1 – 3 tell Postgres to create a table named us_president_staging that is


the same as the us_president table. The “where clause” in line 4 controls how
many rows from the us_president table will be written to the
us_president_staging table. Since the number 1 can never equal 2, no rows
are written to us_president_staging, and we are left with a table in the image
of us_president but with no rows.

Now we can load data from the data file into our staging table using SQL:
copy us_president_staging from '/home/pres/us_president.txt';

Checking the us_president_staging table, we can see that it worked:


select * from us_president_staging;

president_id president_name president_party


1 George Washington null
2 John Adams Federalist
3 Thomas Jefferson Democratic-Republican
4 James Madison Democratic-Republican
5 James Monroe Democratic-Republican
… … …

Now we could load the contents of the staging table into a permanent table.

You might also like