PostgreSQL
2.021 – UAB – DACSO
Pere Pons
Virtual machine prepared
For this course there is a virtual machine already prepared.
• VirtualBox based
• Has installed:
• PostgreSQL
• Dbeaver
• Sublime text editor
• Has some databases for learning
• Install Oracle VirtualBox, and add the virtual machine to your installation.
• The user is user1 and password user1
What is PostgreSQL
• descripció I breu història
• capacitats (replicas, transaccions, bloqueig sofisticat, vistes, integritat
referencial, tipus definits per l’usuari, herencia, regles)
https://www.postgresql.org/docs/9.1/sql.html
What can PostgreSQL offer us?
It is said to be “the most advanced open-source relational database in the world”.
Key points:
• Highly standard compliant
• Supports high concurrency with clever locks
• Transactions: ensures atomicity, isolation, and durability.
• Support for a high set of datatypes (numeric, shapes, addresses, bit strings, texts, json, etc..)
• Replication capabilities
Comparison between databases: https://www.digitalocean.com/community/tutorials/sqlite-vs-mysql-vs-postgresql-a-comparison-of-relational-database-management-systems
Ranking of databases: https://db-engines.com/en/ranking
PostgresSQL architecture
One process per
connection
Dedicated to
replication
Stored data
https://en.wikibooks.org/wiki/PostgreSQL/Architecture#:~:text=PostgreSQL%20is%20a%20relational%20database,request%20read%20and%20write%20operations.
PostgreSQL connectivity
• Achitecture of PostgreSQL. C/C#/VB/P
ython etc..
Java
application
Application in
any platform
that has
application drivers
• ODBC I JDBC psql command
ODBC (Open
Database
JDBC (Java
Database
Native
line
Connectivity) Connectivity) drivers
• Drivers natius
• us amb php, python, c#??
postgres
processes
Files in the fs
Installing postgresql in Linux
The easiest way to install it:
$ sudo apt install postgresql
Then, indicate how users will be allowed to log in editing:
/etc/postgresql/12/main/pg_hba.conf
Users can be requested to authenticate by different methods
• trust: just trust, no check.
• peer: the user in the OS has to be the same than the one in the database. No password check is performed.
• password: sends the password not encrypted
• md5: sends the password encrypted
• And many others like pam, ldap, with certificate, etc..
Firsts interactions will be done with the user “postgres”
$ sudo su postgres
Creation and removal of databases
Use with postgres user:
create: createdb <db_name>
remove: dropdb <db_name>
list existing dabases: psql –l
Creation of users
to create a user:
$ createuser –d –P –r <username>
(createuser --help for more information)
To log in via de psql user interface:
psql –h <host to connect> -U <username> -W
<database_name>
Interacting with PostgreSQL
our user in this
installation is
To interact with PostgreSQL the system offers you a console based tool: dbuser1 and its
password is
psql dbuser1
To connect to a db use: psql –h <host> -U user –W <database>
with psql you can:
• Send queries interactively
• run “batch queries” from files
psql
with the command \? you will see the help
information about the commands you can
send to psql
With the \h command you will get the help
about the SQL syntax. Try “\h select”.
It is the man page of PostgreSQL.
It is a useful tool, but in development having
only a command line tool is not the best
option...
DBeaver
Dbeaver is a free multi-
platform tool for developers,
database administrators and
analysts. It supports all
popular databases:
PostgreSQL, MySQL, Oracle,
MS SQLServer, Sybase, SQLite,
etc..
It is available in Linux,
Windows and OSX.
A tool to know.
Connect to PostgreSQL with dbeaver
You an create a new connection to a
PostgreSQL database. Just select new
database connection and the PostgreSQL
type.
Then set the connection settings:
• host to connect (127.0.0.1)
• username and password
• database to connect
Then, just select the database and connect.
Let’s get started!
Type here your
commands
In our VM you will find a database of the Martian
Empire, let’s use it to learn SQL.
Press F3, and select “New script”. This will allow us to
write SQL commands interactively.
To run the command press Control+Enter and see the
results in the bottom area.
Get here your
results
Martian Empire’s schema
Queries
We have to prepare a set of reports for the government of Mars, and to do it we will take
advantage of the Planetary Database that we have.
We will prepare the following tasks that we have been assigned:
• to provide a list of Martian population.
• to provide a list of population of all bases. Name, surname and name of the base.
• to provide a list of population that is not assigned to any base.
• to provide a list with the first name, the surname and the base name for everyone. If
someone is not assigned to any base, the name of the base will be empty.
• Calculate the value stored of each product supply per resort
• Calculate the cost of the stock that Mars owns
• Prepare a list with all the know humans
Basics of queries
“to provide a list of Martian population”
The basic form of a query is
SELECT * | <field names>
FROM <table_names>|<subquery>
WHERE <conditions to meet>
[ORDER BY <fields to order> DESC|ASC]
SELECT name, surname
FROM martians
Joining two tables
“provide a list of population of all bases. Name, surname and
name of the base.”
SELECT martians.name, martians.surname, bases.base_name
FROM martians, bases
where martians.base_id = bases.base_id
SELECT m.name, m.surname, b.base_name
FROM martians m, bases b
WHERE m.base_id = b.base_id
SELECT m.name, m.surname, b.base_name
FROM martians m, bases b
WHERE m.base_id = b.base_id
ORDER BY b.base_name asc
We are joining the information of two tables to create a report.
Joining two tables
There is another option
SELECT m.name, m.surname, b.base_name
FROM martians m
INNER JOIN bases b
ON m.base_id = b.base_id
ORDER BY b.base_name
The same result… mmm
Martians Bases
The INNER join is the intersection zone, is to say the rows that
have a base_id that is in the second table.
Unlocated people
“provide a list of population that is not assigned to any base”
We have a foreign key that makes sure that all base_id in the martian table exists in the bases table.
The base_id in martian table may be null
We want to know who has a base_id set to null.
SELECT m.name, m.surname
FROM martians m
WHERE m.base_id is null
Everyone in his base.. even the ones that do not
have.
“to provide a list with the first name, the surname and the base name for everyone. If someone is not assigned to any base,
the name of the base will be empty.”
In this case we want to know who is a martian and its base, even if he does not have a base. Martians Bases
Is important not to forget anyone!
This operation is a LEFT join (the group on the left in the schema)
SELECT m.name, m.surname, b.base_name
FROM martians m
LEFT JOIN bases b
ON m.base_id = b.base_id
ORDER BY b.base_name
Everyone in his base.. even the ones that do not
have (cont).
Of course, RIGHT join will show all martians that have a base, but also include the bases that
do not have martians Martians Bases
SELECT m.name, m.surname, b.base_name
FROM martians m
RIGHT JOIN bases b
ON m.base_id = b.base_id
ORDER BY b.base_name
Everyone in his base.. even the ones that do not
have (cont).
And, of course the last operation is the FULL join, is to say:
Martians Bases
All martians and all bases, the ones that do not have a correspondence will have the
fields set to NULL
SELECT m.name, m.surname, b.base_name
FROM martians m
FULL JOIN bases b
ON m.base_id = b.base_id
ORDER BY b.base_name
Everyone in his base.. even the ones that do not
have (cont).
There is a fifth case… the ones that are not in the bases, but without looking at the NULL value
Martians Bases
in the base_id.
SELECT m.name, m.surname, b.base_name
FROM martians m
LEFT JOIN bases b
ON m.base_id = b.base_id
WHERE base_name is null
ORDER BY b.base_name
Cartesian product
If we set no restriction between two tables
SQL will make the cartesian product of the sets
Subqueries
Up to now, we have only made queries from
tables, but we can make queries on queries…
SELECT m.name, m.surname, b2.base_name
FROM martians m,
( select b.base_id, b.base_name
from bases b
where b.base_name > 'T') b2
WHERE b2.base_id = m.base_id
We will select the bases whose name is bigger than “T” and use that
selection as a table.
Calculations
“Calculate the value stored of each product supply per resort”
PostgreSQL not only can print values, also can perform multiple
calculations..
select r.name as Resort, s.name as Supply, rs.stock as Quantity,
rs.stock * s.value as Value
from resort_supplies rs, resorts r, supplies s
where rs.resort_id = r.resort_id
and rs.supply_id = s.supply_id
PostgreSQL has hundreds of functions implemented:
• Logical,
• Mathmatical,
• String manipulation,
• patter matching functions
• Date and time functions,
• etc… (up to 25 groups of functions)
https://www.postgresql.org/docs/9.1/functions.html
Conditions
What else can we set in search condition of the WHERE clause?
• Logical operations between clauses AND, OR, NOT
• Comparison clauses =, >, <, >=, <=, <>,
SELECT … WHERE (a > 8) and (b < 10) or (c = 12)
SELECT .. FROM people WHERE age BETWEEN 18 AND 65
• Operations with sets
SELECT … WHERE color IN (‘red’, ‘blue’, ‘green’)
SELECT … FROM users WHERE EXIST (SELECT id FROM allowed WHERE allowed.id=users.id)
• Operations with strings
SELECT … WHERE LENGTH(name) = 4
SELECT … WHERE name LIKE ‘Mi%’
• And many, many others.. refer to the online user manual for more information.
https://www.postgresql.org/docs/12/functions.html the list is huge!!!
Grouping data
“Calculate the total stock value that each resort has stored”
• Let’s calculate
What is the cost of each product that all bases have?
SELECT r."name" as Resort_name, s."name" as "Supply",
rs.stock * s.value as stock_value
FROM resort_supplies rs, supplies s, resorts r
WHERE rs.supply_id = s.supply_id
AND r.resort_id = rs.resort_id
SELECT r."name" as Resort_name, SUM(rs.stock * s.value) as stock_value_per_resort
FROMresort_supplies rs, supplies s, resorts r
WHERE rs.supply_id = s.supply_id
AND r.resort_id = rs.resort_id
GROUP BY by resort_name
We could add a HAVING clause after the group by. Will include an additional contition to be applied in the grouped information.
Paging data
Mars has a table of Earthers for marketing purposes.
It contains the Earther_Id, the name, surname and
email of the person. It has nearly 14 million records!
To list the earthers in groups of 20, to display them
in the application you should use LIMIT and OFFSET
SELECT name, surname , email
FROM earthers e
LIMIT 20 Page 1
OFFSET 0
SELECT name, surname , email
FROM earthers e
LIMIT 20
OFFSET 20 Page 2
UNION
“Prepare a list of all known humans”
We have the table martians and the table
earthers, and both contains humans.
SELECT name, surname, email Same number of
FROM martians m
UNION row and same
SELECT name, surname, email types
FROM earthers e
VIEWS
The concept “View” refers to present a
query like a table.
Can be used to:
• Restrict visibility (show just a list of fields)
• Avoid having to repeat a complex query
For example, to have available the value in
stock of all bases…
CREATE OR REPLACE VIEW value_stock_per_base AS
SELECT r.resort_id, r."name" as Resort_name, SUM(rs.stock * s.value) as
stock_value_per_resort
FROM resort_supplies rs, supplies s, resorts r
WHERE rs.supply_id = s.supply_id
AND r.resort_id = rs.resort_id
GROUP BY r.resort_id, r.name
Summarizing so far..
• We can explore all the areas of the schema of an
intersection of data stored in tables.
More information on queries:
• If we only set the tables and the restrictions, we make a https://www.postgresql.org/docs/9.1/queries.html
INNER join.
• We can use subqueries as tables
• There is a huge number of functions available that can
be used in the SELECT section, or the WHERE section.
• The information can be grouped and make operations
within the grouped values.
• We can page the information, concatenate two queries,
and generate views from the queries.
Inserting data
To insert data in a table you have to send an Insert command:
INSERT INTO <table> (<field1>, <field2>, .., <fieldn>)
VALUES {(<value1>, <value2>, .., <valuen>),}
INSERT INTO martians (name, surname, base_id, email)
VALUES ('Josh', 'Martins', 2, 'jmartins@google.mars’);
INSERT INTO martians (name, surname, base_id, email)
VALUES ('Kira', 'Loshten', 2, 'klosh@google.mars’),
('Sandra', 'Osburn', 2, 'sosburn@google.mars');
Updating information Martians table (original data)
The syntax of the update command is:
UPDATE <table_name>
SET <field1> = <value1>,
<field2> = <value2>
WHERE <condition> Bases table
Let’s set Josh Martins to the base “Orbit 1”:
UPDATE martians
SET base_id = 4
WHERE martian_id = 9
or
UPDATE martians
SET base_id = 4
WHERE name = ‘Josh’ and surname = ‘Martins’
Updating information Martians table (original data)
Sophie Leicester has been identified as the CIA
agent Theresa Loid, please update its data in
the martians table.
Bases table
UPDATE martians
SET name = ‘Theresa’, surname = ‘Loid’
WHERE martian_id = 5
Updating information Supplies table
Some supplies, the ones that com from Mars,
will have to increase its value a 7%. The
affected supplies are: Martian Chicken,
Magazines and Bread.
UPDATE supplies
SET value = value * 1.07
WHERE supply_id IN (1, 9, 11)
Indexes and optimization
“Earthers” is a table, for marketing purposes, of earthers that may be
interested on the visit to Mars. There are more than 13.5 Million
registers!!.
Its structure is:
create table earthers
(
earther_id serial,
name text not null,
surname text not null,
email text,
primary key (earther_id)
);
Indexes and optimization
Search Time (in
seconds)
The search times in my computer (yours may be different) are: Who has the email James_Ruhl@microsoft.com
Select * from Earthers where email=‘James_Ruhl@microsoft.com’ 5.77
Get the register from Leo White
Select * from Earthers where name=‘Leo’ and surname=‘White’ 5.17
Can we do it better? Sure!! Count how many people is named Noah
Select count(*) from Earthers where name=‘Noah’ 4.84
Count how many people is named Noah
Select count(*) from Earthers where surname=‘White’ 5,13
What will the DBMS do to resolve the queries? Use the EXPLAIN
clause..
… Will go one by one… not very good!
SEQUENTIAL SCAN
https://www.postgresql.org/docs/12/sql-createindex.html
Indexes and optimization
Search Time (in
seconds)
Let’s create an index for the email field: Who has the email James_Ruhl@microsoft.com
Select * from Earthers where email=‘James_Ruhl@microsoft.com’ 5.77
Get the register from Leo White
Select * from Earthers where name=‘Leo’ and surname=‘White’ 5.17
CREATE INDEX ndx_earthers_email
Count how many people is named Noah
ON Earthers (email) Select count(*) from Earthers where name=‘Noah’ 4.84
Count how many people is named Noah
Select count(*) from Earthers where surname=‘White’ 5,13
It will take a bit… 2m 42s
How will the DBMS resolve the query now?
INDEX SEARCH
… Will go one by one… not very good!
How long will it take…. 6ms!!!
From 5700 ms to 6ms.. 1000 times faster!!
https://www.postgresql.org/docs/12/sql-createindex.html
Indexes and optimization
So… why not to place index for every field in all tables?
Because the price to pay is:
• Memory usage.
• Update and insert performance
So you have to know which are the queries to perform, and create
indexes only for those queries.
https://www.postgresql.org/docs/12/sql-createindex.html
Materialized views
A Materialized View is a view whose resulting data is stored on the
DBMS.
When is the snapshot taken? When you create it or when you refresh it.
It may not be up to date always? Yes. correct.
You can request the DBMS to refresh its data whenever you want.
It will provide the information much faster.
https://www.postgresql.org/docs/12/rules-materializedviews.html
Backup and recovery
From the terminal command line you have the following tools:
• pg_dump: Generates a dump (backup) of the database.
$ pg_dump –h <host> -U <user> <database>
$ pg_dump –h 127.0.0.1 –U dbuser1 martian_database | gzip > martiandb.dmp.gz
• psql: the main CLI interface for the database
$ psql –l –h 127.0.0.1 –U dbuser1 => will list the databases
$ psql –h 127.0.0.1 –U dbuser1 => will log in as dbuser and allow to launch commands