[go: up one dir, main page]

0% found this document useful (0 votes)
5 views89 pages

Unit 5 Tutorials Administration

Uploaded by

parrishd2015
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views89 pages

Unit 5 Tutorials Administration

Uploaded by

parrishd2015
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 89

Unit 5 Tutorials: Administration

INSIDE UNIT 5

Reliability

Transactions
ACID Properties
Atomicity
Consistency
Isolation
Durability
COMMIT and ROLLBACK to Manage Changes

Security

CREATE USER/ROLE to Add Users


CREATE ROLE to Create Groups
GRANT to Assign Users
GRANT to Assign Privileges
Application Security
Superusers

Enhancement

Index Overview
B-Tree Index
Hash Index
Create and Drop Indexes

Management

Create a Backup
Restore from Backup
Backups: Command Line vs. GUI
Backup Rotation Strategies

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 1
Migrating Databases in an Organization

Transactions
by Sophia

 WHAT'S COVERED

This lesson explores the concept of transactions in a database to ensure multiple statements execute
in an all-or-nothing scenario, in two parts. Specifically, this lesson will cover:
1. Introduction
2. Examples

1. Introduction
Transactions are a core feature of every database system. The purpose of a transaction is to combine multiple
SQL statements into a scenario that would execute all of the statements or none of them. Each individual SQL
statement within a transaction is not visible to other concurrent transactions in the database, and they are not
saved to the database unless all of the statements in the transaction execute successfully. If any statement in
the transaction fails to execute, the entire transaction is cancelled, and none of its statements execute.

A database management system's ACID properties guarantee the integrity and reliability of transactions by
ensuring atomicity, consistency, isolation, and durability.

Atomicity ensures that transactions will be handled as indivisible units, either completed or rolled back if
they fail.
The consistency of the database ensures that data integrity and business rules are maintained when
transactions take the database from one consistent state to another.
The goal of isolation is to ensure that each transaction takes place independently from the other,
preventing interference and ensuring data accuracy.
Data persistence is ensured by durability, which ensures that once a transaction is committed, its
modifications will be permanently stored in the database as long as the system continues to operate.

Together, these properties ensure that data transactions are reliable and correct, which is crucial for maintaining
database integrity. You will learn more about ACID properties in upcoming lessons.

 TERMS TO KNOW

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 2
Transaction
A sequence of statements that are performed as a group; if one fails to execute, they all fail.

ACID
An acronym for the four properties that ensure data validity: Atomicity, Consistency, Isolation, and
Durability. Transactions are database operations that satisfy the ACID properties.

2. Examples
Let us take a look at a scenario in which a transaction would be necessary. James has gone to an online
computer store and purchased a new computer for $500. In this transaction, there are two things to track. The
first is the $500 being transferred from James to the store, and the second is the computer being deducted
from the store inventory and transferred to James. The basic SQL statements would look something like the
following:

UPDATE customer_account
SET balance = balance – 500
WHERE account_id = 1000;

UPDATE store_inventory
SET quantity = quantity – 1
WHERE store_id = 5 AND product_name = ‘Computer’;

INSERT INTO customer_order(account_id, product_name,store_id, quantity,cost)


VALUES (1000, ‘Computer’,5, 1, 500);

UPDATE store_account
SET balance = balance + 500
WHERE store_id = 5;
As you can see, multiple SQL statements are needed to accomplish this operation. Both James and the store
would want to be assured that either all of these statements occur or none of them do. It would not be
acceptable if $500 was deducted from James’s account, the inventory for the store had the computer removed,
and then there was a system failure. This would mean that James would not get the order, and the store would
not get the $500 that James had paid. We need to ensure that if anything goes wrong at any point within the
entire operation, none of the statements that have been executed so far will take effect. This is where the use
of a transaction is valuable.

To set up a transaction, we need to start with the BEGIN command, have our list of commands, and then end
with COMMIT. Similar to our prior example, consider the following:

BEGIN;
UPDATE customer_account

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 3
SET balance = balance – 500
WHERE account_id = 1000;

UPDATE store_inventory
SET quantity = quantity – 1
WHERE store_id = 5 AND product_name = ‘Computer’;
INSERT INTO customer_order(account_id, product_name,store_id, quantity,cost)
VALUES (1000, ‘Computer’,5, 1, 500);

UPDATE store_account
SET balance = balance + 500
WHERE store_id = 5;
COMMIT;
A transaction can contain a single statement or multiple statements; there is no limit. Note that each SQL
statement needs to end with a semicolon to separate each out individually.

PostgreSQL and other databases treat each executed SQL statement as if it were an individual transaction. If
we do not include a BEGIN command, then each statement is treated as a separate transaction; that is the way
the statements we have run in this course so far have operated.

 WATCH

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter in one of
the examples above and see how it works. Next, try your own choices for which columns you want the
query to provide.

 TERMS TO KNOW

BEGIN
A statement that marks the beginning of a transaction.

COMMIT
A statement that marks the end of a transaction and makes the transaction's results permanent.

 SUMMARY

In this lesson, in the introduction you learned that database transactions are an essential part of
ensuring data integrity and consistency within a system. Transactions are sequences of database
operations that are treated as single, indivisible units of work. Database transactions are essential to
maintain data accuracy and reliability, especially in multiuser environments where concurrent access
can result in conflicts.

In the examples, you also learned that transactions are based on the ACID properties of atomicity,

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 4
consistency, isolation, and durability. Transactions are atomic in nature—they are either fully completed,
bringing the system to a consistent state, or fully aborted, leaving it unchanged. Consistency ensures
that the database transitions from one valid state to another according to defined rules and constraints.
By isolating concurrent transactions, anomalies such as the lost update problem or dirty reads are
prevented, while durability ensures that a transaction's effects will remain permanent and persistent
regardless of system failures once it is committed. During the execution of a transaction, it can be
initiated, completed, and terminated, and its effects persist. This mechanism ensures data integrity in
the event of hardware failures, software crashes, or other unexpected disruptions. A transactional
system ensures database operations' reliability, consistency, and security by providing a structured
framework for managing simultaneous data manipulation.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

ACID
An acronym for the four properties that ensure data validity: Atomicity, Consistency, Isolation, and
Durability. Transactions are database operations that satisfy the ACID properties.

BEGIN
A statement that marks the beginning of a transaction.

COMMIT
A statement that marks the end of a transaction and makes the transaction's results permanent.

Transaction
A sequence of statements that are performed as a group; if one fails to execute, they all fail.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 5
ACID Properties
by Sophia

 WHAT'S COVERED

This lesson explores the ACID (atomicity, consistency, isolation, and durability) properties and how they
affect database transactions, in two parts. Specifically, this lesson will cover:
1. Why ACID Properties Are Essential
2. ACID Properties

1. Why ACID Properties Are Essential


As we covered in the prior lesson, a transaction is a single unit of work that has to be fully executed or
completely aborted if there are any issues within the transaction. There are no states between the beginning
and end that are acceptable for a database to be in.

EXAMPLE James made a purchase of $500 for a computer from a store. All of the SQL statements that
we defined in that transaction must be executed entirely:

BEGIN;
UPDATE customer_account
SET balance = balance – 500
WHERE account_id = 1000;
UPDATE store_inventory
SET quantity = quantity – 1
WHERE store_id = 5 AND product_name = ‘Computer’;
INSERT INTO customer_order(account_id, product_name,store_id, quantity,cost)
VALUES (1000, ‘Computer’,5, 1, 500);
UPDATE store_account
SET balance = balance + 500
WHERE store_id = 5;
COMMIT;
It is not acceptable to only deduct the purchase amount from James’s account or remove inventory from the
store. If any of the SQL statements in the transaction fail, the entire transaction is rolled back (i.e., not
committed) to the original state. If the transaction is successful, the changes to the database bring it from one
consistent state to another. A consistent state is a state in which all of the data integrity constraints on the

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 6
database are satisfied. ACID properties ensure that a consistent state always exists, even when system failures
or power outages occur.

 TERM TO KNOW

Consistent State
A state in which all of the data integrity constraints on the database are satisfied.

2. ACID Properties
To ensure that we have consistency in the database, every transaction has to begin with the database in a
known consistent state. If the database is not in a consistent state, transactions can result in an inconsistent
database due to violations of integrity or business rules. As such, all transactions in the database are controlled
and executed to ensure integrity according to the ACID properties: atomicity, consistency, isolation, and
durability.

Atomicity requires that all SQL statements of a transaction be completed. The transaction should be viewed as
an indivisible logical unit of work. The entire transaction should be aborted if any of the SQL statements are not
completed. For example, in our transaction above, imagine that the first two statements executed:

UPDATE customer_account
SET balance = balance – 500
WHERE account_id = 1000;
UPDATE store_inventory
SET quantity = quantity – 1
WHERE store_id = 5 AND product_name = ‘Computer’;
Then imagine that we ran into an error with the data that stopped the transaction. The entire transaction should
be reverted to its original state. However, if all four statements in the transaction are executed successfully, the
entire transaction will be committed to the database.

Consistency ensures that the database is in a consistent state. This means that a transaction takes a database
from one consistent state to another. When a transaction starts, the database must be in a consistent state;
when the transaction ends, the database must be in a consistent state. If any of the parts of the transaction
violate one of the integrity constraints, the entire transaction is aborted.

Isolation means that the data that is used during the first transaction cannot be used in another transaction until
the first transaction has finished executing. In looking at the example above, imagine that James and another
customer purchased a computer at similar times. If James's transaction has started, the second customer
cannot attempt to purchase the same computer until James’s transaction is completed. Otherwise, they may
have both tried to purchase the single available computer. This is especially important for multiuser databases,
where there will be many users accessing and updating the database at the same time.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 7
Durability is the last ACID property. It ensures that when the transaction changes are finished and committed,
they cannot be undone or removed, even if there is a system failure.

We will discuss each of these properties in further detail in the upcoming lessons.

 TERMS TO KNOW

ACID
An acronym for the four properties that ensure data validity: Atomicity, Consistency, Isolation, and
Durability. Transactions are database operations that satisfy the ACID properties.

Atomicity
The quality of being indivisible.

Consistency
The quality of something being done the same way in all cases.

Isolation
The quality of being inaccessible until certain conditions are met.

Durability
The quality of continuing to exist without being undone or removed.

 SUMMARY

In this lesson, you learned that why ACID properties (Atomicity, Consistency, Isolation, Durability) are
essential for keeping database transactions consistent and reliable. Transaction consistency, an ACID
property, ensures data integrity and defined rules are adhered to while transitioning from one state to
another. Each transaction leaves the database in a state consistent with the application's business logic,
constraints, and integrity requirements.

Data consistency prevents transactions from violating constraints or business rules. Databases maintain
consistency through mechanisms such as validation checks, triggers, and constraints. For example, a
bank's database might need to validate that an account has the funds to cover a withdrawal before
processing that withdrawal, and if it does not, the entire withdrawal transaction is canceled. You also
learned that the ACID properties, when implemented together, ensure consistent database operation
even when unexpected errors or system failures occur. Because consistency is an integral part of ACID
principles, the data stored within the system will be reliable and trustworthy.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 8
ACID
An acronym for the four properties that ensure data validity: Atomicity, Consistency, Isolation, and
Durability. Transactions are database operations that satisfy the ACID properties.

Atomocity
The quality of being indivisible.

Consistency
The quality of something being done the same way in all cases.

Consistent State
A state in which all of the data integrity constraints on the database are satisfied.

Durability
The quality of continuing to exist without being undone or removed.

Isolation
The quality of being inaccessible until certain conditions are met.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 9
Atomicity
by Sophia

 WHAT'S COVERED

This lesson explores the atomicity property in a transaction and how it affects the database, in two
parts. Specifically, this lesson will cover:
1. Atomicity in Transactions
2. Transaction Example

1. Atomicity in Transactions
Transactions in a database consist of multiple SQL statements that are executed together. Atomicity is
important as it ensures that each transaction is treated as a single statement. Atomicity ensures that if any of the
SQL statements in a transaction fail, the entire transaction fails, and the attempted changes within the
transaction are reverted. If all of the statements in a transaction are executed successfully, then the transaction
is successful and committed.

This approach prevents the database from making updates that may only be partially completed. The database
will do one of two operations to ensure atomicity. It will either:

1. Commit—If the transaction is successful, the changes are applied and saved to the database.
2. Abort—If a transaction has any issues, the transaction is aborted, and the changes are rolled back so that
they are not reflected in the database.

This includes all INSERT, UPDATE and DELETE statements in a transaction.

 TERM TO KNOW

Atomicity
The quality of indivisibility. In SQL database terms, atomicity means the transaction cannot be partially
executed: Either all statements are executed or none of them are.

2. Transaction Example

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 10
EXAMPLE Jennifer would like to make a payment to Randall for $100 through an account transfer. This
transaction is a balance transfer between two accounts at two different branches of the same bank. Let us
take a look at what the transaction would look like:
1. $100 would be deducted from Jennifer’s (10) account.
2. The banking location where Jennifer has her account would have $100 deducted from their location’s
account.
3. The banking location's account where Randall (50) has his account would be increased by $100.
4. Randall’s account would be increased by $100.

The transaction would look something like this in PostgreSQL:

BEGIN;
UPDATE customer_account
SET balance = balance – 100
WHERE account_id = 10;
UPDATE branch_account
SET balance = balance – 100
WHERE branch_id = (SELECT branch_id FROM customer_account where account_id = 10);
UPDATE branch_account
SET balance = balance + 100
WHERE branch_id = (SELECT branch_id FROM customer_account where account_id = 50);<br>
UPDATE customer_account
SET balance = balance +100
WHERE account_id = 50;
COMMIT
With the atomicity property, if there were an error at any point in the four statements, then the entire transaction
would be rolled back. For example, imagine that Randall’s account had a freeze on it that prevented any
changes. The first three statements would execute, but an error would be returned on the fourth UPDATE
statement. Regardless of the error, the first three SQL statements would revert back to what they were before
the transaction started. Otherwise, $100 would be deducted from Jennifer’s account, the bank branch that holds
Jennifer’s account would have $100 deducted from their balance, Randall’s bank branch would have $100
added, but Randall’s account would have its original balance. That certainly would not be acceptable to Randall.

 SUMMARY

Atomicity in SQL transactions ensures that a sequence of database operations is treated as a single
indivisible unit of work. SQL atomicity ensures that all operations within a transaction are executed
successfully or none of them are executed if an error occurs. The purpose of this property is to prevent
partial or incomplete changes to the database. This ensures the integrity and consistency of the data.
Committing a transaction signifies that all its constituent operations have been successfully executed
and that the changes have been permanently saved. If an error occurs during the execution of a
transaction's operation, the entire transaction will be rolled back, restoring the database to its state
prior to its initiation. This prevents scenarios where some changes are applied, but others are left

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 11
incomplete regardless of whether the transaction succeeds or fails.

In the transaction example, you learned that the atomicity of SQL operations plays a crucial role in
ensuring their reliability. This ensures that even if a transaction fails, the database won't be left in an
uncertain or inconsistent state. It also facilitates error handling since developers can wrap multiple
operations within a single transaction and be assured that all the changes will be made successfully, or
none will be made at all. Data integrity is enhanced and prevented from being corrupted by
unexpected interruptions by atomicity, which ensures the accuracy and reliability of SQL transactions.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Atomicity
The quality of indivisibility. In SQL database terms, atomicity means the transaction cannot be
partially executed: Either all statements are executed or none of them are.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 12
Consistency
by Sophia

 WHAT'S COVERED

This lesson explores the consistency property in a transaction and how it affects the database, in three
parts. Specifically, this lesson will cover:
1. Ensuring Consistency
2. Consistency Example

1. Ensuring Consistency
Consistency within the ACID properties focuses on ensuring that the data in the database moves from one valid
state to another valid state. This ensures that any data that has been modified in the database is uncorrupted
and correct at the end of the transaction.

The consistency property follows these criteria:

If the transaction has been completed successfully, the changes will be applied to the database.
If there was an error in the transaction, all of the changes should be reverted/rolled back automatically. This
means that the database should restore the pre-transaction state.
If there was a system failure or external issue while the transaction was executing, all of the changes that
were made in the transaction up to that point should automatically be reverted/rolled back.

 TERM TO KNOW

Consistency
The quality of being in the same state. In database terms, consistency refers to a transaction starting
and ending in the same state.

2. Consistency Example
Let's look at our banking example again.

EXAMPLE Jennifer would like to make a $100 payment to Randall through an account transfer. This
transaction is a balance transfer between two accounts at two different branches of the same bank;

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 13
Jennifer's account is at branch 10, and Randall's account is at branch 50. Let us review the transaction:
1. $100 would be deducted from Jennifer’s account at branch 10.
2. Branch 10's balance would be decreased by $100.
3. Branch 50's account would be increased by $100.
4. Randall’s account would be increased by $100.

The consistency property ensures that the total amount of money is the same at the start and finish. One
account and branch are decreased, and another account and branch are increased by the same amount, so
the transactions balance one another. Let us look back at the transaction in SQL:

BEGIN;
UPDATE customer_account
SET balance = balance – 100
WHERE account_id = 10;
UPDATE branch_account
SET balance = balance – 100
WHERE branch_id = (SELECT branch_id FROM customer_account where account_id = 10);
UPDATE branch_account
SET balance = balance + 100
WHERE branch_id = (SELECT branch_id FROM customer_account where account_id = 50);
UPDATE customer_account
SET balance = balance +100
WHERE account_id = 50;
COMMIT;
Imagine that during the second UPDATE statement, the system had a failure, and when it recovered, the
transaction had only partially executed. There would be an inconsistent state because the total balances would
not match up. In this situation, the system would roll back those UPDATE statements to the consistent state
before the transaction started.

If both Jennifer’s and Randall's account balances started at $1000, the end result should have the appropriate
expected balances. Jennifer’s account balance should be $900, and Randall’s balance should be $1100. If the
end values were not what was expected for any reason, the transaction would also be rolled back.

 SUMMARY

In this lesson, you learned how consistency ensures that a database maintains a valid and expected
state before and after a transaction. In the context of ACID, ensuring consistency requires all database
operations to abide by predefined rules, constraints, and business logic. As a result, the database's
states before the transaction and after it are consistent.

The consistency example showed a banking transaction that transferred money between two
accounts. The total amount of money was the same before and after the transaction, reflecting

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 14
consistency. If the transaction were to violate that constraint, it would be rolled back, with any changes
undone, to maintain data integrity.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Consistency
The quality of being in the same state. In database terms, consistency refers to a transaction
starting and ending in the same state.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 15
Isolation
by Sophia

 WHAT'S COVERED

This lesson explores the isolation property in a transaction and how it affects the database, in two parts.
Specifically, this lesson will cover:
1. Introduction
2. Isolation Example

1. Introduction
The isolation property in a transaction ensures that if there are multiple transactions run at the same time as
one another, they do not leave the database in an inconsistent state. The transactions themselves should not
interfere with one another, and each transaction should be run independently. Any changes that statements in a
transaction are making will be visible only to that transaction, and any other concurrent transaction will not see
the results of the changes until the transaction is complete and the data has been committed. This also ensures
that transactions that run concurrently have the same results as if they were run sequentially.

In addition, the isolation property ensures that the data that is used in a transaction cannot be used in another
transaction until the original transaction is complete with it. Most databases, including PostgreSQL, will use
locking to maintain transactional isolation.

 TERM TO KNOW

Isolation
The quality of ensuring that concurrent transactions are isolated from each other, preventing
interference with one another's operations and maintaining data integrity.

2. Isolation Example
Let us consider our banking example yet again.

EXAMPLE Jennifer would like to make a $100 payment to Randall through an account transfer. This
transaction is a balance transfer between two accounts at two different branches of the same bank. Here's
what the transaction looks like:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 16
1. $100 would be deducted from Jennifer’s account at branch 10.
2. Branch 10's balance would be decreased by $100.
3. Branch 50's account would be increased by $100.
4. Randall’s account at branch 50 would be increased by $100.

Let us look at the transaction in SQL:

BEGIN;
UPDATE customer_account
SET balance = balance – 100
WHERE account_id = 10;
UPDATE branch_account
SET balance = balance – 100
WHERE branch_id = (SELECT branch_id FROM customer_account where account_id = 10);
UPDATE branch_account
SET balance = balance + 100
WHERE branch_id = (SELECT branch_id FROM customer_account where account_id = 50);
UPDATE customer_account
SET balance = balance +100
WHERE account_id = 50;
COMMIT;

IN CONTEXT

Let us say that both Jennifer’s and Randall's account balances started at $1000. If Jennifer was
attempting to start this transaction and Randall concurrently was trying to check his account balance,
Randall should not see any updates to his account until the changes are made and the transaction has
been committed. If Randall queries for his customer_account balance, it would be at $1000 until the
entire transaction from Jennifer executes successfully and commits the data to the database. In certain
databases, if Randall tried to query his account balance, no result would be provided until Jennifer’s
transaction was completed.

Isolation is increasingly important as more concurrent transactions access the same data. For
example, imagine a situation where Randall is receiving two different account transfers from two
different individuals at the same time.

Imagine Randall is receiving $100 from Jennifer and $50 from Paul. If there were no isolation in place,
Jennifer’s transaction could check Randall’s balance at $1000. At the same time, Paul’s transaction
could check Randall’s balance at $1000. Jennifer’s transaction would add $100 to the $1000 and save
the results. Paul’s transaction would add $50 to the $1000 and save the results. The final result is that
Randall’s account balance is $1050 instead of $1150. This could be the case if Jennifer's and Paul's

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 17
transactions are both reading Randall’s balance at the same time. Jennifer’s transaction started and
updated the balance, but Paul’s transaction was also completed and saved over Jennifer’s transaction.

This is why isolation is important, as Jennifer’s transaction would prevent Paul's transaction from reading the
value. Even if the value is read, Paul’s transaction would not be able to complete due to the inconsistency in the
database state.

 SUMMARY

In this lesson, in the introduction, you learned that concurrent transactions should be isolated from one
another in accordance with the ACID model (Atomicity, Consistency, Isolation, Durability) to ensure data
integrity. A multiuser database environment may lead to conflicts due to simultaneous transactions,
such as “lost updates” or “dirty reads,” when one transaction adversely affects another. The isolation
mechanism ensures that each transaction appears to run in isolation, oblivious to the presence of other
concurrent transactions. Different mechanisms enable isolation, such as locking, which prevents
multiple transactions from modifying the same data at the same time.

In the isolation example, you considered two transactions modifying the same record simultaneously.
The isolation process ensures that only one transaction can modify a record at a time, while all other
transactions are put on hold until the first transaction is complete. This prevents data corruption or
discrepancies from occurring due to simultaneous modifications.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Isolation
The quality of ensuring that concurrent transactions are isolated from each other, preventing
interference with one another's operations and maintaining data integrity.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 18
Durability
by Sophia

 WHAT'S COVERED

This lesson explores the durability property in a transaction and how it affects the database, in two
parts. Specifically, this lesson will cover:
1. Introduction
2. Durability Example

1. Introduction
Durability is the last ACID property and one of the easiest to understand. This property focuses on ensuring that
once the data from a transaction has been saved/committed to the database, it will stay in place and will not be
affected by a system failure. This means that any completed transactions should be recorded and saved in
memory.

 TERM TO KNOW

Durability
A quality that ensures that once a transaction is committed, its effects are permanent and persist even
in the face of system failures or crashes.

2. Durability Example
Once again, consider our banking example.

EXAMPLE Jennifer would like to make a $100 payment to Randall through an account transfer. This
transaction is a balance transfer between two accounts at two different branches of the same bank. Here's
what the transaction looks like:
1. $100 would be deducted from Jennifer’s account at branch 10.
2. Branch 10's balance would be decreased by $100.
3. Branch 50's account would be increased by $100.
4. Randall’s account would be increased by $100.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 19
Let us look at the transaction in SQL:

BEGIN;
UPDATE customer_account
SET balance = balance – 100
WHERE account_id = 10;
UPDATE branch_account
SET balance = balance – 100
WHERE branch_id = (SELECT branch_id FROM customer_account where account_id = 10);
UPDATE branch_account
SET balance = balance + 100
WHERE branch_id = (SELECT branch_id FROM customer_account where account_id = 50);
UPDATE customer_account
SET balance = balance +100
WHERE account_id = 50;
COMMIT;
Once the transaction has successfully completed execution, the updates and changes to the database are
stored in and written to the database. These changes will still persist even if the system fails, as those updates
are now permanent. The effects of the account transfer will never be lost. Note that durability is only applied
after the transaction and the COMMIT has been successfully executed. Anything that occurs prior to that is part
of another ACID property.

 SUMMARY

In this lesson, in the introduction, you learned that durability ensures that once a transaction has been
committed, its effects persist even in the event of a system failure or crash. This property ensures that
the changes made by a transaction won't be lost and will remain intact in the database. Providing this
assurance ensures the safety and recoverability of users' data. Durability is ensured by transaction logs
or journals that keep track of each transaction's changes. In case of a system failure, database
management systems use this log to recover the database's state. This is accomplished by reapplying
uncompleted transactions to the database to restore consistency.

In the durability example, you reviewed the scenario of a bank transfer where the system crashes
immediately after the transaction is committed. Because of durability, the committed transaction
remains intact, stored permanently and reliably.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Durability

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 20
A quality that ensures that once a transaction is committed, its effects are permanent and persist
even in the face of system failures or crashes.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 21
COMMIT and ROLLBACK to Manage Changes
by Sophia

 WHAT'S COVERED

This lesson explores using COMMIT and ROLLBACK, in two parts. Specifically, this lesson will cover:
1. Introduction
2. Example Transaction

1. Introduction
Up to this point, we have looked at transactions as single units that start with a BEGIN and end with the
COMMIT statement, with multiple SQL commands in between that are executed at once. Remember that
without the BEGIN command, each SQL statement is viewed as a transaction with an implicit BEGIN and
COMMIT statement executed.

ROLLBACK is an operation that cancels all the uncommitted changes prior to it in the transaction. It does not
undo any already committed changes. It would typically be used in complex transactions involving multiple
steps or operations.

 REFLECT

Perhaps you are wondering, “Why would I need ROLLBACK, when a transaction is canceled automatically
when an error occurs?” This is a good question.

ROLLBACK is useful because it helps you take more explicit control over a transaction's behavior:
More precise control: Sometimes you might want to control logical flow through a transaction based on
conditions other than errors. For example, you might want to roll back a transaction based on certain
business logic criteria, which might not be strictly classified as errors from the database's perspective.

Clarity: ROLLBACK enhances the code's readability and maintainability because it clearly
communicates the developer's intentions.

Nested transactions: Some database systems allow you to nest transactions. In such cases, an error in
an inner transaction might not necessarily cause an automatic rollback of the outer transaction.
Explicitly issuing a ROLLBACK statement allows you to manage the rollback behavior at different levels
of nesting.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 22
Partial rollbacks: In complex transactions involving many steps or operations, you might want to roll
back only part of the transaction while preserving the changes made by previous steps. Automatic
rollback on error typically rolls back the entire transaction, but using ROLLBACK, you can control which
parts of the transaction are undone.

Debugging and testing: Explicitly issuing a ROLLBACK statement during development and testing
phases enables developers to manually control the transaction's state and test various scenarios,
ensuring that the application behaves as expected under different conditions.

To start a transaction in the PostgreSQL command line, you can start with either:

BEGIN;
or

BEGIN TRANSACTION;
The transaction then proceeds through its statements until it encounters either a COMMIT or ROLLBACK
statement, or until it encounters an error (which triggers an automatic rollback).

The COMMIT statement is used to save changes from a transaction to the database. This COMMIT statement
will save all the SQL statements to the database following the BEGIN command. The syntax for the COMMIT
statement looks like this:

COMMIT;
or

END TRANSACTION;
The ROLLBACK statement is used to undo or revert SQL statements that have not already been saved to the
database. The ROLLBACK statement can only be used to undo SQL statements that follow the BEGIN
statement. The syntax looks like this:

ROLLBACK;

 TERMS TO KNOW

COMMIT
A statement that makes tentative changes permanent, marking the end of a transaction and providing
durability to ACID transactions.

ROLLBACK
A statement that returns the database to some previous state.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 23
2. Example Transaction
Suppose you are writing SQL code for a banking database. Users can transfer funds between accounts, and
you need to make sure that the funds are deducted from one account and added to another account as an
atomic operation. In other words, if either of those operations fails, the whole transaction fails.

Let's assume there is an Accounts table with columns that include AccountID and Balance. Here's a transaction
that can safely transfer $100 from account 123 to account 456:

BEGIN;
UPDATE Accounts
SET Balance = Balance - 100
WHERE AccountID = 123;
UPDATE Accounts
SET Balance = Balance + 100
WHERE AccountID = 456;
So far, so good, right? But to ensure that there are no errors, we could add this additional logical test, nested
within the main transaction:

IF @ROWCOUNT = 2
BEGIN
COMMIT;
PRINT 'Transfer Successful';
END
ELSE
BEGIN
ROLLBACK;
PRINT 'Transfer failed. Transaction rolled back.';
END
In this case, a logical IF clause, and not the presence or absence of an error condition, determined whether the
transaction would be committed or rolled back.

 SUMMARY

In this lesson's introduction, you learned that the ROLLBACK statement can be run within a transaction
to undo the transaction when certain conditions exist. The syntax of ROLLBACK, like that of COMMIT, is
very simple, consisting only of the single word and a semi-colon (ROLLBACK;).

Then, in the example transaction, you saw a practical example of a situation where ROLLBACK would

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 24
be useful. In this example, an IF clause is used to test for a certain condition, and then either
ROLLBACK or COMMIT is implemented depending on the outcome of that test.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

COMMIT
A statement that makes makes tentative changes permanent, marking the end of a transaction and
providing durability to ACID transactions.

ROLLBACK
An operation that returns the database to some previous state.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 25
CREATE USER/ROLE to Add Users
by Sophia

 WHAT'S COVERED

This lesson explores creating users in a database, in two parts. Specifically, this lesson will cover:
1. Roles, Users, and Groups
2. CREATE ROLE
3. Other Attributes

1. Roles, Users, and Groups


Roles are entities used for authentication and access control. A role can represent individual users, groups, or
other entities such as services or applications. You can see what roles exist in your database by running this
statement:

SELECT rolname
FROM pg_roles;
Note that the column is rolname and not rolename.

Login roles are roles assigned to users that determine their ability to connect to the database. Login roles are
created using the CREATE ROLE statement with the LOGIN option. For our purposes in this course, a login role
can be considered roughly equivalent to a user account.

We can also create and use group roles. Group roles are roles that represent groups of users. Group roles are
created using the CREATE ROLE statement with the GROUP option.

 KEY CONCEPT

PostgreSQL also has an organizing unit called a group, and a CREATE GROUP command, but that is
normally used to manage database objects, not user accounts.
Privileges are permissions assigned to roles, such as the ability to view or edit certain tables or run certain SQL
commands. We assign privileges to group roles to efficiently grant the same privileges to multiple users. You
assign a group role to user accounts and then assign the permissions to the group role. Its privileges are
applied to each user account in the group.

 HINT

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 26
It is a best practice to apply privileges only to groups, not individual user accounts, because it simplifies
administration.

 TERMS TO KNOW

Role
Entity used for authentication and access control.

Login Role
A role that determines the ability of a user to connect to the database.

Group Role
A role that represents a group of users and is used to assign privileges to them.

Privilege
A permission assigned to a role, such as the ability to add or update records.

2. CREATE ROLE
To create a user account and give them permission to login, you can use the CREATE ROLE statement:

CREATE ROLE username PASSWORD 'password' LOGIN;


Note that the username is not in quotation marks, but the password is.

An alternative method of creating a user account with login permission is to use the CREATE USER statement
with the LOGIN option:

CREATE USER username PASSWORD 'password' LOGIN;


Both methods have the same end result of creating a user account that can login. The LOGIN parameter can be
placed anywhere in the statement; it does not need to be at the end.

 TERM TO KNOW

CREATE ROLE
A statement that creates a new role in a PostgreSQL database.

3. Other Attributes
As you are creating new accounts with CREATE ROLE, you can optionally assign additional attributes. The
SUPERUSER attribute, for example, is one that can override all access restrictions in the database. It is a role
that should be used only when truly needed, as it can basically do anything in a database. It would look like this:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 27
CREATE ROLE adminaccount
SUPERUSER
LOGIN
PASSWORD 'secretpassword';
Another type of role is one that can create other databases in PostgreSQL. For this, the CREATEDB attribute is
needed:

CREATE ROLE dbaccount


CREATEDB
LOGIN
PASSWORD 'securePass1';
The VALID UNTIL attribute can also be used. This enables you to enter a date and time after which the role’s
password is no longer valid, and the user can no longer login. This can be useful for individuals who may only
work at the company for a short time, such as a contractor.

CREATE ROLE contractaccount


LOGIN
PASSWORD 'securePass1'
VALID UNTIL '2027-01-01';
After January 1st, 2027, the password for the role contractaccount is no longer valid.

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter one of the
examples above and see how it works. Next, try your own choices for which columns you want the query to
provide.

 SUMMARY

In this lesson you learned about roles, users, and groups, and how they are used to control
authentication and access. Each individual user has their own account, which can be created with the
CREATE ROLE or CREATE USER statement. Rather than assigning access privileges to these individual
user accounts, the best practice is to create group roles and then assign privileges to the groups. You
can create groups with the CREATE ROLE or CREATE GROUP statement.

As you are creating user accounts, you can optionally assign certain other attributes to them. Some of
the common ones you saw in this lesson include PASSWORD, LOGIN, SUPERUSER, CREATEDB, and
VALID UNTIL.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 28
Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

CREATE ROLE
A statement that creates a new role in a PostgreSQL database.

Group Role
A role that represents a group of users and is used to assign privileges to them.

Login Role
A role that determines the ability of a user to connect to the database.

Privilege
A permission assigned to a role, such as the ability to add or update records.

Role
Entity used for authentication and access control.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 29
CREATE ROLE to Create Groups
by Sophia

 WHAT'S COVERED

This lesson explores using the CREATE ROLE to create groups in a database, in two parts. Specifically,
this lesson will cover:
1. Introduction
2. Exploring the CREATE ROLE Attributes

1. Introduction
As you learned in the previous lesson, the CREATE ROLE statement can be used to create a user or a group,
depending on the parameters you use. This is something unique to PostgreSQL, as other databases typically
have different statements for creating users and groups. The previous lesson explained how to create a login
role for a user, with the LOGIN attribute and a password. In contrast, group roles do not have a LOGIN attribute
or a password. That's because group roles are not for logging in (that is, they are not for authentication).
Instead, they are for accessing resources; users assigned to the group will inherit any privileges that the group
has.

2. Exploring the CREATE ROLE Attributes


You have already seen the CREATE ROLE statement at work and learned about a few attributes, but let's take a
deeper look at it now, including some additional attributes. The structure of the CREATE ROLE command looks
like this:

CREATE ROLE <rolename>


<WITH>
[SUPERUSER]
[CREATEDB]
[CREATEROLE]
[INHERIT]
[LOGIN]
[CONNECTION LIMIT] [VALID UNTIL]
[IN ROLE];

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 30
As you learned in the previous section, adding GROUP after the role name makes it a group role to which you
can assign users.

Notice the various attributes in square brackets. As you learned in the previous lesson, this statement offers a
variety of special-purpose options you can include. They are all optional.

 HINT

The angle brackets and square brackets in the syntax shown do not appear in actual usage.
For each of the attributes, there is also a NO version that negates that attribute's use. For example, SUPERUSER
gives a user account the ability to override all access restrictions, and NOSUPERUSER denies the user that
ability.

Perhaps you are wondering why you would need to deny an ability; why wouldn't you just not assign it in the
first place? It becomes needed when one of those attributes is applied to a group. All the users in that group
automatically have that attribute too. If you want someone to be in the group but not have that attribute, you
can assign the "NO" version of the attribute to the specific user account.

We looked at the SUPERUSER, CREATEDB, VALID UNTIL, and LOGIN options in the prior lesson. As a reminder,
the SUPERUSER is one that has the ability to override all access restrictions. It can be dangerous, as the role
can drop any objects or access any data. The account must be a SUPERUSER to create a role using the
SUPERUSER attribute.

The CREATEDB attribute just defines if the role is able to create databases. Typically, a database administrator
role would benefit from this attribute.

The CREATEROLE attribute allows the user to create new, ALTER, and DROP other roles.

The INHERIT attribute enables the role to inherit the privileges of group roles that it is a member of. INHERIT is
the default, so you would not typically use it. What you might use, however, is NOINHERIT, which will prevent
inheritance

The CONNECTION LIMIT determines how many concurrent connections the role can make. The default is set to
-1, meaning that there is no limit.

IN ROLE will list one or more existing roles in which the new role would be immediately added as a member.
This could be a login role or another group role. For example, you could have an executive role that is set up to
be part of a management role so that the executive role will get all of the permissions that the management role
has.

There are other attributes that can be used with the CREATE ROLE that are very specific to a given scenario
that we will not cover.

We can use the different attributes together in a single CREATE ROLE statement. For example, you could have
created an admin role that can create databases and roles:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 31
CREATE ROLE adminrole
CREATEDB
CREATEROLE;
This would create a group role called adminrole.

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter in one of
the examples above and see how it works. Next, try your own choices for which columns you want the
query to provide.

 SUMMARY

In this lesson's introduction, you learned that the CREATE ROLE command is a powerful tool for
managing user access and permissions within the database system. This command allows
administrators to define roles with specific privileges and attributes, effectively grouping permissions
based on user roles or responsibilities. A role can represent an individual user, a group, or even an
application, enabling efficient and organized access control. With the CREATE ROLE command in
PostgreSQL, administrators can specify login options, password settings, and membership in other
roles. You explored the CREATE ROLE attributes that admins can use for different scenarios, whether
they want to create read-only roles, grant access to specific tables, or manage users with special
privileges. Once roles are defined, users can grant them, nest them within other roles, or use them to
enforce security policies in combination with other access control mechanisms. PostgreSQL's CREATE
ROLE command enables you to manage access rights versatilely and robustly. Streamlining permission
management and enhancing data security are both resulting benefits.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 32
GRANT to Assign Users
by Sophia

 WHAT'S COVERED

This lesson explores adding users to group roles in a database, in two parts. Specifically, this lesson will
cover:
1. Introduction
2. Examples

1. Introduction
As you learned in the previous lesson, you can create group roles to make it easier to assign privileges to user
roles (that is, user accounts). You have already learned how to create a group role, and in this lesson, you will
learn how to assign users to groups. A user can be a member of as many groups as needed; there is no
limitation.

 KEY CONCEPT

Groups can also be members of other groups. In other words, you can have one group "nested" within
another. For example, you might have a group called Employees, and one of the members of that group
might be another group called Managers. In database systems that support multiple inheritance, roles can
inherit privileges from multiple groups. In such a system, people in the Managers group would have all the
privileges assigned to Employees plus all the privileges assigned to Managers. However, PostgreSQL
doesn't support multiple inheritance, so you would need to assign a user to each group individually rather
than relying on nested groups.
The statement we use to assign users and privileges to a group is called GRANT. In the next section, you will
learn how to use GRANT to assign users to a group, and in the following lesson, you will learn how to use
GRANT to assign privileges.

 TERMS TO KNOW

Multiple Inheritance
In the context of database roles, the ability to inherit attributes or privileges from more than one role.

GRANT
A statement used to grant permission to use or access a role, or to assign privileges to a role.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 33
2. Examples
Let's look at an example of creating a group role and then assigning users to it.

First, we'll create a user role for an account named myadmin:

CREATE ROLE myadmin


LOGIN
PASSWORD 'mypassword';
Next, we'll create a group role called adminrole. As we create it, we use attributes of the CREATE ROLE
statement to assign it two important privileges: CREATEDB and CREATEROLE.

CREATE ROLE adminrole


CREATEDB
CREATEROLE;
Then, we will grant the adminrole role to the myadmin user role using the GRANT command:

GRANT adminrole TO myadmin;


You can add multiple users at once by listing the users separated by commas:

GRANT adminrole TO myadmin1, myadmin2, myadmin3;


Now, suppose we wanted to separate out the admin role into one that could create roles and a separate one
that could create databases:

CREATE ROLE adminrole_cr


CREATEROLE;
CREATE ROLE adminrole_db
CREATEDB;
We can grant permissions separately to myadmin by doing the following:

GRANT adminrole_cr TO myadmin;


GRANT adminrole_db TO myadmin;
The opposite of GRANT is REVOKE. Suppose we now want to take away the permission to create databases
from myadmin. Here's what that would look like:

REVOKE adminrole_db FROM myadmin;


Since both of those roles are group roles, you could grant them to each other, but only in one direction because
the database will not allow you to set up circular membership loops:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 34
GRANT adminrole_cr TO adminrole_db;
or

GRANT adminrole_db TO adminrole_cr;

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter one of the
examples above and see how it works. Next, try your own choices for which columns you want the query to
provide.

 TERM TO KNOW

REVOKE
A statement used to revoke a privilege.

 SUMMARY

In this lesson's introduction, you learned how to assign users to a group by using the GRANT
statement. This statement is used to assign both users and privileges.

You explored some examples of creating a user role and a group role, and then assigning the user to
the group. You also learned how to remove a user from a group via the REVOKE statement.

 TERMS TO KNOW

GRANT
A statement used to grant permission to use or access a role, or to assign privileges to a role.

Multiple Inheritance
In the context of database roles, the ability to inherit attributes or privileges from more than one
role.

REVOKE
A statement used to revoke a privilege.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 35
GRANT to Assign Privileges
by Sophia

 WHAT'S COVERED

This lesson explores the GRANT and REVOKE commands to assign privileges to roles, in three parts.
Specifically, this lesson will cover:
1. Introduction
2. Possible Privileges
3. Examples

1. Introduction
Each database object (such as a table, view, or index) has an owner, which is typically the user that created it.
For most object types, only its owner or a superuser can do anything with the object. If you need some other
user or group to be able to interact with the object, you must grant the privileges needed to make that possible.

2. Possible Privileges
There are many types of privileges that you can assign. Each privilege represents a permitted action. Here are
some examples:

SELECT—Allows the role to select from any column or from specific columns listed within a table, view, or
sequence. This privilege would also be required if there is a need to reference existing column values in an
UPDATE or DELETE statement.
INSERT—Allows the role to INSERT a new row within a table. If specific columns are listed, only those
columns may be inserted into the other columns automatically being set with default values.
UPDATE—Allows the role to UPDATE a column or a list of columns in a table. If the UPDATE privilege is
granted, the SELECT privilege should also be granted since it has to reference the table columns to
determine which rows of data should be updated.
DELETE—Allows the role to DELETE a row from a table. If the DELETE privilege is granted, the SELECT
privilege should also be granted since it has to reference the table columns to determine which rows of
data should be removed.
ALL—This grants access to all available privileges to which object.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 36
Other privileges beyond the scope of this course include TRUNCATE, REFERENCES, TRIGGER, CREATE,
CONNECT, TEMPORARY, EXECUTE, and USAGE, that allows interaction with objects.

3. Examples
Let us say we have a group role called employee_role. We may want to allow this role to query the employee
table (via SELECT) but not allow it to make any changes to the data. As such, we would run the following:

GRANT SELECT ON employee TO employee_role;


Alternatively, we could grant the employee_role editing and query access to the customer table by specifying
additional allowed statements:

GRANT SELECT, INSERT, UPDATE, DELETE ON customer TO employee_role;


For an admin_role, you might need to grant all privileges. This is done by adding ALL after GRANT:

GRANT ALL ON customer TO admin_role;


If we wanted to grant access to specific columns, we would add those columns in round brackets after the
privilege type. For example, if we wanted to grant UPDATE privileges to the employee_role on the product
table, but only on the unit_price, we would do the following:

GRANT UPDATE(unit_price) ON product TO employee_role;


This grants only the ability to update the price, but none of the other columns.

Now let's say we want to grant SELECT permission for the track table to all users. There is a special group role
in PostgreSQL called public. All roles belong to that group. So, you could use this statement:

GRANT SELECT ON track TO public;

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter one of the
examples above and see how it works. Next, try your own choices for which columns you want the query to
provide.

 SUMMARY

In this lesson, you learned how to use GRANT to grant the many types of possible privileges for various
types of access to database objects such as tables and views. The GRANT statement specifies the type
of privilege being granted and the database object to which the privilege is applied.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 37
Instead of assigning privileges directly to users, administrators typically assign privileges to group roles
and then assign those group roles to users, streamlining permission management and ensuring a
consistent access control strategy.

In the examples, you learned how to structure a GRANT statement to assign specific user privileges to
access certain objects in certain ways, based on statement names such as SELECT and UPDATE.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 38
Application Security
by Sophia

 WHAT'S COVERED

This lesson explores security concerns with applications that connect to databases, in three parts.
Specifically, this lesson will cover:
1. Confidentiality, Integrity, and Availability (CIA)
2. SQL Injections
3. Other Types of Security Attacks on Databases
3a. Cross-Site Scripting (XSS)
3b. Weak Authentication
3c. Weak Authorization
3d. Insecure Direct Object References (IDOR)
3e. Insecure Data Transmission
3f. Denial of Service (DoS) Attacks

1. Confidentiality, Integrity, and Availability (CIA)


When planning for robust database security, there are three main concerns to address: confidentiality, integrity,
and availability. These are often referred to as the CIA triad.

Confidentiality focuses on ensuring that the data is protected against unauthorized access or disclosure of
private information. Many organizations have to follow various laws and guidelines around data confidentiality,
like the Health Insurance Portability and Accountability Act (HIPAA) in medicine or the Sarbanes-Oxley Act (SOX)
in the business world. As such, the data stored within databases needs to be classified as highly restricted,
where very few individuals would have access (credit card information, as an example); confidential, where
certain groups would have information (pay information for employees, as an example); or unrestricted, where
anyone can have access.

Integrity focuses on ensuring that the data is consistent and free from errors. The database itself plays a key
role in data integrity, as we have seen in prior lessons. Protecting sensitive data often involves using encryption.
Through encryption, the underlying data is scrambled, using a key to a format so that it cannot be read as is.
There are many forms of encryption with various algorithms. With weaker encryption, you may see attackers
trying to decrypt the data by brute force, which means trying to guess the decryption key through iterative trial
and error.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 39
Availability focuses on the accessibility of the data when authorized users want access to it.

With applications, security vulnerabilities can occur due to many different factors. Individuals can sometimes
take advantage of bugs within the application that connects to the database. Issues can occur when code is
poorly developed or focused purely on functionality and not security. One example is session hijacking, when
an individual takes over a web session from another individual. By doing so, the individual can access certain
personal information from another user that they may not have been able to access otherwise.

 TERMS TO KNOW

CIA Triad
The combination of confidentiality, integrity, and availability that ensures data and systems are fully
protected from security threats.

Confidentiality
The assurance that private data is not accessible to unauthorized persons.

Availability
The assurance that data will be available when it is needed.

Integrity
The assurance that data is consistent and free from errors.

2. SQL Injections
SQL injections are one of the most common web hacking techniques used with applications. They can damage
your database or provide complete unauthorized access to the database if things are not well protected. SQL
injections typically work by attempting to add malicious code into SQL statements through the use of web page
input.

This can occur if the page asks for input like a userid and concatenates the input to the SQL string. For
example, from a program, we may have something that looks like this:

my input = readFromInput('userid');
mysql = 'SELECT * FROM users where user_id =' + myinput;
If the user enters in 5 for the userid, this will create a SQL statement like this:

SELECT *
FROM users
WHERE user_id = 5;
That would be the expected result. However, if the “hacker” entered in “5 or 1=1”, the following SQL statement
would look like this:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 40
SELECT *
FROM users
WHERE user_id = 5 or 1=1;
If we look at the statement, the 1=1 will always return true, which means that the query would return every single
row within the table. Imagine if the table had usernames, passwords, and other user information. All of that
would now be compromised in this successful SQL injection. Or perhaps the input could look like “5; DROP
TABLE customer;”. The resulting query would look like this:

SELECT *
FROM users
WHERE user_id = 5; DROP TABLE customer;
If this SQL injection is successful, it could potentially drop the table customer, which is also quite problematic.

Different databases handle SQL injection issues slightly differently. To avoid these types of scenarios, what is
important is that the application first ensures that the input data has been validated before sending the query to
the database. We also want to filter input data to avoid the user bypassing our validation. By filtering the user
input, we can ensure that we're checking for any special characters that should not be included. In many
applications, the use of SQL parameters can help.

 TERM TO KNOW

SQL Injection
A hacking technique that inserts malicious SQL statements into a data entry field in an attempt to
execute the statements on the database.

3. Other Types of Security Attacks on Databases


In addition to SQL injection, there are many other types of attacks that cybercriminals use to try to get
unauthorized access to a database's data. These include:

3a. Cross-Site Scripting (XSS)


This is an attack on a website that inserts dangerous Javascript code into the page's HTML. It isn't specific to
pages that provide database interfaces, but the scripting can affect the security of the database that the
webpage accesses. Input validation can help minimize the risk of XSS. So can implementing a content-security-
policy (CSP), which can restrict the sources from which scripts can be loaded.

3b. Weak Authentication


Database systems that allow weak, easy-to-guess passwords are at risk from hacking software that guesses
passwords.

3c. Weak Authorization

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 41
Database systems that grant unnecessary privileges to user accounts can be easier to exploit after gaining
entry via stolen or guessed user credentials. Following the principle of least privilege—that is, not allowing users
any more privileges than they need—can restrict the amount of damage an attacker can do if they gain access
through stolen credentials.

3d. Insecure Direct Object References (IDOR)


This occurs when a web-based application exposes internal implementation details, such as database keys or
filenames, in URLs or other parameters. Attackers can manipulate those references to access unauthorized
data.

3e. Insecure Data Transmission


If data is transmitted between the application and the database without encryption, it can be intercepted and
tampered with by attackers. For example, an attacker might capture the stream of data and glean usernames
and passwords from it.

3f. Denial of Service (DoS) Attacks


Attackers overwhelm the application with a large volume of requests or queries, which degrades the database's
performance and can even render it unavailable.

To mitigate these risks, developers should:

Follow secure coding practices


Implement proper input validation
Enforce least privilege access controls
Enforce strong authentication and authorization mechanisms
Encrypt sensitive data
Regularly update the database application
Conduct security assessments and audits

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter one of the
examples above and see how it works. Next, try your own choices for which columns you want the query to
provide.

 SUMMARY

In this lesson, you learned that CIA stands for confidentiality, integrity, and availability. It represents the
fundamental principles that guide secure systems design and implementation. Confidentiality ensures
that data is only accessible to authorized individuals or entities, preventing unauthorized disclosure.
Integrity guarantees data accuracy and consistency, safeguarding against unauthorized modifications

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 42
or tampering. Availability ensures that data and services are accessible when needed, preventing
downtime and ensuring uninterrupted operations. These three principles collectively form the
foundation for creating secure and resilient information systems.

SQL injection is a severe cybersecurity vulnerability in PostgreSQL and other database systems. This
vulnerability occurs when malicious actors exploit poor input validation and execute arbitrary SQL code.
This vulnerability arises when an application fails to properly sanitize user inputs before incorporating
them into SQL queries. Attackers can manipulate input fields, such as login forms or search bars, to
inject malicious SQL statements. The database server executes these statements. In PostgreSQL, this
could lead to unauthorized access, data theft, data manipulation, or even a complete database
compromise. By understanding SQL injection risks and implementing preventative measures,
developers can help ensure the security and integrity of PostgreSQL databases and the applications
that interact with them.

In addition to SQL injection, you learned there are many other types of security attacks on databases,
involving cross-site scripting (XSS), weak authentication, weak authorization, insecure direct object
references (IDOR), insecure data transmission, and denial of service (DoS) attacks, and briefly covered
ways that developers can mitigate these risks.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Availability
The assurance that data will be available when it is needed.

CIA Triad
The combination of confidentiality, integrity, and availability that ensures data and systems are fully
protected from security threats.

Confidentiality
The assurance that private data is not accessible to unauthorized persons.

Integrity
The assurance that data is consistent and free from errors.

SQL Injection
A hacking technique that inserts malicious SQL statements into a data entry field in an attempt to
execute the statements on the database.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 43
Superusers
by Sophia

 WHAT'S COVERED

This lesson explains why superuser accounts are useful but also a potential security risk, and provides
best practices for minimizing the security risks they pose, in three parts. Specifically, this lesson will
cover:
1. Understanding the Superuser Role
2. Security Risks Posed by Superusers
3. Protecting Against Superuser Misuse
3a. Limit Who Has Superuser Access
3b. Separate Duties
3c. Password Policy
3d. Multi-Factor Authentication
3e. Location-Based Access
3f. Audit Logging
3g. Encryption
3h. Enable Logging and Monitoring
3i. Review Privileges Regularly

1. Understanding the Superuser Role


As you learned in previous lessons, you can create a login user role for each user who needs to access your
database system. That user role can be granted certain privileges and can be included in a group from which it
can inherit privileges.

Whether it is an operating system or a database, every system needs at least one user account that has full
privileges to do everything with every command, function, and resource. That need is met by a special role
called superuser. A superuser can bypass all permission checks and access all types of powerful operations. It
has unrestricted access to everything in the database.

 KEY CONCEPT

A database superuser does not automatically have unlimited access to the operating system of the server
on which the database system runs. Superuser status applies only to activities within the database system.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 44
 TERM TO KNOW

Superuser
An account role that has unlimited access and privilege to every object and activity in the database
system.

2. Security Risks Posed by Superusers


Because of the power of the superuser role, database administrators must take great care to make sure the role
is not misused, either intentionally or by accident. An attacker with access to a superuser account could do
unlimited damage, up to and including permanently deleting the entire database.

EXAMPLE Rather than deleting the database, which would be immediately noticed, an attacker might
act more stealthily.

For example, if an attacker had temporary access to the superuser role in a database, they could create
additional login roles that would enable them to sign in later and cause further damage or steal more
information. Even if the superuser role they were using to breach the system were removed, they could still
have a way into the system via those other roles they created.

They could even remove evidence of their activity within the system. For example, an intruder with a
superuser role could create orders within a system and have items that they did not purchase sent to them.
Once the order has been sent out, they could delete it and any related data so that the system has no
information about that order.

3. Protecting Against Superuser Misuse


Minimizing the security risks associated with a superuser account is essential to maintain the database's
confidentiality, integrity, and availability. Here are some practices to help mitigate these risks.

3a. Limit Who Has Superuser Access


Grant superuser privileges only when absolutely necessary, applying the principle of least privilege. In a large
organization, some IT and database professionals may never—or seldom—need superuser access.

 TERM TO KNOW

Principle of Least Privilege


A security best practice that dictates that each account should have only the privileges it needs for the
user to accomplish their assigned work.

3b. Separate Duties

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 45
Avoid using the superuser account for routine tasks. This helps in limiting the scope of potential damage if the
account is compromised.

3c. Password Policy


Enforce strong password policies for superuser accounts. Require complex passwords with a combination of
uppercase and lowercase letters, numbers, and special characters. Change passwords periodically to mitigate
the risk of unauthorized access.

3d. Multi-Factor Authentication


Implement multi-factor authentication (MFA) for superuser accounts to add an extra layer of security. For
example, in addition to a username and password, a user might need to enter a code from an authenticator app
on a smartphone or key fob. This can prevent unauthorized access even if the password is compromised.

 TERM TO KNOW

Multi-Factor Authentication (MFA)


A security measure that requires multiple authentication methods for a user to access a system.

3e. Location-Based Access


Limit access to superuser accounts by IP address or network range. Utilize firewall rules or database settings to
restrict connections to computers on the internal local area network or from a certain network address.

3f. Audit Logging


Enable comprehensive audit logging for superuser activities. This allows tracking of all actions performed by
superusers, aiding in identifying any suspicious or unauthorized activities.

3g. Encryption
Encrypt data at rest and in transit to protect sensitive information from unauthorized access. Use SSL/TLS for
secure communication between clients and the database server.

3h. Enable Logging and Monitoring


Configure the database to log all superuser activity and monitor these logs regularly for any suspicious
behavior.

3i. Review Privileges Regularly


Periodically review and audit the privileges assigned to superuser accounts. Remove unnecessary privileges
and roles to minimize the potential impact of a security breach.

By implementing these best practices, you can significantly reduce the security risks associated with superuser
accounts.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 46
 SUMMARY

In this lesson, you developed an understanding of a superuser role as an all-access pass to every part
of a database system. This powerful role can wreak havoc on a database in the wrong hands, so the
existence of superusers poses a security risk. You learned a number of ways to protect against
superuser misuse, including limiting who has superuser access, applying the principle of least
privilege when granting privileges to accounts; separating duties; implementing password policies and
multi-factor authentication to make it harder for an attacker to sign into a superuser account;
leveraging location-based access, limiting access to superuser accounts by IP address or network
range; enabling comprehensive audit logging for superuser activities; encrypting data at rest and in
transit; enabling the logging and monitoring of superuser activity; and reviewing privileges regularly,
removing unnecessary privileges and roles to minimize the potential impact of a security breach.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Multi-Factor Authentication (MFA)


A security measure that requires multiple authentication methods for a user to access a system.

Principle of Least Privilege


A security best practice that dictates that each account should have only the privileges it needs for
the user to accomplish their assigned work.

Superuser
An account role that has unlimited access and privilege to every object and activity in the database
system.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 47
Index Overview
by Sophia

 WHAT'S COVERED

This lesson explores the different types of indexes in databases, in two parts. Specifically, this lesson
will cover:
1. Understanding Indexes
2. Types of Indexes

1. Understanding Indexes
By default, primary keys and columns with the UNIQUE constraint have indexes created for them. You can also
create indexes on any other columns where you think having one might be useful.

The purpose of a database index is to improve the efficiency of data retrieval operations. Based on the values
of one or more columns, it quickly retrieves specific rows from within a database table. The database system
organizes data by creating an index on a column or set of columns to speed up retrieval, search, and filtering.

An index allows the database engine to quickly pinpoint the relevant rows when a query is executed based on
indexed columns, reducing the need to search through the entire table. Large datasets are processed more
quickly as a result, especially when dealing with large and complex queries. Indexes can also be helpful when
running UPDATE and DELETE statements that include filtering criteria.

Even though indexes greatly improve read operations, they can sometimes negatively affect the performance of
data modification operations (such as inserts, updates, and deletes) since the indexes must be updated
whenever the underlying data changes. The right balance between query performance and data modification
efficiency can be achieved by strategically creating indexes for frequently searched and retrieved columns.

EXAMPLE Let us look at a query run on our track table:

SELECT * FROM track WHERE album_id = 5;

If no indexes on the album_id column are created for this table, the database would have to scan row by row
through the entire table to find all of the matching rows. If there were many rows in this table and we only
expected to have a few rows returned, this can be quite an inefficient search. Imagine if there were one million
rows in the table, and this album_id did not even exist. This means that the database would have to search

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 48
through all one million rows to identify no matching values. If the database has an index on the album_id
column, however, the database would be able to find the matching data much more efficiently.

 KEY CONCEPT

The concept of database indexes is quite similar to the alphabetical index at the end of a book. With a
book, an individual can quickly scan through the index to find the topics they're interested in and what page
to turn to. This is a much faster approach than reading through the entire book to find the content you want.
The usefulness of a book index depends on the person who creates the topic list. Similarly, the database
developer must determine what indexes could be useful.
After an index is created, the database system uses it automatically to assist with data retrieval and updates it
whenever the table data is modified. Indexes can be useful not only in SELECT statements with joins on the
indexed columns but also in UPDATE and DELETE commands that have filtering criteria.

We do not want to index every single column in a database, for two reasons. One is that creating an index can
take a long time. Another is that maintaining the index uses system resources. Every INSERT, DELETE, and
UPDATE statement that affects the index uses memory and CPU resources to keep the index in synch with the
table. For that reason, if an index is not frequently used, it should ideally be removed.

 TERM TO KNOW

Index
A database object that improves the speed of data retrieval operations on a database table at the
expense of additional writes and storage space.

2. Types of Indexes
Optimizing and managing databases effectively requires an understanding of different index types.
Understanding each index type's strengths and limitations can help database administrators, and developers
make informed decisions when optimizing data access and query performance.

Each index type has its own benefits and drawbacks that make it ideal for certain kinds of searches and not so
ideal for others. Database professionals who understand the types can select the best index type for a situation
based on the nature of the data and the expected query workload. With this proactive approach, queries are
more efficient, data is retrieved more quickly, and database efficiency is improved.

Most databases will implement indexes using one of the following data structures:

Hash Index Data Types: Suitable for exact match queries.

Queries: Efficient for comparisons using the = operator.

Best For: Columns with uniform distribution and where exact matches are frequent.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 49
This type can handle only simple equality operators, but it does so very well.

Data Types: Suitable for almost all data types.

Queries: Ideal for equality and range queries.

Best For: Frequently accessed columns, such as primary keys or columns used in WHERE
B-Tree
clauses for equality comparisons. Columns where most records contain a unique value.
Index

This is the most common type of index. It organizes data in a hierarchical format, like an upside-
down tree. It takes about the same amount of time to access any row, regardless of the index
size. For example, with one million rows, it will take fewer than 20 searches to find the data,
compared to one million searches if the search were done sequentially.

Data Types: Suitable for columns with a low cardinality (a small number of distinct values).

Bitmap
Queries: Great for querying Boolean or categorical data.
Index

Best For: Columns with few distinct values (e.g., gender, Boolean flags).

Data Types: Designed for text data.

Full-Text
Queries: Ideal for searching within text fields using keywords or phrases.
Index

Best For: Text-heavy columns like article content, product descriptions, etc.

Data Types: Any data types.

Queries: Useful for queries involving multiple columns, especially when those columns are often
Composite
used together in WHERE clauses.
Index

Best For: Queries that involve multiple columns in the WHERE clause or involve multiple
conditions.

Data Types: Geospatial data types (points, lines, polygons).

Spatial Queries: Optimal for spatial queries, such as finding nearby locations or areas within a certain
Index distance.

Best For: Geospatial data, like maps, GPS coordinates, etc.

 SUMMARY

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 50
In this lesson, you understood that an index within a database management system speeds up and
improves data retrieval operations. Indexed columns make it easier to locate specific data rows based
on their values. It minimizes the amount of time it takes to retrieve relevant data by reducing the need
for full table scans. Indexes are especially important for maintaining acceptable query response times
for large dataset databases. As well as maintaining data integrity, database indexes enforce unique
constraints. You learned about the different types of indexes and what data types and query types they
are optimized for. Some of the types you learned about include B-tree, hash, bitmap, spatial, composite,
and full-text.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Index
A database object that improves the speed of data retrieval operations on a database table at the
expense of additional writes and storage space.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 51
B-Tree Index
by Sophia

 WHAT'S COVERED

This lesson explores using B-tree indexes in databases, in three parts. Specifically, this lesson will
cover:
1. Understanding B-Tree Indexes
2. The Benefit of a B-Tree Index
3. When Is a B-Tree Index Useful?
4. When to Use a B-Tree Index

1. Understanding B-Tree Indexes


A B-tree index (short for balanced tree) is a common index type in relational databases.

The B-tree is composed of nodes. A key is a representation of the data you are searching for. Each node
contains a different small subset of all the keys. A value is the data associated with a key.

The topmost node is called the root node, and all the other nodes are child nodes.

The search algorithm begins at the root node and then uses the keys to navigate through the nodes. At each
node, it looks for the key it is seeking. If it doesn’t find it, it moved to one of that node’s child nodes.

Here’s a very simple example. In the following tree, the root node contains one key: 30. The search begins
there. Is the desired key 30? If yes, then the search is over. If not, it moves to one of the next-level child nodes.
But which one?

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 52
Look closely and you’ll see that the numbers in the left-hand node in the second level are less than 30, and the
numbers in the right-hand node are more than 30. So, the search continues by first evaluating whether the
number it is looking for is greater than or less than 30. Then it moves to either the left or right node accordingly.

At the next level, the algorithm evaluates another condition. Is the desired number less than the smaller
number, between the two numbers, or greater than the larger number? Depending on the answer, it selects one
of the nodes at the next level to move to.

The lines that connect one node level to the next are called pointers. The nodes that are between the root
node and the leaf nodes are called internal nodes.

 HINT

The number of keys within a node determines the number of keys at the next child level: It’s always the
number of keys plus 1. For example, in the above diagram, the second-level node on the left contains two
keys and has three child nodes; the one on the right contains three keys and has four child nodes.
And so the algorithm goes, moving downward through the levels, until it finds the desired value. The diagram
above shows only three levels, but an actual B-tree may have more. At the bottommost level, the nodes are
called leaf nodes. The leaf node contains pointers to the table location containing the data being looked up.

B-trees maintain a balanced structure, meaning that all leaf nodes are at the same level. It achieves this
balance by redistributing keys between nodes as needed when insertions or deletions occur.

 TERMS TO KNOW

B-Tree Index
A type of index that uses an upside-down tree to create a hierarchical system of nodes that reference
the data being searched.

Node
A logical unit within a B-tree that contains a set of keys and pointers.

Key
A representation of the data being searched for.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 53
Value
The data associated with a key.

Root Node
The node at the top of a B-tree.

Child Node
A node that appears below another node (its parent node) in a B-tree.

Pointer
A connector between one node level and the next.

Internal Node
A node located between the root node and the leaf nodes in a B-tree.

Leaf Node
A node that contains pointers to the table location containing the data being looked up.

Balanced Structure
A B-tree index structure in which all leaf nodes are at the same level.

2. The Benefit of a B-Tree Index


In an indexless search, the database system would need to search the entire table to find the desired value. If
there were 10,000 records, it would take up to 10,000 search operations, where it would go through the table
line by line comparing the desired value to that record's value.

In contrast, a B-tree search can find any value in much fewer operations. In the above example, there are 20
keys, each one representing a unique value in the database, but finding any key will take a maximum of three
operations—one at each level of the tree. Some keys will take fewer than that to find.

3. When Is a B-Tree Index Useful?


When you add indexes to tables, you generally do not have to worry about the best index type to use because
the database will handle it for you. However, as a database administrator, it is useful to understand the different
index types so you can choose the type that would provide the most benefit depending on the way the table is
used. Let's look at some examples of situations where a B-tree would or would not be very useful.

Let's say you have a query that uses a pattern-matching operator such as LIKE with a constant value that is
anchored at the start of the string. A B-tree index can be a great help with that. For example, we could look for
patterns of the name that starts with “Wal”:

SELECT *

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 54
FROM track
WHERE name LIKE 'Wal%';
Or for customers that have an email address starting with “ro”:

SELECT *
FROM customer
WHERE email LIKE 'ro%';
However, the B-tree index would not be useful if we tried to find information in the middle or at the end of the
string, like tracks that have “at” in the middle name:

SELECT *
FROM track
WHERE name LIKE '%at%';
Or customers that have the email with “gmail.com” as the domain name:

SELECT *
FROM customer
WHERE email LIKE '%@gmail.com';
Other queries on data are based on ranges. For example, you could have open-ended ranges like this:

SELECT *
FROM track
WHERE album_id >=5;
Or those that have specific ranges that contain values:

SELECT *
FROM track
WHERE album_id >= 5 AND album_id <=10;
This is the same as if we had the following:

SELECT *
FROM track
WHERE album_id BETWEEN 5 AND 10;
Each of those could potentially benefit from a B-tree index. A B-tree index is well suited for range queries
because it allows for efficient traversal of the index to find all the entries within the range.

SELECT *
FROM track
WHERE album_id <= 5 AND album_id >=10;

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 55
Here we are looking for items with the album_id less than or equal to 5, while at the same time looking for the
album_id being greater or equal to 10. As the album_id cannot be a value that simultaneously meets that
criterion, no rows would be returned. More importantly, this would not fit the B-tree index well. Even if we use
the OR operator, it will not be as efficient as having the overlapping range:

SELECT *
FROM track
WHERE album_id <= 5 OR album_id >=10;

4. When to Use a B-Tree Index


A B-tree index is great for instances where you have values that repeat only a few times or are completely
unique. In our PostgreSQL database, for example, the track table's name column may have a few repeated
names, but for the most part, the names on the tracks are all different. As such, a B-tree index would be very
helpful.

A B-tree index is not so great in instances where you have just a few different values. For example, if you are
searching on a Boolean data type field that contains only Yes or No values, a B-tree index would not be of much
use at all because there are not a lot of different values that need to be waded through.

 KEY CONCEPT

When considering any type of index, it's important to weigh the costs and benefits. Because a B-tree index
must maintain a balanced structure, every time a record is inserted or deleted, the tree must be rebalanced,
and keys must move between nodes. This takes processing time. As such, a B-tree index may not be the
best choice for a table that is constantly or frequently gaining or losing records.

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter one of the
examples above and see how it works. Next, try your own choices for which columns you want the query to
provide.

 SUMMARY

In this lesson, you learned how B-tree indexes function and how they benefit certain types of queries.
You developed an understanding of B-tree indexes, learning that a B-tree index is composed of nodes
that contain keys and pointers. A key is a representation of data values you are searching for. The
topmost node is the root node. A search begins at the root node and progresses through internal
nodes until it finds the key it is searching for. The bottom layer contains leaf nodes. A B-tree index
remains balanced, such that all leaf nodes are at the same level; it shuffles keys between nodes as
needed to make that happen.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 56
You learned that the benefit of a B-tree index is that it can dramatically decrease the number of search
operations needed to locate data in a table with a large number of records. Then, you learned when a
B-tree is useful by exploring several examples of situations where a B-tree index would or would not
be useful. B-tree indexes are best in situations where records are not frequently being added and
removed, and where the field being searched contains many different values, or even unique values for
each record. When using B-tree indexes with pattern-matching operations, it works best when the
pattern is a constant and is anchored at the start of the string. Lastly, you examined when to use a B-
tree index, noting that B-tree indexes do not work well when searching for a pattern that is in the
middle or end of a string, or when querying for a Boolean value.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

B-Tree Index
A type of index that uses an upside-down tree to create a hierarchical system of nodes that
reference the data being searched.

Balanced Structure
A B-tree index structure in which all leaf nodes are at the same level.

Child Node
A node that appears below another node (its parent node) in a B-tree.

Internal Node
A node located between the root node and the leaf nodes in a B-tree.

Key
A representation of the data being searched for.

Leaf Node
A node that contains pointers to the table location containing the data being looked up.

Node
A logical unit within a B-tree that contains a set of keys and pointers.

Pointer
A connector between one node level and the next.

Root Node
The node at the top of a B-tree.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 57
Value
The data associated with a key.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 58
Hash Index
by Sophia

 WHAT'S COVERED

This lesson explores the use of hash indexes in databases, in three parts. Specifically, this lesson will
cover:
1. Understanding Hash Indexes
2. Uses for a Hash Index
3. Examples

1. Understanding Hash Indexes


A hash index offers fast lookup times for exact match queries but is not suitable for other types of queries. That
means when you are choosing an index type when indexing a particular column, you must consider the type of
queries you will most often run when deciding if a hash index is the best choice.

A hash function is a math algorithm that takes an input (such as a data value stored in a specific row) and
produces a fixed size string of characters called a hash value. The same input will always produce the same
hash value.

In hash indexing, every time you enter a value in the column being indexed, the database runs that value
through the hash algorithm for that index and stores the resulting hash value in the hash table, along with a
pointer to the corresponding original value in the table.

When you want to look up a value in the indexed column, the database uses your query input to compute the
hash value for it, using the same hash function. Then it looks up the hash value in the hash table and uses the
associated pointer to quickly jump to the matching row in the dataset.

 HINT

One of the nice things about a hash index is that it offers constant-time retrieval. That means the time it
takes to find an item doesn't increase when the size of the collection increases.

 TERMS TO KNOW

Hash Index

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 59
An index that stores the hashes of the values of the indexed fields and looks up data by hashing the
query's request and comparing it to the hash values.

Hash Function
An algorithm that computes the hash value of data by performing math operations on it.

Hash Value
The result of running data through a hash function.

2. Uses for a Hash Index


Hash indexing can greatly speed retrieval times for exact match searches because it directly maps the search
key to the data location.

However, hash indexes are valuable only when you are running an equality search—that is, a query with a
WHERE clause that uses an equality operator (an = sign).

The main drawback of a hash index is its specific focus. Hash indexes are not suitable for queries involving
ranges because the hash value represents a single value; it can't represent a range.

It's also not good in situations where multiple data values happen to produce the same hash value, and the
index returns erroneous results. This situation is known as a hash collision.

EXAMPLE Suppose the hash function did something as simple as counting the number of characters in
the value. For example, the name Bob would have a hash value of 3. The problem with that is that every
name with three letters would also have a hash value of 3, like Ann and May. There are internal techniques
for handling collisions, but multiple collisions can degrade the search speed and negate the value of using
the index.

 TERM TO KNOW

Hash Collision
A situation in which multiple data values produce the same hash value, so the index returns erroneous
results.

3. Examples
Let us take a look at instances where a hash index would be ideal:

SELECT *
FROM track
WHERE name = 'Walk On';

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 60
SELECT *
FROM track
WHERE album_id = 5;

SELECT *
FROM customer
WHERE country = 'USA';
What do these queries have in common? They all use the equality operator. On the other hand, queries like the
following ones would not use the hash index, as they don’t use the = operator:

SELECT *
FROM customer
WHERE email LIKE 'ro%';

SELECT *
FROM track
WHERE album_id >=5;

SELECT *
FROM track
WHERE album_id >= 5 AND album_id <=10;

 SUMMARY

In this lesson, you developed an understanding of the basics of hash indexes, including how they work
and what they are good for. You learned that a hash index is created by running data values through a
hash function, which is an algorithm that generates a fixed length value and places it in a table cell.
That cell also contains a pointer to the actual data that the hash value represents. When a search is run,
the search request itself is hashed, and then that value is compared to the hash values in the hash
table to locate the matching entry and direct the query to the associated row in the table. When
multiple data values happen to produce the same hash value, it is known as a hash collision. Collisions
can be worked around, but they slow things down, partially or fully negating the value of the index.

You also learned to identify uses for a hash index. Hash indexes are specialized and beneficial only
when running equality queries—that is, queries that use the WHERE clause and an = sign. They are not
good for queries involving ranges. Lastly, you saw some examples of queries that are good or not so
good for hash indexes.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 61
 TERMS TO KNOW

Hash Collision
A situation in which multiple data values produce the same hash value, so the index returns
erroneous results.

Hash Function
An algorithm that computes the hash value of data by performing math operations on it.

Hash Index
An index that stores the hashes of the values of the indexed fields and looks up data by hashing
the query's request and comparing it to the hash values.

Hash Value
The result of running data through a hash function.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 62
Create and Drop Indexes
by Sophia

 WHAT'S COVERED

This lesson explores using the CREATE INDEX and DROP INDEX commands to create and remove
indexes, in two parts. Specifically, this lesson will cover:
1. Creating an Index
2. Dropping an Index

1. Creating an Index
So far in this challenge, you have learned the potential benefits of creating indexes, and about the different
types of indexes you can create. Now it's time to learn how to actually create an index as well as how to drop
(delete) one.

The syntax for creating an index—using the CREATE INDEX statement—is as follows:

CREATE [UNIQUE] INDEX <indexname>


ON<tablename> (<columnname>) [USING method];
Notice the optional UNIQUE clause in that syntax. When you create an index, you can optionally enforce having
unique values in that column. For example, the following statement creates an index on the customer_email
column and also sets a constraint requiring the values in the customer_email column to be unique for each
record.

CREATE UNIQUE INDEX idx_customer_country


ON customer(country);
If there are already duplicates in that column, the index creation will fail.

In the above examples, note that the index name is written as “idx” plus the table name and then the country
name. This naming convention is common but not required; you can name the index anything you like.

The UNIQUE constraint would not be appropriate to use in a column where there are some duplicates—or
might be in the future. For example, if you tried to create an index on the country column in the customer table,
an error would appear if there were already multiple customers in the same country.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 63
CREATE UNIQUE INDEX idx_customer_country
ON customer(country);

You could, however, create an index on the country in general:

CREATE INDEX idx_customer_country


ON customer(country);
By default, PostgreSQL creates a B-tree index. However, the PostgreSQL versions 10 and later support
specifying an index type with the USING clause. For example, to use a hash index, you would specify it at
creation like this:

CREATE INDEX idx_customer_country USING hash


ON customer(country);

 TERM TO KNOW

CREATE INDEX
A statement that enables you to create a new index for a column in a table.

2. Dropping an Index
If an index isn't helping with query performance, we typically would want to remove it to ensure it does not
impair performance. The syntax for the DROP INDEX statement is as follows:

DROP INDEX [CONCURRENTLY] [IF EXISTS] <indexname> [CASCADE or RESTRICT or FORCE];


Let's examine the statement's options.

The CONCURRENTLY option allows you to drop the index without blocking concurrent operations on the table.
That's useful for large tables where blocking other operations for the duration of the index drop would not be
acceptable.

The IF EXISTS option prevents an error from occurring if the index does not exist. For example, the following
would execute whether or not the index exists:

DROP INDEX IF EXISTS idx_customer_country;


The CASCADE option automatically drops objects that depend on the index. For example, if there are foreign
key constraints or views dependent on the index, they will be dropped too. For example:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 64
DROP INDEX idx_customer_country CASCADE;
The RESTRICT option prevents the drop operation if there are any dependent objects on the index. RESTRICT
is the default behavior if you do not specify CASCADE or FORCE. For example:

DROP INDEX idx_customer_country RESTRICT;


The FORCE option drops the index without checking if any objects depend on it. Use this option with caution,
as it can lead to data integrity issues.

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter one of the
examples above and see how it works. Next, try your own choices for which columns you want the query to
provide.

 TERM TO KNOW

DROP INDEX
A statement that enables you to drop (delete) an index.

 SUMMARY

In this lesson, you learned that the CREATE INDEX statement is used to create an index in a database
table using specific SQL columns. This command enhances data retrieval efficiency by creating a data
structure that improves query performance. With customized indexes, queries involving indexed
columns can be processed more quickly. Based on the data and query patterns, different types of
indexes, such as B-trees, hashes, and bitmaps, offer varying benefits.

A DROP INDEX command, on the other hand, removes an existing index from a database table.
Dropping an index can be necessary when the index becomes redundant due to changes in query
patterns or data characteristics. It can also be required when reclaiming storage space is necessary.
Before dropping indexes, however, it is important to consider their impact on query performance.
Dropping an index might cause queries to execute slower, especially if the dropped index is used in the
query. The database's requirements and performance implications should be thoroughly considered
when dropping an index.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 65
CREATE INDEX
A statement that enables you to create a new index for a column in a table.

DROP INDEX
A statement that enables you to drop (delete) an index.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 66
Create a Backup
by Sophia

 WHAT'S COVERED

This lesson explores using the command line to back up a PostgreSQL database, in two parts.
Specifically, this lesson will cover:
1. Introduction
2. PostgreSQL Backup Tools

1. Introduction
Depending on the choice of database, backing up a database can take various forms. In most cases, there is a
command line option and a graphical user interface option that functions by using the command line behind the
scenes. These steps are important to save the data and state of the database so that it can be recovered if
there are any issues. There are different types of database backup options, such as full or partial data, or being
able to choose to back up structures.

The tools you will learn about in this lesson perform an operation known as a dump. Dumping is the process of
extracting data or information from a system or application and storing it in a file or memory space for backup,
analysis, or debugging. Dumping is roughly analogous to backing up, but dumps can have uses other than
backups (for example, debugging or analysis) and may not always be comprehensive or suitable for directly
restoring a system or dataset without some type of manipulation.

 TERM TO KNOW

Dumping
The process of extracting data or information from a system or application and storing it in a file or
memory space for backup, analysis, or debugging.

2. PostgreSQL Backup Tools


In PostgreSQL databases, we have the pg_dump and pg_dumpall tools. These will not work within our web
interface tool, as the tool is logged into PostgreSQL already. However, you can test the commands if you have
PostgreSQL installed locally.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 67
 KEY CONCEPT

The pg_dump and pg_dumpall tools do not actually perform the backup by copying the database files.
Instead, they extract the structure and data from the database and represent them in a text format that can
be executed as SQL commands to rebuild the database. This method provides a flexible and efficient way
to back up and restore PostgreSQL databases.
There are a lot of different parameters that can be used. We will explore some of the more common options.

Let us first look at a complete command that will back up the mydb database to a mydb.sql file in the
c:\backups\ folder on a Windows system, using the user adminrole:

pg_dump -U adminrole -W -F t mydb > c:\backup\mydb.sql


Let's break this down:

pg_dump is the command line tool. The operation that pg_dump performs—while we might say it is a type
of backup—is more commonly referred to as a dump.
U adminrole specifies the user role that will connect to the database. In this case, we are using the
adminrole to login to perform the backup.
W prompts the pg_dump command to prompt for the password on adminrole, adding some security when
including this command in a script.
F specifies the output file format. In this case, the t stands for plain-text SQL script file. The alternative is -F
d to create a TAR file (an archive format similar to ZIP).
mydb is the database that we want to back up.
> is like an arrow that “points” to the destination for the backup.
c:\backup\mydb.sql is the output backup file name and path that we are backing up to. We could optionally
omit the path and just provide the file name, to output the backup file to the location from which we are
running the command.

Here are some additional options:

Backs up only the data, not the schema. This option is useful only if backing up the data in
a
plain text format.

s Backs up only the schema, not the data.

c Cleans the database objects using DROP statements first before creating them.

t <tables> Backs up only specific tables that match the table name that is passed. For example, -t
</tables> employee would only back up the employee table.

T <tables>
Backs up all tables other than those listed.
</tables>

Starts the output with a command to create the database and reconnect to the created
C
database. Using this option, you don't have to create the database first.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 68
 HINT

The pg_dumpall command backs up all of the databases on a server. This is not commonly done, as we
typically want to back up specific databases. It exports all of the databases into a single file, so restoring
from this can be unreliable. It uses the same options as the pg_dump command. The only difference is that
the -W option is not used, as you wouldn’t want to have to type the password for each individual database.

 TERMS TO KNOW

pg_dump
A utility for backing up a single PostgreSQL database.

pg_dumpall
A utility for backing up all the databases on a server.

 SUMMARY

In this lesson's introduction, you learned that there are different options for backing up a database,
including both command line and graphical interface options.

Next, you learned about two PostgreSQL backup tools, pg_dump and pg_dumpall.; pg_dump is a
command line utility that enables users to back up their databases. The program does not actually
perform the backup, but rather it generates a text file containing SQL commands that can be used as a
script to recreate the database's schema and data. Database migration, disaster recovery, and data
preservation are all made easier with the help of this utility.

Depending on the database connection details, users can tailor their backup process by specifying
various options, including the backup file name, the user role initiating the backup (and its password),
and the file format. You can also add options that back up only the data or only the schema, or back up
only certain tables.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Dumping
The process of extracting data or information from a system or application and storing it in a file or
memory space for backup, analysis, or debugging.

pg_dump
A utility for backing up a single PostgreSQL database.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 69
pg_dumpall
A utility for backing up all the databases on a server.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 70
Restore from Backup
by Sophia

 WHAT'S COVERED

This lesson explores using pg_restore to restore data from a dump in the command line, in three parts.
Specifically, this lesson will cover:
1. Getting Started
2. Using pqsl
3. Using pg_restore

1. Getting Started
There are two commands that can be used to restore database backups.

The psql command will restore plain SQL script files that have been created by pg_dump and pg_dumpall
tools.
The pg_restore command is a utility that restores a PostgreSQL database from an archive created using
the pg_dump command using one of the non-text formats, such as a TAR file. The pg_restore command
runs the commands needed to reconstruct the database as it existed when the dump was created.

 TERMS TO KNOW

psql
A command line tool for restoring a database from a plain-text SQL dump file made using pg_dump or
pg_dumpall.

pg_restore
A command line tool used for restoring databases that were created in non-text formatting using
pg_dump or pg_dumpall.

2. Using pqsl
Using the psql tool, you can execute the entire SQL script simultaneously. The command will look like the
following:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 71
psql -U adminrole -f backupfile.sql
The above command will log in using the adminrole account (prompting you for its password) and run the
backupfile.sql script file to restore the data. The -f option is used to specify the file name.

Like with the pg_dump command, you can specify options to fine-tune its behavior. For example:

a will output all of the input lines to the standard output so you can watch the progress of the restore.
d enables you to specify the database name to which to connect, like -d mydb.
W forces psql to prompt for a password if needed.

3. Using pg_restore
The pg_restore command restores databases that are in a non-text format created from the pg_dump or
pg_dumpall tools. Using this command, you can specify database objects from a file containing a full backup of
a file containing an entire server's databases or individual databases. This tool can also take a backed-up
database from an older version of a database and restore it to a new version.

For example, say we had a backup.tar file that had been created in the same folder. We can restore the
database by doing the following:

pg_restore -d mydb -f backup.tar


We also have options with the pg_restore command:

The a option will restore the data but not create the schema. This would assume that the schema has
already been created.
-c will clean/drop the database objects before recreating them.
-C (with an uppercase C) will create the entire database before restoring it.
d <database></database> will drop the current database and recreate it before restoring.
-f can pass the filename if we include the file name.
-s will only create the schema but not restore the data into the database.
With the -t option, we can specify the table name to restore.

 SUMMARY

In this lesson's getting started section, you learned that there are two options for restoring a database
from a pg_dump or pg_dumpall file: psql and pg_restore. You learned how to use psql to restore a
dump that was saved in plain-text format, and you learned about the options available to fine-tune the

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 72
process. You also learned how to use pg_restore to restore a dump that was saved in a non-text format
such as a TAR archive, along with the available options.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

pg_restore
A command line tool used for restoring databases that were created in non-text formatting using
pg_dump or pg_dumpall.

psql
A command line tool for restoring a database from a plain-text SQL dump file made using pg_dump
or pg_dumpall.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 73
Backups: Command Line vs. GUI
by Sophia

 WHAT'S COVERED

This lesson explores the differences between backing up and restoring from the command line and
graphical user interfaces (GUI), in three parts. Specifically, this lesson will cover:
1. Introduction
2. GUI Backup
3. Command Line Interface (CLI) Backup

1. Introduction
Both graphical user interface (GUI) and command line interface (CLI) backup tools share most of the same
capabilities. For example, both can back up to a variety of formats (such as a TAR archive or a plain-text file),
and both can specify whether to back up the data only, the schema only, or both. The main difference is the
process for interacting with the tool to execute those requests.

GUI database backups are more user-friendly and suitable for users who prefer visual guidance and simplicity.
On the other hand, command line backups offer greater flexibility, efficiency, and automation capabilities,
making them preferable for experienced users, system administrators, or those who require advanced
customization and scripting support. The choice between GUI and command line interfaces ultimately depends
on the user's preference, expertise, and specific requirements.

2. GUI Backup
Most commercial database software includes a GUI-based backup option to assist less technical users in
completing backup operations. Database administrators can choose whether to use this GUI, to acquire a
different GUI backup application, or to use a command line backup tool.

The following are benefits of a GUI backup system:

Ease of Use: GUIs are typically more user-friendly, especially for those who are not comfortable with
command line interfaces. They often provide buttons and menus that guide users through the backup
process step by step.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 74
Visual Feedback: GUIs provide visual feedback during the backup process, such as progress bars, status
indicators, and confirmation dialogs that let the user know what is happening.
Less Error-Prone: With a GUI, there's little or no typing of commands, so there's less chance of making
syntax errors or mistakes.
Interface Customization: Most GUI backup tools enable the user to customize the user interface to make
the controls needed for a particular database's backup needs readily available.

Here are some drawbacks of a GUI backup system:

More System Resources: a GUI backup system uses more memory because it must maintain the GUI
interface throughout the backup process. The software required for creating the GUI interface also takes
up storage space.
Limited Customization: GUIs may offer limited options for customization compared to command line
interfaces. Users may not have full control over backup parameters and configurations.
Limited Automation: GUI backup tools may offer limited automation capabilities compared to command line
interfaces. While some GUI tools allow backup scheduling, they may not provide the same level of scripting
and automation options.
Dependency on Software: GUI backup solutions often require specific software to be installed and may not
be available across all platforms. Users might need to install additional software or plugins to perform
backups.

3. Command Line Interface (CLI) Backup


Most experienced database administrators prefer the command line interface. For someone who knows the
commands to type and their syntax, typing can be faster than navigating a point-and-click interface. It also offers
more control because the command line tools offer many options.

The following are benefits of a command line interface (CLI) backup system:

Flexibility: Command line interfaces offer greater flexibility and control over the backup process. Users can
specify backup parameters precisely and automate backup tasks using scripts.
Efficiency: CLI backups can be faster and more efficient, especially for experienced users who are familiar
with command syntax and can execute tasks quickly.
Scripting Support: CLI allows for scripting and automation, enabling users to schedule backups, perform
batch operations, and integrate backups into existing workflows or systems.
Platform Independence: Command line tools are often platform-independent, enabling users to perform
backups across different server operating systems without relying on specific GUI software.

Here are some drawbacks of a command line system:


Learning Curve: Command line interfaces typically require users to memorize commands and understand
their syntax, which might be intimidating for beginners.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 75
Less Feedback: A command line backup tool typically provides some feedback to the user as the backup is
progressing, but it may not be as easy to understand, especially for a less experienced user.

 SUMMARY

In this lesson, you learned the benefits and drawbacks of GUI backup systems and command line
interface (CLI) backup systems.

GUI systems are ideal for less experienced users because they provide a point-and-click interface that
does not require memorizing and typing commands. They provide visual feedback and are less error-
prone. However, they may be limited in automation and customization features, and they may not be
available across all platforms.

CLI systems are ideal for experienced administrators. They are fast and efficient (if you know what to
type) and do not depend on installing additional software or plugins. They offer full scripting support,
making them ideal for automating processes on large, complex systems. Their main drawback is the
steep learning curve.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 76
Backup Rotation Strategies
by Sophia

 WHAT'S COVERED

This lesson explores various database backup strategies, in four parts. Specifically, this lesson will
cover:
1. Introduction
2. Full Backups
3. Differential Backup
4. Incremental Backup

1. Introduction
A backup strategy is a plan that specifies what will be backed up and how often. Choosing a backup strategy
involves weighing the costs and benefits because no strategy is without its pros and cons. A backup strategy
that is easy to restore from, for example, is often where each backup takes a long time and requires a lot of
storage space. In contrast, a backup that is quick to perform may be more complicated to manage and may take
longer to restore.

The three main backup strategies are full, incremental, and differential. In this lesson, you will learn about each
one and compare their benefits and drawbacks.

 TERM TO KNOW

Backup Strategy
A plan that specifies what will be backed up and how often.

2. Full Backups
A full backup is the simplest type. It makes a copy of the entire database as of a single point in time.

 HINT

The backup can still be made if there are database transactions running, but the backup operation can slow
down some of those transactions.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 77
A full backup strategy—that is, one where only full backups are made—is best for small databases that don't
take very long to back up. However, if the database is large, a full backup can take hours. Making a full backup
every day (or night) can be time consuming for the IT staff who must monitor and manage it and can use a lot of
processing and memory resources to create.

A company will usually want to keep several previous versions of the full backup, so they can roll back to an
earlier version if there is a serious problem with the database, or if it becomes infected with malware. Therefore,
if the database file is large, and you keep multiple backups of it (for example, one for each day of the week or
month), the storage required can grow very large.

The main advantage of a full backup is the ease of restoring from it because each backup contains everything,
and only one restore operation is required, from one backup file.

 TERM TO KNOW

Full Backup
A backup strategy where every backup performed backs up all data.

3. Differential Backup
A differential backup strategy is one that makes a full backup only once a week, or perhaps less often. On the
in-between days, it backs up only the data that has changed since the last full backup. It backs up everything
that has changed since then, even if that data has been backed up on a previous day's differential backup.

 HINT

How does the backup utility know what has changed? It looks at the timestamps on the files or transactions
and compares them to the timestamp on the last full backup file.

 TERM TO KNOW

Differential Backup
A backup strategy that runs a full backup only on certain days and then backs up only the data that has
changed since the last full backup.

4. Incremental Backup
An incremental backup is similar to a differential one, except that each incremental backup contains the data
that has changed since the last backup of ANY kind, not just since the last full backup. It is not cumulative.

For example, suppose we run a full backup on Sunday and then run incremental backups the other days of the
week. The incremental backup on Monday contains Monday's data. The incremental backup on Tuesday
contains only Tuesday's data, and so on.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 78
Now suppose the server's storage fails on Saturday. Restoring the database will take quite a while compared to
the other strategies because we must do a full restore from Sunday's data, and then incremental restores for
Monday through Friday.

The main advantage of an incremental backup is that it takes the least amount of time and the least storage
space. The main disadvantage is that it takes the longest to restore. Moreover, if any of the incremental backups
are damaged, the incremental backup files that follow them are not useful because we are missing some
changes.

 TERM TO KNOW

Incremental Backup
A backup strategy that runs a full backup only on certain days and then backs up only the data that has
changed since the last backup of any kind.

 SUMMARY

In this lesson's introduction, you learned that there are three primary backup methods in the context of
databases: full, differential, and incremental.

A full backup captures an entire database's content, including its schema, data, and any other
associated objects. This provides a comprehensive snapshot of the database at a particular point in
time. Although full backups ensure complete data recovery, they are resource intensive and time
consuming, especially for large databases.

Differential backups capture changes made since the last full backup. Each differential backup takes
less time and uses less storage space than a full backup would. The drawback is that in the event that a
restore operation is required, you must restore from the full backup and then from the differential
backup.

Incremental backups capture changes made since the last backup of any type. Its advantage is that it
takes the least time and storage space to complete. Its disadvantage is that when a restore is needed,
it takes the most time to restore. It must restore from the last full backup and then every incremental
backup since then.

Each backup method offers different trade-offs between speed of backup, storage requirements, and
ease of restoration.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 79
Backup Strategy
A plan that specifies what will be backed up and how often.

Differential Backup
A backup strategy that runs a full backup only on certain days and then backs up only the data that
has changed since the last full backup.

Full Backup
A backup strategy where every backup performed backs up all data.

Incremental Backup
A backup strategy that runs a full backup only on certain days and then backs up only the data that
has changed since the last backup of any kind.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 80
Migrating Databases in an Organization
by Sophia

 WHAT'S COVERED

This lesson explores various issues with database migrations, in two parts. Specifically, this lesson will
cover:
1. Introduction
2. Planning and Executing a Data Migration

1. Introduction
As technology evolves, organizations often find themselves needing to migrate (move) their databases to newer
systems to keep pace with changing requirements, improve performance, and/or enhance scalability. Data
migration is a complex process that requires careful planning and execution to ensure a smooth transition with
minimal disruption to business operations.

 TERM TO KNOW

Data Migration
The process of moving data from one database system or platform to another.

2. Planning and Executing a Data Migration


The following steps outline the processes and considerations for a successful data migration. While every
migration situation is different, the basic concerns and precautions are always the same: We need to safeguard
the data's confidentiality, integrity, and availability, as you learned when studying the CIA triad earlier in the
course. In other words, you want to make sure that the data is protected against unauthorized access, complete
and uncorrupted, and available to the people and systems that need it.

Step 1: Define Objectives and Scope. Make sure that all the stakeholders in the organization understand what is
going to happen, and why. Clearly define the reasons for the migration, such as upgrading to a newer database
version, consolidating multiple databases, or moving to a cloud-based solution. Define the scope by specifying
which datasets, tables, or applications will be migrated and any dependencies or constraints that must be
considered.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 81
Step 2: Assess Data Quality and Compatibility. If there is no data integrity to start with, you won't be able to fairly
access its integrity post-migration. Therefore, pre-migration is a great time to fix any problems with the data.
Identify and correct any inconsistencies, duplicates, or other anomalies that may affect the migration process.
Ensure that the target database system is compatible with the existing data formats, structures, and schemas.

Step 3: Choose Migration Tools and Technologies. Some database management systems include their own
migration tools. You can also utilize third-party migration software and custom scripts or code. Select the
appropriate migration tool(s) based on the characteristics of your database environment, such as size,
complexity, and performance requirements. Consider factors such as data volume, how much downtime is
acceptable during the changeover, and how compatible the source and target systems are.

Step 4: Develop Migration Plan and Timeline. Create a detailed migration plan that outlines the sequence of
tasks, dependencies, and timelines for each phase of the process. Define the milestones and success criteria
you will use to track the progress. The plan should be detailed and practical, explaining how you will handle
data extraction, transformation, loading processes, testing, validation, and post-migration activities.

 HINT

The migration plan should also include a plan for keeping data confidential at every phase of the process—
the “C” in the CIA triad. Data being migrated should be encrypted both while in motion and while at rest,
and the storage media should be physically inaccessible to unauthorized persons. If the data must be sent
via an unsecure network such as the internet, plan to use a virtual private network (VPN) to create a secure
tunnel through which data will travel.
Step 5: Perform Pre-Migration Testing and Validation. Before performing the actual migration, you should test
the planned migration process so you can identify and correct any potential issues. Using a copy of the data (or
a subset of a copy), test the proposed data extraction, transformation, and loading procedures to ensure data
integrity. Validate the migrated test data against source data to verify consistency. You should also test the new
system's functionality and performance under different scenarios to ensure that not only is the data intact, but it
can be accessed in all the ways that are required.

Step 6: Create a Backup. As the final pre-migration step, create a full backup of the database. That way, if
anything goes wrong, the data can be restored to its original state—even if there's a problem with one of the
backups. It is a best practice to wait until just before the actual migration to make this full backup so that the
backup will be as up to date as possible.

Step 7: Execute the Migration Plan. As the migration plan is executing, monitor the process closely, tracking key
metrics such as data transfer rates, error rates, and resource utilization. Address any issues or bottlenecks
promptly, adjusting plans or procedures as needed. Provide status updates to stakeholders regularly to
maintain transparency and manage their expectations.

Step 8: Conduct Post-Migration Activities. Once the migration is complete, perform validation checks to ensure
that migrated data matches the source data accurately. Test application functionality to confirm that all business
processes are functioning as expected. Address any post-migration issues or discrepancies promptly to
minimize any disruption to business operations.

 HINT

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 82
Do not decommission the old system until you are quite confident in the new one—and it may take a month
or more for that confidence because some activities may only be performed once a month. In some cases,
an IT department may want to keep an old system active for even longer, in case of a disaster with the new
system.
Step 9: Document Lessons Learned and Best Practices. Document lessons learned, challenges encountered,
and best practices identified during the migration process. Record key insights and recommendations for future
migrations. Create a comprehensive migration report summarizing the migration process, its outcomes, and any
recommendations you have for improving the process next time.

Remember that each data migration project is unique, and careful planning, collaboration, and execution are
essential for achieving desired outcomes. With proper preparation and execution, organizations can leverage
data migration as an opportunity to modernize their database infrastructure and drive business innovation and
growth.

 TERM TO KNOW

Virtual Private Network (VPN)


A software-based tunnel within a larger network that data can pass through securely, even if the larger
network is not secure.

 SUMMARY

In this lesson's introduction, you learned that a migration plan is a complex process that requires
careful planning and execution. You learned a nine-step process for planning and executing a data
migration.

The steps of this process are:


1. Define objectives and scope.
2. Assess data quality and compatibility.
3. Choose migration tools and technologies.
4. Develop a migration plan and timeline.
5. Perform testing and validation.
6. Create backups.
7. Execute the migration plan.
8. Conduct post-migration activities.
9. Document lessons learned and best practices.

If you follow these steps to prepare and execute a data migration plan, you can maximize the likelihood
of a successful, error-free migration process.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 83
Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024)
FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Data Migration
The process of moving data from one database system or platform to another.

Virtual Private Network (VPN)


A software-based tunnel within a larger network that data can pass through securely, even if the
larger network is not secure.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 84
Terms to Know
ACID
An acronym for the four properties that ensure data validity: Atomicity, Consistency, Isolation,
and Durability. Transactions are database operations that satisfy the ACID properties.

Atomicity
The quality of indivisibility. In SQL database terms, atomicity means the transaction cannot be
partially executed: Either all statements are executed or none of them are.

Atomocity
The quality of being indivisible.

Availability
The assurance that data will be available when it is needed.

B-Tree Index
A type of index that uses an upside-down tree to create a hierarchical system of nodes that
reference the data being searched.

BEGIN
A statement that marks the beginning of a transaction.

Backup Strategy
A plan that specifies what will be backed up and how often.

Balanced Structure
A B-tree index structure in which all leaf nodes are at the same level.

CIA Triad
The combination of confidentiality, integrity, and availability that ensures data and systems
are fully protected from security threats.

COMMIT
A statement that marks the end of a transaction and makes the transaction's results
permanent.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 85
CREATE INDEX
A statement that enables you to create a new index for a column in a table.

CREATE ROLE
A statement that creates a new role in a PostgreSQL database.

Child Node
A node that appears below another node (its parent node) in a B-tree.

Confidentiality
The assurance that private data is not accessible to unauthorized persons.

Consistency
The quality of something being done the same way in all cases.

Consistent State
A state in which all of the data integrity constraints on the database are satisfied.

DROP INDEX
A statement that enables you to drop (delete) an index.

Data Migration
The process of moving data from one database system or platform to another.

Differential Backup
A backup strategy that runs a full backup only on certain days and then backs up only the
data that has changed since the last full backup.

Dumping
The process of extracting data or information from a system or application and storing it in a
file or memory space for backup, analysis, or debugging.

Durability
The quality of continuing to exist without being undone or removed.

Full Backup
A backup strategy where every backup performed backs up all data.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 86
GRANT
A statement used to grant permission to use or access a role, or to assign privileges to a role.

Group Role
A role that represents a group of users and is used to assign privileges to them.

Hash Collision
A situation in which multiple data values produce the same hash value, so the index returns
erroneous results.

Hash Function
An algorithm that computes the hash value of data by performing math operations on it.

Hash Index
An index that stores the hashes of the values of the indexed fields and looks up data by
hashing the query's request and comparing it to the hash values.

Hash Value
The result of running data through a hash function.

Incremental Backup
A backup strategy that runs a full backup only on certain days and then backs up only the
data that has changed since the last backup of any kind.

Index
A database object that improves the speed of data retrieval operations on a database table
at the expense of additional writes and storage space.

Integrity
The assurance that data is consistent and free from errors.

Internal Node
A node located between the root node and the leaf nodes in a B-tree.

Isolation
The quality of being inaccessible until certain conditions are met.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 87
Key
A representation of the data being searched for.

Leaf Node
A node that contains pointers to the table location containing the data being looked up.

Login Role
A role that determines the ability of a user to connect to the database.

Multi-Factor Authentication (MFA)


A security measure that requires multiple authentication methods for a user to access a
system.

Multiple Inheritance
In the context of database roles, the ability to inherit attributes or privileges from more than
one role.

Node
A logical unit within a B-tree that contains a set of keys and pointers.

Pointer
A connector between one node level and the next.

Principle of Least Privilege


A security best practice that dictates that each account should have only the privileges it
needs for the user to accomplish their assigned work.

Privilege
A permission assigned to a role, such as the ability to add or update records.

REVOKE
A statement used to revoke a privilege.

ROLLBACK
An operation that returns the database to some previous state.

Role

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 88
Entity used for authentication and access control.

Root Node
The node at the top of a B-tree.

SQL Injection
A hacking technique that inserts malicious SQL statements into a data entry field in an
attempt to execute the statements on the database.

Superuser
An account role that has unlimited access and privilege to every object and activity in the
database system.

Transaction
A sequence of statements that are performed as a group; if one fails to execute, they all fail.

Value
The data associated with a key.

Virtual Private Network (VPN)


A software-based tunnel within a larger network that data can pass through securely, even if
the larger network is not secure.

pg_dump
A utility for backing up a single PostgreSQL database.

pg_dumpall
A utility for backing up all the databases on a server.

pg_restore
A command line tool used for restoring databases that were created in non-text formatting
using pg_dump or pg_dumpall.

psql
A command line tool for restoring a database from a plain-text SQL dump file made using
pg_dump or pg_dumpall.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 89

You might also like