[go: up one dir, main page]

0% found this document useful (0 votes)
24 views14 pages

SQL Lec 01 03

Lecture note

Uploaded by

snsfaojia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views14 pages

SQL Lec 01 03

Lecture note

Uploaded by

snsfaojia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

SQL for Data Science

Lecture 2-3
Md. Abdul Basit

Lecturer of Applied Statistics


Institute of Statistical Research and Training
University of Dhaka

18 Sept 2023
Introduction to Databases

Md. Abdul Basit | SQL for Data Science | Lecture 2-3 2


Databases
A database is nothing more than a set of related information. A telephone book, for example, is a database of the
names, phone numbers, and addresses of all people living in a particular region. While a telephone book is
certainly a ubiquitous and frequently used database, it suffers from the following:

Finding a person’s telephone number can be time consuming, especially if the telephone book contains a large
number of entries (slow data retrieval rate).

A telephone book is indexed only by last/first names, so finding the names of the people living at a particular
address, while possible in theory, is not a practical use for this database (limited indexing capabilities).

From the moment the telephone book is printed, the information becomes less and less accurate as people
move into or out of a region, change their telephone numbers, or move to another location within the same
region (outdated information).

Md. Abdul Basit | SQL for Data Science | Lecture 2-3 3


Databases
The same drawbacks attributed to telephone books can also apply to any manual data storage system, such as
patient records stored in a filing cabinet.

Because of the cumbersome nature of paper databases, some of the first computer applications developed were
database systems, which are computerized data storage and retrieval mechanisms.

Because a database system stores data electronically rather than on paper, a database system is able to retrieve
data more quickly, index data in multiple ways, and deliver up-to-the-minute information to its user community.

Early database systems managed data stored on magnetic tapes. While these database systems were a significant
improvement over paper databases, they are a far cry from what is possible with today’s technology.

Modern database systems can manage petabytes of data, accessed by clusters of servers each caching tens of
gigabytes of that data in high-speed memory.

Md. Abdul Basit | SQL for Data Science | Lecture 2-3 4


Relational Databases
A relational database, or relational database management system (RDMS), stores information in tables. The term
relational database was first used in 1970 by E.F. Codd at IBM in his research paper “A Relational Model of Data for
Large Shared Data Banks.”

Often, the tables in a relational database have shared information between them, causing a relationship to form
between tables. This is where a relational database gets its name from.

In a relational database, a table uses columns to define the information being stored and rows for the actual data.
Each table will have a column that must have unique values—known as the primary key. This column can then be
used in other tables, if relationships are to be defined between them. When one table’s primary key is used in
another table, this column in the second table is known as the foreign key.

Historically, the most popular relational databases have been Microsoft SQL Server, Oracle Database, MySQL and IBM
DB2. Several free versions of these RDBMS platforms have gained popularity over the years, such as PostgreSQL,
SQLite, MariaDB, etc.

Md. Abdul Basit | SQL for Data Science | Lecture 2-3 5


Relational Databases
Imagine you run an online business. You have a variety of information that you store, like customer information,
order information, and products. In a relational database, this would be stored in different tables with a key to join
the tables when needed.

Here, the customer table stores the basic


customer information, order id and address
id.
If someone needs more information on the
address of the customers, they can join the
customer table with the address table using the
key ORDER ID.

The order table in turn has product ids of the


product items in the order. The details of the
product are in a separate product table. This
makes information organized and more
structured.

Md. Abdul Basit | SQL for Data Science | Lecture 2-3 6


Non-relational Databases
A non-relational database, sometimes called NoSQL (Not Only SQL), is any kind of database that doesn’t use the
tables, fields, and columns structured data concept from relational databases.

Instead, the non-relational database uses a storage model optimized for specific requirements of the type of data
being stored.

Some of the more popular NoSQL databases are MongoDB, Apache Cassandra, Redis, Apache HBase, etc.

Md. Abdul Basit | SQL for Data Science | Lecture 2-3 7


SQL
Structured query language (SQL) is a programming language for storing and processing information mostly in a
relational database.

Along with Codd’s definition of the relational model in 1970, he proposed a language called DSL/Alpha for
manipulating the data in relational tables.

Shortly after Codd’s paper was released, IBM commissioned a group to build a prototype based on Codd’s ideas.
This group created a simplified version of DSL/Alpha that they called SQUARE. Refinements to SQUARE led to a
language called SEQUEL, which was, finally, shortened to SQL.

While SQL began as a language used to manipulate data in relational databases, it has evolved to be a language for
manipulating data across various database technologies.

SQL goes hand in hand with the relational model because the result of an SQL query is a table (also called, in this
context, a result set). Thus, a new permanent table can be created in a relational database simply by storing the
result set of a query.

Similarly, a query can use both permanent tables and the result sets from other queries as inputs (to be explored
later in this course).

Md. Abdul Basit | SQL for Data Science | Lecture 2-3 8


Some Terminology
The following table shows the terms we use for the remainder of the course along with their definitions.

Md. Abdul Basit | SQL for Data Science | Lecture 2-3 9


SQL Statement Classes
The SQL language is divided into several distinct parts: the parts that we explore in this course include--

SQL schema Statements: used to define the data structures stored in the database

SQL data Statements: used to manipulate the data structures previously defined using SQL schema
statements

For example, to create a new table in your database, you would use the SQL schema statement create table,
whereas the process of populating your new table with data would require the SQL data statement insert.

Md. Abdul Basit | SQL for Data Science | Lecture 2-3 10


SQL Statement Classes
To give you a taste of what these statements look like, here’s an SQL schema statement that creates a table called
corporation in the testdb database:

CREATE TABLE testdb.corporation


(corp_id SMALLINT,
name VARCHAR(30),
CONSTRAINT pk_corporation PRIMARY KEY (corp_id)
);

This statement creates a table with two columns, corp_id and name, with the corp_id column identified as the
primary key for the table.

Md. Abdul Basit | SQL for Data Science | Lecture 2-3 11


SQL Statement Classes
Next, here's an SQL data statement, that inserts a row into the corporation table for Acme Paper Corporation:

INSERT INTO testdb.corporation (corp_id, name)


VALUES (27, 'Acme Paper Corporation');

This statement adds a row to the corporation table with a value of 27 for the corp_id column and a value of
'Acme Paper Corporation' for the name column.

Md. Abdul Basit | SQL for Data Science | Lecture 2-3 12


SQL Statement Classes
Finally, here’s a simple select statement to retrieve the data that was just created:

SELECT
name, id
FROM
testdb.corporation;

Md. Abdul Basit | SQL for Data Science | Lecture 2-3 13


Thank You

Md. Abdul Basit | SQL for Data Science | Lecture 2-3 14

You might also like