[go: up one dir, main page]

0% found this document useful (0 votes)
58 views32 pages

My SQLDay 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views32 pages

My SQLDay 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 32

MySQL/PHP Workshop

http://www.trii.org/courses/mysql/
• 2 MySQL lectures
• 2 PHP lectures
• Each lecture builds on concepts taught and learned in the previous
lectures.
• The first two lectures discuss the concept of a relational database
such as MySQL and show you how to manipulate the data stored in
the database from the command line. It is essential to learn this
first because PHP makes use of the language of the database.
• The third and fourth lectures will introduce you to PHP, a server-
side scripting language that allows you to interact with the MySQL
database from a web browser and create fancy web pages to
display the data. PHP is the go-between that fetches the data
from the MySQL database and then spits it out dynamically as the
nicely formatted HTML page that the browser expects.
Class #1: Introduction to MySQL
Manda Wilson, BIC, cBio, MSKCC
Trii Training: Bioinformatics and Computational Methods

• Relational databases
• Database design
• SQL
o Creating databases
o Creating tables
o Selecting from, deleting, and updating tables
• Exercises

• You should have


o Class notes
o Exercise handout
o MySQL Pocket Reference

• Useful resource: http://dev.mysql.com/doc/


Relational Databases

• A database is a collection of tables


• Columns define attributes of the data
o All data in a column must have the same data type
• A record is stored in a row
table name
Employees
First Name Last Name Phone
Nadia Li 2687
Madhu Charu 7856

row Ajuma
Wade
Kinsaka
Randal
4489
5257
Helen Clark 2147

column
Use a Relational Database When…

• You have a very large dataset


• There is redundant data
o Wastes disk space
o Increases errors
• Information must be updated in multiple
locations
• Security is important
o Different users can be granted different
permissions
• Strict enforcement of data types is important
Spreadsheet Example
Title Author Borrower Phone
A House for Mr. Biswas VS Naipaul Sarah 646.555.1234
Midnight's Children Salman Rushdie
On the Road Jack Kerouac
Spreadsheet Example
Title Author Borrower Phone
A House for Mr. Biswas VS Naipaul Sarah 646.555.1234
Midnight's Children Salman Rushdie
On the Road Jack Kerouac

One Flew Over the Cuckoo's Nest Ken Kesey Sarah 646.555.1244
Sula Toni Morrison
Villette Charlotte Bronte Jim 646.555.4586

Data is inconsistent!

Now imagine you are designing the New York Public


Library database which has tens of million books and
well over a million cardholders.
Database Design
Entity Relationship Design

Entity (“thing”, “object”) Table

Attributes (describe entity) Columns

Entity Instance Row

Relationships between entities preserved in relationships


between tables.

If you are interested in learning more formal design


methods look up “normalization” and/or “third normal
form”.
Our data
Ensembl Gene ID Symbol / Chromo Start End Position LocusLink Taxonomy Common Species
Name some Position (bp) ID ID Name
(bp)
ENSG00000186891.3 TNFRSF18 1 1044947 1048147 8784 9606 human Homo sapiens
ENSG00000078808.4 CAB45 1 1058370 1073469 51150 9606 human Homo sapiens
ENSG00000176022.1 B3GALT6 1 1073703 1076476 126792 9606 human Homo sapiens
ENSG00000160087.5 UBE2J2 1 1095352 1115292 118424 9606 human Homo sapiens
ENSG00000162572.4 SCNN1D 1 1123634 1133467 6339 9606 human Homo sapiens
ENSG00000162576.4 MGC3047 1 1194130 1199973 84308 9606 human Homo sapiens
ENSG00000175756.3 AKIP 1 1215168 1216641 54998 9606 human Homo sapiens
ENSG00000131586.2 MRPL20 1 1288703 1294063 55052 9606 human Homo sapiens
ENSG00000179403.2 WARP 1 1322311 1327547 64856 9606 human Homo sapiens
ENSG00000160072.5 ATAD3B 1 1358611 1396091 83858 9606 human Homo sapiens
ENSG00000008128.5 CDC2L2 1 1582617 1604060 985 9606 human Homo sapiens
ENSG00000169911.4 SLC35E2 1 1611978 1625728 9906 9606 human Homo sapiens
ENSG00000008130.3 FLJ13052 1 1630975 1659805 65220 9606 human Homo sapiens
ENSG00000078369.3 GNB1 1 1665027 1770792 2782 9606 human Homo sapiens
ENSMUSG00000041954.1 TNFRSF18 4 154139702 154142251 21936 10090 mouse Mus musculus

What entities or “objects”


ENSMUSG00000023286.1 UBE2J2 are 4defined here?
154057210 1540722964 140499 10090 mouse Mus musculus

Is there any redundant data?


What happens if we want to add another species attribute (e.g. genus)?
Our tables

QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.

How do we know which organism a gene belongs to?

Do we have unique identifiers?

Can the organism have more than one gene?

Can the gene have more than one organism?


Our tables

QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.

QuickTime™
TIFF (LZW) and Means
a
decompressor
each gene has “one and only one”
are needed to see this picture.
organism, but that each organism can
have more than one gene.

This is an example of an entity relationship diagram.


Our tables

QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.

• A primary key is a unique identifier for a record in a


table.
• A foreign key in one table refers to the primary key of
another.

• The gene table has a foreign key, Organism_ID. Each


record in the gene table will have an organism id to link it
to the correct species record in the organism table.
Database Design Caveat

• Sometimes design is “sacrificed” for


speed.
Data Types
• float
• integer
• tinyint
• varchar(size)
o stores strings
o size can be between 0 - 255, inclusive

• datetime
• + more What data types should our attributes (columns) be?

QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Complete Design
Gene
Column Data Type
Database name: ensmartdb gene_id integer
ensembl_gene_id varchar(50)
organism_id integer
name varchar(35)
locuslink varchar(10)
chromosome tinyint
chromo_start integer
chromo_end integer
description varchar(255)
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture. Organism
Column Data Type
organism_id integer
taxonomy_id integer
common_name varchar(35)
species varchar(35)
Connecting to MySQL from the Command
Line

mysql -uusername -p

Example:
>mysql -uroot

To EXIT MySQL:
EXIT;
Basic SQL Commands

• SQL statements end with a semicolon


• View databases
SHOW DATABASES;
Importing a Database

• Creating a database
CREATE DATABASE trii;
• From the command line:
mysql -uusername -ppassword databasename <
filename.sql
• Example:
o mysql -uroot trii < trii.sql
Basic SQL Commands

• Use database databasename


USE databasename;

• Display all tables in a database


SHOW TABLES;
Create Table
database name

CREATE TABLE organism (


organism_id INTEGER NOT NULL AUTO_INCREMENT,
column taxonomy_id INTEGER NOT NULL,
names common_name VARCHAR(35) NOT NULL,
species VARCHAR(35) NOT NULL,
PRIMARY KEY (organism_id),
UNIQUE (taxonomy_id)
);
View column details for a table

DESC tablename;
Selecting all data

SELECT * FROM tablename;


Select only the columns you need (it will
be faster)

SELECT common_name, species


FROM organism;
Limiting your data

• Get the species name for a specific


organism (you have the id)
SELECT species
FROM organism
WHERE organism_id=1;

How do we select all the gene names for chromosome 1?


Insert

• Inserting a gene
INSERT INTO gene
(ensembl_gene_id,
organism_id,
name,
chromosome,
chromo_start,
chromo_end) VALUES (‘MY_NEW_GENE’,
1, ‘MY GENE’, 1, 12345, 67890);
• Get the id of the gene:
SELECT gene_id FROM gene WHERE name='MY GENE';
Delete/Update

• Deleting a gene
DELETE FROM gene WHERE gene_id=19;
• Updating a gene
UPDATE gene SET name=‘NEW NAME’
WHERE name=‘AKIP’;
Table Joins

• Sometimes you want data from more than


one table. To do this we join the tables.
The result is a new (temporary) table.
Cross Join

• SELECT gene.name, organism.species


FROM gene, organism;
Note: There are only
two records in the
gene table with the
name “TNFRSF18”.
One is a mouse gene
and the other is a
human gene. What do
you think happened?
Cross Join (Continued)

Each row of the gene table was joined with


every row in the organism table.
Cross Join (Continued)

• We want to use the organism id to match


a gene record with the correct organism
record so that we get:

Remember there
are two gene
records with the
name “TNFRSF18”.
Cross Join (Continued)

SELECT gene.name, organism.species


FROM gene, organism
WHERE
gene.organism_id=organism.organism_id;
Notice that we have 18
rows and that there are
18 rows in the gene
table.
Class #2: Introduction to MySQL
(Continued)
• Table Joins
• Where clause (continued)
• Aggregate Functions
• Backing up your database
• Indexes (if time)
• Importing large datasets (if time)

You might also like