Basics of Deep Learning
Author: Chris Lindeman
Types of Deep Learning…
Computes:
I/O: A real number that may be a
Both a source (X) and a truth (Y) predicted value or a probability
are supplied. Such as the value of a home
Output is a real number, being given its characteristics or the
either a predicted value or P(y| Supervised probability that an image is of
x) class
Output Error:
I/O:
Given the output of the forward
Takes an input vector X
pass, an error function computes
Outputs a probability of
nt eme
Un the difference in the expected
belonging to some group su
outcome and the current output
orc
pe
inf
rv
Solves:
ise
Re
Best Decision given a Set of
d
Measures: Options
Alikeness
Such as in game playing,
Used in grouping inputs,
exploration, robotics, self-driving
determining grammar rules,
estimating densites
Types of Deep Learning…
Supervised Unsupervised Reinforcement
Inputs and Outputs: Examples: Inputs and Outputs:
Both a source Takes an input vector X Given a position in an
(X) and a truth (Y) are supplied. Outputs a probability of environment and a set of
Output is a real belonging to some group constraints, output is the best
number, being either a predicted choice to achieve the goal
value or P(y|x)
Computes: Measures: Solves:
A number that may be a Alikeness Best Decision given a Set of
predicted value or a probability Used in grouping inputs, Options
Such as the value of a home determining grammar rules, Such as in game playing,
given its characteristics or the estimating densites exploration, robotics, self-driving
probability that an image is of
class
Start Shallow…
Simple Example
● Neural Network
○ Based on human brain neurons
○ Can recreate any linear model
(e.g. regression)
● Real Estate Example:
● Estimating price
● Houses never sell for less than b
● X1= number of bedrooms
● X2 = number of bathrooms
● w1,w2 get adjusted so we have a
model to estimate prices in the
market
Get Adjusted – How we arrive at our estimate
Using our home price example
● X = [(3,2),(2,3),(3,2.5)] Y = [325, 330, 300]
○ Initialize b, w as [0,1,1]
○ 0*1 + 1*3 1*2 = 5, not 325
● Clearly our starting guess was wrong
○ Calculus maps a multidimensional landscape
○ For each pass of algorithm, we go in the direction
of the steepest slope
○ Adjustment ends when they stop improving error
Dee
Supervised Example: Image Classification:
● Bigger Dimensions ● More math
● Linear algebra reduces the input from each pixel to the number of classes
● Activation functions adjust feature importance
Input Output Class P(class)
(pixels)
.02 Bird 0
(0,22,31)
.06 Fish 0
…
linear, pooling, convolution .12 Unicorn 0
…
LAYERS .6 Liger 1
…
.11 Pegasus 0
…
.08 Giraffe 0
…
.01 Garden 0
(256,0,1) Gnome
Why all the layers?
● Layer 1: edges, color contrast
○ Incredibly vague
● Layer 2: shapes, gradients
● Layer 3: patterns, textures, outlines
○ This will be important later
● Layer 4: muzzles, legs
● Layer 5: keys/holes, words, petals, wheels,
faces, ears, eyes
○ Highly specific
Image from Zeiler & Fergus. Visualizing and Understanding
Convolutional Networks. 2014
Unsupervised Learning: Language
Language is complicated
Structured, tabular data:
● Relational databases
Semi- or unstructured data:
● Key-value Databases
○ Document Databases (subset of key-value databases)
● Graph Databases
● In-memory Databases
MATH!?!?
Given a B/W image of 32x32 pixels, the input X is
1024 items long.
Shallow Learning - the classic
Odds are, when you think of machine learning, you envision supervised learning.
● Classification is a supervised learning problem
● Given some knowledge about the world, we make inferences
● Estimates are outputed based on experience
● Real estate example:
● Want good asking price for
● 3br 3.5bth brick home, .25 acres
● Know recent sales
● 3br 2bth, .2 acres ~ $250,000
● 4br 3.5bth, .33 acres ~ $350,000
● 3br 3bth, .25 acres ~ $300,000
Image Classification – Going deep
Seeing how a simple regression takes some known inputs and gives an estimate
● Allows other tables to reference this same information without duplication
● Updates are easier and faster
Consider the following tables. If the address at ‘addresses.id=2’ needs to be changed we only need to
update one row, rather than every row in the ‘customers’ table that has that address.
customers
addresses
height_i address_i
id name age birthdate
n d
postal_cod
id street city state
e
0 Jim Smith 70 23 Mar 2, 2000 0
Sarah Dec 27,
0 987 3rd St. New York NY 11635
1
Thompson
73 45
1978
2
567 James
2 Dave Beck 65 34 Jun 5, 1986 1 1 Buffalo NY 14268
St.
Martin Jul 27,
3
Thompson
60 46
1977
2 St.
2 123 4th Ave MO 63160
Primary key Louis
4 Ester Barnes 66 51
Sept 13,
0 Foreign
1972
key
SQL: Getting Data From a Relational Database
SQL* is the most popular language used for communicating with relational databases.
Example: Consider the tables from the previous Example: We can also generate statistical
slide. If we want to get a list of our customers who information by grouping the data. For instance, we
live in New York state with their addresses, we can can compute the average age of residents in each
run a query that JOINs the data in the two tables state using the following query:
together.
SELECT addresses.state,
SELECT customers.name, addresses.*
avg(customers.age) as avg_age
FROM customers
FROM customers
JOIN addresses
JOIN addresses
ON customers.address_id =
ON customers.address_id =
addresses.id
addresses.id
WHERE addresses.state = ‘NY’
GROUP BY addresses.state
;
;
stat state avg_age
name id street city postal_code
e
Jim Smith 0 987 3rd St. New York NY 11635 NY 36
567 James MO 45.5
Dave Beck 1 Buffalo NY 14268
St.
Ester Barnes 0 987 3rd St. New York NY 11635
* Much debate is had on the correct pronunciation of “SQL”. Most industry members pronounce it like ‘sequel’, but pronouncing the individual letters as “S-Q-L” is also used (though slightly less
common). Additionally, different database systems may use slightly different SQL syntax.
Relational Database Management Systems (RDBMS)
RDBMSs are designed to help in the creation and management of relational
databases. They provide functionality to: Ranking of most popular relational database
management systems (Jan 2022)
● Query (extract data from) the database
● Create new tables, schemas, and catalogs in the database
● Modify existing tables (for example: add, modify, delete
rows/columns)
● Delete tables from the database
● Manage database user lists, roles, and permissions
Popular RDBMS
Image: © Statista, 2023
Key information for non-
technical persons:
● Why go deep (e.g instead of regression, DT, kmeans, etc.)?
○ Data requirements
○ Computational complexity (time/cost)
● How do we link this to the work being done by non-technical team
○ How do we connect to “improving the work they’re doing?”
○ Enable them to ask/answer questions
○ Put the hype in perspective
See also…
● Basics of data
● Basics of cloud
● Basics of data processing
Appendix: Other Database Types
● Key-value Databases
● Document Databases
● In-memory Databases
● Graph Databases
Key-value Databases
Key-value databases store data in a single table where values are associated with a specific
key.
Principles:
● Keys are unique
● key value
Values are not guaranteed to follow a predefined structure
0 “{‘name’: ‘John Smith’, ‘userid’: 12473}”
Strengths:
● Flexible (no requirement that values follow a fixed structure) 1 “{‘name’: ‘Sarah Conor’, ‘last_login’: ‘Mar 2,
● Can handle a variety of different data types 2021’}”
● Keys are linked directly to values for faster lookup
2 “<html><body>Hello World!</body></html>”
Weaknesses:
3 “Just some fun text!”
● Analytical queries are difficult to perform due to lack of joins
● How the values will be used needs to be known in advance to ensure 4 16828238457
optimal performance.
Popular systems:
Document Databases
Keeps “documents” containing data as a series of elements. Documents can be
navigated using languages like Python or Node.js.
key document
Principles:
● Documents assumed to have the same encoding (XML, YAML,
0 {
JSON, etc…) ‘name’: ‘John Smith’,
● Documents are referenced via a unique key ‘user_id’: 1234,
‘order_ids’: [542,47,125]
}
Strengths:
● Flexible (files can contain any data)
1 {
● No need to plan for a specific type of data when creating the database ‘name’: ‘Sarah Conor’,
● Easy to scale ‘user_id’: 1267,
‘order_ids’: [4765,1845,306],
‘Last_order_date’: ‘Sept 4,
Weaknesses: 2020’
● Sacrificing ACID compliance for flexibility }
● Databases cannot query across files natively
2 {
‘user_id’: 465,
Popular systems: ‘order_ids’: [294,4563]
}
Graph Databases
Storage for objects that have varying relationships to other objects.
Principles:
● Nodes are the objects in the database (circles in the diagram)
● Edges are the relationships between nodes (connecting lines)
● Querying languages are typically specific to the graph database
system
Strengths:
● Allows simple, fast retrieval of complex hierarchical structures
● Great for real-time big data mining
● Can rapidly identify common data points between nodes
● Great for making relevant recommendations and allowing for
rapid querying of those relationships
Weaknesses:
● Inefficient at storing transactional data
● Requires learning a new query language (not SQL)
● Analytics on data may be less efficient than with other DBs
Popular systems:
In-memory Databases
For applications that require real-time access to data.
Principles:
● Similar to other database systems, but data exists in computer memory rather than on disk drives
● Data in-memory is accessible much faster than data on hard drives
Strengths:
● Supports the most demanding applications requiring sub-millisecond response times
● Adapt to changes in demands by scaling out and in without downtime
● Provides ultrafast and inexpensive access to copies of data
Weaknesses:
● Data that is rapidly changing or is seldom accessed
● Application using the in-memory store has a low tolerance for stale data
Popular systems:
Common Terms
Common Term: CRUD
CRUD references four basic ways to interact with a database:
● Create: Create new tables and add data to existing tables
● Read: Query data from the database (without changing anything)
● Update: Modify existing data
● Delete: Remove tables or data in the database
These typically map to the following SQL commands:
● Create: CREATE or INSERT
● Read: SELECT
● Update: UPDATE
● Delete: DELETE
Common Term: ACID
ACID references a set of properties associated with database transactions:
● Atomicity: Transactions with the database cannot be partially successful. For example, a banking
system requests a transfer of money from account A to account B. If I fail to add the transaction
amount to account B the system should not deduct the amount from account A even if that
transaction was successful.
● Consistency: Any transaction with the database can only modify it in allowed ways. For example, I
cannot add a string variable to a column of integers.
● Isolation: Guarantees that multiple transactions happening on a database at the same time will
give the same result as if the transactions happened one after the other.
● Durability: Guarantees that once a change is applied to the database, it will remain that way until a
new change is made.
Common Terms: SQL vs. NoSQL
SQL databases are structured (columns & rows)
● Typically Relational databases
NoSQL databases use an alternative storage
● Non-relational databases
● Even if a database system is labelled as “NoSQL”, it may still support SQL operations.
Common Terms: Data Warehouse, Data Lake, Data Mesh
Data Warehouses:
● Typically processed, structured data
● Data formats are determined when data is written to the underlying databases. This makes
accessing data faster since the code doesn’t have to check/convert data types.
Data Lakes:
● Typically raw data (text, files, photos, videos, etc…)
● May be structured, unstructured, or semi-structured
● Data formats are determined when data is read out of the underlying databases. This makes writing
data faster since the code doesn’t have to check/convert data types.
Data Mesh
● Term used to describe a system where data is distributed
● Teams can use the tools they know/want
● The ‘mesh’ connects the data in a meaningful way so it can be shared across an organization
Common Terms: ETL vs. ELT (Data Processing Pipelines)
ETL (Extract → Transform → Load)
● Extract: First, data is extracted from external sources
● Transform: Then data is processed and aggregated
● Load: Finally processed data is loaded into a database
● Typical in business intelligence and API backends that need quick access to summarized
data statistics
ELT (Extract → Load → Transform)
● Extract: First, data is extracted from external sources
● Load: The raw data is loaded into the target system
● Transform: Transformations are applied to the data at read-time
○ May be in the form of a ‘view’ table
○ May be applied by a separate system that sits between the user and the database
● Typical in heavy data analytics work, as the data aggregate steps may not be known
beforehand.
Not-so-basic Topics
● Database normalization (advanced)
● Database tools
● Entity Relation Diagrams (ERDs)