Unit 1 Neural Network Basics
What is Neural Network?
Understanding Neural Networks: A Simple Explanation
Deep learning often involves training neural networks, which are computational models inspired by the
human brain. But what exactly is a neural network? Let’s break it down with an example.
Housing Price Prediction: A Starting Point
Imagine you have data about six houses:
• The size of each house in square feet or square meters.
• The price of each house.
Your goal is to predict the price of a house based on its size.
1. Linear Regression Approach
You might fit a straight line to the data, where the line represents the relationship between size
and price. But prices can’t be negative, so a more realistic approach would be to adjust the curve
so it flattens at zero for very small sizes.
This curve becomes the function for predicting the price of a house based on its size.
2. The Neural Network Perspective
This simple function can be seen as a very basic neural network with:
o Input: The size of the house (x).
o Output: The predicted price (y).
The Simplest Neural Network
A neural network is made up of nodes, also called neurons. For the housing example:
• A single neuron takes the size (x) as input.
• It applies a mathematical operation to compute the price (y).
• To ensure prices are realistic (e.g., non-negative), the neuron uses a specific function called ReLU
(Rectified Linear Unit).
o The ReLU function outputs 0 if the input is negative and outputs the input value
otherwise. This ensures the curve remains realistic, resembling the function we fitted
earlier.
Building Larger Neural Networks
Now, what if the price of a house depends on more features, like:
• Number of bedrooms (#bedrooms).
• Family size (can the house fit your family?).
• Zip code or postal code (which might indicate walkability or school quality).
Here’s how we handle this:
1. Adding More Inputs
Instead of just size, we now include other features as inputs:
x = [size, bedrooms, zip code, wealth of the neighborhood].
2. Hidden Layers
o Between the input (x) and output (y), we add hidden units (neurons).
o Each hidden unit processes all input features and computes something useful, like family
size, walkability, or school quality.
3. Connections
o Every input is connected to every neuron in the hidden layer. This is called a dense
connection.
o The neural network doesn’t need you to define specific roles (e.g., "this neuron
calculates family size"). It learns what’s important from the data.
4. Output
o The network combines all the information from the hidden layer to predict the price of a
house.
Why Neural Networks Are Powerful
• Neural networks can find complex patterns in data.
• They are most useful in supervised learning, where you map input (x) to output (y), as in our
housing price example.
• With enough data, a neural network can learn very accurate functions for predictions.
Key Takeaways for Students
1. Basic Unit: A single neuron takes inputs, applies a function (like ReLU), and produces an output.
2. Bigger Networks: Combine multiple neurons to handle more complex data with multiple
features.
3. Learning: The network figures out the best way to combine inputs to predict outputs using
training data.
By stacking these neurons together, you can solve increasingly complex problems—just like stacking
LEGO bricks to build something amazing!
Supervised Learning with Neural Network
The Hype Around Neural Networks
Neural networks have garnered a lot of attention recently, and much of this hype is justified given their
impressive performance across various domains. However, the majority of the economic value created
by neural networks to date stems from one specific type of machine learning: supervised learning.
What is Supervised Learning?
In supervised learning, the goal is to map an input xxx to an output yyy. For instance, in a housing price
prediction task, the input might include features like the size of a house, the number of bedrooms, and
its location. The output, yyy, is the estimated price of the house.
Here are some practical examples of where neural networks excel:
1. Online Advertising:
Perhaps the most lucrative application of deep learning today is online advertising. Neural
networks predict whether a user will click on an ad based on details about the ad and user
behavior. This application has significantly improved the revenue of major advertising companies
by personalizing ads for users.
2. Computer Vision:
Neural networks, particularly deep learning models, have revolutionized computer vision. For
example, an input image can be processed to output a label or index representing one of many
possible objects (e.g., identifying objects in a photo for tagging).
3. Speech Recognition:
Deep learning has enabled neural networks to convert audio clips into accurate text transcripts.
4. Machine Translation:
Neural networks can translate sentences from one language (e.g., English) to another (e.g.,
Chinese) with remarkable accuracy.
5. Autonomous Driving:
Neural networks process images from cameras and radar data to identify the positions of
vehicles and obstacles, serving as a key component in autonomous driving systems.
Selecting x and y for Applications
The effectiveness of supervised learning often depends on selecting appropriate inputs x and outputs y
for the problem at hand. Once identified, these components can fit into larger systems, such as
autonomous vehicles.
Different types of neural networks are suited for specific applications:
• Standard Neural Networks:
Suitable for structured data tasks, such as predicting housing prices or online ad performance.
• Convolutional Neural Networks (CNNs):
Ideal for image-related tasks like photo tagging or object recognition.
• Recurrent Neural Networks (RNNs):
Effective for sequence data, such as audio, language, or temporal sequences. For example,
processing speech or translating text often requires advanced RNN variants.
In more complex scenarios like autonomous driving, a hybrid network architecture combining CNNs and
other components might be necessary.
Structured vs. Unstructured Data
Neural networks handle two broad categories of data:
1. Structured Data:
Examples include databases containing well-defined features like the size of a house, the number
of bedrooms, or a user’s age. Supervised learning models use these features to make
predictions.
2. Unstructured Data:
This includes raw audio, images, or text. Historically, analyzing unstructured data was challenging
for computers, but deep learning has changed this. Neural networks now excel at recognizing
patterns in audio, identifying objects in images, and processing natural language.
While the media often highlights neural networks’ success with unstructured data (e.g., recognizing a cat
in a picture), their economic value in structured data applications—such as improving advertising
systems and processing large databases—cannot be understated.
Why Are Neural Networks Thriving Now?
The core technical concepts behind neural networks have existed for decades. However, only recently
have they become powerful tools, thanks to advances in computational power, data availability, and
algorithmic innovations.
Why is Deep Learning taking off?
The key reasons behind the recent rise of deep learning and its ongoing progress. Here's a summary of
the main points:
1. Availability of Data:
• Historical limitation: Traditional algorithms like SVMs and logistic regression plateaued in
performance with limited data.
• Modern abundance: The digitization of society has resulted in massive amounts of data from
digital activities, mobile apps, IoT sensors, cameras, etc.
• Deep learning thrives in this "big data" regime, where performance scales with data volume.
2. Scale of Neural Networks:
• Performance improves significantly with larger neural networks (more parameters and hidden
units).
• However, this requires a substantial amount of data and computational resources.
3. Computation Advances:
• Specialized hardware: GPUs and other hardware innovations have accelerated the training of
large networks.
• Faster computation enables rapid experimentation, shortening the feedback loop for developing
and refining neural network architectures.
4. Algorithmic Innovations:
• Techniques like replacing sigmoid activation functions with ReLU (Rectified Linear Unit) have
sped up gradient descent and training.
• Such innovations make training faster and more efficient, allowing researchers to build larger
and better-performing networks.
5. Iterative Development:
• Faster training cycles empower researchers to test and refine ideas quickly, fostering rapid
innovation in deep learning.
6. Optimism for the Future:
• Data growth: Society continues to generate more digital data.
• Improved hardware: Faster and more specialized computational resources are being developed.
• Ongoing research: The deep learning research community consistently delivers new algorithms,
ensuring continued progress.
These factors—data scale, computational power, and algorithmic advancements—are synergistically
driving the rise of deep learning and will likely sustain its growth in the foreseeable future.
Data Processing in Neural Networks
When working with training datasets, you might think of using a loop to process each example
individually. However, this is computationally expensive for large datasets. Neural networks handle this
by performing operations on the entire dataset at once using matrix operations. This approach leverages
the efficiency of linear algebra libraries, making it faster and more scalable.
2. Forward and Backward Propagation
• Forward Propagation: This step calculates predictions based on the current parameters of the
model.
• Backward Propagation: This step updates the model's parameters by minimizing the error in
predictions using techniques like gradient descent.
These steps are the building blocks of how a neural network learns from data.
3. Logistic Regression
Logistic regression is used as an introduction to neural networks because it shares similarities in
structure but is simpler. It predicts binary outcomes, such as:
• 1: Cat (True)
• 0: Not-cat (False)
4. How Images Are Represented
Images are stored as matrices corresponding to three color channels: Red, Green, and Blue (RGB). For a
64×64 image:
• Each channel is a 64×64 matrix.
• Flatten these into a single feature vector: x = [All red values, All green values, All blue values]
• The length of x is 64×64×3=12,288.
5. Notation and Representation
• Single Training Example: (x, y)
o x: Input feature vector (e.g., 12,288 pixel values).
o y: Output label (binary: 1 or 0).
• Training Set:
o Contains m examples: (x(1),y(1)),…,(x(m),y(m)).
o m is the number of training samples.
• Matrix Representation:
o X: Input feature matrix, nx × m, where nx is the size of x.
▪ Each column is one training example.
o Y: Output label matrix, 1×m1.
▪ Each column is the label for the corresponding training example in X.
This stacking in columns simplifies operations like forward and backward propagation.
Logistic Regression
Logistic regression is a supervised learning algorithm used for binary classification problems. The goal is
to predict an output Y (0 or 1) given an input X (e.g., an image) and determine the probability that Y is 1,
denoted as 𝑌̂.
Steps and Key Concepts
1. Input Representation:
o X: Feature vector representing input data.
o Y: Output label (0 or 1).
2. Parameters:
o W: Weight vector (same dimension as X).
o b: Bias term (a single scalar).
3. Linear Combination:
o Z=WTX+b: A linear combination of input features and parameters.
o This step is similar to linear regression but isn't sufficient for probabilities because Z can
take any value, including values outside [0,1].
4. Sigmoid Function:
o Used to map Z to a probability value between 0 and 1:
1
σ(𝑍) =
1 + 𝑒 −𝑍
o Properties:
▪ σ(Z)→1 as Z→∞.
▪ σ(Z)→0 as Z→−∞.
▪ σ(0)=0.5.
5. Output:
o 𝑌̂ = 𝜎(𝑍) : The predicted probability that Y = 1.
6. Learning Parameters:
o Adjust W and b to minimize the error in predictions.
o The method to achieve this (gradient descent) and the cost function will be discussed
later.
Logistic Regression Loss and Cost Functions
In this video, the process of defining the loss function and cost function for logistic regression is
explained. These are essential for training the parameters W and b of the logistic regression model.
Key Points:
1. Prediction Setup:
o Logistic regression outputs 𝑦̂ = 𝜎(𝑊 𝑇 𝑥 + 𝑏), where:
1
▪ 𝜎(𝑧) = 1+𝑒 −𝑧 is the sigmoid function.
▪ 𝑥 (𝑖) and 𝑦 (𝑖) refer to the input features and label for the i-th training example.
▪ 𝑧 (𝑖) = 𝑊 𝑇 𝑥 (𝑖) + 𝑏 is the linear combination of weights and inputs.
2. Loss Function:
o Measures how well the model's prediction 𝑦̂ matches the actual label 𝑦 for a single
training example.
o Defined as: ℒ(𝑦̂, 𝑦) = −𝑦 log(𝑦̂) − (1 − 𝑦) log(1 − 𝑦̂)
o Interpretation:
▪ If 𝑦 = 1: The function minimizes − log(𝑦̂), pushing 𝑦̂ towards 1.
▪ If 𝑦 = 0: The function minimizes − log(1 − 𝑦̂), pushing 𝑦̂ towards 0.
3. Cost Function:
o Measures the average performance across the entire training set:
𝑚
1
𝐽(𝑊, 𝑏) = ∑ ℒ (𝑦̂
(𝑖) , 𝑦 (𝑖) )
𝑚
𝑖=1
o Expanded:
𝑚
1
𝐽(𝑊, 𝑏) = − ∑ [𝑦 (𝑖) log (𝑦̂
(𝑖) ) + (1 − 𝑦 (𝑖) ) log (1 − 𝑦̂
(𝑖) )]
𝑚
𝑖=1
4. Optimization Goal:
o Train the logistic regression model by finding 𝑊 and 𝑏 that minimize the cost function
𝐽(𝑊, 𝑏).
Why Use This Loss Function?
• The chosen loss function ensures convex optimization, making it easier to find the global
minimum.
• Using alternatives, such as squared error, can lead to non-convex optimization problems with
multiple local minima.