[go: up one dir, main page]

0% found this document useful (0 votes)
30 views11 pages

Tut4 NN Pytorch Updated - Ipynb - Colab

Uploaded by

Kevin Luo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views11 pages

Tut4 NN Pytorch Updated - Ipynb - Colab

Uploaded by

Kevin Luo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

8/11/25, 10:51 PM Tut4_NN_pytorch_updated.

ipynb - Colab

keyboard_arrow_down PyTorch Tutorial for two-layer NN for classification


Here, we import and check the version of torch .

import torch
import torchvision

import numpy as np
from tqdm.notebook import tqdm
print(torch.__version__)

2.6.0+cu124

import matplotlib.pyplot as plt


import math
%matplotlib inline

from torchvision import datasets, transforms ## load the dataset

mnist_train = datasets.MNIST('data', train=True, download=True,


transform=transforms.ToTensor())

mnist_test = datasets.MNIST('../data', train=False, download=True, transform=


transforms.ToTensor())

100%|██████████| 9.91M/9.91M [00:00<00:00, 56.0MB/s]


100%|██████████| 28.9k/28.9k [00:00<00:00, 1.66MB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 12.3MB/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 7.08MB/s]
100%|██████████| 9.91M/9.91M [00:00<00:00, 57.0MB/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 1.70MB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 12.8MB/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 4.20MB/s]

print(mnist_train) # print info of the data

Dataset MNIST
Number of datapoints: 60000
Root location: data
Split: Train
StandardTransform
Transform: ToTensor()

We can easily visualize the images and their corresponding label as below. See how index 0 for a given sample corresponds to the image,
and index 1 is the label.

indices = [1, 12000, 344]

fig = plt.figure(figsize=(len(indices) * 4, 4))

for i, index in enumerate(indices):


ax = fig.add_subplot(1, len(indices), i + 1)
example = mnist_train[index]
ax.imshow(example[0].reshape(28, 28), cmap=plt.cm.gray)
ax.set_title("Label: {}".format(example[1]))

https://colab.research.google.com/drive/1QQ3yGOPmYTO41J-Ga-tuagGXJud2jv8V?usp=sharing#printMode=true 1/11
8/11/25, 10:51 PM Tut4_NN_pytorch_updated.ipynb - Colab

Pytorch's DataLoader is responsible for managing batches. You can create a DataLoader from any Dataset. DataLoader makes it easier to
iterate over batches (it can shuffle and give you the next mini-batch).

A drawback of having a Dataset wrapped with a DataLoader is that the DataLoader does not allow indexing. That's why if we want to get
a batch from a DataLoader without actually iterating over it as part of a loop, we have to convert it to an Iterator , and then called the
next method.

from torch.utils.data import DataLoader


train_dl = DataLoader(mnist_train, batch_size=100, shuffle=False)
# we set shufffle = False for reproducible visualization, in practice when training via SGD, should set it as the True

dataiter = iter(train_dl)
images, labels = next(dataiter)
viz = torchvision.utils.make_grid(images, nrow=10, padding = 2).numpy()
fig, ax = plt.subplots(figsize= (8,8))
ax.imshow(np.transpose(viz, (1,2,0)))
ax.set_xticks([])
ax.set_yticks([])
plt.show()

https://colab.research.google.com/drive/1QQ3yGOPmYTO41J-Ga-tuagGXJud2jv8V?usp=sharing#printMode=true 2/11
8/11/25, 10:51 PM Tut4_NN_pytorch_updated.ipynb - Colab

Thanks to PyTorch's ability to calculate gradients automatically, we can define the model and let torch do all the gradient update!

keyboard_arrow_down Some helper functions:


def accuracy(out, yb): # the accuracy evaluation
preds = torch.argmax(out, dim=1)
return (preds==yb).float().mean()

def get_test_stat(model, dl, device): # return the test loss and test accuracy
model.eval() # set model to eval mode#only matters if we have dropout/ normalization......
cum_loss, cum_acc = 0.0, 0.0
total_samples = 0

for i, (xb, yb) in enumerate(dl):


xb = xb.to(device)
yb = yb.to(device)

xb = xb.view(xb.size(0), -1)
f_pred = model(xb)
loss = loss_fn(f_pred, yb)
acc = accuracy(f_pred, yb)
cum_loss += loss.item() * len(yb)
cum_acc += acc.item() * len(yb)
total_samples += len(yb)

cum_loss /= total_samples
cum_acc /= total_samples
model.train() # set model back to train mode
return cum_loss, cum_acc

Then, we build a neural network with one hidden layer, by extending the torch.nn.Module class. This allows us to keep the code
modularized, and is how larger and more complicated models (e.g. ConvNets, self-atttention in LLM) are also built in PyTorch.

The torch.nn.Module is the base class for all neural network models in PyTorch. It provides the infrastructure for:

• Defining layers (e.g., nn.Linear, nn.Conv2d)

• Registering parameters so optimizers can update them

• Saving/loading model state (state_dict)

To build a custom network, subclass nn.Module and: 1. Define layers in init(). 2. Implement the forward pass in forward().

class Parent:
def __init__(self):
print("Parent init")

class Child(Parent):
def __init__(self):
print("Child init")
print("--------")

class Child2(Parent):
def __init__(self):
super().__init__() # will also call Parent.__init__() by using super().
print("Child init")
print("--------")

eg1 = Child()

eg2 = Child2()

Child init
--------
Parent init
Child init
--------

import torch.nn.functional as F
https://colab.research.google.com/drive/1QQ3yGOPmYTO41J-Ga-tuagGXJud2jv8V?usp=sharing#printMode=true 3/11
8/11/25, 10:51 PM Tut4_NN_pytorch_updated.ipynb - Colab
p

class LR(torch.nn.Module): # It defines a simple one-layer NN for classfication. It is just equivalent to a logistic regression
def __init__(self, input_dim, output_dim):
super(LR, self).__init__()
# define the parameters here
self.fc = torch.nn.Linear(input_dim, output_dim) #in linear reg outputdim =1, in logistic reg, output dim =K

def forward(self, x): # defines the forward pass (overwriting the default method)
out = self.fc(x) # pass input through the first layer
return out

class Net(torch.nn.Module): # It defines a simple two-layer NN for classfication.


def __init__(self, input_dim, hidden_dim, output_dim):
super(Net, self).__init__()

# define the parameters here


self.fc = torch.nn.Linear(input_dim, hidden_dim) # first layer, FC means fully connected, , you can also try more layers
self.out_layer = torch.nn.Linear(hidden_dim, output_dim) # output layer, i..e the last layer

def forward(self, x): # defines the forward pass (overwriting the default method)
out = self.fc(x) # pass input through the first layer
out = F.relu(out) # apply ReLU activation, you can change it to other activation such as tanh.
out = self.out_layer(out) # pass through the output layer

return out

We can now train the network. Note that instead of manually updating the weights ourselves, we use a built-in PyTorch optimizer here
torch.optim.SGD . Many other optimizers are available too (https://pytorch.org/docs/stable/optim.html).

The output our defined neural net is f pred , and the predicted probability is
f pred,k
e
^
P(y = k) =
9 f pred,j
∑ e
j=0

The loss on a single data point is given by


9 9

f pred,j
ℓ(y, f pred ) = − ∑ I {y = k} ⋅ f pred,k + log (∑ e ).

k=0 j=0

This loss is nothing but simply the negative log-likelihood, or usually called cross-entropy loss in the ML convention.

learning_rate = 1e-2
epochs = 10
bs = 128 # mini-batch size
dim_x = 784 # dimension of the input features 784= 28 * 28
dim_out = 10 # dim_out here is set to be 10, as we output 10 scores of 10 classes

# instantiate the model


model_LR = LR(dim_x, dim_out)

optimizer = torch.optim.SGD(model_LR.parameters(), lr=learning_rate)

# create datasets and data loader


mnist_train = datasets.MNIST('data', train=True, download=True,
transform=transforms.ToTensor())

mnist_test = datasets.MNIST('../data', train=False, download=True, transform=


transforms.ToTensor())

train_dl = DataLoader(mnist_train, batch_size=bs,shuffle=True)


#Dataloader can help us perform random shuffling and sample splitting to perform stochastic mini-batch GD., it automatically randomly split

test_dl = DataLoader(mnist_test, batch_size = 100)

# Using GPUs in PyTorch is pretty straightforward


if torch.cuda.is_available():
print("Using cuda")
use_cuda = True
device = torch.device("cuda")
else:
device = "cpu"

model_LR.to(device)
loss_fn = torch.nn.CrossEntropyLoss()

https://colab.research.google.com/drive/1QQ3yGOPmYTO41J-Ga-tuagGXJud2jv8V?usp=sharing#printMode=true 4/11
8/11/25, 10:51 PM Tut4_NN_pytorch_updated.ipynb - Colab

# set the model to training mode


model_LR.train()

train_stats_LR = {
'epoch': [],
'loss': [],
'acc': []
}
test_stats_LR = {
'epoch': [],
'loss': [],
'acc': []
}

pbar = tqdm(range(epochs))
for epoch in pbar: # During one epoch, the model processes all samples in the training set exactly once.
pbar.set_description(f"Epoch {epoch + 1} / 10") # printing the training process
train_loss = 0.0
train_acc = 0.0
for i, (xb, yb) in enumerate(train_dl): #each iteration here means a mini-batch gradient descent update.
xb = xb.to(device) # transport the data from your storage to the device for computing gradient
yb = yb.to(device) # transport the data from your storage to the device for computing gradient
xb = xb.view(xb.size(0), -1) # originally the training data is a tensor of shape [batch_size, 28, 28], we reshape it to be [batch_si

# Forward pass
f_pred = model_LR(xb) # f_pred here is the scores/logits for each class, not the predicted label
loss = loss_fn(f_pred, yb) # return the loss defined above
acc = accuracy(f_pred, yb) # accuracy function was defined before in the helper function chunk
# Backward pass
model_LR.zero_grad() # Zero out the previous gradient computation, see Tut 3 for the reason of zero-out operation
loss.backward() # Compute the gradient on the current mini-batch
optimizer.step() # Use the gradient information to make a step, update the parameters
train_stats_LR['epoch'].append(epoch + i / len(train_dl)) # the current min-batch index
train_stats_LR['loss'].append(loss.item()) # the current mini-batch training loss
train_stats_LR['acc'].append(acc.item()) # the current mini-batch training accuracy

test_loss_LR , test_acc_LR = get_test_stat(model_LR, test_dl, device) # the test_loss, test_accuracy.


test_stats_LR['epoch'].append(epoch + 1)
test_stats_LR['loss'].append(test_loss_LR)
test_stats_LR['acc'].append(test_acc_LR)

Epoch 10 / 10: 100% 10/10 [01:43<00:00, 10.35s/it]

learning_rate = 1e-2
epochs = 10
bs = 128 # mini-batch size
dim_x = 784 # dimension of the input features 784=28 * 28
dim_h = 32 # hidden layer dimension 32
dim_out = 10 # dim_out here is set to be 10, as we output 10 scores of 10 classes

# instantiate the model


model = Net(dim_x, dim_h, dim_out)

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

# create datasets and data loader


mnist_train = datasets.MNIST('data', train=True, download=True,
transform=transforms.ToTensor())

mnist_test = datasets.MNIST('../data', train=False, download=True, transform=


transforms.ToTensor())
train_dl = DataLoader(mnist_train, batch_size=bs,shuffle=True)
#Dataloader can help us perform random shuffling and sample splitting to perform stochastic mini-batch GD., it automatically randomly split

test_dl = DataLoader(mnist_test, batch_size = 100)

# Using GPUs in PyTorch is pretty straightforward


if torch.cuda.is_available():
print("Using cuda")
use_cuda = True
device = torch.device("cuda")
else:
device = "cpu"

https://colab.research.google.com/drive/1QQ3yGOPmYTO41J-Ga-tuagGXJud2jv8V?usp=sharing#printMode=true 5/11
8/11/25, 10:51 PM Tut4_NN_pytorch_updated.ipynb - Colab
model.to(device)
loss_fn = torch.nn.CrossEntropyLoss()

# set the model to training mode


model.train()

train_stats = {
'epoch': [],
'loss': [],
'acc': []
}
test_stats = {
'epoch': [],
'loss': [],
'acc': []
}

pbar = tqdm(range(epochs))
for epoch in pbar: # During one epoch, the model processes all samples in the training set exactly once.
pbar.set_description(f"Epoch {epoch + 1} / 10") # printing the training process
train_loss = 0.0
train_acc = 0.0
for i, (xb, yb) in enumerate(train_dl): #each iteration here means a mini-batch gradient descent update.
xb = xb.to(device) # transport the data from your storage to the device for computing gradient
yb = yb.to(device) # transport the data from your storage to the device for computing gradient
xb = xb.view(xb.size(0), -1) # originally the training data is a tensor of shape [batch_size, 28, 28], we reshape it to be [batch_si

# Forward pass
f_pred = model(xb) # f_pred here is the scores/logits for each class, not the predicted label
loss = loss_fn(f_pred, yb) # return the loss defined above
acc = accuracy(f_pred, yb) # accuracy function was defined before in the helper function chunk
# Backward pass
model.zero_grad() # Zero out the previous gradient computation, see Tut 3 for the reason of zero-out operation
loss.backward() # Compute the gradient on the current mini-batch
optimizer.step() # Use the gradient information to make a step, update the parameters
train_stats['epoch'].append(epoch + i / len(train_dl)) # the current min-batch index
train_stats['loss'].append(loss.item()) # the current mini-batch training loss
train_stats['acc'].append(acc.item()) # the current mini-batch training accuracy

test_loss, test_acc = get_test_stat(model, test_dl, device) # the test_loss, test_accuracy.


test_stats['epoch'].append(epoch + 1)
test_stats['loss'].append(test_loss)
test_stats['acc'].append(test_acc)

Epoch 10 / 10: 100% 10/10 [02:01<00:00, 12.21s/it]

Plot training and test loss & accuracy curves.

fig, axes = plt.subplots(nrows=2, ncols=1, figsize=(6, 6))

axes[0].plot(train_stats_LR['epoch'], train_stats_LR['loss'], label='train_LR')


axes[0].plot(test_stats_LR['epoch'], test_stats_LR['loss'], label='test_LR')
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Loss')

axes[1].plot(train_stats['epoch'], train_stats['acc'], label='train_LR')


axes[1].plot(test_stats['epoch'], test_stats['acc'], label='test_LR')
axes[1].set_xlabel('Epoch_LR')
axes[1].set_ylabel('Accuracy_LR')

plt.legend()
plt.show()

https://colab.research.google.com/drive/1QQ3yGOPmYTO41J-Ga-tuagGXJud2jv8V?usp=sharing#printMode=true 6/11
8/11/25, 10:51 PM Tut4_NN_pytorch_updated.ipynb - Colab

fig, axes = plt.subplots(nrows=2, ncols=1, figsize=(6, 6))

axes[0].plot(train_stats['epoch'], train_stats['loss'], label='train')


axes[0].plot(test_stats['epoch'], test_stats['loss'], label='test')
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Loss')

axes[1].plot(train_stats['epoch'], train_stats['acc'], label='train')


axes[1].plot(test_stats['epoch'], test_stats['acc'], label='test')
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('Accuracy')

plt.legend()
plt.show()

(train_stats['loss'][-1]), (train_stats_LR['loss'][-1])

(0.37917956709861755, 0.4087868630886078)

https://colab.research.google.com/drive/1QQ3yGOPmYTO41J-Ga-tuagGXJud2jv8V?usp=sharing#printMode=true 7/11
8/11/25, 10:51 PM Tut4_NN_pytorch_updated.ipynb - Colab

keyboard_arrow_down Weight visualization


We visualize the learned weights in the first layer of the network as images. Compared to the linear model before, this model has 100 hidden
units with ReLU activation, enabling it to make use of a more diverse set of features.

nrows = 4
ncols = 8
first_layer_weights = model.fc.weight.detach().cpu().numpy()
fig, axes = plt.subplots(nrows=ncols, ncols=ncols, figsize=(6, 6))

for i in range(nrows):
for j in range(ncols):
axes[i, j].imshow(first_layer_weights[i * ncols + j].reshape((28, 28)), cmap='gray')
axes[i, j].set_xticks([])
axes[i, j].set_yticks([])

plt.tight_layout(pad=0.1)
plt.show()

nrows = 1
ncols = 10
first_layer_weights_LR = model_LR.fc.weight.detach().cpu().numpy()
fig, axes = plt.subplots(nrows=1, ncols=ncols, figsize=(12, 2))

for i in range(ncols):
axes[i].imshow(first_layer_weights_LR[i].reshape((28, 28)), cmap='gray')
axes[i].set_xticks([])
axes[i].set_yticks([])

plt.tight_layout(pad=0.1)
plt.show()

https://colab.research.google.com/drive/1QQ3yGOPmYTO41J-Ga-tuagGXJud2jv8V?usp=sharing#printMode=true 8/11
8/11/25, 10:51 PM Tut4_NN_pytorch_updated.ipynb - Colab

keyboard_arrow_down More advanced techniques in DL


Residual Connection:
Denote h(i) the input to the layer i + 1 from the previous layer. The function defined in layer i + 1 is F (i+1) (⋅) . Then the residual connection
is defined as
(i+1) (i+1) (i) (i)
Res = F (h ) + h .

(i+1) (i+1) (i+1) (i)


When using residual connection, the input to layer i + 2 is Res instead of h = F (h )

AdamW:
An advanced mini-batch gradient method, with momentum and weight decay.

Momentum means that at each mini-batch, the update Δw is not purely the gradient at current mini-batch, but a weighted average of
gradient at current minibatch and gradients from previous mini-batch.

The weight decay implicitly introduces ℓ2 regularization to parameters to prevent overfitting.

Dropout:
Dropout is a regularization technique that randomly sets a fraction of neurons’ outputs to zero during training to prevent overfitting and
enhance robustness.

class ResNet(torch.nn.Module): # It defines a simple 3-layer NN for classfication.


def __init__(self, input_dim, hidden_dim, output_dim):
super(ResNet, self).__init__()

# define the parameters here


self.fc1 = torch.nn.Linear(input_dim, hidden_dim) # first layer, FC means fully connected, , you can also try more layers
self.fc2 = torch.nn.Linear(hidden_dim, hidden_dim)
self.out_layer = torch.nn.Linear(hidden_dim, output_dim) # output layer, i..e the last layer

def forward(self, x): # defines the forward pass (overwriting the default method)
out = self.fc1(x) # pass input through the first layer
out = F.relu(out)
out = self.fc2(out) + out
out = self.out_layer(out) # pass through the output layer

return out

learning_rate = 1e-2
epochs = 10
bs = 128 # mini-batch size
dim_x = 784 # dimension of the input features 784=28 * 28
dim_h = 32 # hidden layer dimension 32
dim_out = 10 # dim_out here is set to be 10, as we output 10 scores of 10 classes

# instantiate the model


model_RN = ResNet(dim_x, dim_h, dim_out)

optimizer = torch.optim.AdamW(
model_RN.parameters(),
lr=learning_rate, # same learning rate variable
weight_decay=1e-3 # AdamW usually uses some weight decay (equiv to L2 regularization)
)

# create datasets and data loader


mnist_train = datasets.MNIST('data', train=True, download=True,
transform=transforms.ToTensor())

mnist_test = datasets.MNIST('../data', train=False, download=True, transform=


https://colab.research.google.com/drive/1QQ3yGOPmYTO41J-Ga-tuagGXJud2jv8V?usp=sharing#printMode=true 9/11
8/11/25, 10:51 PM Tut4_NN_pytorch_updated.ipynb - Colab
transforms.ToTensor())
train_dl = DataLoader(mnist_train, batch_size=bs,shuffle=True)
#Dataloader can help us perform random shuffling and sample splitting to perform stochastic mini-batch GD., it automatically randomly split

test_dl = DataLoader(mnist_test, batch_size = 100)

# Using GPUs in PyTorch is pretty straightforward


if torch.cuda.is_available():
print("Using cuda")
use_cuda = True
device = torch.device("cuda")
else:
device = "cpu"

model_RN.to(device)
loss_fn = torch.nn.CrossEntropyLoss()

# set the model to training mode


model_RN.train()

train_stats_RN = {
'epoch': [],
'loss': [],
'acc': []
}
test_stats_RN = {
'epoch': [],
'loss': [],
'acc': []
}

pbar = tqdm(range(epochs))
for epoch in pbar: # During one epoch, the model processes all samples in the training set exactly once.
pbar.set_description(f"Epoch {epoch + 1} / 10") # printing the training process
train_loss_RN = 0.0
train_acc_RN = 0.0
for i, (xb, yb) in enumerate(train_dl): #each iteration here means a mini-batch gradient descent update.
xb = xb.to(device) # transport the data from your storage to the device for computing gradient
yb = yb.to(device) # transport the data from your storage to the device for computing gradient
xb = xb.view(xb.size(0), -1) # originally the training data is a tensor of shape [batch_size, 28, 28], we reshape it to be [batch_si

# Forward pass
f_pred = model_RN(xb) # f_pred here is the scores/logits for each class, not the predicted label
loss = loss_fn(f_pred, yb) # return the loss defined above
acc = accuracy(f_pred, yb) # accuracy function was defined before in the helper function chunk
# Backward pass
model_RN.zero_grad() # Zero out the previous gradient computation, see Tut 3 for the reason of zero-out operation
loss.backward() # Compute the gradient on the current mini-batch
optimizer.step() # Use the gradient information to make a step, update the parameters
train_stats_RN['epoch'].append(epoch + i / len(train_dl)) # the current min-batch index
train_stats_RN['loss'].append(loss.item()) # the current mini-batch training loss
train_stats_RN['acc'].append(acc.item()) # the current mini-batch training accuracy

test_loss, test_acc = get_test_stat(model_RN, test_dl, device) # the test_loss, test_accuracy.


test_stats_RN['epoch'].append(epoch + 1)
test_stats_RN['loss'].append(test_loss)
test_stats_RN['acc'].append(test_acc)

Epoch 10 / 10: 100% 10/10 [01:54<00:00, 11.32s/it]

fig, axes = plt.subplots(nrows=2, ncols=1, figsize=(6, 6))

axes[0].plot(train_stats_RN['epoch'], train_stats_RN['loss'], label='train')


axes[0].plot(test_stats_RN['epoch'], test_stats_RN['loss'], label='test')
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Loss')

axes[1].plot(train_stats_RN['epoch'], train_stats_RN['acc'], label='train')


axes[1].plot(test_stats_RN['epoch'], test_stats_RN['acc'], label='test')
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('Accuracy')

plt.legend()
plt.show()

https://colab.research.google.com/drive/1QQ3yGOPmYTO41J-Ga-tuagGXJud2jv8V?usp=sharing#printMode=true 10/11
8/11/25, 10:51 PM Tut4_NN_pytorch_updated.ipynb - Colab

https://colab.research.google.com/drive/1QQ3yGOPmYTO41J-Ga-tuagGXJud2jv8V?usp=sharing#printMode=true 11/11

You might also like