A Quick Peek into PyTorch
7/21/2024|Tech|5 min read
Google Colab vs Kaggle
There are several Jupyter Lab services out there and they all pack with different perks. I chose Google Colab to start with since it has a newer version of Python so I could give PyTorch 2.0 a shot (though there might not be too much of a difference for a beginner).
Replace Pre-installed PyTorch with Your Flavor
Simply just provide the version of your choice and install them with pip, life is that easy:
pip install torch==2.0.0 torchvision==0.15.1
Prepare Data
Luckily, there is no need for me to hustle on collecting a dataset of different breeds of dogs; the Stanford Dogs Dataset saved my time and also I spot the big name behind this dataset -- Fei-Fei Li which makes it more promising. This dataset consists of 120 breeds of dogs with 20580 images in total.
Then, it is only a simple download and unarchive:
curl -fsSL -o images.tar http://vision.stanford.edu/aditya86/ImageNetDogs/images.tar
tar xf images.tar
# This is not necessary needed in most cases
# curl -fsSL -o annotations.tar http://vision.stanford.edu/aditya86/ImageNetDogs/annotation.tar
# tar xf annotations.tar
Before Coding
Since I am learning a new framework, I have to write every single line of code by referring to the documentation; I would better think about how would I train this model.
We have 20k+ images over 120 classes, so that is less than 200 images per class. I think that is not a lot of data for a multiclass classification problem. Thus, I decided to do a transfer learning using a simple model: ResNet50 with pre-trained weights on ImageNet.
According to my humble experience, I chose the following stuff/hyperparameters for this model:
- Optimizer: Adam (is it still SOTA?)
- Loss: Cross Entropy Loss
- Batch size: 256 (since the input is , the free GPU from Google Colab can handle that for sure)
The Actual Code
Isn't learning a new framework delightful? To feel how much a framework enables you to do in some fields is interesting. Long story short, I divided my code into the following parts:
- Constants initialization
- Loading data
- Batch training iteration declaration
- Model, optimizer and loss initialization
- Training
I will be implementing these step by step.
As mentioned above as hyperparameters, I have to define the BATCH_SIZE
for this model. Additionally, PyTorch uses a device to control where computation on tensors takes place, so I chose my default device for training here:
import torch
EPOCHS = 100
DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f'Running on {DEVICE}')
Loading Data
There are usually two steps in PyTorch to read from a dataset:
- Create a
object; this object knows how to iterate the dataset - Create a
object; this object knows how to batch the dataset
There is a nice article about how exactly one should do to import any shape of data, but the dataset that I am using is well-structured (images are organized in different folders which represent their class), so I can take advantage of the torchvision
library, and use ImageFolder
to import all the data:
import matplotlib.pyplot as plt
from torchvision.datasets import ImageFolder
from torchvision.transforms import Compose, Resize, RandomCrop, ToTensor
dogs_dataset = ImageFolder('./Images', transform=Compose([
Resize(256, antialias=True),
idx_to_class = {v: k.split('-')[1] for k, v in dogs_dataset.class_to_idx.items()}
for i in range(3):
idx = torch.randint(len(dogs_dataset), (1,))
data = dogs_dataset[idx]
ax = plt.subplot(1, 3, i + 1)
plt.imshow(data[0].numpy().transpose(1, 2, 0))
Since ResNet50 takes tensors of size , I used transforms
provided by torchvision
library to accomplish this easily.
Then, using DataLoader
, the dataset will be automatically batched for you:
from torch.utils.data import random_split, DataLoader
train_size = int(0.8 * len(dogs_dataset))
test_size = len(dogs_dataset) - train_size
train_dataset, test_dataset = random_split(dogs_dataset,
[train_size, test_size])
trainloader = DataLoader(train_dataset,
testloader = DataLoader(train_dataset,
dataloaders_dict = {
'train': trainloader,
'val': testloader
Batch Training
After some preparation of data, it is time to define the training loop:
from torch import nn, optim
def train_model(model: nn.Module,
dataloaders: dict[str, DataLoader],
criterion: nn.Module,
optimizer: optim.Optimizer,
num_epochs: int,
device: torch.device):
since = time.time()
val_acc_history = []
for epoch in range(num_epochs):
print(f'Epoch {epoch}/{num_epochs - 1}')
print('-' * 10)
# Each epoch has a training and validation phase
for phase in ['train', 'val']:
if phase == 'train':
model.train() # Set model to training mode
model.eval() # Set model to evaluate mode
running_loss = 0.0
running_corrects = 0
# Iterate over data
for inputs, labels in dataloaders[phase]:
inputs = inputs.to(device)
labels = labels.to(device)
# zero the parameter gradients
# forward
# track history if only in train
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
loss = criterion(outputs, labels)
_, preds = torch.max(outputs, 1)
# backward + optimize only if in training phase
if phase == 'train':
# statistics
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)
epoch_loss = running_loss / len(dataloaders[phase].dataset)
epoch_acc = running_corrects.double() / len(dataloaders[phase].dataset)
print(f'{phase} Loss: {epoch_loss} Acc: {epoch_acc}')
if phase == 'val':
time_elapsed = time.time() - since
print(f'Training complete in {time_elapsed // 60}m {time_elapsed % 60}s')
return model, val_acc_history
The code above comes with some more-than-necessary features that can be taken off. The essential part is only the loop and operations done on optimizer
and loss
Model, Optimizer, Loss and Training
After defining some essential blocks for the entire workflow, I instantiated the network with pre-trained weights on ImageNet:
from torchvision.models import resnet
model = resnet.resnet50(weights=resnet.ResNet50_Weights.DEFAULT)
for param in model.parameters():
param.requires_grad = False
model.fc = nn.Linear(2048, 120)
As the above code suggested, with not enough data, I only took off the last layer of the network for fine-tuning.
Initialize a loss criterion and an optimizer:
params_to_update = []
for name, param in model.named_parameters():
if param.requires_grad:
optimizer = optim.Adam(params_to_update)
criterion = nn.CrossEntropyLoss()
Note that here I only passed in parameters except the last layer.
After initializing everything, the loss criterion and optimizer will handle the rest:
model, hist = train_model(model,
Get some coffee and wait, things will be done sooner or later.