Pytorch快速入门

Quick Start

首先通过一个模型训练的案例来对Pytorch进行快速使用。需要提前import的模块如下：

import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

数据准备

数据是模型训练的基础。在Pytorch中有两个数据相关的抽象，分别是torch.utils.data.DataLoader和torch.utils.data.Dataset。其中Dataset管理了训练数据和标签，提供相关方式来获取每个数据以及对应的Label；DataLoader对Dataset进行包装，将原始数据进行组织，并对Dataset进行迭代取值，相关迭代方法可以配置。

torchversion是视觉领域的相关库。Pytorch还提供了其他特定领域的相关库，包括torchtext、torchaudio。这些相关库中提供了一些相关测试数据。

这里使用torchversion中提供的FashionMNIST数据集来进行演示。下面的代码会将FashionMNIST数据集下载到./data目录下，并返回对应的Dataset。

# Download training data from open datasets.
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)

# Download test data from open datasets.
test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)

之后将Dataset包装成DataLoader，供后续训练使用。

batch_size = 64

# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

模型搭建

准备好数据之后，需要进行模型的搭建。Pytorch中模型的关键类是nn.Module。通过继承该类，在__init__中定义不同层次的网络，并在forward方法中定义前向传播的过程，我们可以定义出不同类型的模型。

Pytorch支持多种类型的设备，包括cpu，cuda（gpu），mps等。在使用的时候，Model和Data都需要使用to()方法加载到对应device上。

需要注意的是，mps支持Mac M系列的芯片。而在Intel chip的Mac上也会检测出mps可用，但是在Intel Chip Mac上使用mps device可能会存在一些问题。

# Get cpu, gpu or mps device for training.
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
# device = "cpu" # open it to use cpu if you are using mac intel chip
print(f"Using {device} device")

# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to(device)
print(model)

模型训练

模型训练的过程实际上就是一个迭代的过程，在每次循环当中，我们使用损失函数Loss Function和优化器Optimizer，利用梯度下算法来迭代更新模型参数，最终使得模型达到最优或者相对最优的水平。

在案例中，我们使用了交叉熵损失函数以及SGD优化器。

1 2	`loss_fn = nn.CrossEntropyLoss() optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)`

下面的train方法表示的是一个epoch的训练流程，一个epoch表示对训练集中的所有数据都处理一遍。在一个epoch中，我们首先利用DataLoader以batch的方式从训练集中加载数据，对于每个batch的数据，让其经过模型计算之后，通过损失函数计算Loss，再执行backward，利用Optimizer更新模型参数。在训练过程中，我们可以记录训练进度，模型损失等。

def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

下面的test方法表示的也是一个epoch的测试。与train方法不同的，test方法不需要对模型参数进行更新，只需要计算实际的loss即可。因此这里设置torch.no_grad()来关闭梯度更新。在训练过程中进行测试的目的是判断模型是否仍然在学习。

def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

有了单个epoch的训练和测试方法之后，实际的训练和测试就是循环执行单个epoch的流程，如下所示：

epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

模型保存和加载

模型训练完毕之后，可以将其保存成相关文件，包括模型保存和模型参数保存。这里我们选择将训练好的模型的相关参数进行保存。

1 2	`torch.save(model.state_dict(), "model.pth") print("Saved PyTorch Model State to model.pth")`

Pytorch中也提供方法从相应文件中加载出对应模型。

1 2	`model = NeuralNetwork().to(device) model.load_state_dict(torch.load("model.pth"))`

加载好模型之后，就可以直接使用了，如下所示：

classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

model.eval()
x, y = test_data[0][0], test_data[0][1]
with torch.no_grad():
    x = x.to(device)
    pred = model(x)
    predicted, actual = classes[pred[0].argmax(0)], classes[y]
    print(f'Predicted: "{predicted}", Actual: "{actual}"')

总结

通过快速入门案例，我们可以大致总结出一个完整的模型训练和测试流程：

准备数据集，利用DataLoader来加载数据集
搭建模型
构建损失函数，定义优化器
设置训练网络的一些参数，例如epoch、batch size、learning rate等
开始训练，在训练过程中记录测试结果
训练完成之后评估训练效果
将训练好的模型进行保存
加载已经保存好的模型进行使用

Tensor

Tensor是Pytorch中最基本的数据结构，在Pytroch中，模型的输入，输出以及模型参数都是使用Tensor来表示的。Tensor与Numpy中的ndarray非常类似，它们具有即为相似的API。不过Tensor可以借助GPU或者其他硬件来进行计算加速，而ndarray只能运行在CPU上；同时Tensor还对自动微分进行了优化，使其更加适合机器学习的场景。

Tensor的许多操作都可以与Numpy中的ndarray进行类比，有关Numpy可以查看本人的Numpy基础笔记。在Tensor中同样有向量化、广播机制等，与Numpy中的含义相同。

下面对Tensor的一些相关操作进行简单举例，主要是为了和Numpy中的API建立联系。更多的Tensor相关操作可以参考官方文档Torch.Tensors|Pytorch Documentation。

Tensor的创建

# Tensor可以直接从数据创建,数据类型自动推断
x_data = torch.tensor(data)

# 从numpy.ndarray创建
np_array = np.array(data)
x_np = torch.from_numpy(np_array)

# 从其他tensor创建，具有相同的shape
x_ones = torch.ones_like(x_data) 
x_rand = torch.rand_like(x_data, dtype=torch.float)

# 指定shape创建，默认dtype为float
shape = (2,3)
rand_tensor = torch.rand(shape)  # rand
randn_tensor = torch.randn(shape)  # rand正态分布
ones_tensor = torch.ones(shape)  # 全1矩阵
zeros_tensor = torch.zeros(shape)  # 全0矩阵

# 常用创建方法
range_tensor = torch.arange(10)

实际上，在Pytorch中，device为CPU的Tensors可以和numpy上的ndarray共享内存，两者的值变化会相互影响。可以通过Tensor的.numpy()和torch.from_numpy(ndarray)来构建这种关系：

# tensor to ndarray
t = torch.ones(5)
n = t.numpy()
 
# ndarray to tensor
n = np.ones(5)
t = torch.from_numpy(n)

Tensor的基本属性

tensor = torch.rand(3,4)

# shape
tensor.shape  
# or tensor.size
# dtype
tensor.dtype
# device(cpu/gpu/mps/...)
tensor.device

# num of element
tensor.numel()

通过Tensor的.to()方法，可以将Tensor转移到对应设备上，以使用相应硬件的加速能力，例如：

# move tensor to the GPU if available
if torch.cuda.is_available():
    tensor = tensor.to("cuda")
# move tensor to the MPS if available
elif torch.backends.mps.is_available():
    tensor = tensor.to("mps")

Tensor的索引、切片

# 与Numpy类似
tensor = torch.ones(4, 4)
tensor[0]  # first row
tensor[:, 0]  # first column
tensor[..., -1]  # last column

tensor = torch.arange(10).reshape(2, 5)
# 支持Mask矩阵
tensor[tensor % 2 == 0]  # tensor([0, 2, 4, 6, 8])

# 支持Index矩阵
i = torch.tensor([0, 0, 1, 1, 1])
j = torch.tensor([0, 1, 2, 3, 4])
tensor[i, j]  # tensor([0, 1, 7, 8, 9])

Tensor的数学计算方法

tensor = torch.arange(6).reshape(2, 3)
# + - * / ** @

# 矩阵乘法的三种方式 y1,y2,y3值相同
y1 = tensor @ tensor.T
y2 = tensor.matmul(tensor.T)
y3 = torch.rand_like(y1)
torch.matmul(tensor, tensor.T, out=y3)

# 矩阵按元素相乘的三种方式 z1,z2,z3值相同
z1 = tensor * tensor
z2 = tensor.mul(tensor)
z3 = torch.rand_like(tensor)
torch.mul(tensor, tensor, out=z3)

# 单元素Tensor与Python数值的转换: item()
sum_tensor = tensor.sum()
sum_value = sum_tensor.item()
# 或者Python内置函数
sum_value = int(sum_tensor)

# 向量点积(对应元素相乘再相加)
tensor = torch.arange(6)
a = torch.dot(tensor, tensor)

Pytorch中的方法通常都是返回一个新的对象，不过也提供对原对象修改的方法，即执行原地操作。这些方法大都在对应方法名后增加了一个下划线_。当然在Pytorch中不推荐这种使用方式，因为会导致梯度丢失。

1 2	`tensor = torch.arange(6).reshape(2, 3) tensor.add_(5) # tensor += 5`

对于聚合函数来说，我们同样可以指定沿着哪个轴进行聚合，这个轴在最终结果中会消失。不过可以通过指定keepdims属性为True，来保持维度，进行非降维聚合，此时该轴对应的元素数量下降为1。

tensor = torch.arange(8).reshape(2,4)
print(tensor.shape)  # torch.Size([2, 4])

print(tensor.sum(axis=0).shape)  # torch.Size([4])
print(tensor.sum(axis=1).shape)  # torch.Size([2])


# keepdims=True
print(tensor.sum(axis=0, keepdims=True).shape)  # torch.Size([1, 4])
print(tensor.sum(axis=1, keepdims=True).shape)  # torch.Size([2, 1])

Tensor的Shape变换

# 类比 numpy concatenate, 矩阵堆叠
tensor = torch.arange(8).reshape(2,4)
print(tensor.shape)  # torch.Size([2, 4])
t0 = torch.cat([tensor, tensor, tensor], dim=0)
print(t0.shape)  # torch.Size([6, 4])
t1 = torch.cat([tensor, tensor, tensor], dim=1)
print(t1.shape)  # torch.Size([2, 12])

# 沿着新维度进行矩阵堆叠，需要满足所有tensor具有相同的shape
# 会新增维度，新增到指定的dim对应的维度
tensor = torch.arange(9).reshape(3, 3)
print(tensor.shape)  # torch.Size([3, 3])
t0 = torch.stack([tensor, tensor], dim=0)
print(t0.shape)  # torch.Size([2, 3, 3])
t1 = torch.stack([tensor, tensor], dim=1)
print(t1.shape)  # torch.Size([3, 2, 3])
t2 = torch.stack([tensor, tensor], dim=2)
print(t2.shape)  # torch.Size([3, 3, 2])

Dataset与DataLoader

一个良好的架构特点是低耦合，在Pytorch中，也希望能够将数据处理代码和模型训练代码分离。Dataset和DataLoader就是用来解决这种问题的。在Pytorch中，Dataset存储样本数据feature以及对应的标签label；DataLoader则将Dataset包装成一个可迭代对象，使得我们能够更加轻松地访问样本。

除此之外，Pytorch还提供了许多预定义好的示例数据集，包括 Image Datasets，Text Datasets 和 Audio Datasets。在Pytorch中，这些数据集都以Dataset的子类形式提供，通过相关API我们可以直接拿到对应数据集的Dataset对象。例如在Quick Start中，我们就是直接使用了Pytroch自带的API获取了Fashion-MNIST数据集。

Dataset

Pytorch允许用户自定义Dataset类。自定义Dataset需要继承torch.utils.data.Dataset，并且实现其中的__init__，__len__和__getitem__方法。这三个方法实际上就是在定义如何读取数据，数据集的大小以及通过[index]如何获取数据。

在__init__方法中定义后续可能需要的所有变量和方法；
在__len__方法中返回数据集的大小；
在__getitem__方法中返回数据集中第index条记录，包括feature和label

import os
import pandas as pd
from torchvision.io import read_image

class CustomImageDataset(Dataset):
    def __init__(self, annotations_file, img_dir, transform=None, target_transform=None):
        self.img_labels = pd.read_csv(annotations_file)
        self.img_dir = img_dir
        self.transform = transform
        self.target_transform = target_transform

    def __len__(self):
        return len(self.img_labels)

    def __getitem__(self, idx):
        img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0])
        image = read_image(img_path)
        label = self.img_labels.iloc[idx, 1]
        if self.transform:
            image = self.transform(image)
        if self.target_transform:
            label = self.target_transform(label)
        return image, label

这里的示例代码处理的是这样的场景，用于训练的图片数据存储在img_dir中，对应的label则存放在annotataions_file中，每行表示一条数据记录，并且格式为image_file_name, label。

DataLoader

利用Dataset进行数据访问，我们只能通过下标进行，并且每次只能获取一个feature和label。在训练模型时，我们通常对传递样本的方式有所要求，例如希望以minibatch的形式传递，希望在不同时期对传递数据进行shuffle以减少过拟合等。

而DataLoader对Dataset进行了一层包装，它是一个可迭代对象，通过简单的 API 为我们抽象了这种数据处理的复杂性。利用DataLoader，我们可以很简单地完成batch，shuffle等的指定。

from torch.utils.data import DataLoader

train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=64, shuffle=True)

DataLoader可以根据需要对数据集进行迭代，每次迭代返回的是一批train_features和train_labels。如果额外指定了shuffle为true，则每次返回的数据各不相同。

1
2
3

# 迭代访问dataloader
# train_features,train_labels都是batch批量的数据
train_features, train_labels = next(iter(train_dataloader))

Transforms

我们前面提到Pytorch中模型训练输入输出和参数都是Tensor的数据类型，但是很多时候原始的训练数据集并不是Tensor的格式，例如Quick Start中训练数据就是以图片的形式出现的。Transforms指的就是将各式各样的原始数据类型转换成Tensor的格式，使其能够适合训练。

例如，在torchvision.transforms中提供了对于图像类数据来说常用的转换方式。

Model

一个Model是由多个模块module构成的，每个模块会对数据进行相应的计算处理。torch.nnnamespace向用户提供构建Model所需的支持。在Pytorch中，所有的Module都是nn.Module的子类。我们可以调用一些现有的基本Module进行组合，形成自定义的Model。

自定义Model需要实现一个Model类，继承nn.Module，并实现其中的__init__和forward()方法。

在__init__方法中定义在Model中可能用到的模块
在forward(self, x)方法中定义数据经过Model的前向传播流程。其中的参数x表示单个feature，即Dataset中的某个feature

回顾Quick Start中定义的Model，我们定义了如下的Model结构。

class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

我们可以从模型对于某个feature的一次处理来理解代码。从前向传播forward方法出发，在这种情况下，x就是一个由28*28的图片转化而来的(28, 28)的Tensor。在前向传播中，首先通过一个Flatten层将Tensor拍平为(784)，之后经过一个线性层Linear降低维度为(512)，之后依次经过ReLU、线性层、ReLU、线性层，此时Tensor的shape已经降低为(10)了。最后返回这个Tensor。

对于上面出现的一些基本模块，这里不再进行介绍，不过需要提一下nn.Sequential()方法，这个方法可以将多个Module按照顺序组织在一起，返回一个新的Module。

AutoGrad

在训练神经网络的过程中，最常用的算法是梯度下降，该算法通过计算参数相对于损失函数的梯度值来进行参数的更新。这个过程中最为复杂的一步就是梯度的计算，Pytorch通过内置的torch.autpgrad梯度计算引擎来计算梯度，它支持任何计算图的梯度自动计算。（反向传播是一种计算梯度的方式）

Tensor自身具有一个requires_grad属性，表示是否需要计算它的梯度。我们可以在创建的时候指定该属性，也可以通过x.requires_grad_(True)来设置。在训练过程中，我们调用模型损失（标量）的backward方法，pytorch就会自动进行梯度计算，此时可以通过.grad属性来查看对应的梯度。

1
2
3

loss.backward()
print(w.grad)
print(b.grad)

在有些时候，我们不希望Pytorch计算梯度，只希望网络进行前向传播，那么此时可以使用torch.no_grad()with 作用域进行包裹，这样Pytorch就不会进行梯度的计算了。或者使用某个Tensor的.detach()版本。这样的Tensor也不会被计算梯度。

不计算梯度的原因可能是我们希望冻结参数，又或者是希望加速计算。

with torch.no_grad():
  # ...
  
z = torch.matmul(x, w)+b
z_det = z.detach()

Model Save and Load

模型的保存和加载分为两种方式。第一种方式是仅保存模型的参数权重，在加载的时候需要先创建出具有相同结构的Model实例，然后再将对应参数加载到模型当中；第二种情况是直接将整个模型进行保存，包括Model的结构和参数，在使用的时候也是直接加载即可。

# 方式一: save model params
# save
torch.save(model_save.state_dict, "xxx.pth")
# load
model_load = xxx # model define 注意model使用的device要和保存的时候一致
model_load.load_state_dict(torch.load("xxx.pth"))

# 方式二: save model
# save
torch.save(model_save, "xxx.pth")
# load
model_load = torch.load("xxx.pth")

Using Device

在使用Pytorch框架的时候，我们通常会需要指定使用哪个或者哪些设备。如果在统一在代码中指定device，那么在许多地方都会用到，并且要做到统一管理较为麻烦，可能在某个地方忽略了，导致设备不统一。一种更加便捷的方式是，在书写代码的时候，要使用gpu的地方默认.cuda()，即默认使用.device("cuda")。然后在运行代码的时候，利用环境变量CUDA_VISIBLE_DEVICES来指定当前进程可见的GPU。它的原理是，假如设置了CUDA_VISIBLE_DEVICES=3,4,5，那么后续的进程就会将实际的3，4，5号GPU看作是0，1，2号GPU。这样我们就可以达到控制使用某个或者某些GPU的目的。例如：

1 2	`CUDA_VISIBLE_DEVICES=1,2,3 python xxx_using_multi_gpu.py CUDA_VISIBLE_DEVICES=7 python xxx_using_single_gpu.py`

参考文章

Introduction to Pytorch|Pytorch Documentation

深度学习 > Coding

#Python #Pytorch

Pytorch快速入门

https://evernorif.github.io/2023/08/16/Pytorch快速入门/

作者

EverNorif

发布于

2023年8月16日

许可协议

线性代数的本质上一篇

Slidev:从Markdown生成PPT 下一篇

Pytorch快速入门

Quick Start

数据准备

模型搭建

模型训练

模型保存和加载

总结

Tensor

相关概念

Tensor的创建

Tensor的基本属性

Tensor的索引、切片

Tensor的数学计算方法

Tensor的Shape变换

Dataset与DataLoader

Dataset

DataLoader

Transforms

Model

AutoGrad

Model Save and Load

Using Device

参考文章