- 1、Saving and Loading Model Weights
- we do not specify pretrained=True, i.e. do not load default weights
- be sure to call model.eval() method before inferencing to set the dropout and batch
- normalization layers to evaluation mode. Failing to do this will yield inconsistent
- inference results.
- 3、Saving and Loading(resuming training)a General Checkpoint
- Additional information
- 4、Saving and Loading Models across Devices
- Save
- Load
- Save
- Load to whatever device you want
- Specify a path to save to
In this section we will look at how to persist model state with saving, loading and running model predictions.
import torch
import torchvision.models as models
1、Saving and Loading Model Weights
PyTorch models store the learned parameters in an internal state dictionary, called
state_dict
. These can be persisted via thetorch.save
method:A common way to save a model is to serialize the internal state dictionary (containing the model parameters).
model = models.vgg16(pretrained=True)
torch.save(model.state_dict(), 'model_weights.pth')
To load model weights, you need to create an instance of the same model first, and then load the parameters using
load_state_dict()
method. ```pythonwe do not specify pretrained=True, i.e. do not load default weights
model = models.vgg16() model.load_state_dict(torch.load(‘model_weights.pth’))
be sure to call model.eval() method before inferencing to set the dropout and batch
normalization layers to evaluation mode. Failing to do this will yield inconsistent
inference results.
model.eval()
<a name="IIr2h"></a>
# 2、Saving and Loading Models with Shapes
- When loading model weights, we needed to instantiate the model class first, because the class defines the structure of a network.
- We might want to save the structure of this class together with the model, in which case we can pass `model` (and not `model.state_dict()`) to the saving function:
```python
torch.save(model, 'model.pth')
We can then load the model like this:
model = torch.load('model.pth')
Saving and loading a general checkpoint model for inference or resuming training can be helpful for picking up where you last left off.
- When saving a general checkpoint, you must save more than just the model’s state_dict. It is important to also save the optimizer’s state_dict, as this contains buffers and parameters that are updated as the model trains. Other items that you may want to save are the epoch you left off on, the latest recorded training loss, external
torch.nn.Embedding
layers, and more, based on your own algorithm.- To save multiple checkpoints, you must organize them in a dictionary and use
torch.save()
to serialize the dictionary. A common PyTorch convention is to save these checkpoints using the.tar
file extension. - To load the items, first initialize the model and optimizer, then load the dictionary locally using
torch.load()
. From here, you can easily access the saved items by simply querying the dictionary as you would expect.
- To save multiple checkpoints, you must organize them in a dictionary and use
steps to save and load multiple checkpoints are shown below.
1)Import all necessary libraries for loading our data
import torch import torch.nn as nn import torch.optim as optim
2)Define and initialize the neural network
For sake of example, we will create a neural network for training images. ```python class Net(nn.Module): def init(self):
super(Net, self).__init__() self.conv1 = nn.Conv2d(3, 6, 5) self.pool = nn.MaxPool2d(2, 2) self.conv2 = nn.Conv2d(6, 16, 5) self.fc1 = nn.Linear(16 * 5 * 5, 120) self.fc2 = nn.Linear(120, 84) self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x))) x = self.pool(F.relu(self.conv2(x))) x = x.view(-1, 16 * 5 * 5) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x
net = Net() print(net)
<a name="lEpFy"></a>
## 3)Initialize the optimizer
- We will use SGD with momentum.
```python
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
4)Save the general checkpoint
- Collect all relevant information and build your dictionary.
```python
Additional information
EPOCH = 5 PATH = “model.pt” LOSS = 0.4
torch.save({ ‘epoch’: EPOCH, ‘model_state_dict’: net.state_dict(), ‘optimizer_state_dict’: optimizer.state_dict(), ‘loss’: LOSS, }, PATH)
<a name="et8mK"></a>
## 5)Load the general checkpoint
- Remember to first initialize the model and optimizer, then load the dictionary locally.
- You must call `model.eval()` to set dropout and batch normalization layers to evaluation mode before running inference. Failing to do this will yield inconsistent inference results.
- If you wish to resuming training, call `model.train()` to ensure these layers are in training mode.
```python
model = Net()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
checkpoint = torch.load(PATH)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']
model.eval()
# - or -
model.train()
4、Saving and Loading Models across Devices
- There may be instances where you want to save and load your neural networks across different devices. Saving and loading models across devices is relatively straightforward using PyTorch.
In this recipe, we will experiment with saving and loading models across CPUs and GPUs. Steps are shown below.
1)Import all necessary libraries for loading our data
import torch import torch.nn as nn import torch.optim as optim
2)Define and intialize the neural network
For sake of example, we will create a neural network for training images. ```python class Net(nn.Module): def init(self):
super(Net, self).__init__() self.conv1 = nn.Conv2d(3, 6, 5) self.pool = nn.MaxPool2d(2, 2) self.conv2 = nn.Conv2d(6, 16, 5) self.fc1 = nn.Linear(16 * 5 * 5, 120) self.fc2 = nn.Linear(120, 84) self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x))) x = self.pool(F.relu(self.conv2(x))) x = x.view(-1, 16 * 5 * 5) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x
net = Net() print(net)
<a name="zWBNE"></a>
## 3)Save and Load on CPU/GPU
<a name="lF5J8"></a>
### (1)Save on a GPU, load on a CPU
- When loading a model on a CPU that was trained with a GPU, pass `torch.device('cpu')` to the `map_location` argument in the `torch.load()` function.
```python
# Specify a path to save to
PATH = "model.pt"
# Save
torch.save(net.state_dict(), PATH)
# Load
device = torch.device('cpu')
model = Net()
# the storages underlying the tensors are dynamically remapped to the CPU device
# using the ``map_location`` argument.
model.load_state_dict(torch.load(PATH, map_location=device))
(2)Save on a GPU, load on a GPU
- When loading a model on a GPU that was trained and saved on GPU, simply convert the initialized model to a CUDA optimized model using
model.to(torch.device('cuda'))
. - Be sure to use the
.to(torch.device('cuda'))
function on all model inputs to prepare the data for the model. - Note that calling
my_tensor.to(device)
returns a new copy ofmy_tensor
on GPU. It does NOT overwritemy_tensor
. Therefore, remember to manually overwrite tensors:my_tensor = my_tensor.to(torch.device('cuda'))
. ```pythonSave
torch.save(net.state_dict(), PATH)
Load
device = torch.device(“cuda”) model = Net() model.load_state_dict(torch.load(PATH)) model.to(device)
<a name="nRet2"></a>
### (3)Save on a CPU, load on a GPU
- When loading a model on a GPU that was trained and saved on CPU, set the `map_location` argument in the `torch.load()` function to `cuda:device_id`. This loads the model to a given GPU device.
- Be sure to call `model.to(torch.device('cuda'))` to convert the model’s parameter tensors to CUDA tensors.
- Finally, also be sure to use the `.to(torch.device('cuda'))` function on all model inputs to prepare the data for the CUDA optimized model.
```python
# Save
torch.save(net.state_dict(), PATH)
# Load
device = torch.device("cuda")
model = Net()
# Choose whatever GPU device number you want
model.load_state_dict(torch.load(PATH, map_location="cuda:0"))
# Make sure to call input = input.to(device) on any input tensors that you feed to the model
model.to(device)
4)Saving and loading DataParallel
models
torch.nn.DataParallel
is a model wrapper that enables parallel GPU utilization.- To save a
DataParallel
model generically, save themodel.module.state_dict()
. This way, you have the flexibility to load the model any way you want to any device you want. ```pythonSave
torch.save(net.module.state_dict(), PATH)
Load to whatever device you want
<a name="RDHX3"></a>
# 5、Saving and Loading Multiple Models in one File
- Saving and loading multiple models can be helpful for reusing models that you have previously trained.
- When saving a model comprised of multiple `torch.nn.Modules`, such as a GAN, a sequence-to-sequence model, or an ensemble of models, you must save a dictionary of each model’s state_dict and corresponding optimizer. You can also save any other items that may aid you in resuming training by simply appending them to the dictionary.
- To load the models, first initialize the models and optimizers, then load the dictionary locally using `torch.load()`. From here, you can easily access the saved items by simply querying the dictionary as you would expect.
- In this recipe, we will demonstrate how to save multiple models to one file. Setps shown below.
<a name="zFWbO"></a>
## 1)Import all necessary libraries for loading our data
```python
import torch
import torch.nn as nn
import torch.optim as optim
2)Define and intialize the neural network
For sake of example, we will create a neural network for training images. ```python class Net(nn.Module): def init(self):
super(Net, self).__init__() self.conv1 = nn.Conv2d(3, 6, 5) self.pool = nn.MaxPool2d(2, 2) self.conv2 = nn.Conv2d(6, 16, 5) self.fc1 = nn.Linear(16 * 5 * 5, 120) self.fc2 = nn.Linear(120, 84) self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x))) x = self.pool(F.relu(self.conv2(x))) x = x.view(-1, 16 * 5 * 5) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x
netA = Net() netB = Net()
<a name="cHNb2"></a>
## 3)Initialize the optimizer
- We will use SGD with momentum to build an optimizer for each model we created.
```python
optimizerA = optim.SGD(netA.parameters(), lr=0.001, momentum=0.9)
optimizerB = optim.SGD(netB.parameters(), lr=0.001, momentum=0.9)
4)Save multiple models
- Collect all relevant information and build your dictionary.
```python
Specify a path to save to
PATH = “model.pt”
torch.save({ ‘modelA_state_dict’: netA.state_dict(), ‘modelB_state_dict’: netB.state_dict(), ‘optimizerA_state_dict’: optimizerA.state_dict(), ‘optimizerB_state_dict’: optimizerB.state_dict(), }, PATH)
<a name="rnAIh"></a>
## 5)Load multiple models
- Remember to first initialize the models and optimizers, then load the dictionary locally.
- You must call `model.eval()` to set dropout and batch normalization layers to evaluation mode before running inference. Failing to do this will yield inconsistent inference results.
- If you wish to resuming training, call `model.train()` to ensure these layers are in training mode.
```python
modelA = Net()
modelB = Net()
optimizerA = optim.SGD(modelA.parameters(), lr=0.001, momentum=0.9)
optimizerB = optim.SGD(modelB.parameters(), lr=0.001, momentum=0.9)
checkpoint = torch.load(PATH)
modelA.load_state_dict(checkpoint['modelA_state_dict'])
modelB.load_state_dict(checkpoint['modelB_state_dict'])
optimizerA.load_state_dict(checkpoint['optimizerA_state_dict'])
optimizerB.load_state_dict(checkpoint['optimizerB_state_dict'])
modelA.eval()
modelB.eval()
# - or -
modelA.train()
modelB.train()