• In this section we will look at how to persist model state with saving, loading and running model predictions.

    1. import torch
    2. import torchvision.models as models

    1、Saving and Loading Model Weights

  • PyTorch models store the learned parameters in an internal state dictionary, called state_dict. These can be persisted via the torch.save method:

  • A common way to save a model is to serialize the internal state dictionary (containing the model parameters).

    1. model = models.vgg16(pretrained=True)
    2. torch.save(model.state_dict(), 'model_weights.pth')
  • To load model weights, you need to create an instance of the same model first, and then load the parameters using load_state_dict() method. ```python

    we do not specify pretrained=True, i.e. do not load default weights

    model = models.vgg16() model.load_state_dict(torch.load(‘model_weights.pth’))

be sure to call model.eval() method before inferencing to set the dropout and batch

normalization layers to evaluation mode. Failing to do this will yield inconsistent

inference results.

model.eval()

  1. <a name="IIr2h"></a>
  2. # 2、Saving and Loading Models with Shapes
  3. - When loading model weights, we needed to instantiate the model class first, because the class defines the structure of a network.
  4. - We might want to save the structure of this class together with the model, in which case we can pass `model` (and not `model.state_dict()`) to the saving function:
  5. ```python
  6. torch.save(model, 'model.pth')
  • We can then load the model like this:

    model = torch.load('model.pth')
    
    • This approach uses Python pickle module when serializing the model, thus it relies on the actual class definition to be available when loading the model.

      3、Saving and Loading(resuming training)a General Checkpoint

  • Saving and loading a general checkpoint model for inference or resuming training can be helpful for picking up where you last left off.

  • When saving a general checkpoint, you must save more than just the model’s state_dict. It is important to also save the optimizer’s state_dict, as this contains buffers and parameters that are updated as the model trains. Other items that you may want to save are the epoch you left off on, the latest recorded training loss, external torch.nn.Embedding layers, and more, based on your own algorithm.
    • To save multiple checkpoints, you must organize them in a dictionary and use torch.save() to serialize the dictionary. A common PyTorch convention is to save these checkpoints using the .tar file extension.
    • To load the items, first initialize the model and optimizer, then load the dictionary locally using torch.load(). From here, you can easily access the saved items by simply querying the dictionary as you would expect.
  • steps to save and load multiple checkpoints are shown below.

    1)Import all necessary libraries for loading our data

    import torch
    import torch.nn as nn
    import torch.optim as optim
    

    2)Define and initialize the neural network

  • For sake of example, we will create a neural network for training images. ```python class Net(nn.Module): def init(self):

      super(Net, self).__init__()
      self.conv1 = nn.Conv2d(3, 6, 5)
      self.pool = nn.MaxPool2d(2, 2)
      self.conv2 = nn.Conv2d(6, 16, 5)
      self.fc1 = nn.Linear(16 * 5 * 5, 120)
      self.fc2 = nn.Linear(120, 84)
      self.fc3 = nn.Linear(84, 10)
    

    def forward(self, x):

      x = self.pool(F.relu(self.conv1(x)))
      x = self.pool(F.relu(self.conv2(x)))
      x = x.view(-1, 16 * 5 * 5)
      x = F.relu(self.fc1(x))
      x = F.relu(self.fc2(x))
      x = self.fc3(x)
      return x
    

net = Net() print(net)

<a name="lEpFy"></a>
## 3)Initialize the optimizer

- We will use SGD with momentum.
```python
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

4)Save the general checkpoint

  • Collect all relevant information and build your dictionary. ```python

    Additional information

    EPOCH = 5 PATH = “model.pt” LOSS = 0.4

torch.save({ ‘epoch’: EPOCH, ‘model_state_dict’: net.state_dict(), ‘optimizer_state_dict’: optimizer.state_dict(), ‘loss’: LOSS, }, PATH)

<a name="et8mK"></a>
## 5)Load the general checkpoint

- Remember to first initialize the model and optimizer, then load the dictionary locally.
- You must call `model.eval()` to set dropout and batch normalization layers to evaluation mode before running inference. Failing to do this will yield inconsistent inference results.
- If you wish to resuming training, call `model.train()` to ensure these layers are in training mode.
```python
model = Net()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

checkpoint = torch.load(PATH)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']

model.eval()
# - or -
model.train()

4、Saving and Loading Models across Devices

  • There may be instances where you want to save and load your neural networks across different devices. Saving and loading models across devices is relatively straightforward using PyTorch.
  • In this recipe, we will experiment with saving and loading models across CPUs and GPUs. Steps are shown below.

    1)Import all necessary libraries for loading our data

    import torch
    import torch.nn as nn
    import torch.optim as optim
    

    2)Define and intialize the neural network

  • For sake of example, we will create a neural network for training images. ```python class Net(nn.Module): def init(self):

      super(Net, self).__init__()
      self.conv1 = nn.Conv2d(3, 6, 5)
      self.pool = nn.MaxPool2d(2, 2)
      self.conv2 = nn.Conv2d(6, 16, 5)
      self.fc1 = nn.Linear(16 * 5 * 5, 120)
      self.fc2 = nn.Linear(120, 84)
      self.fc3 = nn.Linear(84, 10)
    

    def forward(self, x):

      x = self.pool(F.relu(self.conv1(x)))
      x = self.pool(F.relu(self.conv2(x)))
      x = x.view(-1, 16 * 5 * 5)
      x = F.relu(self.fc1(x))
      x = F.relu(self.fc2(x))
      x = self.fc3(x)
      return x
    

net = Net() print(net)

<a name="zWBNE"></a>
## 3)Save and Load on CPU/GPU
<a name="lF5J8"></a>
### (1)Save on a GPU, load on a CPU

- When loading a model on a CPU that was trained with a GPU, pass `torch.device('cpu')` to the `map_location` argument in the `torch.load()` function.
```python
# Specify a path to save to
PATH = "model.pt"

# Save
torch.save(net.state_dict(), PATH)

# Load
device = torch.device('cpu')
model = Net()
# the storages underlying the tensors are dynamically remapped to the CPU device 
# using the ``map_location`` argument.
model.load_state_dict(torch.load(PATH, map_location=device))

(2)Save on a GPU, load on a GPU

  • When loading a model on a GPU that was trained and saved on GPU, simply convert the initialized model to a CUDA optimized model using model.to(torch.device('cuda')).
  • Be sure to use the .to(torch.device('cuda')) function on all model inputs to prepare the data for the model.
  • Note that calling my_tensor.to(device) returns a new copy of my_tensor on GPU. It does NOT overwrite my_tensor. Therefore, remember to manually overwrite tensors: my_tensor = my_tensor.to(torch.device('cuda')). ```python

    Save

    torch.save(net.state_dict(), PATH)

Load

device = torch.device(“cuda”) model = Net() model.load_state_dict(torch.load(PATH)) model.to(device)

<a name="nRet2"></a>
### (3)Save on a CPU, load on a GPU

- When loading a model on a GPU that was trained and saved on CPU, set the `map_location` argument in the `torch.load()` function to `cuda:device_id`. This loads the model to a given GPU device.
- Be sure to call `model.to(torch.device('cuda'))` to convert the model’s parameter tensors to CUDA tensors.
- Finally, also be sure to use the `.to(torch.device('cuda'))` function on all model inputs to prepare the data for the CUDA optimized model.
```python
# Save
torch.save(net.state_dict(), PATH)

# Load
device = torch.device("cuda")
model = Net()
# Choose whatever GPU device number you want
model.load_state_dict(torch.load(PATH, map_location="cuda:0"))
# Make sure to call input = input.to(device) on any input tensors that you feed to the model
model.to(device)

4)Saving and loading DataParallel models

  • torch.nn.DataParallel is a model wrapper that enables parallel GPU utilization.
  • To save a DataParallel model generically, save the model.module.state_dict(). This way, you have the flexibility to load the model any way you want to any device you want. ```python

    Save

    torch.save(net.module.state_dict(), PATH)

Load to whatever device you want

<a name="RDHX3"></a>
# 5、Saving and Loading Multiple Models in one File

- Saving and loading multiple models can be helpful for reusing models that you have previously trained.
- When saving a model comprised of multiple `torch.nn.Modules`, such as a GAN, a sequence-to-sequence model, or an ensemble of models, you must save a dictionary of each model’s state_dict and corresponding optimizer. You can also save any other items that may aid you in resuming training by simply appending them to the dictionary. 
- To load the models, first initialize the models and optimizers, then load the dictionary locally using `torch.load()`. From here, you can easily access the saved items by simply querying the dictionary as you would expect. 
- In this recipe, we will demonstrate how to save multiple models to one file. Setps shown below.
<a name="zFWbO"></a>
## 1)Import all necessary libraries for loading our data
```python
import torch
import torch.nn as nn
import torch.optim as optim

2)Define and intialize the neural network

  • For sake of example, we will create a neural network for training images. ```python class Net(nn.Module): def init(self):

      super(Net, self).__init__()
      self.conv1 = nn.Conv2d(3, 6, 5)
      self.pool = nn.MaxPool2d(2, 2)
      self.conv2 = nn.Conv2d(6, 16, 5)
      self.fc1 = nn.Linear(16 * 5 * 5, 120)
      self.fc2 = nn.Linear(120, 84)
      self.fc3 = nn.Linear(84, 10)
    

    def forward(self, x):

      x = self.pool(F.relu(self.conv1(x)))
      x = self.pool(F.relu(self.conv2(x)))
      x = x.view(-1, 16 * 5 * 5)
      x = F.relu(self.fc1(x))
      x = F.relu(self.fc2(x))
      x = self.fc3(x)
      return x
    

netA = Net() netB = Net()

<a name="cHNb2"></a>
## 3)Initialize the optimizer

- We will use SGD with momentum to build an optimizer for each model we created.
```python
optimizerA = optim.SGD(netA.parameters(), lr=0.001, momentum=0.9)
optimizerB = optim.SGD(netB.parameters(), lr=0.001, momentum=0.9)

4)Save multiple models

  • Collect all relevant information and build your dictionary. ```python

    Specify a path to save to

    PATH = “model.pt”

torch.save({ ‘modelA_state_dict’: netA.state_dict(), ‘modelB_state_dict’: netB.state_dict(), ‘optimizerA_state_dict’: optimizerA.state_dict(), ‘optimizerB_state_dict’: optimizerB.state_dict(), }, PATH)

<a name="rnAIh"></a>
## 5)Load multiple models

- Remember to first initialize the models and optimizers, then load the dictionary locally.
- You must call `model.eval()` to set dropout and batch normalization layers to evaluation mode before running inference. Failing to do this will yield inconsistent inference results.
- If you wish to resuming training, call `model.train()` to ensure these layers are in training mode.
```python
modelA = Net()
modelB = Net()
optimizerA = optim.SGD(modelA.parameters(), lr=0.001, momentum=0.9)
optimizerB = optim.SGD(modelB.parameters(), lr=0.001, momentum=0.9)

checkpoint = torch.load(PATH)
modelA.load_state_dict(checkpoint['modelA_state_dict'])
modelB.load_state_dict(checkpoint['modelB_state_dict'])
optimizerA.load_state_dict(checkpoint['optimizerA_state_dict'])
optimizerB.load_state_dict(checkpoint['optimizerB_state_dict'])

modelA.eval()
modelB.eval()
# - or -
modelA.train()
modelB.train()