18.train

训练的大致步骤可以分为:

dataset下载数据集

dataloader加载数据集

创建神经网络

定义损失函数和损失函数及其参数

创建tensorboard

设置一些训练使用的参数

开始训练:[训练],[验证],保存训练后的模型

训练时,包含:[][][计算损失], [梯度清零], [反向传播], [参数更新]

验证时,我们需要计算正确率。

  1. for data in test_dataloader:
  2. imgs,targets = data
  3. output = demo(imgs)
  4. loss = loss_fn(output, targets)
  5. total_test_loss = total_test_loss + loss
  6. pre = output.argmax(1)
  7. acc = ((pre == targets).sum())
  8. total_acc = total_acc + acc
  9. print("total_test_loss = {}".format(total_test_loss))
  10. print("total_acc = {}".format(total_acc))
  11. writer.add_scalar("test_loss",total_test_loss,i)
  12. writer.add_scalar("total_acc",total_acc,i)

每次for循环都有64个图片被使用

output = [0.1, 0.2, 0.3, 0.4 ,0.5, 0.6, 0.7, 0.8, 0.9, 0.1]

  1. ……共64
  2. [0.1, 0.2, 0.3, 0.4 ,0.5, 0.6, 0.7, 0.8, 0.9, 0.1]
  3. 每位及代表类别class 0 - 9

targets = [1]

  1. ……共64
  2. [3]

使用argmax将output变为和targets一样的形式作对比

output.argmax(1) = [9]

  1. ……共64
  2. [9]

(output.argmax(1) == targets) = (F,F,T……F,T) = (0,0,1….,0,1) 相等为T 不等为F,共64个

(output.argmax(1) == targets).sum() = 将 (0,0,1….,0,1)中元素相加,即等于当前batch_size中判断正确的数量,每次for循环累加一次,跑完整个数据集后可以算出当前epoch验证出的精确度。

argmax(1) >>tensor([1,1])

argmax(0) >>tensor([1,1])

  1. # argmax and accuracy
  2. import torch
  3. outputs = torch.tensor([[0.1,0.2],
  4. [0.3,0.4]])
  5. print(outputs.argmax(1))
  6. pre = outputs.argmax(1)
  7. targets = torch.tensor([0,1])
  8. print(pre == targets)
  9. print((pre == targets).sum())
  10. # whne input = 2 , output = 2
  11. # accuracy = ((pre = targets).sum()) / 2

code:

  1. import torch
  2. import torchvision
  3. from torch.utils.data import DataLoader
  4. from torch.utils.tensorboard import SummaryWriter
  5. from models import *
  6. # 准备数据集
  7. train_data = torchvision.datasets.CIFAR10(root="./dataset",train=True,transform=torchvision.transforms.ToTensor(),
  8. download=True)
  9. test_data = torchvision.datasets.CIFAR10(root="./dataset",train=False,transform=torchvision.transforms.ToTensor(),
  10. download=True)
  11. train_data_size = len(train_data)
  12. test_data_size = len(test_data)
  13. print("训练数据长度为:{}".format(train_data_size))
  14. print("测试数据长度为:{}".format(test_data_size))
  15. # 加载数据集
  16. train_dataloader = DataLoader(train_data,batch_size=64)
  17. test_dataloader = DataLoader(test_data,batch_size=64)
  18. # 创建神经网络
  19. demo = DEMO()
  20. # 定义损失函数
  21. loss_fn = nn.CrossEntropyLoss()
  22. # 定义优化器
  23. #learing_rate = 1e-2 = 1 × (10) ^ (-2)
  24. learning_rate = 0.01
  25. optimizer = torch.optim.SGD(demo.parameters(),lr=learning_rate)
  26. # tensorboard
  27. writer = SummaryWriter("./train_logs")
  28. # 训练
  29. # 设置一些参数
  30. total_train_step = 0
  31. total_test_step = 0
  32. epoch = 5
  33. for i in range(epoch):
  34. print("--------------epoch:{}----------------".format(i+1))
  35. # 训练
  36. demo.train() # 在使用一些特殊层时要调用,不适用时也可以调用,实践中很常用 同 demo.eval()
  37. for data in train_dataloader:
  38. imgs, targets = data
  39. output = demo(imgs)
  40. loss = loss_fn(output,targets)
  41. optimizer.zero_grad()
  42. loss.backward()
  43. optimizer.step()
  44. total_train_step += 1
  45. if total_train_step % 100 == 0:
  46. print("训练次数:{},train_Loss = {}".format(total_train_step,loss.item()))
  47. writer.add_scalar("train_loss",loss.item(),total_train_step)
  48. # 验证
  49. demo.eval()
  50. total_test_loss = 0
  51. # 梯度为零,不优化参数
  52. total_acc = 0
  53. with torch.no_grad():
  54. for data in test_dataloader:
  55. imgs,targets = data
  56. output = demo(imgs)
  57. loss = loss_fn(output, targets)
  58. total_test_loss = total_test_loss + loss
  59. pre = output.argmax(1)
  60. acc = ((pre == targets).sum())
  61. total_acc = total_acc + acc
  62. accuracy = total_acc / test_data_size
  63. print("total_test_loss = {}".format(total_test_loss))
  64. print("total_acc = {}".format(accuracy))
  65. writer.add_scalar("test_loss",total_test_loss,i+1)
  66. writer.add_scalar("accuracy",accuracy,i+1)
  67. torch.save(demo,"pretrained_demo_{}.pth".format(i+1))
  68. print("pretrained_demo已保存")
  69. writer.close()