Torch

Torch

数据处理

linspace()

torch.linspace(start,end,steps=100，dtype)
作用是返回一个一维的tensor(张量）,其中dtype是返回的数据类型。

import torch 
print(torch.linspace(-1,1,10))
# tensor([-1.0000, -0.7778, -0.5556, -0.3333, -0.1111,
#         0.1111,  0.3333,  0.5556,  0.7778,  1.0000])

unsqueeze()

在指定位置增加维度。

import torch
a=torch.arange(0,6)  #a是一维向量
b=a.reshape(2,3)     #b是二维向量
c=b.unsqueeze(1)     #c是三维向量，在b的第二维上增加一个维度
print(a,b,c,c.size())
# tensor([0, 1, 2, 3, 4, 5]) 
# tensor([[0, 1, 2], [3, 4, 5]]) 
# tensor([[[0, 1, 2]], [[3, 4, 5]]]) 
# torch.Size([2, 1, 3])

squeeze()

可去掉维度为1的维度。

import torch
a=torch.arange(0,6)  #a是一维向量
b=a.reshape(2,3)
c=b.unsqueeze(1) 
c=c.unsqueeze(1) 
print(c, c.size())
d=c.squeeze(1) 
print(d, d.size())
tensor([[[[0, 1, 2]]],
        [[[3, 4, 5]]]]) torch.Size([2, 1, 1, 3])
tensor([[[0, 1, 2]],
        [[3, 4, 5]]]) torch.Size([2, 1, 3])

torch.utils.data

torch.utils.data.TensorDataset(*tensors)

封装张量，每个样本将通过沿着第一维度对张量进行索引来检索。
*tensors (Tensor) - 具有与第一维度相同大小的张量。

inps = torch.arange(10 * 5, dtype=torch.float32).view(10, 5)
tgts = torch.arange(10 * 5, dtype=torch.float32).view(10, 5)
dataset = TensorDataset(inps, tgts)
print(dataset.tensors)

(tensor([[ 0.,  1.,  2.,  3.,  4.],
         [ 5.,  6.,  7.,  8.,  9.],
         [10., 11., 12., 13., 14.],
         [15., 16., 17., 18., 19.],
         [20., 21., 22., 23., 24.],
         [25., 26., 27., 28., 29.],
         [30., 31., 32., 33., 34.],
         [35., 36., 37., 38., 39.],
         [40., 41., 42., 43., 44.],
         [45., 46., 47., 48., 49.]]),
 tensor([[ 0.,  1.,  2.,  3.,  4.],
         [ 5.,  6.,  7.,  8.,  9.],
         [10., 11., 12., 13., 14.],
         [15., 16., 17., 18., 19.],
         [20., 21., 22., 23., 24.],
         [25., 26., 27., 28., 29.],
         [30., 31., 32., 33., 34.],
         [35., 36., 37., 38., 39.],
         [40., 41., 42., 43., 44.],
         [45., 46., 47., 48., 49.]]))

torch.utils.data.DataLoader

https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader
Combines a dataset and a sampler, and provides an iterable over the given dataset.
DataLoader的格式为：

DataLoader(dataset, batch_size=1, shuffle=False, sampler=None,
           batch_sampler=None, num_workers=0, collate_fn=None,
           pin_memory=False, drop_last=False, timeout=0,
           worker_init_fn=None, *, prefetch_factor=2,
           persistent_workers=False)

主要参数说明：

dataset: 加载的数据集；
batch_size: 批大小；
shuffle：是否将数据打乱；
sampler：样本抽样
num_workers：使用多进程加载的进程数，0代表不使用多进程；
collate_fn：如何将多个样本数据拼接成一个batch，一般使用默认的拼接方式即可；
pin_memory：是否将数据保存在pin memory区，pin memory中的数据转到GPU会快一些；
drop_last：dataset 中的数据个数可能不是 batch_size的整数倍，drop_last为True会将多出来不足一个batch的数据丢弃。 ```python inps = torch.arange(10 5, dtype=torch.float32).view(10, 5) tgts = torch.arange(10 5, dtype=torch.float32).view(10, 5) dataset = TensorDataset(inps, tgts)

loader = DataLoader(dataset, batch_size=3, pin_memory=True)

for batch_ndx, sample in enumerate(loader): print(batch_ndx, sample)

```python
0 [tensor([[ 0.,  1.,  2.,  3.,  4.],
        [ 5.,  6.,  7.,  8.,  9.],
        [10., 11., 12., 13., 14.]]), tensor([[ 0.,  1.,  2.,  3.,  4.],
        [ 5.,  6.,  7.,  8.,  9.],
        [10., 11., 12., 13., 14.]])]
1 [tensor([[15., 16., 17., 18., 19.],
        [20., 21., 22., 23., 24.],
        [25., 26., 27., 28., 29.]]), tensor([[15., 16., 17., 18., 19.],
        [20., 21., 22., 23., 24.],
        [25., 26., 27., 28., 29.]])]
2 [tensor([[30., 31., 32., 33., 34.],
        [35., 36., 37., 38., 39.],
        [40., 41., 42., 43., 44.]]), tensor([[30., 31., 32., 33., 34.],
        [35., 36., 37., 38., 39.],
        [40., 41., 42., 43., 44.]])]
3 [tensor([[45., 46., 47., 48., 49.]]), tensor([[45., 46., 47., 48., 49.]])]

torch.nn

class torch.nn.Embedding(num_embeddings, embedding_dim, padding_idx=None, 
    max_norm=None, norm_type=2, scale_grad_by_freq=False, sparse=False)

一个保存了固定字典和大小的简单查找表。torch.nn - PyTorch中文文档
这个模块常用来保存词嵌入和用下标检索它们。模块的输入是一个下标的列表，输出是对应的词嵌入。

num_embeddings (int) - 嵌入字典的大小
embedding_dim (int) - 每个嵌入向量的大小
padding_idx (int, optional) - 如果提供的话，输出遇到此下标时用零填充
max_norm (float, optional) - 如果提供的话，会重新归一化词嵌入，使它们的范数小于提供的值
norm_type (float, optional) - 对于max_norm选项计算p范数时的p

scale_grad_by_freq (boolean, optional) - 如果提供的话，会根据字典中单词频率缩放梯度

embedding = nn.Embedding(10, 3)
input = torch.LongTensor([[1,2,4,5],[4,3,2,9]])
embedding(input)
x = embedding(input)

input =
tensor([[1, 2, 4, 5],
      [4, 3, 2, 9]])
x = 
tensor([[[-0.3706, -0.4984, -1.4760],
       [ 3.1967,  0.2012, -0.2333],
       [-1.1364, -0.9656, -0.7985],
       [ 0.0352, -0.5436,  0.9799]],
      [[-1.1364, -0.9656, -0.7985],
       [ 1.1055,  1.1854, -1.0513],
       [ 3.1967,  0.2012, -0.2333],
       [-1.4650,  0.7708,  0.7526]]], grad_fn=<EmbeddingBackward>)
x.shape = 
torch.Size([2, 4, 3])

训练

nn.BCELoss

torch.nn.BCELoss(weight=None, size_average=None, reduce=None, reduction=’mean’)[SOURCE]
Creates a criterion that measures the Binary Cross Entropy between the target and the input probabilities:
The unreduced (i.e. with reduction set to ‘none’) loss can be described as:
torch 常用函数 - 图1

where N is the batch size. If reduction is not ‘none’ (default ‘mean’), then
torch 常用函数 - 图2

This is used for measuring the error of a reconstruction in for example an auto-encoder. Note that the targets y should be numbers between 0 and 1.

m = nn.Sigmoid()
loss = nn.BCELoss()
input = torch.randn(3, requires_grad=True)
target = torch.empty(3).random_(2)
output = loss(m(input), target)
output.backward()