效果对比

torch.Tensor.detach()/torch.Tensor.detach_()	- 返回一个新的张量。 - 与输入有相同的`shape`、`dtype`和`device`. - 与输入共享内存。如果对原始数据空间内容进行修改，即对原始数据或者返回的数据执行inplace修改，可能会触发错误。 - 结果不具有梯度，它是从当前计算图中分离下来的。 - `requires_grad`自动设为`False`，不再与输入的计算图关联。自身也不需要再计算梯度。 - `requires_grad`手动设为`True`时，它也不会具有梯度。新变量在反向传播过程中会变为叶子结点，到达这个结点后就会停止。
torch.Tensor.data	- 返回一个新的张量。 - 与输入有相同的`shape`、`dtype`和`device`. - 与输入共享内存。如果对原始数据空间内容进行修改，即对原始数据或者返回的数据执行inplace修改，并不会报错但是会导致与其相关的节点的梯度就不符合预期。 - 结果不具有梯度，它是从当前计算图中分离下来的。 - `requires_grad`自动设为`False`，不再与输入的计算图关联。自身也不需要再计算梯度。 - `requires_grad`手动设为`True`时，它也不会具有梯度。新变量在反向传播过程中会变为叶子结点，到达这个结点后就会停止。
torch.Tensor.clone()/torch.clone()	- 返回一个新的张量。 - 与输入有相同的`shape`，`dtype`和`device`。 - 与输入不共享内存。 - 该函数可微，梯度会回流到输入上，类似于一个恒等操作。
torch.Tensor.copy_()	- 将原始张量的中的元素复制到调用方法的张量中。 - 与输入可以具有不同的shape（但是输入必须可以广播），不同的dtype，不同的device。 - 与输入不共享内存。 - 该方法可微，梯度会回流到输入上，类似于一个恒等操作。

使用案例

clone()与copy_()可以在新的内存空间复制源张量数据，但梯度信息不独立；detach()与.data可以独立出梯度信息，但与源张量具有相同内存。因此联合使用二者可以创建出数据相同，完全独立的新tensor。常见的手段便是 b = a.clone().detach() 或是 b = a.detach().clone()。而后者更有效一些，因为一开始就丢弃了计算图，而不需要复制后再丢弃。对比可见https://stackoverflow.com/a/62496418。
深度学习模型中可以训练的参数都是叶子节点。而PyTorch规定requires_grad=True的叶子节点不能做inplace操作。为了初始化他们，可以通过对共享内存的.data进行操作。例如.data.zero_便可以效果了。
在MoCo中，需要用一个网络的参数初始化另一个网络的参数。由于copy_()也是一种inplace操作，因此只能采用A.data.copy_(B.data)的方式。
代码示例
detach
```python import torch

a = torch.tensor([1., 2., 3.], requires_grad=True) b = a.detach()

assert a.data_ptr() == b.data_ptr() # 内存位置相同

用户直接创建的和detach之后的都是叶子结点

print(“a”, a, a.requires_grad, a.grad, a.grad_fn, a.is_leaf) print(“b”, b, b.requires_grad, b.grad, b.grad_fn, b.is_leaf)

a tensor([1., 2., 3.], requires_grad=True) True None None True

b tensor([1., 2., 3.]) False None None True

c = a * 2 c.sum().backward() print(“a.grad”, a.grad) # tensor([2., 2., 2.]) print(“b.grad”, b.grad) # None

d = b 4 + a print(‘d = b 4 + a’) print(“d.requires_grad”, d.requires_grad) d.sum().backward() print(“a.grad”, a.grad) # tensor([3., 3., 3.]) 这里可以看到a的梯度被累加了，2->3 print(“b.grad”, b.grad) # None

b.requires_grad = True d = b 4 print(‘d = b 4’) print(“d.requires_grad”, d.requires_grad) d.sum().backward() print(“a.grad”, a.grad) # tensor([3., 3., 3.]) 这里可以看到a的梯度没有变换，b的梯度没有回流到a上 print(“b.grad”, b.grad) # tensor([4., 4., 4.])

b.requires_grad = False d = b * 4 print(“d.requires_grad”, d.requires_grad)

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

d.sum().backward() # 报错了！由于b不记录计算图，因此无法计算b的相关梯度信息，也就无法计算倒数

<a name="yeM85"></a>
### clone
```python
import torch
a = torch.tensor([1., 2., 3.],requires_grad=True)
b = a.clone()
assert a.data_ptr() != b.data_ptr() # 内存位置不同
print("a", a, a.requires_grad, a.grad, a.grad_fn, a.is_leaf) # 用户创建的是叶子结点
print("b", b, b.requires_grad, b.grad, b.grad_fn, b.is_leaf) # b属于基于用户创建的张量计算出来的中间结点，所以不是叶子节点
# a tensor([1., 2., 3.], requires_grad=True) True None None True
# UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed.
# Its .grad attribute won't be populated during autograd.backward().
# If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor.
# If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead.
# b tensor([1., 2., 3.], grad_fn=<CloneBackward0>) True None <CloneBackward0 object at 0x000001916E8BA730> False
c = a * 2
c.sum().backward()
print(a.grad) # tensor([2., 2., 2.])
print(b.grad) # None
d = b * 3
d.sum().backward()
print(a.grad) # tensor([5., 5., 5.]) 原始输入的梯度被累加
print(b.grad) # None 复制节点不是叶子节点，backward之后不会保存梯度
b.retain_grad()
d = b * 3
d.sum().backward()
print(a.grad) # tensor([8., 8., 8.]) 原始输入的梯度被累加
print(b.grad) # tensor([3., 3., 3.]) 因为保持了梯度，所以非叶子节点的梯度得以保留

copy

import torch

a = torch.tensor([1., 2., 3.],requires_grad=True)
b = torch.empty_like(a).copy_(a)

assert a.data_ptr() != b.data_ptr() # 内存位置不同
print("a", a, a.requires_grad, a.grad, a.grad_fn, a.is_leaf) # 用户创建的是叶子结点
print("b", b, b.requires_grad, b.grad, b.grad_fn, b.is_leaf) # b属于基于用户创建的张量计算出来的中间结点，所以不是叶子节点
# a tensor([1., 2., 3.], requires_grad=True) True None None True
# UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed.
# Its .grad attribute won't be populated during autograd.backward().
# If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor.
# If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead.
# b tensor([1., 2., 3.], grad_fn=<CopyBackwards>) True None <CopyBackwards object at 0x000001F3AA51B730> False

c = a * 2
c.sum().backward()
print(a.grad) # tensor([2., 2., 2.])
print(b.grad) # None

d = b * 3
d.sum().backward()
print(a.grad) # tensor([5., 5., 5.]) 原始输入的梯度被累加
print(b.grad) # None 复制节点不是叶子节点，backward之后不会保存梯度

b.retain_grad()
d = b * 3
d.sum().backward()
print(a.grad) # tensor([8., 8., 8.]) 原始输入的梯度被累加
print(b.grad) # tensor([3., 3., 3.]) 因为保持了梯度，所以非叶子节点的梯度得以保留

PyTorch记录

detach()/data/clone()/copy_()