本文参考

pytorch源码解析：Tensor的生成过程
PyTorch Internals 5：Autograd的实现
PyTorch学习笔记(5)——论一个torch.Tensor是如何构建完成的？
PyTorch Internals 4：一个tensor新建的过程
PyTorch的Tensor（上）

本文基于PyTorch1.7.0，https://github.com/pytorch/pytorch/tree/v1.7.0 如果本文有不清楚或者不正确的地方，请在评论区指正

在上一篇中对Tensor的继承体系做了介绍，那么Tensor的构建过程是怎样的呢？
在这一节中将会看到一个Tensor是如何被创建的。
本文会介绍两种初始化tensor的方式，一种是使用torch模块的函数，另一种是使用Tensor类。

注意：在本专栏中，Tensor表示类，tensor代表实例，在本篇文章中涉及两种阶段的tensor，一种是C++层面的tensor，另一种是可在Python层面调用的tensor，其对应C++层面的THPVariable，因此在本文中提到THPVariable可以理解为Python层面的tensor。

1 torch.empty()

import torch
a = torch.empty(2, 3, requires_grad=True)
print(a)
"""
tensor([[1.2383e+02, 3.0771e-41, 1.2383e+02],
        [3.0771e-41, 1.2383e+02, 3.0771e-41]], requires_grad=True)
"""

对第2行逐语句调试，跳转入THPVariable_empty()

[torch/csrc/autograd/generated/python_torch_functions.cpp]
static PyObject * THPVariable_empty(PyObject* self_, PyObject* args, PyObject* kwargs)
{
  ...
}

该函数包括三部分功能，第一部分使用PythonArgsParser解析参数，第二部分创建C++层面的tensor，第三部分把tensor封装成THPVariable（即可在python层面调用的tensor）。
create_empty_tensor_1 (4).svg

1.1 参数解析

[torch/csrc/autograd/generated/python_torch_functions.cpp]
static PyObject * THPVariable_empty(PyObject* self_, PyObject* args, PyObject* kwargs)
{
  static PythonArgParser parser({
    "empty(IntArrayRef size, *, DimnameList? names, MemoryFormat? memory_format=None, ScalarType dtype=None, Layout layout=torch.strided, Device device=None, bool pin_memory=False, bool requires_grad=False)",
    "empty(IntArrayRef size, *, MemoryFormat? memory_format=None, Tensor out=None, ScalarType dtype=None, Layout layout=torch.strided, Device device=None, bool pin_memory=False, bool requires_grad=False)",
  }, /*traceable=*/true);

  ParsedArgs<8> parsed_args;
  auto _r = parser.parse(nullptr, args, kwargs, parsed_args);
  ...
}

解析传入的参数，参数有size、dtype、device等。4-5行表示两种参数方式，第10行返回解析的结果，_r.idx用数字索引表示第几种参数方式。

PythonArgParser类可以在torch/csrc/utils/python_arg_parser.h查看

1.2 创建C++层面的tensor

在本例代码中，_r.idx等于1，所以进入case 1执行代码

[torch/csrc/autograd/generated/python_torch_functions.cpp]
static PyObject * THPVariable_empty(PyObject* self_, PyObject* args, PyObject* kwargs)
{
  ...
  auto _r = parser.parse(nullptr, args, kwargs, parsed_args);
  ...
  switch (_r.idx) {
    ...
    case 1: {
      if (_r.isNone(2)) {
        // aten::empty.memory_format(int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor
        const auto options = TensorOptions()
            .dtype(_r.scalartype(3))
            .device(_r.device(5))
            .layout(_r.layoutOptional(4))
            .requires_grad(_r.toBool(7))
            .pinned_memory(_r.toBool(6));
        torch::utils::maybe_initialize_cuda(options);

        auto dispatch_empty = [](IntArrayRef size, const TensorOptions & options, c10::optional<MemoryFormat> memory_format) -> Tensor {
          pybind11::gil_scoped_release no_gil;
          return torch::empty(size, options, memory_format);
        };
        return wrap(dispatch_empty(_r.intlist(0), options, _r.memoryformatOptional(1)));
      }
      ...
    }
  }
}

_r.isNone(2)判断memory_format【见torch-memory-format】是否为空，在本例中为空。
12-17行使用TensorOptions封装参数
第24行首先通过调用dispatch_empty()最终调用torch::empty()，然后使用wrap()封装返回的C++层面的tensor，wrap()实际调用THPVariable_Wrap()

1.2.1 torch::empty()
torch::empty()执行过程
执行at::AutoNonVariableTypeMode non_var_type_mode(true); 防止循环调用
调用at::empty()返回C++层面的tensor，但这个tensor还没初始化完成，at::empty()只完成了内存分配相关的工作，但自动求导（autograd）的工作得留给make_varibale()完成

调用make_varibale()完成和自动求导相关的初始化

[torch/csrc/autograd/generated/variable_factories.cpp]
inline at::Tensor empty(at::IntArrayRef size, const at::TensorOptions & options = {}, c10::optional<at::MemoryFormat> memory_format = c10::nullopt) {
at::Tensor tensor = ([&]() {
  at::AutoNonVariableTypeMode non_var_type_mode(true);
  return at::empty(size, at::TensorOptions(options).requires_grad(c10::nullopt), memory_format);
})();
at::Tensor result =
  autograd::make_variable(std::move(tensor), /*requires_grad=*/options.requires_grad());
return result;
}

1.2.1.1 at::empty()

这里涉及的分发机制将在下一篇文章中介绍，在这里只需要知道PyTorch的分发机制会根据数据类型是否需要dtype、device等信息来调用对应实现。

[build/aten/src/ATen/Functions.cpp]
// aten::empty.memory_format(int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor
Tensor empty(IntArrayRef size, const TensorOptions & options, c10::optional<MemoryFormat> memory_format) {
  static auto op = c10::Dispatcher::singleton()
      .findSchemaOrThrow("aten::empty", "memory_format")
      .typed<Tensor (IntArrayRef, c10::optional<ScalarType>, c10::optional<Layout>, c10::optional<Device>, c10::optional<bool>, c10::optional<MemoryFormat>)>();
  return op.call(size, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt(), c10::impl::check_tensor_options_and_extract_memory_format(options, memory_format));
}

这里只需知道通过分发机制最终调用empty_cpu()完成tensor内存层面的初始化，具体调用过程见一个empty()方法的分发流程。
empty_cpu()执行过程如下图

[aten/src/Aten/Utils.cpp]
Tensor empty_cpu(
  IntArrayRef size,
  c10::optional<ScalarType> dtype_opt,
  c10::optional<Layout> layout_opt,
  c10::optional<Device> device_opt,
  c10::optional<bool> pin_memory_opt,
  c10::optional<c10::MemoryFormat> memory_format_opt) {
Device device = device_or_default(device_opt);

TORCH_CHECK(device.type() == DeviceType::CPU);
check_size_nonnegative(size);

bool pin_memory = pinned_memory_or_default(pin_memory_opt);
c10::Allocator* allocator;
if (pin_memory) {
  allocator = detail::getCUDAHooks().getPinnedMemoryAllocator();
} else {
  allocator = at::getCPUAllocator();
}

int64_t nelements = prod_intlist(size);
caffe2::TypeMeta dtype = scalarTypeToTypeMeta(dtype_or_default(dtype_opt));
int64_t size_bytes = nelements * dtype.itemsize();
auto storage_impl = c10::make_intrusive<StorageImpl>(
    c10::StorageImpl::use_byte_size_t(),
    size_bytes,
    allocator->allocate(size_bytes),
    allocator,
    /*resizeable=*/true);

auto tensor = detail::make_tensor<TensorImpl>(
    std::move(storage_impl), at::DispatchKey::CPU, dtype);
// Default TensorImpl has size [0]
if (size.size() != 1 || size[0] != 0) {
  tensor.unsafeGetTensorImpl()->set_sizes_contiguous(size);
}

if (memory_format_opt.has_value()) {
  // Restriding a just-created empty contiguous tensor does nothing.
  if (*memory_format_opt != MemoryFormat::Contiguous) {
    tensor.unsafeGetTensorImpl()->empty_tensor_restride(*memory_format_opt);
  }
}

return tensor;
}

1.2.1.2 autograd::make_variable()

[troch/csrc/autograd/variable.h]
/// Creates a `Variable` from the given `Tensor`, copying its underlying `TensorImpl`.
/// `requires_grad` should be
/// set only for leaves, and determines whether the `Variable` will accumulate
/// gradients. NOTE: `data` must *not* be a `Variable` already. Its dynamic
/// type *must* be `Tensor`.
///
/// TODO: Eliminate this function as much as possible, as it can be expressed
/// more clearly as detach() or a no-op in most call sites (especially when
/// there is only one use of the variable).
inline Variable make_variable(
  at::Tensor data,
  bool requires_grad = false,
  bool allow_tensor_metadata_change = true) {
if (data.defined()) {
  if (data.getIntrusivePtr().use_count() == 1 && data.getIntrusivePtr()->unique_version()) {
    auto data_impl = data.getIntrusivePtr();
    data_impl->set_allow_tensor_metadata_change(allow_tensor_metadata_change);
    if (requires_grad) {
      data_impl->set_autograd_meta(std::make_unique<AutogradMeta>(data_impl.get(), requires_grad));
    } else {
      data_impl->set_autograd_meta(nullptr);
    }
    return Variable(std::move(data_impl));
  } else {
    auto data_impl_copy = data.getIntrusivePtr()->shallow_copy_and_detach(
      /*version_counter=*/0,
      /*allow_tensor_metadata_change=*/allow_tensor_metadata_change);
    if (requires_grad) {
      data_impl_copy->set_autograd_meta(std::make_unique<AutogradMeta>(
        data_impl_copy.get(), requires_grad));
    } else {
      data_impl_copy->set_autograd_meta(nullptr);
    }
    return Variable(data_impl_copy);
  }
}
return Variable();
}

18行将tensor中的TensorImpl的allow_tensor_metadata_change设置为True。
本例中requires_grad为True，所以执行20行data_impl->set_autograd_meta()，设置TensorImpl的autograd_meta_为AutogradMeta实例
AutogradMeta实例是通过std::make_unique<AutogradMeta>(data_impl.get(), requires_grad)创建的【std::make_unique含义及用法见make_unique】。

AutogradMeta被实例化的过程中会调用AutogradMeta()构造函数

[torch/csrc/autograd/variable.h]
AutogradMeta(at::TensorImpl* self_impl = nullptr, bool requires_grad = false, Edge gradient_edge = Edge() ) {
    grad_fn_ = std::move(gradient_edge.function);
    requires_grad_ = false;
    retains_grad_ = false;
    is_view_ = false;
    output_nr_ = gradient_edge.input_nr;

    // set_requires_grad also checks error conditions.
    if (requires_grad) {
      TORCH_INTERNAL_ASSERT(self_impl);
      set_requires_grad(requires_grad, self_impl);
    }
    TORCH_CHECK(
        !grad_fn_ || !requires_grad_,
        "requires_grad should be false if grad_fn is set");
}

gradient_edge在参数列表中被实例化，Edge是构建计算图的关键类之一，将在自动求导章节详细介绍
在本例中grad_fn_ 为空，因为是叶子节点，所以该值为空

至此，C++层面的tensor已经创建完成，接下来只需封装成可在python层面使用的THPVariable。

1.3 把tensor封装成THPVariable

调用wrap()封装C++层面的Tensor。

[torch/csrc/autograd/utils/wrap_outputs.cpp]
inline PyObject* wrap(at::Tensor tensor) {
  return THPVariable_Wrap(Variable(std::move(tensor)));
}

wrap()调用THPVariable_Wrap()

[torch/csrc/autograd/python_variable.cpp]
PyObject * THPVariable_Wrap(Variable var)
{
  if (!var.defined()) {
    Py_RETURN_NONE;
  }

  if (auto obj = torch::autograd::impl::pyobj(var)) {
    Py_INCREF(obj);
    return obj;
  }

  return THPVariable_NewWithVar((PyTypeObject *)THPVariableClass, std::move(var));
}

THPVariable_NewWithVar()是真正把C++版本的tensor封装成可在python层面使用的THPVariable。

[torch/csrc/autograd/python_variable.cpp]
// Creates a new Python object for a Variable. The Variable must not already
// have a PyObject* associated with it.
static PyObject* THPVariable_NewWithVar(PyTypeObject* type, Variable var)
{
  PyObject* obj = type->tp_alloc(type, 0);
  if (obj) {
    auto v = (THPVariable*) obj;
    new (&v->cdata) Variable(std::move(var));
    torch::autograd::impl::set_pyobj(v->cdata, obj);
  }
  return obj;
}

至此，一个tensor被创建完成了。

2 torch.tensor()

import torch
a = torch.tensor([2.0, 3.0], requires_grad=True)
print(a)
# tensor([2., 3.], requires_grad=True)

对第2行逐语句调试，跳转入THPVariable_tensor()

[torch/csrc/autograd/generated/python_torch_functions.cpp]
static PyObject * THPVariable_tensor(PyObject* self, PyObject* args, PyObject* kwargs)
{
  ...
  return THPVariable_Wrap(torch::utils::tensor_ctor(torch::tensors::get_default_dispatch_key(), torch::tensors::get_default_scalar_type(), args, kwargs));
  ...
}

THPVariable_tensor()包括四部分

torch::tensors::get_default_scalar_type()

获取数据类型

torch::tensors::get_default_dispatch_key()

获取默认的分发键

torch::utils::tensor_ctor()
```
返回一个C++层面的tensor
```

THPVariable_Wrap()

把tensor封装成Python可使用的`THPVariable`<br />![create_tensor (2).svg](https://cdn.nlark.com/yuque/0/2021/svg/12356632/1611132236308-de420595-ffdc-4730-992a-766ade0f7a42.svg#height=391&id=VOZqo&margin=%5Bobject%20Object%5D&name=create_tensor%20%282%29.svg&originHeight=391&originWidth=721&originalType=binary&ratio=1&size=19692&status=done&style=none&width=721)

2.1 torch::tensors::get_default_scalar_type()

获取数据类型

[torch/csrc/tensor/python_tensor.cpp]
ScalarType get_default_scalar_type() {
return typeMetaToScalarType(get_default_dtype());
}

get_default_dtype()返回caffe2::TypeMeta::Make<float>()，即浮点类型。

typeMetaToScalarType() convert TypeMeta handles to ScalarType enum values

[c10/core/ScalarTypeToTypeMete.h]
/**
* convert TypeMeta handles to ScalarType enum values
*/
static inline ScalarType typeMetaToScalarType(caffe2::TypeMeta dtype) {
return dtype.toScalarType();
}

ScalarType定义见https://github.com/pytorch/pytorch/blob/v1.7.0/c10/core/ScalarType.h#L62

get_default_scalar_type()最终返回常量6，对应的类型为Float

2.2 torch::tensors::get_default_dispatch_key()

获取默认的分发键

[torch/csrc/tensor/python_tensor.cpp]
c10::DispatchKey get_default_dispatch_key() {
  AT_ASSERT(default_tensor_type);
  return default_tensor_type->get_dispatch_key();
}

[torch/csrc/tensor/python_tensor.cpp]
DispatchKey get_dispatch_key() const {
  return backendToDispatchKey(static_cast<Backend>(backend));
}

get_default_dispatch_key()最终返回DispatchKey::CPU

2.3 torch::utils::tensor_ctor()

tensor_ctor()主要包括以下三部分

解析参数【见参数解析】
调用internal_new_from_data()完成C++层面tensor的创建，负责初始化storage_
调用new_tensor.detach_()确保newtensor是叶子结点，在这里是负责初始化`autograd_meta`

如果不熟悉storage和autograd_meta，请看Tensor的继承体系

[torch/csrc/utils/tensor_new.cpp, https://github.com/pytorch/pytorch/blob/v1.7.0/torch/csrc/utils/tensor_new.cpp#L694]
Tensor tensor_ctor(c10::DispatchKey dispatch_key, at::ScalarType scalar_type, PyObject* args, PyObject* kwargs) {
  static PythonArgParser parser({
    "tensor(PyObject* data, *, ScalarType dtype=None, Device? device=None, bool pin_memory=False, bool requires_grad=False, DimnameList? names=None)",
  });

  constexpr int ctor_num_args = 6;
  ParsedArgs<ctor_num_args> parsed_args;
  auto r = parser.parse(args, kwargs, parsed_args);
  if (r.idx == 0) {
    PyObject* data = r.pyobject(0);
    ...
    auto new_tensor = internal_new_from_data(
               typeIdWithDefault(r, 2, dispatch_key),
               r.scalartypeWithDefault(1, scalar_type),
               r.deviceOptional(2),
               data,
               /*copy_variables=*/true,
               /*copy_numpy=*/true,
               /*type_inference=*/type_inference,
               pin_memory);
    ...
  }
  throw std::runtime_error("tensor(): invalid arguments");
}

2.3.1 internal_new_from_data()

[torch/csrc/utils/tensor_new.cpp, https://github.com/pytorch/pytorch/blob/v1.7.0/torch/csrc/utils/tensor_new.cpp#L231]
Tensor internal_new_from_data(
    c10::DispatchKey dispatch_key,
    at::ScalarType scalar_type,
    c10::optional<Device> device_opt,
    PyObject* data,
    bool copy_variables,
    bool copy_numpy,
    bool type_inference,
    bool pin_memory = false) {

  ...

#ifdef USE_NUMPY
  ...
  if (PyArray_Check(data)) {
    TORCH_CHECK(!pin_memory, "Can't pin tensor constructed from numpy");
    auto tensor = tensor_from_numpy(data, /*warn_if_not_writeable=*/!copy_numpy);
    const auto& inferred_scalar_type = type_inference ? tensor.scalar_type() : scalar_type;
    auto device = device_opt.has_value() ? *device_opt : at::Device(computeDeviceType(dispatch_key));
    pybind11::gil_scoped_release no_gil;
    maybe_initialize_cuda(device);
    return tensor.to(device, inferred_scalar_type, /*non_blocking=*/false, /*copy=*/copy_numpy);
  }
#endif

  auto sizes = compute_sizes(data);
  ScalarType inferred_scalar_type = type_inference ? infer_scalar_type(data) : scalar_type;
  // This exists to prevent us from tracing the call to empty().  The actual
  // autograd code doesn't really matter, because requires_grad is always false
  // here.
  Tensor tensor;
  {
    at::AutoNonVariableTypeMode guard;  // TODO: remove
    at::tracer::impl::NoTracerDispatchMode tracer_guard;
    tensor = at::empty(sizes, at::initialTensorOptions().dtype(inferred_scalar_type).pinned_memory(pin_memory));
    recursive_store(
        (char*)tensor.data_ptr(), tensor.sizes(), tensor.strides(), 0,
        inferred_scalar_type, tensor.dtype().itemsize(), data);
  }
  auto device = device_opt.has_value() ? *device_opt : at::Device(computeDeviceType(dispatch_key));
  pybind11::gil_scoped_release no_gil;
  maybe_initialize_cuda(device);
  // However, it is VERY important that we trace the to() call here (even
  // though the reason this is important is a hack).  Without *some* factory
  // function call that is traced at construction time, we will consider
  // a tensor constant as originating from "outside" the trace, and if you
  // try to return it directly we will fail with the error saying no
  // "no observable data dependence".  In an ideal world, we wouldn't trace
  // a to() call but I need to think harder about what exactly we should trace
  // in this case.
  return tensor.to(device, inferred_scalar_type, /*non_blocking=*/false, /*copy=*/false);
}

internal_new_from_data()主要包括两部分

调用at::empty()创建tensor【见1.2.1.1 at::empty()】

调用recursive_store() 初始化tensor的数据

[torch/csrc/utils/tensor_new.cpp, https://github.com/pytorch/pytorch/blob/v1.7.0/torch/csrc/utils/tensor_new.cpp#L207]
void recursive_store(char* data, IntArrayRef sizes, IntArrayRef strides, int64_t dim,
                          ScalarType scalarType, int elementSize, PyObject* obj) {
int64_t ndim = sizes.size();
if (dim == ndim) {
  torch::utils::store_scalar(data, scalarType, obj);
  return;
}

auto n = sizes[dim];
auto seq = THPObjectPtr(PySequence_Fast(obj, "not a sequence"));
if (!seq) throw python_error();
auto seq_size = PySequence_Fast_GET_SIZE(seq.get());
if (seq_size != n) {
  throw ValueError("expected sequence of length %lld at dim %lld (got %lld)",
    (long long)n, (long long)dim, (long long)seq_size);
}

PyObject** items = PySequence_Fast_ITEMS(seq.get());
for (int64_t i = 0; i < n; i++) {
  recursive_store(data, sizes, strides, dim + 1, scalarType, elementSize, items[i]);
  data += strides[dim] * elementSize;
}
}

2.3.2 newtensor.detach()

detach_()的作用是保证new_tensor是叶子结点

关于detach的讨论可见What the difference between detach(), detach_(),and with torch.no_grad() in our training phase?

[build/aten/src/ATen/core/TensorMethods.cpp]
// aten::detach_(Tensor(a!) self) -> Tensor(a!)
Tensor & Tensor::detach_() const {
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::detach_", "")
        .typed<Tensor & (Tensor &)>();
    return op.call(const_cast<Tensor&>(*this));
}

通过分发机制，aten::detach_()最终调用`detach_()``

[torch/csrc/autograd/VaribableTypeManual.cpp]
Tensor & detach_(Tensor & self) {
  ...
  // I think the choice here is conservative.  In principle, doing
  // an in-place detach should give us the ability to just clear
  // the autograd meta.  But this function ONLY resets requires_grad,
  // grad_fn and output_nr; there's other metadata like debug name
  // and hooks which aren't cleared.  Is this function supposed to
  // clear those too? I'm not too sure, so I'm leaving it be for now.
  auto autograd_meta = impl::materialize_autograd_meta(self);
  autograd_meta->set_requires_grad(false, self.unsafeGetTensorImpl());
  autograd_meta->grad_fn_.reset();
  autograd_meta->output_nr_ = 0;

  // detach only backward gradients for both primal and tangent
  if (self.fw_grad(/* level */ 0).defined()) {
    self.fw_grad(/* level */ 0).detach_();
  }

  return self;
}

materialize_autograd_meta()主要是初始化autograd_meta_，也是借助std::make_unique<AutogradMeta>()创建实例。【关于std::make_unique()的介绍可见1.2.1.2 autograd::make_variable()】

[torch/csrc/autograd/varibable.cpp]
AutogradMeta* materialize_autograd_meta(const Variable& self) {
    TORCH_CHECK(self.defined(), "cannot call materialize_autograd_meta() on undefined tensor");
    auto p = self.unsafeGetTensorImpl();
    if (!p->autograd_meta()) {
      p->set_autograd_meta(std::make_unique<AutogradMeta>());
    }
    return get_autograd_meta(self);
}

2.4 THPVariable_Wrap()

和1.3差不多，故在此略过。

本篇文章就到这了。。。。。。

本章节只介绍了用户自己创建的tensor是如何一步步被创建的，那么中间结点（不是用户创建的非叶子结点）是如何被创建，以及计算图是如何生成并为自动求导服务的，将在下一章的自动求导部分一一展开。

Tensor的构建过程