本文参考

本文基于PyTorch1.7.0,https://github.com/pytorch/pytorch/tree/v1.7.0 如果本文有不清楚或者不正确的地方,请在评论区指正

在上一篇中对Tensor的继承体系做了介绍,那么Tensor的构建过程是怎样的呢?
在这一节中将会看到一个Tensor是如何被创建的。
本文会介绍两种初始化tensor的方式,一种是使用torch模块的函数,另一种是使用Tensor类。

注意:在本专栏中,Tensor表示类,tensor代表实例,在本篇文章中涉及两种阶段的tensor,一种是C++层面的tensor,另一种是可在Python层面调用的tensor,其对应C++层面的THPVariable,因此在本文中提到THPVariable可以理解为Python层面的tensor。

1 torch.empty()

  1. import torch
  2. a = torch.empty(2, 3, requires_grad=True)
  3. print(a)
  4. """
  5. tensor([[1.2383e+02, 3.0771e-41, 1.2383e+02],
  6. [3.0771e-41, 1.2383e+02, 3.0771e-41]], requires_grad=True)
  7. """

对第2行逐语句调试,跳转入THPVariable_empty()

  1. [torch/csrc/autograd/generated/python_torch_functions.cpp]
  2. static PyObject * THPVariable_empty(PyObject* self_, PyObject* args, PyObject* kwargs)
  3. {
  4. ...
  5. }

该函数包括三部分功能,第一部分使用PythonArgsParser解析参数,第二部分创建C++层面的tensor,第三部分把tensor封装成THPVariable(即可在python层面调用的tensor)
create_empty_tensor_1 (4).svg

1.1 参数解析

[torch/csrc/autograd/generated/python_torch_functions.cpp]
static PyObject * THPVariable_empty(PyObject* self_, PyObject* args, PyObject* kwargs)
{
  static PythonArgParser parser({
    "empty(IntArrayRef size, *, DimnameList? names, MemoryFormat? memory_format=None, ScalarType dtype=None, Layout layout=torch.strided, Device device=None, bool pin_memory=False, bool requires_grad=False)",
    "empty(IntArrayRef size, *, MemoryFormat? memory_format=None, Tensor out=None, ScalarType dtype=None, Layout layout=torch.strided, Device device=None, bool pin_memory=False, bool requires_grad=False)",
  }, /*traceable=*/true);

  ParsedArgs<8> parsed_args;
  auto _r = parser.parse(nullptr, args, kwargs, parsed_args);
  ...
}

解析传入的参数,参数有size、dtype、device等。4-5行表示两种参数方式,第10行返回解析的结果,_r.idx用数字索引表示第几种参数方式。

PythonArgParser类可以在torch/csrc/utils/python_arg_parser.h查看

1.2 创建C++层面的tensor

在本例代码中,_r.idx等于1,所以进入case 1执行代码

[torch/csrc/autograd/generated/python_torch_functions.cpp]
static PyObject * THPVariable_empty(PyObject* self_, PyObject* args, PyObject* kwargs)
{
  ...
  auto _r = parser.parse(nullptr, args, kwargs, parsed_args);
  ...
  switch (_r.idx) {
    ...
    case 1: {
      if (_r.isNone(2)) {
        // aten::empty.memory_format(int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor
        const auto options = TensorOptions()
            .dtype(_r.scalartype(3))
            .device(_r.device(5))
            .layout(_r.layoutOptional(4))
            .requires_grad(_r.toBool(7))
            .pinned_memory(_r.toBool(6));
        torch::utils::maybe_initialize_cuda(options);

        auto dispatch_empty = [](IntArrayRef size, const TensorOptions & options, c10::optional<MemoryFormat> memory_format) -> Tensor {
          pybind11::gil_scoped_release no_gil;
          return torch::empty(size, options, memory_format);
        };
        return wrap(dispatch_empty(_r.intlist(0), options, _r.memoryformatOptional(1)));
      }
      ...
    }
  }
}
  • _r.isNone(2)判断memory_format【见torch-memory-format】是否为空,在本例中为空。
  • 12-17行使用TensorOptions封装参数
  • 第24行首先通过调用dispatch_empty()最终调用torch::empty(),然后使用wrap()封装返回的C++层面的tensor,wrap()实际调用THPVariable_Wrap()

    1.2.1 torch::empty()

    torch::empty()执行过程

  • 执行at::AutoNonVariableTypeMode non_var_type_mode(true); 防止循环调用

  • 调用at::empty()返回C++层面的tensor,但这个tensor还没初始化完成,at::empty()只完成了内存分配相关的工作,但自动求导(autograd)的工作得留给make_varibale()完成
  • 调用make_varibale()完成和自动求导相关的初始化

    [torch/csrc/autograd/generated/variable_factories.cpp]
    inline at::Tensor empty(at::IntArrayRef size, const at::TensorOptions & options = {}, c10::optional<at::MemoryFormat> memory_format = c10::nullopt) {
    at::Tensor tensor = ([&]() {
      at::AutoNonVariableTypeMode non_var_type_mode(true);
      return at::empty(size, at::TensorOptions(options).requires_grad(c10::nullopt), memory_format);
    })();
    at::Tensor result =
      autograd::make_variable(std::move(tensor), /*requires_grad=*/options.requires_grad());
    return result;
    }
    

    1.2.1.1 at::empty()

    这里涉及的分发机制将在下一篇文章中介绍,在这里只需要知道PyTorch的分发机制会根据数据类型是否需要dtype、device等信息来调用对应实现。

    [build/aten/src/ATen/Functions.cpp]
    // aten::empty.memory_format(int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor
    Tensor empty(IntArrayRef size, const TensorOptions & options, c10::optional<MemoryFormat> memory_format) {
      static auto op = c10::Dispatcher::singleton()
          .findSchemaOrThrow("aten::empty", "memory_format")
          .typed<Tensor (IntArrayRef, c10::optional<ScalarType>, c10::optional<Layout>, c10::optional<Device>, c10::optional<bool>, c10::optional<MemoryFormat>)>();
      return op.call(size, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt(), c10::impl::check_tensor_options_and_extract_memory_format(options, memory_format));
    }
    

    这里只需知道通过分发机制最终调用empty_cpu()完成tensor内存层面的初始化,具体调用过程见一个empty()方法的分发流程
    empty_cpu()执行过程如下图
    create_empty_tensor_3.svg

    [aten/src/Aten/Utils.cpp]
    Tensor empty_cpu(
      IntArrayRef size,
      c10::optional<ScalarType> dtype_opt,
      c10::optional<Layout> layout_opt,
      c10::optional<Device> device_opt,
      c10::optional<bool> pin_memory_opt,
      c10::optional<c10::MemoryFormat> memory_format_opt) {
    Device device = device_or_default(device_opt);
    
    TORCH_CHECK(device.type() == DeviceType::CPU);
    check_size_nonnegative(size);
    
    bool pin_memory = pinned_memory_or_default(pin_memory_opt);
    c10::Allocator* allocator;
    if (pin_memory) {
      allocator = detail::getCUDAHooks().getPinnedMemoryAllocator();
    } else {
      allocator = at::getCPUAllocator();
    }
    
    int64_t nelements = prod_intlist(size);
    caffe2::TypeMeta dtype = scalarTypeToTypeMeta(dtype_or_default(dtype_opt));
    int64_t size_bytes = nelements * dtype.itemsize();
    auto storage_impl = c10::make_intrusive<StorageImpl>(
        c10::StorageImpl::use_byte_size_t(),
        size_bytes,
        allocator->allocate(size_bytes),
        allocator,
        /*resizeable=*/true);
    
    auto tensor = detail::make_tensor<TensorImpl>(
        std::move(storage_impl), at::DispatchKey::CPU, dtype);
    // Default TensorImpl has size [0]
    if (size.size() != 1 || size[0] != 0) {
      tensor.unsafeGetTensorImpl()->set_sizes_contiguous(size);
    }
    
    if (memory_format_opt.has_value()) {
      // Restriding a just-created empty contiguous tensor does nothing.
      if (*memory_format_opt != MemoryFormat::Contiguous) {
        tensor.unsafeGetTensorImpl()->empty_tensor_restride(*memory_format_opt);
      }
    }
    
    return tensor;
    }
    

    1.2.1.2 autograd::make_variable()

    [troch/csrc/autograd/variable.h]
    /// Creates a `Variable` from the given `Tensor`, copying its underlying `TensorImpl`.
    /// `requires_grad` should be
    /// set only for leaves, and determines whether the `Variable` will accumulate
    /// gradients. NOTE: `data` must *not* be a `Variable` already. Its dynamic
    /// type *must* be `Tensor`.
    ///
    /// TODO: Eliminate this function as much as possible, as it can be expressed
    /// more clearly as detach() or a no-op in most call sites (especially when
    /// there is only one use of the variable).
    inline Variable make_variable(
      at::Tensor data,
      bool requires_grad = false,
      bool allow_tensor_metadata_change = true) {
    if (data.defined()) {
      if (data.getIntrusivePtr().use_count() == 1 && data.getIntrusivePtr()->unique_version()) {
        auto data_impl = data.getIntrusivePtr();
        data_impl->set_allow_tensor_metadata_change(allow_tensor_metadata_change);
        if (requires_grad) {
          data_impl->set_autograd_meta(std::make_unique<AutogradMeta>(data_impl.get(), requires_grad));
        } else {
          data_impl->set_autograd_meta(nullptr);
        }
        return Variable(std::move(data_impl));
      } else {
        auto data_impl_copy = data.getIntrusivePtr()->shallow_copy_and_detach(
          /*version_counter=*/0,
          /*allow_tensor_metadata_change=*/allow_tensor_metadata_change);
        if (requires_grad) {
          data_impl_copy->set_autograd_meta(std::make_unique<AutogradMeta>(
            data_impl_copy.get(), requires_grad));
        } else {
          data_impl_copy->set_autograd_meta(nullptr);
        }
        return Variable(data_impl_copy);
      }
    }
    return Variable();
    }
    
  • 18行将tensor中的TensorImplallow_tensor_metadata_change设置为True。

  • 本例中requires_gradTrue,所以执行20行data_impl->set_autograd_meta(),设置TensorImplautograd_meta_AutogradMeta实例
  • AutogradMeta实例是通过std::make_unique<AutogradMeta>(data_impl.get(), requires_grad)创建的【std::make_unique含义及用法见make_unique】。

AutogradMeta被实例化的过程中会调用AutogradMeta()构造函数

[torch/csrc/autograd/variable.h]
AutogradMeta(at::TensorImpl* self_impl = nullptr, bool requires_grad = false, Edge gradient_edge = Edge() ) {
    grad_fn_ = std::move(gradient_edge.function);
    requires_grad_ = false;
    retains_grad_ = false;
    is_view_ = false;
    output_nr_ = gradient_edge.input_nr;

    // set_requires_grad also checks error conditions.
    if (requires_grad) {
      TORCH_INTERNAL_ASSERT(self_impl);
      set_requires_grad(requires_grad, self_impl);
    }
    TORCH_CHECK(
        !grad_fn_ || !requires_grad_,
        "requires_grad should be false if grad_fn is set");
}
  • gradient_edge在参数列表中被实例化,Edge是构建计算图的关键类之一,将在自动求导章节详细介绍
  • 在本例中grad_fn_ 为空,因为是叶子节点,所以该值为空

至此,C++层面的tensor已经创建完成,接下来只需封装成可在python层面使用的THPVariable

1.3 把tensor封装成THPVariable

调用wrap()封装C++层面的Tensor。

[torch/csrc/autograd/utils/wrap_outputs.cpp]
inline PyObject* wrap(at::Tensor tensor) {
  return THPVariable_Wrap(Variable(std::move(tensor)));
}

wrap()调用THPVariable_Wrap()

[torch/csrc/autograd/python_variable.cpp]
PyObject * THPVariable_Wrap(Variable var)
{
  if (!var.defined()) {
    Py_RETURN_NONE;
  }

  if (auto obj = torch::autograd::impl::pyobj(var)) {
    Py_INCREF(obj);
    return obj;
  }

  return THPVariable_NewWithVar((PyTypeObject *)THPVariableClass, std::move(var));
}

THPVariable_NewWithVar()是真正把C++版本的tensor封装成可在python层面使用的THPVariable

[torch/csrc/autograd/python_variable.cpp]
// Creates a new Python object for a Variable. The Variable must not already
// have a PyObject* associated with it.
static PyObject* THPVariable_NewWithVar(PyTypeObject* type, Variable var)
{
  PyObject* obj = type->tp_alloc(type, 0);
  if (obj) {
    auto v = (THPVariable*) obj;
    new (&v->cdata) Variable(std::move(var));
    torch::autograd::impl::set_pyobj(v->cdata, obj);
  }
  return obj;
}

至此,一个tensor被创建完成了。

2 torch.tensor()

import torch
a = torch.tensor([2.0, 3.0], requires_grad=True)
print(a)
# tensor([2., 3.], requires_grad=True)

对第2行逐语句调试,跳转入THPVariable_tensor()

[torch/csrc/autograd/generated/python_torch_functions.cpp]
static PyObject * THPVariable_tensor(PyObject* self, PyObject* args, PyObject* kwargs)
{
  ...
  return THPVariable_Wrap(torch::utils::tensor_ctor(torch::tensors::get_default_dispatch_key(), torch::tensors::get_default_scalar_type(), args, kwargs));
  ...
}

THPVariable_tensor()包括四部分

  • torch::tensors::get_default_scalar_type()

获取数据类型

  • torch::tensors::get_default_dispatch_key()

获取默认的分发键

  • torch::utils::tensor_ctor()

    返回一个C++层面的tensor
    
  • THPVariable_Wrap()

    把tensor封装成Python可使用的`THPVariable`<br />![create_tensor (2).svg](https://cdn.nlark.com/yuque/0/2021/svg/12356632/1611132236308-de420595-ffdc-4730-992a-766ade0f7a42.svg#height=391&id=VOZqo&margin=%5Bobject%20Object%5D&name=create_tensor%20%282%29.svg&originHeight=391&originWidth=721&originalType=binary&ratio=1&size=19692&status=done&style=none&width=721)
    

    2.1 torch::tensors::get_default_scalar_type()

    获取数据类型

    [torch/csrc/tensor/python_tensor.cpp]
    ScalarType get_default_scalar_type() {
    return typeMetaToScalarType(get_default_dtype());
    }
    
  • get_default_dtype()返回caffe2::TypeMeta::Make<float>(),即浮点类型。

  • typeMetaToScalarType() convert TypeMeta handles to ScalarType enum values
    [c10/core/ScalarTypeToTypeMete.h]
    /**
    * convert TypeMeta handles to ScalarType enum values
    */
    static inline ScalarType typeMetaToScalarType(caffe2::TypeMeta dtype) {
    return dtype.toScalarType();
    }
    

    ScalarType定义见https://github.com/pytorch/pytorch/blob/v1.7.0/c10/core/ScalarType.h#L62

get_default_scalar_type()最终返回常量6,对应的类型为Float

2.2 torch::tensors::get_default_dispatch_key()

获取默认的分发键

[torch/csrc/tensor/python_tensor.cpp]
c10::DispatchKey get_default_dispatch_key() {
  AT_ASSERT(default_tensor_type);
  return default_tensor_type->get_dispatch_key();
}
[torch/csrc/tensor/python_tensor.cpp]
DispatchKey get_dispatch_key() const {
  return backendToDispatchKey(static_cast<Backend>(backend));
}

get_default_dispatch_key()最终返回DispatchKey::CPU

2.3 torch::utils::tensor_ctor()

tensor_ctor()主要包括以下三部分

  • 解析参数【见参数解析
  • 调用internal_new_from_data()完成C++层面tensor的创建,负责初始化storage_
  • 调用new_tensor.detach_()确保newtensor是叶子结点,在这里是负责初始化`autograd_meta`

    如果不熟悉storage和autograd_meta,请看Tensor的继承体系

create_tensor_1.svg

[torch/csrc/utils/tensor_new.cpp, https://github.com/pytorch/pytorch/blob/v1.7.0/torch/csrc/utils/tensor_new.cpp#L694]
Tensor tensor_ctor(c10::DispatchKey dispatch_key, at::ScalarType scalar_type, PyObject* args, PyObject* kwargs) {
  static PythonArgParser parser({
    "tensor(PyObject* data, *, ScalarType dtype=None, Device? device=None, bool pin_memory=False, bool requires_grad=False, DimnameList? names=None)",
  });

  constexpr int ctor_num_args = 6;
  ParsedArgs<ctor_num_args> parsed_args;
  auto r = parser.parse(args, kwargs, parsed_args);
  if (r.idx == 0) {
    PyObject* data = r.pyobject(0);
    ...
    auto new_tensor = internal_new_from_data(
               typeIdWithDefault(r, 2, dispatch_key),
               r.scalartypeWithDefault(1, scalar_type),
               r.deviceOptional(2),
               data,
               /*copy_variables=*/true,
               /*copy_numpy=*/true,
               /*type_inference=*/type_inference,
               pin_memory);
    ...
  }
  throw std::runtime_error("tensor(): invalid arguments");
}

2.3.1 internal_new_from_data()

[torch/csrc/utils/tensor_new.cpp, https://github.com/pytorch/pytorch/blob/v1.7.0/torch/csrc/utils/tensor_new.cpp#L231]
Tensor internal_new_from_data(
    c10::DispatchKey dispatch_key,
    at::ScalarType scalar_type,
    c10::optional<Device> device_opt,
    PyObject* data,
    bool copy_variables,
    bool copy_numpy,
    bool type_inference,
    bool pin_memory = false) {

  ...

#ifdef USE_NUMPY
  ...
  if (PyArray_Check(data)) {
    TORCH_CHECK(!pin_memory, "Can't pin tensor constructed from numpy");
    auto tensor = tensor_from_numpy(data, /*warn_if_not_writeable=*/!copy_numpy);
    const auto& inferred_scalar_type = type_inference ? tensor.scalar_type() : scalar_type;
    auto device = device_opt.has_value() ? *device_opt : at::Device(computeDeviceType(dispatch_key));
    pybind11::gil_scoped_release no_gil;
    maybe_initialize_cuda(device);
    return tensor.to(device, inferred_scalar_type, /*non_blocking=*/false, /*copy=*/copy_numpy);
  }
#endif

  auto sizes = compute_sizes(data);
  ScalarType inferred_scalar_type = type_inference ? infer_scalar_type(data) : scalar_type;
  // This exists to prevent us from tracing the call to empty().  The actual
  // autograd code doesn't really matter, because requires_grad is always false
  // here.
  Tensor tensor;
  {
    at::AutoNonVariableTypeMode guard;  // TODO: remove
    at::tracer::impl::NoTracerDispatchMode tracer_guard;
    tensor = at::empty(sizes, at::initialTensorOptions().dtype(inferred_scalar_type).pinned_memory(pin_memory));
    recursive_store(
        (char*)tensor.data_ptr(), tensor.sizes(), tensor.strides(), 0,
        inferred_scalar_type, tensor.dtype().itemsize(), data);
  }
  auto device = device_opt.has_value() ? *device_opt : at::Device(computeDeviceType(dispatch_key));
  pybind11::gil_scoped_release no_gil;
  maybe_initialize_cuda(device);
  // However, it is VERY important that we trace the to() call here (even
  // though the reason this is important is a hack).  Without *some* factory
  // function call that is traced at construction time, we will consider
  // a tensor constant as originating from "outside" the trace, and if you
  // try to return it directly we will fail with the error saying no
  // "no observable data dependence".  In an ideal world, we wouldn't trace
  // a to() call but I need to think harder about what exactly we should trace
  // in this case.
  return tensor.to(device, inferred_scalar_type, /*non_blocking=*/false, /*copy=*/false);
}

internal_new_from_data()主要包括两部分

  • 调用at::empty()创建tensor【见1.2.1.1 at::empty()
  • 调用recursive_store() 初始化tensor的数据

    [torch/csrc/utils/tensor_new.cpp, https://github.com/pytorch/pytorch/blob/v1.7.0/torch/csrc/utils/tensor_new.cpp#L207]
    void recursive_store(char* data, IntArrayRef sizes, IntArrayRef strides, int64_t dim,
                              ScalarType scalarType, int elementSize, PyObject* obj) {
    int64_t ndim = sizes.size();
    if (dim == ndim) {
      torch::utils::store_scalar(data, scalarType, obj);
      return;
    }
    
    auto n = sizes[dim];
    auto seq = THPObjectPtr(PySequence_Fast(obj, "not a sequence"));
    if (!seq) throw python_error();
    auto seq_size = PySequence_Fast_GET_SIZE(seq.get());
    if (seq_size != n) {
      throw ValueError("expected sequence of length %lld at dim %lld (got %lld)",
        (long long)n, (long long)dim, (long long)seq_size);
    }
    
    PyObject** items = PySequence_Fast_ITEMS(seq.get());
    for (int64_t i = 0; i < n; i++) {
      recursive_store(data, sizes, strides, dim + 1, scalarType, elementSize, items[i]);
      data += strides[dim] * elementSize;
    }
    }
    

    2.3.2 newtensor.detach()

    detach_()的作用是保证new_tensor是叶子结点

    关于detach的讨论可见What the difference between detach(), detach_(),and with torch.no_grad() in our training phase?

[build/aten/src/ATen/core/TensorMethods.cpp]
// aten::detach_(Tensor(a!) self) -> Tensor(a!)
Tensor & Tensor::detach_() const {
    static auto op = c10::Dispatcher::singleton()
        .findSchemaOrThrow("aten::detach_", "")
        .typed<Tensor & (Tensor &)>();
    return op.call(const_cast<Tensor&>(*this));
}

通过分发机制,aten::detach_()最终调用`detach_()``

`

[torch/csrc/autograd/VaribableTypeManual.cpp]
Tensor & detach_(Tensor & self) {
  ...
  // I think the choice here is conservative.  In principle, doing
  // an in-place detach should give us the ability to just clear
  // the autograd meta.  But this function ONLY resets requires_grad,
  // grad_fn and output_nr; there's other metadata like debug name
  // and hooks which aren't cleared.  Is this function supposed to
  // clear those too? I'm not too sure, so I'm leaving it be for now.
  auto autograd_meta = impl::materialize_autograd_meta(self);
  autograd_meta->set_requires_grad(false, self.unsafeGetTensorImpl());
  autograd_meta->grad_fn_.reset();
  autograd_meta->output_nr_ = 0;

  // detach only backward gradients for both primal and tangent
  if (self.fw_grad(/* level */ 0).defined()) {
    self.fw_grad(/* level */ 0).detach_();
  }

  return self;
}

materialize_autograd_meta()主要是初始化autograd_meta_,也是借助std::make_unique<AutogradMeta>()创建实例。【关于std::make_unique()的介绍可见1.2.1.2 autograd::make_variable()

[torch/csrc/autograd/varibable.cpp]
AutogradMeta* materialize_autograd_meta(const Variable& self) {
    TORCH_CHECK(self.defined(), "cannot call materialize_autograd_meta() on undefined tensor");
    auto p = self.unsafeGetTensorImpl();
    if (!p->autograd_meta()) {
      p->set_autograd_meta(std::make_unique<AutogradMeta>());
    }
    return get_autograd_meta(self);
}

2.4 THPVariable_Wrap()

和1.3差不多,故在此略过。

本篇文章就到这了。。。。。。


本章节只介绍了用户自己创建的tensor是如何一步步被创建的,那么中间结点(不是用户创建的非叶子结点)是如何被创建,以及计算图是如何生成并为自动求导服务的,将在下一章的自动求导部分一一展开。