本文参考
- pytorch源码解析:Tensor的生成过程
- PyTorch Internals 5:Autograd的实现
- PyTorch学习笔记(5)——论一个torch.Tensor是如何构建完成的?
- PyTorch Internals 4:一个tensor新建的过程
- PyTorch的Tensor(上)
本文基于PyTorch1.7.0,https://github.com/pytorch/pytorch/tree/v1.7.0 如果本文有不清楚或者不正确的地方,请在评论区指正
在上一篇中对Tensor的继承体系做了介绍,那么Tensor的构建过程是怎样的呢?
在这一节中将会看到一个Tensor是如何被创建的。
本文会介绍两种初始化tensor的方式,一种是使用torch模块的函数,另一种是使用Tensor类。
注意:在本专栏中,Tensor表示类,tensor代表实例,在本篇文章中涉及两种阶段的tensor,一种是C++层面的tensor,另一种是可在Python层面调用的tensor,其对应C++层面的THPVariable,因此在本文中提到THPVariable可以理解为Python层面的tensor。
1 torch.empty()
import torcha = torch.empty(2, 3, requires_grad=True)print(a)"""tensor([[1.2383e+02, 3.0771e-41, 1.2383e+02],[3.0771e-41, 1.2383e+02, 3.0771e-41]], requires_grad=True)"""
对第2行逐语句调试,跳转入THPVariable_empty()
[torch/csrc/autograd/generated/python_torch_functions.cpp]static PyObject * THPVariable_empty(PyObject* self_, PyObject* args, PyObject* kwargs){...}
该函数包括三部分功能,第一部分使用PythonArgsParser解析参数,第二部分创建C++层面的tensor,第三部分把tensor封装成THPVariable(即可在python层面调用的tensor)。
1.1 参数解析
[torch/csrc/autograd/generated/python_torch_functions.cpp]
static PyObject * THPVariable_empty(PyObject* self_, PyObject* args, PyObject* kwargs)
{
static PythonArgParser parser({
"empty(IntArrayRef size, *, DimnameList? names, MemoryFormat? memory_format=None, ScalarType dtype=None, Layout layout=torch.strided, Device device=None, bool pin_memory=False, bool requires_grad=False)",
"empty(IntArrayRef size, *, MemoryFormat? memory_format=None, Tensor out=None, ScalarType dtype=None, Layout layout=torch.strided, Device device=None, bool pin_memory=False, bool requires_grad=False)",
}, /*traceable=*/true);
ParsedArgs<8> parsed_args;
auto _r = parser.parse(nullptr, args, kwargs, parsed_args);
...
}
解析传入的参数,参数有size、dtype、device等。4-5行表示两种参数方式,第10行返回解析的结果,_r.idx用数字索引表示第几种参数方式。
PythonArgParser类可以在torch/csrc/utils/python_arg_parser.h查看
1.2 创建C++层面的tensor
在本例代码中,_r.idx等于1,所以进入case 1执行代码
[torch/csrc/autograd/generated/python_torch_functions.cpp]
static PyObject * THPVariable_empty(PyObject* self_, PyObject* args, PyObject* kwargs)
{
...
auto _r = parser.parse(nullptr, args, kwargs, parsed_args);
...
switch (_r.idx) {
...
case 1: {
if (_r.isNone(2)) {
// aten::empty.memory_format(int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor
const auto options = TensorOptions()
.dtype(_r.scalartype(3))
.device(_r.device(5))
.layout(_r.layoutOptional(4))
.requires_grad(_r.toBool(7))
.pinned_memory(_r.toBool(6));
torch::utils::maybe_initialize_cuda(options);
auto dispatch_empty = [](IntArrayRef size, const TensorOptions & options, c10::optional<MemoryFormat> memory_format) -> Tensor {
pybind11::gil_scoped_release no_gil;
return torch::empty(size, options, memory_format);
};
return wrap(dispatch_empty(_r.intlist(0), options, _r.memoryformatOptional(1)));
}
...
}
}
}
_r.isNone(2)判断memory_format【见torch-memory-format】是否为空,在本例中为空。- 12-17行使用
TensorOptions封装参数 第24行首先通过调用
dispatch_empty()最终调用torch::empty(),然后使用wrap()封装返回的C++层面的tensor,wrap()实际调用THPVariable_Wrap()1.2.1 torch::empty()
torch::empty()执行过程执行
at::AutoNonVariableTypeMode non_var_type_mode(true);防止循环调用- 调用
at::empty()返回C++层面的tensor,但这个tensor还没初始化完成,at::empty()只完成了内存分配相关的工作,但自动求导(autograd)的工作得留给make_varibale()完成 调用
make_varibale()完成和自动求导相关的初始化[torch/csrc/autograd/generated/variable_factories.cpp] inline at::Tensor empty(at::IntArrayRef size, const at::TensorOptions & options = {}, c10::optional<at::MemoryFormat> memory_format = c10::nullopt) { at::Tensor tensor = ([&]() { at::AutoNonVariableTypeMode non_var_type_mode(true); return at::empty(size, at::TensorOptions(options).requires_grad(c10::nullopt), memory_format); })(); at::Tensor result = autograd::make_variable(std::move(tensor), /*requires_grad=*/options.requires_grad()); return result; }1.2.1.1 at::empty()
这里涉及的分发机制将在下一篇文章中介绍,在这里只需要知道PyTorch的分发机制会根据数据类型是否需要dtype、device等信息来调用对应实现。
[build/aten/src/ATen/Functions.cpp] // aten::empty.memory_format(int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor Tensor empty(IntArrayRef size, const TensorOptions & options, c10::optional<MemoryFormat> memory_format) { static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::empty", "memory_format") .typed<Tensor (IntArrayRef, c10::optional<ScalarType>, c10::optional<Layout>, c10::optional<Device>, c10::optional<bool>, c10::optional<MemoryFormat>)>(); return op.call(size, optTypeMetaToScalarType(options.dtype_opt()), options.layout_opt(), options.device_opt(), options.pinned_memory_opt(), c10::impl::check_tensor_options_and_extract_memory_format(options, memory_format)); }这里只需知道通过分发机制最终调用
empty_cpu()完成tensor内存层面的初始化,具体调用过程见一个empty()方法的分发流程。empty_cpu()执行过程如下图[aten/src/Aten/Utils.cpp] Tensor empty_cpu( IntArrayRef size, c10::optional<ScalarType> dtype_opt, c10::optional<Layout> layout_opt, c10::optional<Device> device_opt, c10::optional<bool> pin_memory_opt, c10::optional<c10::MemoryFormat> memory_format_opt) { Device device = device_or_default(device_opt); TORCH_CHECK(device.type() == DeviceType::CPU); check_size_nonnegative(size); bool pin_memory = pinned_memory_or_default(pin_memory_opt); c10::Allocator* allocator; if (pin_memory) { allocator = detail::getCUDAHooks().getPinnedMemoryAllocator(); } else { allocator = at::getCPUAllocator(); } int64_t nelements = prod_intlist(size); caffe2::TypeMeta dtype = scalarTypeToTypeMeta(dtype_or_default(dtype_opt)); int64_t size_bytes = nelements * dtype.itemsize(); auto storage_impl = c10::make_intrusive<StorageImpl>( c10::StorageImpl::use_byte_size_t(), size_bytes, allocator->allocate(size_bytes), allocator, /*resizeable=*/true); auto tensor = detail::make_tensor<TensorImpl>( std::move(storage_impl), at::DispatchKey::CPU, dtype); // Default TensorImpl has size [0] if (size.size() != 1 || size[0] != 0) { tensor.unsafeGetTensorImpl()->set_sizes_contiguous(size); } if (memory_format_opt.has_value()) { // Restriding a just-created empty contiguous tensor does nothing. if (*memory_format_opt != MemoryFormat::Contiguous) { tensor.unsafeGetTensorImpl()->empty_tensor_restride(*memory_format_opt); } } return tensor; }1.2.1.2 autograd::make_variable()
[troch/csrc/autograd/variable.h] /// Creates a `Variable` from the given `Tensor`, copying its underlying `TensorImpl`. /// `requires_grad` should be /// set only for leaves, and determines whether the `Variable` will accumulate /// gradients. NOTE: `data` must *not* be a `Variable` already. Its dynamic /// type *must* be `Tensor`. /// /// TODO: Eliminate this function as much as possible, as it can be expressed /// more clearly as detach() or a no-op in most call sites (especially when /// there is only one use of the variable). inline Variable make_variable( at::Tensor data, bool requires_grad = false, bool allow_tensor_metadata_change = true) { if (data.defined()) { if (data.getIntrusivePtr().use_count() == 1 && data.getIntrusivePtr()->unique_version()) { auto data_impl = data.getIntrusivePtr(); data_impl->set_allow_tensor_metadata_change(allow_tensor_metadata_change); if (requires_grad) { data_impl->set_autograd_meta(std::make_unique<AutogradMeta>(data_impl.get(), requires_grad)); } else { data_impl->set_autograd_meta(nullptr); } return Variable(std::move(data_impl)); } else { auto data_impl_copy = data.getIntrusivePtr()->shallow_copy_and_detach( /*version_counter=*/0, /*allow_tensor_metadata_change=*/allow_tensor_metadata_change); if (requires_grad) { data_impl_copy->set_autograd_meta(std::make_unique<AutogradMeta>( data_impl_copy.get(), requires_grad)); } else { data_impl_copy->set_autograd_meta(nullptr); } return Variable(data_impl_copy); } } return Variable(); }18行将
tensor中的TensorImpl的allow_tensor_metadata_change设置为True。- 本例中
requires_grad为True,所以执行20行data_impl->set_autograd_meta(),设置TensorImpl的autograd_meta_为AutogradMeta实例 AutogradMeta实例是通过std::make_unique<AutogradMeta>(data_impl.get(), requires_grad)创建的【std::make_unique含义及用法见make_unique】。
AutogradMeta被实例化的过程中会调用AutogradMeta()构造函数
[torch/csrc/autograd/variable.h]
AutogradMeta(at::TensorImpl* self_impl = nullptr, bool requires_grad = false, Edge gradient_edge = Edge() ) {
grad_fn_ = std::move(gradient_edge.function);
requires_grad_ = false;
retains_grad_ = false;
is_view_ = false;
output_nr_ = gradient_edge.input_nr;
// set_requires_grad also checks error conditions.
if (requires_grad) {
TORCH_INTERNAL_ASSERT(self_impl);
set_requires_grad(requires_grad, self_impl);
}
TORCH_CHECK(
!grad_fn_ || !requires_grad_,
"requires_grad should be false if grad_fn is set");
}
gradient_edge在参数列表中被实例化,Edge是构建计算图的关键类之一,将在自动求导章节详细介绍- 在本例中
grad_fn_为空,因为是叶子节点,所以该值为空
至此,C++层面的tensor已经创建完成,接下来只需封装成可在python层面使用的THPVariable。
1.3 把tensor封装成THPVariable
调用wrap()封装C++层面的Tensor。
[torch/csrc/autograd/utils/wrap_outputs.cpp]
inline PyObject* wrap(at::Tensor tensor) {
return THPVariable_Wrap(Variable(std::move(tensor)));
}
wrap()调用THPVariable_Wrap()
[torch/csrc/autograd/python_variable.cpp]
PyObject * THPVariable_Wrap(Variable var)
{
if (!var.defined()) {
Py_RETURN_NONE;
}
if (auto obj = torch::autograd::impl::pyobj(var)) {
Py_INCREF(obj);
return obj;
}
return THPVariable_NewWithVar((PyTypeObject *)THPVariableClass, std::move(var));
}
THPVariable_NewWithVar()是真正把C++版本的tensor封装成可在python层面使用的THPVariable。
[torch/csrc/autograd/python_variable.cpp]
// Creates a new Python object for a Variable. The Variable must not already
// have a PyObject* associated with it.
static PyObject* THPVariable_NewWithVar(PyTypeObject* type, Variable var)
{
PyObject* obj = type->tp_alloc(type, 0);
if (obj) {
auto v = (THPVariable*) obj;
new (&v->cdata) Variable(std::move(var));
torch::autograd::impl::set_pyobj(v->cdata, obj);
}
return obj;
}
2 torch.tensor()
import torch
a = torch.tensor([2.0, 3.0], requires_grad=True)
print(a)
# tensor([2., 3.], requires_grad=True)
对第2行逐语句调试,跳转入THPVariable_tensor()
[torch/csrc/autograd/generated/python_torch_functions.cpp]
static PyObject * THPVariable_tensor(PyObject* self, PyObject* args, PyObject* kwargs)
{
...
return THPVariable_Wrap(torch::utils::tensor_ctor(torch::tensors::get_default_dispatch_key(), torch::tensors::get_default_scalar_type(), args, kwargs));
...
}
THPVariable_tensor()包括四部分
- torch::tensors::get_default_scalar_type()
获取数据类型
- torch::tensors::get_default_dispatch_key()
获取默认的分发键
torch::utils::tensor_ctor()
返回一个C++层面的tensorTHPVariable_Wrap()
把tensor封装成Python可使用的`THPVariable`<br />2.1 torch::tensors::get_default_scalar_type()
获取数据类型
[torch/csrc/tensor/python_tensor.cpp] ScalarType get_default_scalar_type() { return typeMetaToScalarType(get_default_dtype()); }get_default_dtype()返回caffe2::TypeMeta::Make<float>(),即浮点类型。typeMetaToScalarType()convert TypeMeta handles to ScalarType enum values[c10/core/ScalarTypeToTypeMete.h] /** * convert TypeMeta handles to ScalarType enum values */ static inline ScalarType typeMetaToScalarType(caffe2::TypeMeta dtype) { return dtype.toScalarType(); }ScalarType定义见https://github.com/pytorch/pytorch/blob/v1.7.0/c10/core/ScalarType.h#L62
get_default_scalar_type()最终返回常量6,对应的类型为Float
2.2 torch::tensors::get_default_dispatch_key()
获取默认的分发键
[torch/csrc/tensor/python_tensor.cpp]
c10::DispatchKey get_default_dispatch_key() {
AT_ASSERT(default_tensor_type);
return default_tensor_type->get_dispatch_key();
}
[torch/csrc/tensor/python_tensor.cpp]
DispatchKey get_dispatch_key() const {
return backendToDispatchKey(static_cast<Backend>(backend));
}
get_default_dispatch_key()最终返回DispatchKey::CPU
2.3 torch::utils::tensor_ctor()
tensor_ctor()主要包括以下三部分
- 解析参数【见参数解析】
- 调用
internal_new_from_data()完成C++层面tensor的创建,负责初始化storage_ - 调用
new_tensor.detach_()确保newtensor是叶子结点,在这里是负责初始化`autograd_meta`如果不熟悉storage和autograd_meta,请看Tensor的继承体系
[torch/csrc/utils/tensor_new.cpp, https://github.com/pytorch/pytorch/blob/v1.7.0/torch/csrc/utils/tensor_new.cpp#L694]
Tensor tensor_ctor(c10::DispatchKey dispatch_key, at::ScalarType scalar_type, PyObject* args, PyObject* kwargs) {
static PythonArgParser parser({
"tensor(PyObject* data, *, ScalarType dtype=None, Device? device=None, bool pin_memory=False, bool requires_grad=False, DimnameList? names=None)",
});
constexpr int ctor_num_args = 6;
ParsedArgs<ctor_num_args> parsed_args;
auto r = parser.parse(args, kwargs, parsed_args);
if (r.idx == 0) {
PyObject* data = r.pyobject(0);
...
auto new_tensor = internal_new_from_data(
typeIdWithDefault(r, 2, dispatch_key),
r.scalartypeWithDefault(1, scalar_type),
r.deviceOptional(2),
data,
/*copy_variables=*/true,
/*copy_numpy=*/true,
/*type_inference=*/type_inference,
pin_memory);
...
}
throw std::runtime_error("tensor(): invalid arguments");
}
2.3.1 internal_new_from_data()
[torch/csrc/utils/tensor_new.cpp, https://github.com/pytorch/pytorch/blob/v1.7.0/torch/csrc/utils/tensor_new.cpp#L231]
Tensor internal_new_from_data(
c10::DispatchKey dispatch_key,
at::ScalarType scalar_type,
c10::optional<Device> device_opt,
PyObject* data,
bool copy_variables,
bool copy_numpy,
bool type_inference,
bool pin_memory = false) {
...
#ifdef USE_NUMPY
...
if (PyArray_Check(data)) {
TORCH_CHECK(!pin_memory, "Can't pin tensor constructed from numpy");
auto tensor = tensor_from_numpy(data, /*warn_if_not_writeable=*/!copy_numpy);
const auto& inferred_scalar_type = type_inference ? tensor.scalar_type() : scalar_type;
auto device = device_opt.has_value() ? *device_opt : at::Device(computeDeviceType(dispatch_key));
pybind11::gil_scoped_release no_gil;
maybe_initialize_cuda(device);
return tensor.to(device, inferred_scalar_type, /*non_blocking=*/false, /*copy=*/copy_numpy);
}
#endif
auto sizes = compute_sizes(data);
ScalarType inferred_scalar_type = type_inference ? infer_scalar_type(data) : scalar_type;
// This exists to prevent us from tracing the call to empty(). The actual
// autograd code doesn't really matter, because requires_grad is always false
// here.
Tensor tensor;
{
at::AutoNonVariableTypeMode guard; // TODO: remove
at::tracer::impl::NoTracerDispatchMode tracer_guard;
tensor = at::empty(sizes, at::initialTensorOptions().dtype(inferred_scalar_type).pinned_memory(pin_memory));
recursive_store(
(char*)tensor.data_ptr(), tensor.sizes(), tensor.strides(), 0,
inferred_scalar_type, tensor.dtype().itemsize(), data);
}
auto device = device_opt.has_value() ? *device_opt : at::Device(computeDeviceType(dispatch_key));
pybind11::gil_scoped_release no_gil;
maybe_initialize_cuda(device);
// However, it is VERY important that we trace the to() call here (even
// though the reason this is important is a hack). Without *some* factory
// function call that is traced at construction time, we will consider
// a tensor constant as originating from "outside" the trace, and if you
// try to return it directly we will fail with the error saying no
// "no observable data dependence". In an ideal world, we wouldn't trace
// a to() call but I need to think harder about what exactly we should trace
// in this case.
return tensor.to(device, inferred_scalar_type, /*non_blocking=*/false, /*copy=*/false);
}
internal_new_from_data()主要包括两部分
- 调用
at::empty()创建tensor【见1.2.1.1 at::empty()】 调用
recursive_store()初始化tensor的数据[torch/csrc/utils/tensor_new.cpp, https://github.com/pytorch/pytorch/blob/v1.7.0/torch/csrc/utils/tensor_new.cpp#L207] void recursive_store(char* data, IntArrayRef sizes, IntArrayRef strides, int64_t dim, ScalarType scalarType, int elementSize, PyObject* obj) { int64_t ndim = sizes.size(); if (dim == ndim) { torch::utils::store_scalar(data, scalarType, obj); return; } auto n = sizes[dim]; auto seq = THPObjectPtr(PySequence_Fast(obj, "not a sequence")); if (!seq) throw python_error(); auto seq_size = PySequence_Fast_GET_SIZE(seq.get()); if (seq_size != n) { throw ValueError("expected sequence of length %lld at dim %lld (got %lld)", (long long)n, (long long)dim, (long long)seq_size); } PyObject** items = PySequence_Fast_ITEMS(seq.get()); for (int64_t i = 0; i < n; i++) { recursive_store(data, sizes, strides, dim + 1, scalarType, elementSize, items[i]); data += strides[dim] * elementSize; } }2.3.2 newtensor.detach()
detach_()的作用是保证new_tensor是叶子结点关于detach的讨论可见What the difference between detach(), detach_(),and with torch.no_grad() in our training phase?
[build/aten/src/ATen/core/TensorMethods.cpp]
// aten::detach_(Tensor(a!) self) -> Tensor(a!)
Tensor & Tensor::detach_() const {
static auto op = c10::Dispatcher::singleton()
.findSchemaOrThrow("aten::detach_", "")
.typed<Tensor & (Tensor &)>();
return op.call(const_cast<Tensor&>(*this));
}
通过分发机制,aten::detach_()最终调用`detach_()``
`
[torch/csrc/autograd/VaribableTypeManual.cpp]
Tensor & detach_(Tensor & self) {
...
// I think the choice here is conservative. In principle, doing
// an in-place detach should give us the ability to just clear
// the autograd meta. But this function ONLY resets requires_grad,
// grad_fn and output_nr; there's other metadata like debug name
// and hooks which aren't cleared. Is this function supposed to
// clear those too? I'm not too sure, so I'm leaving it be for now.
auto autograd_meta = impl::materialize_autograd_meta(self);
autograd_meta->set_requires_grad(false, self.unsafeGetTensorImpl());
autograd_meta->grad_fn_.reset();
autograd_meta->output_nr_ = 0;
// detach only backward gradients for both primal and tangent
if (self.fw_grad(/* level */ 0).defined()) {
self.fw_grad(/* level */ 0).detach_();
}
return self;
}
materialize_autograd_meta()主要是初始化autograd_meta_,也是借助std::make_unique<AutogradMeta>()创建实例。【关于std::make_unique
[torch/csrc/autograd/varibable.cpp]
AutogradMeta* materialize_autograd_meta(const Variable& self) {
TORCH_CHECK(self.defined(), "cannot call materialize_autograd_meta() on undefined tensor");
auto p = self.unsafeGetTensorImpl();
if (!p->autograd_meta()) {
p->set_autograd_meta(std::make_unique<AutogradMeta>());
}
return get_autograd_meta(self);
}
2.4 THPVariable_Wrap()
和1.3差不多,故在此略过。
本篇文章就到这了。。。。。。
本章节只介绍了用户自己创建的tensor是如何一步步被创建的,那么中间结点(不是用户创建的非叶子结点)是如何被创建,以及计算图是如何生成并为自动求导服务的,将在下一章的自动求导部分一一展开。
