本文参考

PyTorch的初始化
PyTorch（二）

本文涉及代码来自PyTorch1.7.0, https://github.com/pytorch/pytorch/tree/v1.7.0/torch

背景

在使用PyTorch深度学习框架的时候，不管是训练还是测试，代码中引入PyTorch的第一句总是：

import torch

按照python规范，会找到torch package目录下的init.py，在这个文件中进一步会调用：

from torch._C import *

对于PyTorch来说这个modulename 是_C，因此我们可以揣测，在torch/csrc/stub.cpp中一定实现了PyInit_C这个函数。是的，PyTorch就是这么做的，torch/csrc/stub.cpp中的代码就是下面这样：

#include <Python.h>
extern PyObject* initModule(void);
PyMODINIT_FUNC PyInit__C(void)
{
  return initModule();
}

本文将从initModule函数展开，全面阐述PyTorch框架的初始化工作。initModule就是PyTorch初始化时候的第一层调用栈了，因为所有的初始化工作都是在这个函数内完成的，内容比较多，将其划分为7部分：

1、torch._C的诞生

这一步就是产生torch._C类，并在这个python类上面注册众多函数：

[torch/csrc/Module.cpp]
PyObject* initModule() {
  ...
  THPUtils_addPyMethodDefs(methods, TorchMethods);
  THPUtils_addPyMethodDefs(methods, DataLoaderMethods);
  THPUtils_addPyMethodDefs(methods, torch::autograd::python_functions());
  THPUtils_addPyMethodDefs(methods, torch::multiprocessing::python_functions());
#ifdef USE_CUDA
  THPUtils_addPyMethodDefs(methods, THCPModule_methods());
#endif
#if defined(USE_DISTRIBUTED) && defined(USE_C10D)
  THPUtils_addPyMethodDefs(methods, torch::distributed::c10d::python_functions());
#ifndef _WIN32
  THPUtils_addPyMethodDefs(methods, torch::distributed::rpc::python_functions());
  THPUtils_addPyMethodDefs(
      methods, torch::distributed::autograd::python_functions());
  THPUtils_addPyMethodDefs(methods, torch::distributed::rpc::testing::python_functions());
#endif
#endif
  static struct PyModuleDef torchmodule = {
     PyModuleDef_HEAD_INIT,
     "torch._C",
     nullptr,
     -1,
     methods.data()
  };
  ASSERT_TRUE(module = PyModule_Create(&torchmodule));
  ...
}

TorchMethods注册了48个方法，见【torch/csrc/Module.cpp#L574】
DataLoaderMethods注册了4个方法，见【torch/csrc/DataLoader.cpp#L217】
torch::autograd::python_functions()注册了9个方法，见【torch/csrc/autograd/init.cpp#L192】
torch::multiprocessing::python_functions()注册了1个方法，见【torch/csrc/multiprocessing/init.cpp#L53】
THCPModule_methods()，见【torch/csrc/cuda/Module.cpp#L527】
torch::distributed::c10d::python_functions()

总而言之，在这一小步，我们达到了这样一个里程碑，torch._C符号诞生，并且向torch._C注册了百余个函数，涉及torch、dataloader、autograd、multiprocess、cuda、distribute、c10d方面。

2、一些关键类型

[torch/csrc/Module.cpp,https://github.com/pytorch/pytorch/blob/v1.7.0/torch/csrc/Module.cpp#L681]
PyObject* initModule() {
  ...
  ASSERT_TRUE(THPGenerator_init(module));
  ASSERT_TRUE(THPException_init(module));
  THPSize_init(module);
  THPDtype_init(module);
  THPDTypeInfo_init(module);
  THPLayout_init(module);
  THPMemoryFormat_init(module);
  THPQScheme_init(module);
  THPDevice_init(module);
  THPStream_init(module);
  ASSERT_TRUE(THPVariable_initModule(module));
  ASSERT_TRUE(THPFunction_initModule(module));
  ASSERT_TRUE(THPEngine_initModule(module));
  ...
}

3、torch._C._TensorBase的诞生

将以下三个初始化函数归为这一小节：

[torch/csrc/Module.cpp,https://github.com/pytorch/pytorch/blob/v1.7.0/torch/csrc/Module.cpp#L681]
PyObject* initModule() {
  ...
  ASSERT_TRUE(THPVariable_initModule(module));
  ASSERT_TRUE(THPFunction_initModule(module));
  ASSERT_TRUE(THPEngine_initModule(module));
  ...
}

为什么呢？因为地位太显赫了。

THPVariable_initModule(module) 创建了torch._C._TensorBase，这是一切Tensor的基类。

THPFunction_initModule(module)创建了torch._C._FunctionBase，在torch/autograd/function.py中，以下两个类以torch._C._FunctionBase为基类：

[torch/autograd/function.py]
class BackwardCFunction(_C._FunctionBase, _ContextMethodMixin, _HookMixin):
  ...
class Function(with_metaclass(FunctionMeta, _C._FunctionBase, _ContextMethodMixin, _HookMixin)):
  ...

这个Function继承体系就构成了DAG的基础。

THPEngine_initModule(module)创建了torch._C._EngineBase，_EngineBase这个类负责动态图执行之前的preprocess，_EngineBase会将torch.autograd的backward之类的请求预处理后送给真正的Engine去执行。

4、pybind11绑定

这一小节的初始化内容都是和pybind11相关的：

[torch/csrc/Module.cpp,https://github.com/pytorch/pytorch/blob/v1.7.0/torch/csrc/Module.cpp#L681]
PyObject* initModule() {
  ...
  // NOTE: We need to be able to access OperatorExportTypes from ONNX for use in
  // the export side of JIT, so this ONNX init needs to appear before the JIT
  // init.
  torch::onnx::initONNXBindings(module);
  torch::jit::initJITBindings(module);
  torch::impl::dispatch::initDispatchBindings(module);
  torch::throughput_benchmark::initThroughputBenchmarkBindings(module);
  torch::autograd::initNNFunctions(module);
  torch::autograd::initFFTFunctions(module);
  torch::autograd::initLinalgFunctions(module);
  torch::autograd::init_legacy_variable(module);
  torch::python::init_bindings(module);
#ifdef USE_CUDA
  torch::cuda::initModule(module);
#endif
  ...
}

initONNXBindings是ONNX的python binding：torch._C._onnx.TensorProtoDataType和torch._C._onnx.OperatorExportTypes

>>> dir(torch._C._onnx.TensorProtoDataType)
['BOOL', 'COMPLEX128', 'COMPLEX64', 'DOUBLE', 'FLOAT', 'FLOAT16', 'INT16', 'INT32', 'INT64', 'INT8', 'STRING', 'UINT16', 'UINT32', 'UINT64', 'UINT8', 'UNDEFINED', '__class__', '__delattr__', '__dir__', '__doc__', '__entries', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__int__', '__le__', '__lt__', '__members__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', 'name']
>>> dir(torch._C._onnx.OperatorExportTypes)
['ONNX', 'ONNX_ATEN', 'ONNX_ATEN_FALLBACK', 'ONNX_FALLTHROUGH', 'RAW', '__class__', '__delattr__', '__dir__', '__doc__', '__entries', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__int__', '__le__', '__lt__', '__members__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', 'name']

initJITBindings则是通过pybind11往torch._C上注册了一堆和JIT相关的C++函数/对象；

initNNFunctions初始化了一个torch._C._nn 对象，并注册了一些nn相关的函数

>>> dir(torch._C._nn)
['__doc__', '__loader__', '__name__', '__package__', '__spec__', '_parse_to', '_test_optional_filled_intlist', '_test_optional_floatlist', '_test_optional_intlist', 'adaptive_avg_pool2d', 'adaptive_avg_pool3d', 'adaptive_max_pool2d', 'adaptive_max_pool3d', 'avg_pool2d', 'avg_pool3d', 'binary_cross_entropy', 'col2im', 'elu', 'elu_', 'fractional_max_pool2d', 'fractional_max_pool3d', 'gelu', 'glu', 'hardsigmoid', 'hardsigmoid_', 'hardswish', 'hardswish_', 'hardtanh', 'hardtanh_', 'im2col', 'l1_loss', 'leaky_relu', 'leaky_relu_', 'linear', 'log_sigmoid', 'max_pool2d_with_indices', 'max_pool3d_with_indices', 'max_unpool2d', 'max_unpool3d', 'mkldnn_linear', 'mkldnn_reorder_conv2d_weight', 'mkldnn_reorder_conv3d_weight', 'mse_loss', 'multi_margin_loss', 'multilabel_margin_loss', 'nll_loss', 'nll_loss2d', 'one_hot', 'reflection_pad1d', 'reflection_pad2d', 'replication_pad1d', 'replication_pad2d', 'replication_pad3d', 'rrelu_with_noise', 'rrelu_with_noise_', 'silu', 'silu_', 'slow_conv3d', 'slow_conv_dilated2d', 'slow_conv_dilated3d', 'slow_conv_transpose2d', 'slow_conv_transpose3d', 'smooth_l1_loss', 'soft_margin_loss', 'softplus', 'softshrink', 'thnn_conv2d', 'thnn_conv_depthwise2d', 'upsample_bicubic2d', 'upsample_bilinear2d', 'upsample_linear1d', 'upsample_nearest1d', 'upsample_nearest2d', 'upsample_nearest3d', 'upsample_trilinear3d']

init_legacy_variable注册了torch._C._LegacyVariableBase

>>> dir(torch._C._LegacyVariableBase)
['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']

_LegacyVariableBase类会派生出Variable类（该类的_execution_engine会初始化为torch._C._EngineBase）

[torch/autograd/varibale.py]
# mypy doesn't understand torch._six.with_metaclass
class Variable(with_metaclass(VariableMeta, torch._C._LegacyVariableBase)):  # type: ignore
  pass

init_bindings是通过pybind11往torch._C上注册一些函数，torch::cuda::initModule类似，也是通过pybind11往torch._C上注册一些函数，只不过内容是和cuda相关的。

5、在torch._C上注册StorageBase类

[torch/csrc/Module.cpp,https://github.com/pytorch/pytorch/blob/v1.7.0/torch/csrc/Module.cpp#L681]
PyObject* initModule() {
  ...
  ASSERT_TRUE(THPDoubleStorage_init(module));
  ASSERT_TRUE(THPFloatStorage_init(module));
  ASSERT_TRUE(THPHalfStorage_init(module));
  ASSERT_TRUE(THPLongStorage_init(module));
  ASSERT_TRUE(THPIntStorage_init(module));
  ASSERT_TRUE(THPShortStorage_init(module));
  ASSERT_TRUE(THPCharStorage_init(module));
  ASSERT_TRUE(THPByteStorage_init(module));
  ASSERT_TRUE(THPBoolStorage_init(module));
  ASSERT_TRUE(THPQUInt8Storage_init(module));
  ASSERT_TRUE(THPQInt8Storage_init(module));
  ASSERT_TRUE(THPQInt32Storage_init(module));
  ASSERT_TRUE(THPQUInt4x2Storage_init(module));
  ASSERT_TRUE(THPBFloat16Storage_init(module));
  ASSERT_TRUE(THPComplexDoubleStorage_init(module));
  ASSERT_TRUE(THPComplexFloatStorage_init(module));

#ifdef USE_CUDA
  // This will only initialise base classes and attach them to library namespace
  // They won't be ready for real usage until importing cuda module, that will
  // complete the process (but it defines Python classes before calling back into
  // C, so these lines have to execute first)..
  ASSERT_TRUE(THCPDoubleStorage_init(module));
  ASSERT_TRUE(THCPFloatStorage_init(module));
  ASSERT_TRUE(THCPHalfStorage_init(module));
  ASSERT_TRUE(THCPLongStorage_init(module));
  ASSERT_TRUE(THCPIntStorage_init(module));
  ASSERT_TRUE(THCPShortStorage_init(module));
  ASSERT_TRUE(THCPCharStorage_init(module));
  ASSERT_TRUE(THCPByteStorage_init(module));
  ASSERT_TRUE(THCPBoolStorage_init(module));
  ASSERT_TRUE(THCPBFloat16Storage_init(module));
  ASSERT_TRUE(THCPComplexDoubleStorage_init(module));
  ASSERT_TRUE(THCPComplexFloatStorage_init(module));
  ...
}

这些初始化工作主要就是往torch._C上注册Storage类：

DoubleStorageBase
FloatStorageBase
CudaFloatStorageBase

比如以FloatStorageBase为例的话，我们可以这样查看它注册的方法：

>>> dir(torch._C.FloatStorageBase)
['__class__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', '_cdata', '_expired', '_free_weak_ref', '_get_shared_fd', '_new_shared_fd', '_new_shared_filename', '_new_using_fd', '_new_using_filename', '_new_with_file', '_new_with_weak_ptr', '_set_cdata', '_set_from_file', '_share_fd_', '_share_filename_', '_shared_decref', '_shared_incref', '_weak_ref', '_write_file', 'copy_', 'data_ptr', 'device', 'dtype', 'element_size', 'fill_', 'from_buffer', 'from_file', 'is_pinned', 'is_shared', 'new', 'resize_', 'size']

这些类会在python体系中被继承：

[torch/__init__.py]
class FloatStorage(_C.FloatStorageBase, _StorageBase):
    ...

_C.FloatStorageBase等方法是用宏生成的，具体生成方法见本专栏文章如何用C宏命令支持泛型

6、ATen的初始化

7、torch._C._THNN和torch._C._THCUNN的初始化

PyTorch源码剖析

PyTorch初始化--initModule

背景