0. Overview
Before adding Op, please refer to Op Manual to avoid unnecessary duplication.
In MNN, adding Op consists of the following steps:
- Add model description
- Add model conversion
- Add shape calculation
- Add implementation
1. Add model description
After modifying the model description, you need to run the generate script to regenerate the model description header file.
Add Op Type
Append the operator name to the OpType list in schema/default/MNN.fbs
, such as:
enum OpType : int {
AbsVal,
QuantizedAdd,
...
MyCustomOp
}
Add Op Parameter
If the operator does not contain parameters, you can skip this step.
First, append the operator parameter name to the OpParameter list in schema/default/MNN.fbs
, such as:
union OpParameter {
QuantizedAdd,
ArgMax,
AsString,
...
MyCustomOpParam
}
Then add a parameter description. If the operator is from Caffe, choose CaffeOps.fbs
; if the operator is from TensorFlow, use TensorflowOp.fbs
.
table MyCustomOpParam {
padX:int;
padY:int;
kernelX:int;
kernelY:int;
strideX:int;
strideY:int;
dataType:DataType=DT_FLOAT;
}
2. Add model conversion
After adding the model conversion, you need to re-run cmake.
Currently, MNN supports the conversion from TensorFlow, TensorFlow Lite, Caffe and ONNX model.
Tensorflow Model Convert
- Add conversion class
AddMyCustomOpTf.cpp
undertools/converter/source/tensorflow
. You can declare the conversion class directly, or you can use a macro definition to simplify the code.
Direct declaration example:
class MyCustomOpTf : public tfOpConverter {
public:
virtual void run(MNN::OpT *dstOp, TmpNode *srcNode, TmpGraph *tempGraph);
MyCustomOpTf() {}
virtual ~MyCustomOpTf() {}
virtual MNN::OpType opType();
virtual MNN::OpParameter type();
}
Equivalent macro definition example:
DECLARE_OP_CONVERTER(MyCustomOpTf);
Need to implement run
, destructor, opType
and type
functions. Among them, the run
function is used to parse the model’s proto file to get the parameters, and then assign them to the flatbuffer custom parameters. The parameter srcNode
holds the input and output node information, and the TmpNode
can be found in the tempGraph
according to the input and output nodes. Call the function find_attr_value(const tensorflow::NodeDef&, const char*, tensorflow::AttrValue&)
to get the value of the corresponding parameter.
Register conversion class:
REGISTER_CONVERTER(MyCustomOpTf, MyCustomOp);
Add mapping
Add the corresponding TensorFlow Op name to the MNN Op name mapping inOpMapper.hpp
:{"OpName1", MNN::OpType_MyCustomOp},
{"OpName2", MNN::OpType_MyCustomOp},
Handling Op with Const
If Const is not treated as a parameter of Op, but as a separate Op, you can ignore this step; if Const is treaded as a parameter of Op, modify the function_genMinGraph()
inTmpGraph.cpp
, setisCovered
property of corresponding Const node to true.
Tensorflow Lite Model Convert
- Add conversion class
AddMyCustomOpTflite.cpp
undertools/converter/source/tflite
.
Macro definition example:
DECLARE_OP_COVERTER(MyCustomOpTflite);
Need to implement functions:
MyCustomOpTflite::opType(bool quantizedModel);
MyCustomOpTflite::type(bool quantizedModel);
MyCustomOpTflite::run(MNN::OpT *dstOp,
const std::unique_ptr<tflite::OperatorT> &tfliteOp,
const std::vector<std::unique_ptr<tflite::TensorT> > &tfliteTensors,
const std::vector<std::unique_ptr<tflite::BufferT> > &tfliteModelBuffer,
const std::vector<std::unique_ptr<tflite::OperatorCodeT> > &tfliteOpSet,
bool quantizedModel)
Among them, the run
function has one more quantizedModel
parameter than the version of TensorFlow. If the quantizedModel
is true, the model is a quantitative model and needs to be converted to the corresponding quantified Op; if it is false, it is converted to floating point Op. In the run
function, you need to set the index of the input and output tensor:
// set input output index
dstOp->inputIndexes.resize(1);
dstOp->outputIndexes.resize(1);
dstOp->inputIndexes[0] = tfliteOp->inputs[0];
dstOp->outputIndexes[0] = tfliteOp->outputs[0];
Register conversion class:
using namespace tflite;
REGISTER_CONVERTER(MyCustomOpTflite, BuiltinOperator_OPName);
Caffe Model Convert
- Add conversion class
Add MyCustomOp.cpp under/tools/converter/source/caffe
.
Class declaration example:
class MyCustomOp : public OpConverter {
public:
virtual void run(MNN::OpT* dstOp,
const caffe::LayerParameter& parameters,
const caffe::LayerParameter& weight);
MyCustomOp() {}
virtual ~MyCustomOp() {}
virtual MNN::OpType opType();
virtual MNN::OpParameter type();
};
Implement the run
, opType
, and type
functions, and parse the caffe parameter in the run
function to get the specific parameters. The parameters
parameter stores the parameter information of Op, and the weight
stores data parameters such as convolution and BN.
Register conversion class:
static OpConverterRegister<MyCustomOp> a("MyCustomOp");
ONNX Model Convert
- Add conversion class
Add MyCustomOpOnnx.cpp under/tools/converter/source/onnx
.
Macro definition example:
DECLARE_OP_CONVERTER(MyCustomOpOnnx);
Need to implement functions:
MNN::OpType MyCustomOpOnnx::opType();
MNN::OpParameter MyCustomOpOnnx::type();
void MyCustomOpOnnx::run(MNN::OpT* dstOp,
const onnx::NodeProto* onnxNode,
std::vector<const onnx::TensorProto*> initializers);
In the run
function, onnxNode contains the ONNX original node information. weight and other data information needs to be taken from the initializers.
Register conversion class:
REGISTER_CONVERTER(MyCustomOpOnnx, MyCustomOp);
3. Add shape calculation
After adding the shape calculation code, you need to re-run cmake.
- Add calculation class
Add ShapeMyCustomOp.cpp to/source/shape
.
In theclass MyCustomOpSizeComputer : public SizeComputer {
public:
virtual bool onComputeSize(const MNN::Op* op, const std::vector<Tensor*>& inputs,
const std::vector<Tensor*>& outputs) const override {
// set tensor->buffer.type
// .dimensions
// .dim[x].extent
// .dim[x].stride
// .dim[x].flag
return true;
}
virtual float onComputeFlops(const MNN::Op* op,
const std::vector<Tensor*>& inputs,
const std::vector<Tensor*>& outputs) const {
return flops_for_calc_output_from_input;
}
};
onComputeSize
function, the dimension information of the output tensor is calculated according to the dimension information of the input tensor, and the data type of the output tensor is set. Returns true if the calculation is succeed; returns false if the input dimension information is unknown.
In theonComputeFlops
function, the total calculation amount is returned according to the dimension information of the input and output tensor.
Register calculation class:
REGISTER_SHAPE(MyCustomOpSizeComputer, OpType_MyCustomOp);
4. Add Implementation
Add CPU Implementation
Add CPUMyCustomOp.hpp
and CPUMyCustomOp.cpp
to source/backend/CPU
.
Implementation class declaration
class CPUMyCustomOp : public Execution {
public:
virtual ErrorCode onResize(const std::vector<Tensor *> &inputs,
const std::vector<Tensor *> &outputs) override;
virtual ErrorCode onExecute(const std::vector<Tensor *> &inputs,
const std::vector<Tensor *> &outputs) override;
};
Implement
onResize
andonExecute
In onResize
, call backend()->onAcquireBuffer(&mCache, Backend::DYNAMIC)
to allocate the cache, and call backend()->onReleaseBuffer(&mCache, Backend::DYNAMIC)
to reclaim the cache. The released memory can be reused.
In onExecute
, doing the necessary input checks will help you find the problem ahead of time. If the execution is completed, it returns NO_ERROR correctly.
- Register implementation class
class CPUMyCustomOpCreator : public CPUBackend::Creator {
public:
virtual Execution *onCreate(const std::vector<Tensor *> &inputs,
const std::vector<Tensor *> &outputs,
const MNN::Op *op,
Backend *backend) const override {
return new CPUMyCustomOp(backend);
}
};
REGISTER_CPU_OP_CREATOR(CPUMyCustomOpCreator, OpType_MyCustomOp);
Add Metal Implementation
Add Shader
AddMetalMyCustomOp.metal
in thesource/backend/Metal
directory and add it to the Xcode project. Metal shader can refer to the existing implementations.Implementation class declaration
AddMetalMyCustomOp.hpp
andMetalMyCustomOp.cpp
in thesource/backend/Metal
directory and add them to the Xcode project:class MetalMyCustomOp : public Execution {
public:
virtual ErrorCode onResize(const std::vector<Tensor *> &inputs,
const std::vector<Tensor *> &outputs) override;
virtual ErrorCode onExecute(const std::vector<Tensor *> &inputs,
const std::vector<Tensor *> &outputs) override;
};
Implement
onResize
andonExecute
Unlike CPU Tensor, which stores data in the host pointer, the Metal data pointer is stored in thedeviceId
, and the deviceId stores theid<MTLBuffer>
:auto buffer = (__bridge id<MTLBuffer>)(void *)tensor->deviceId();
Specific parameters of Metal Op can be stored by id<MTLBuffer>
. Different from tensor, the buffer can mix multiple data types, just ensure that the correct length is specified when creating. E.g:
auto buffer = [context newDeviceBuffer:2 * sizeof(int) + 2 * sizeof(__fp16) access:CPUWriteOnly];
((__fp16 *)buffer.contents)[0] = mAlpha / mLocalSize; // alpha
((__fp16 *)buffer.contents)[1] = mBeta; // beta
((int *)buffer.contents)[1] = mLocalSize; // local size
((int *)buffer.contents)[2] = inputs[0]->channel(); // channel
When creating a buffer, you need to specify access control permissions. There are currently three permissions:
CPUReadWrite
, data is shared between CPU/GPU, generally used for device buffer;CPUWriteOnly
, the data is not read after being written by the CPU, and is generally used for the parameter buffer;CPUTransparent
, the data is only in the GPU, generally used in the heap buffer.
MNNMetalContext has two sets of similar interfaces to create buffer, the difference is only in the life cycle of the data:
- the memory occupied by the device will not be reused in the single inference process;
- the memory occupied by the heap is reused by other Ops after calling
-[MNNMetalContext releaseHeapBuffer:]
.
In general, the heap will only be used with CPUTransparent. heap only aviliable on iOS 10+,fall back to device on iOS 9
When using Metal, It is forbidden to create device and library yourself if it is not a special case. Loading the library and compiling the function are time-consuming behaviors, and MNNMetalContext does the necessary cache optimization. An example of executing Metal via context is as follows:
auto context = (__bridge MNNMetalContext *)backend->context();
auto kernel = /* metal kernel name NSString */;
auto encoder = [context encoder];
auto bandwidth = [context load:kernel encoder:encoder];
/* encoder set buffer(s)/sampler(s) */
[context dispatchEncoder:encoder
threads:{x, y, z}
maxThreadsPerGroup:maxThreadsPerThreadgroup]; // recommended way to dispatch
[encoder endEncoding];
- Register implementation class
class MetalMyCustomOpCreator : public MetalBackend::Creator {
public:
virtual Execution *onCreate(const std::vector<Tensor *> &inputs,
const MNN::Op *op, Backend *backend) const {
return new MetalMyCustomOp(backend);
}
};
REGISTER_METAL_OP_CREATOR(MetalMyCustomOpCreator, OpType_MyCustomOp);
Add Vulkan Implementation
Add Shader
Add a shader (*.comp) in thesource/backend/vulkan/execution/glsl
directory. Uses image as data container if the input memory layout isNC4HW4
, uses buffer otherwise. You can refer to the existing implementations. Then, execute themakeshader.py
script to compile shaders.Implementation class declaration
AddVulkanMyCustomOp.hpp
andVulkanMyCustomOp.cpp
tosource/backend/vulkan/execution/
:class VulkanMyCustomOp : public VulkanBasicExecution {
public:
VulkanMyCustomOp(const Op* op, Backend* bn);
virtual ~VulkanMyCustomOp();
ErrorCode onEncode(const std::vector<Tensor*>& inputs,
const std::vector<Tensor*>& outputs,
const VulkanCommandPool::Buffer* cmdBuffer) override;
private:
// GPU Shader Parameters
std::shared_ptr<VulkanBuffer> mConstBuffer;
// Pipeline
const VulkanPipeline* mPipeline;
// Layout Descriptor Set
std::shared_ptr<VulkanPipeline::DescriptorSet> mDescriptorSet;
};
Implement
onEncode
To implement the functiononEncode
, you first need to do a memory layout check: if it isNC4HW4
, uses image as data container, otherwise uses buffer. Return NO_ERROR after execution.Register implementation class
class VulkanMyCustomOpCreator : public VulkanBackend::Creator {
public:
virtual Execution* onCreate(const std::vector<Tensor*>& inputs,
const MNN::Op* op,
Backend* backend) const override {
return new VulkanMyCustomOp(op, backend);
}
};
static bool gResistor = []() {
VulkanBackend::addCreator(OpType_MyCustomOp, new VulkanMyCustomOpCreator);
return true;
}();
Add OpenCL Implementation
- Add Kernel
Add a specific kernel (*.cl) tosource/backend/opencl/execution/cl
. Currently feature maps are implemented using image2d. You can refer to the existing implementations. Then executeopencl_codegen.py
to generate the kernel map.
2.Implementation class declaration
Add MyCustomOp.h
and MyCustomOp.cpp
to source/backend/opencl/execution/
:
template <typename T>
class MyCustomOp : public Execution {
public:
virtual ErrorCode onResize(const std::vector<Tensor *> &inputs,
const std::vector<Tensor *> &outputs) override;
virtual ErrorCode onExecute(const std::vector<Tensor *> &inputs,
const std::vector<Tensor *> &outputs) override;
};
Implement
onResize
andonExecute
Implement the functiononResize
(optional),onExecute
. Return NO_ERROR after execution.Register implementation class
OpenCLCreatorRegister<TypedCreator<MyCustomOp<cl_data_t>>> __my_custom_op(OpType_MyCustomOp);
Add OpenGL Implementation
Add Shader
Add a shader (*.glsl) undersource/backend/opengl/glsl
, no header file is needed. The feature map is represented by image3d. You can refer to the existing implementations. Then, executemakeshader.py
undersource/backend/opengl
.Add Executor
AddGLMyCustomOp.h
andGLMyCustomOp.cpp
tosource/backend/opengl/execution/
: ```cpp class GLMyCustomOp : public Execution { public: GLMyCustomOp(const std::vector&inputs, const Op op, Backend bn); virtual ~GLMyCustomOp(); virtual ErrorCode onExecute(const std::vector &inputs, const std::vector<Tensor *> &outputs) override;
virtual ErrorCode onResize(const std::vector
&inputs, const std::vector<Tensor *> &outputs) override;
private:
std::shared_ptr
3. Implement `onResize` and `onExecute`<br />Implement the function `onResize` (optional), `onExecute`. Return NO_ERROR after execution.
4. Register implementation class
```cpp
GLCreatorRegister<TypedCreator<GLMyCustomOp>> __my_custom_op(OpType_MyCustomOp);