- Compiling FAQ
- Runtime FAQ
- What is NC4HW4 Format ?
- How to Convert Between Formats ?
- Can’t see error log on Android Device
- Why does output tensor data copying so slow on GPU backend?
- How to add your own opencl path ?
- MNN Opengl backend must need opengl es3.1 version at least
- TensorArray Op not support
- How to get the intermediate result in net
- Can’t use opencl / vulkan in PC
- Why is iOS metal or Android OpenCL performance worse than CPU performance?
- What happens when the GPU is occupied by other tasks during MNN inference?
- Quantized model performance worse than float model performance?
Compiling FAQ
Environment Requirement
cmake 3.10+
gcc 4.9+
protobuf 3.0+
Remember to run cmake again after upgrading gcc.
schema/generate.sh Relative Errors
*** building flatc ***
CMake Error: Could not find CMAKE_ROOT !!!
If the script fails with error above, your CMake was not installed correctly.
Trysudo apt install extra-cmake-modules
orexport CMAKE_ROOT=/path/to/where_cmake_installed
to fix it.
Remember to run schema/generate.sh after editing schema (*.proto).
tools/script/get_model.sh Relative Errors
Could NOT find Protobuf (missing: Protobuf_INCLUDE_DIR)
Unrecognized syntax identifier "proto3". This parser only recognizes "proto2".
If the script fails with errors above, your protobuf was not installed correctly. Follow Protobuf’s Installation Instructions to install it.
If there are multiple protobufs are installed and conflicts with each other, you could try solutions below:
which protoc
# comment the output path in .bashrc if it do NOT direct to the correct protoc.
source .bashrc
sudo ldconfig
or
# uninstall
sudo apt-get remove libprotobuf-dev
sudo apt-get remove protobuf-compiler
sudo apt-get remove python-protobuf
sudo rm -rf /usr/local/bin/protoc
sudo rm -rf /usr/bin/protoc
sudo rm -rf /usr/local/include/google
sudo rm -rf /usr/local/include/protobuf*
sudo rm -rf /usr/include/google
sudo rm -rf /usr/include/protobuf*
# install
sudo apt-get update
sudo ldconfig
sudo apt-get install libprotobuf* protobuf-compiler python-protobuf
Quantized Models
We support TensorFlow Quantized Models for now. And we plan to provide a model quantizing tool based on MNN model format, which is training free.
Usage of static library
If you build mnn as static library, when link to your own library , you should add flags like this:
-Wl,—whole-archive MNN -Wl,—no-whole-archive
For other platform:
GCC:
-Wl,—whole-archive MNN -Wl,—no-whole-archive
OSX(Xcode):
-Wl,-force_load MNN
Window(Visio-Studio):
/WHOLEARCHIVE:MNN
Don’t support type [Convolution]
This occured because of not correct use static library, see “Usage of static library” below.
Unsupported Operations
opConverter ==> MNN Converter NOT_SUPPORTED_OP: [ ANY_OP_NAME ]
If the MNNConverter fails with error above, one or more operations are not supported by MNN. You could submit an issue or leave a comment at pinned issue. If you want to implement it yourself, You can follow our guide. Pull requests are always welcome.
TensorFlow SSD model is not supported — usage of TensorFlow Object API produces some unsupported control logic operations in post-processing part. And the TensorFlow SSD model is not as efficient as Caffe SSD model. So, it is recommended to use the Caffe version SSD model.
Runtime FAQ
What is NC4HW4 Format ?
The difference between NCHW and NC4HW4 is just like the difference between color representing method planar and chunky. Imagine a 2x2 RGBA image, in planar representing (NCHW), its storage would be RRRRGGGGBBBBAAAA
; and in chunky representing (NC4HW4), its storage would be RGBARGBARGBARGBA
. In MNN, we pack each 4 channels for floats or 8 channels for int8s to gain better performance with SIMD.
You can obtain tensor’s format through TensorUtils::getDescribe(tensor)->dimensionFormat
. If it returns MNN_DATA_FORMAT_NC4HW4
, the channel dim is packed, which may cause tensor’s elementSize be greater than product of each dimension.
How to Convert Between Formats ?
You can convert tensor format using codes below:
auto srcTensor = Tensor::create({1, 224, 224, 3}, Tensor::TENSORFLOW);
// ... set srcTensor data
auto dstTensor = net->getSessionInput(session, NULL);
dstTensor->copyFromHostTensor(srcTensor);
Only the Tensor get from session can call copyFromHostTensor / copyToHostTensor .
Can’t see error log on Android Device
Android System has two method to print log: printf or logcat. Default MNN’s build scripts use printf for convenient to debug in command line. If want’t to see log in App, cmake .. -DMNN_USE_LOGCAT=ON
Why does output tensor data copying so slow on GPU backend?
If you do not wait for GPU inference to be finished (through runSessionWithCallback with sync), copyToHostTensor has to wait for it before copying data.
How to add your own opencl path ?
mnn opencl backend only support common path now, if your device has specific path,you can add this
path in OpenCLWrapper.cpp
MNN Opengl backend must need opengl es3.1 version at least
TensorArray Op not support
Log: These Op Not Support: Tensorflow::TensorArrayGatherV3 | Tensorflow::TensorArrayReadV3 | Tensorflow::TensorArrayScatterV3 | Tensorflow::TensorArraySizeV3 | Tensorflow::TensorArrayV3 | Tensorflow::TensorArrayWriteV3 |
For origin tensorflow’s ssd pb, it use TensorArray’s op to do Detectionouttput.
You can:
- Remove detection output and write it by your code.
- Use tflite’s version.
How to get the intermediate result in net
Normally, MNN only apple the output tensor, if you want’s to get the intermediate result, you can:
- Set intermediate tensor into config.saveTensors when create session.
- Use runSessionWithCallBack, see tools/cpp/MNNV2Basic.cpp for example
Can’t use opencl / vulkan in PC
Fastest way to solve:
cmake .. -DMNN_USE_SYSTEM_LIB=true -DMNN_SEP_BUILD=false
Can’t find system lib
You can set MNN_USE_SYSTEM_LIB = true , then MNN will use system driver instead of to find it.
Linux System’s extra problem
OpenCL / Vulkan use auto register variable. For linux will not direct link it. You can solve this problem by :
- Set MNN_SEP_BUILD = false. Make opencl / vulkan info MNN main so.
- dlopen(“libMNN_CL.so”) for your main program. See https://github.com/alibaba/MNN/issues/105 .
If you has install opencl / vulkan driver, set MNN_USE_SYSTEM_LIB = true to use your dirver instead of let mnn search the library.
Why is iOS metal or Android OpenCL performance worse than CPU performance?
There are many possible reasons:
On mid-low tier phones (e.g. pre-iPhone 8), GPU performance is indeed worse than CPU.
Also, Apple switched from Imagination to its own inhouse GPU in iPhone 8, which made the metal performance worse than iphone 7.
The could be some ops unimplemented in Metal or OpenCL. These ops will fallback to CPU implementation, incurring some data transfer cost between CPU and GPU memory.
- The model is too small to fully utilize the parallelism offered by GPU.
- The GPU is occupied by other tasks while doing the inference.
What happens when the GPU is occupied by other tasks during MNN inference?
- When the GPU memory is completely full, the inference will fail. However, there is an API to query the utlization ratio of GPU resources.
- On Qualcomm GPUs, we can set priorities for the GPU kernels. MNN sets its priority to low so that the inference does not overtake UI rendering, which visibly degrades the user experience.
Quantized model performance worse than float model performance?
There are three possible reasons:
- The model’s flops is small (less than 100 M Flo), convolution don’t own most of time. You can use timeProfile.out to check it.
- This deploy machine’s quantized’s performance is worse than float’s.
- SSE / AVX2 Machine, the compute is int8 -> int16 -> int32, which is a little slow than float.
- ARM v8.2 Machine, the machine has twice float performance than quan without dot. Please comipile MNN with -DMNN_ARM82=true, then MNN will use sdot for better performance.
- For 2x2-7x7 float convolution, mnn can use winograd for better performance, currently quantized don’t support winograd.