概述

使用MNN推理时，有两个层级的抽象，分别是解释器Interpreter和会话Session。Interpreter是模型数据的持有者；Session通过Interpreter创建，是推理数据的持有者。多个推理可以共用同一个模型，即，多个Session可以共用一个Interpreter。

在创建完Session，且不再创建Session或更新训练模型数据时，Interpreter可以通过releaseModel函数释放模型数据，以节省内存。

创建Interpreter

有两种创建Interpreter的方法：

通过磁盘文件创建

/**
* @brief create net from file.
* @param file  given file.
* @return created net if success, NULL otherwise.
*/
static Interpreter* createFromFile(const char* file);

通过内存数据创建

/**
* @brief create net from buffer.
* @param buffer    given data buffer.
* @param size      size of data buffer.
* @return created net if success, NULL otherwise.
*/
static Interpreter* createFromBuffer(const void* buffer, size_t size);

函数返回的Interpreter实例是通过**new**创建的，务必在不再需要时，通过**delete**释放，以免造成内存泄露。

创建Session

一般通过**Interpreter::createSession**创建Session：

/**
 * @brief create session with schedule config. created session will be managed in net.
 * @param config session schedule config.
 * @return created session if success, NULL otherwise.
 */
Session* createSession(const ScheduleConfig& config);

函数返回的Session实例是由Interpreter管理，随着Interpreter销毁而释放，一般不需要关注。也可以在不再需要时，调用Interpreter::releaseSession释放，减少内存占用。

创建Session 一般而言需要较长耗时，而Session 在多次推理过程中可以重复使用，建议只创建一次多次使用。

简易模式

一般情况下，不需要额外设置调度配置，函数会根据模型结构自动识别出调度路径、输入输出，例如：

ScheduleConfig conf;
Session* session = interpreter->createSession(conf);

调度配置

调度配置定义如下：

/** session schedule config */
struct ScheduleConfig {
    /** which tensor should be kept */
    std::vector<std::string> saveTensors;
    /** forward type */
    MNNForwardType type = MNN_FORWARD_CPU;
    /** CPU:number of threads in parallel , Or GPU: mode setting*/
    union {
        int numThread = 4;
        int mode;
    };
    /** subpath to run */
    struct Path {
        std::vector<std::string> inputs;
        std::vector<std::string> outputs;
        enum Mode {
            /**
             * Op Mode
             * - inputs means the source op, can NOT be empty.
             * - outputs means the sink op, can be empty.
             * The path will start from source op, then flow when encounter the sink op.
             * The sink op will not be compute in this path.
             */
            Op = 0,
            /**
             * Tensor Mode
             * - inputs means the inputs tensors, can NOT be empty.
             * - outputs means the outputs tensors, can NOT be empty.
             * It will find the pipeline that compute outputs from inputs.
             */
            Tensor = 1
        };
        /** running mode */
        Mode mode = Op;
    };
    Path path;
    /** backup backend used to create execution when desinated backend do NOT support any op */
    MNNForwardType backupType = MNN_FORWARD_CPU;
    /** extra backend config */
    BackendConfig* backendConfig = nullptr;
};

推理时，主选后端由**type**指定，默认为CPU。在主选后端不支持模型中的算子时，启用由**backupType**指定的备选后端。

推理路径包括由**path**的**inputs**到**outputs**途径的所有算子，在不指定时，会根据模型结构自动识别。为了节约内存，MNN会复用**outputs**之外的tensor内存。如果需要保留中间tensor的结果，可以使用**saveTensors**保留tensor结果，避免内存复用。

CPU推理时，并发数与线程数可以由**numThread**修改。**numThread**决定并发数的多少，但具体线程数和并发效率，不完全取决于**numThread**：

iOS，线程数由系统GCD决定；
启用**MNN_USE_THREAD_POOL**时，线程数取决于第一次配置的大于1的**numThread**；
OpenMP，线程数全局设置，实际线程数取决于最后一次配置的**numThread**；

GPU推理时，可以通过mode来设置GPU运行的一些参量选择(暂时只支持OpenCL)。GPU mode参数如下：

typedef enum {
    // choose one tuning mode Only
    MNN_GPU_TUNING_NONE    = 1 << 0,/* Forbidden tuning, performance not good */
    MNN_GPU_TUNING_HEAVY  = 1 << 1,/* heavily tuning, usually not suggested */
    MNN_GPU_TUNING_WIDE   = 1 << 2,/* widely tuning, performance good. Default */
    MNN_GPU_TUNING_NORMAL = 1 << 3,/* normal tuning, performance may be ok */
    MNN_GPU_TUNING_FAST   = 1 << 4,/* fast tuning, performance may not good */
    // choose one opencl memory mode Only
    /* User can try OpenCL_MEMORY_BUFFER and OpenCL_MEMORY_IMAGE both, then choose the better one according to performance*/
    MNN_GPU_MEMORY_BUFFER = 1 << 6,/* User assign mode */
    MNN_GPU_MEMORY_IMAGE  = 1 << 7,/* User assign mode */
} MNNGpuMode;

目前支持tuning力度以及GPU memory用户可自由设置。例如：

MNN::ScheduleConfig config;
config.mode = MNN_GPU_TUNING_NORMAL | MNN_GPU_MEMORY_IMAGE;

tuning力度选取越高，第一次初始化耗时越多，推理性能越佳。如果介意初始化时间过长，可以选取MNN_GPU_TUNING_FAST或者MNN_GPU_TUNING_NONE，也可以同时通过下面的cache机制，第二次之后就不会慢。GPU_Memory用户可以指定使用MNN_GPU_MEMORY_BUFFER或者MNN_GPU_MEMORY_IMAGE，用户可以选择性能更佳的那一种。如果不设定，框架会采取默认判断帮你选取(不保证一定性能最优)。

上述CPU的numThread和GPU的mode，采用union联合体方式，共用同一片内存。用户在设置的时候numThread和mode只需要设置一种即可，不要重复设置。

对于GPU初始化较慢的问题，提供了Cache机制。后续可以直接加载cache提升初始化速度。

具体可以参考tools/cpp/MNNV2Basic.cpp里面setCacheFile设置cache方法进行使用。
当模型推理输入尺寸有有限的多种时，每次resizeSession后调用updateCacheFile更新cache文件。
当模型推理输入尺寸无限随机变化时，建议config.mode设为1，关闭MNN_GPU_TUNING。

此外，可以通过**backendConfig**设定后端的额外参数。具体见下。

后端配置

后端配置定义如下：

struct BackendConfig {
    enum MemoryMode {
        Memory_Normal = 0,
        Memory_High,
        Memory_Low
    };
    MemoryMode memory = Memory_Normal;
    enum PowerMode {
        Power_Normal = 0,
        Power_High,
        Power_Low
    };
    PowerMode power = Power_Normal;
    enum PrecisionMode {
        Precision_Normal = 0,
        Precision_High,
        Precision_Low
    };
    PrecisionMode precision = Precision_Normal;
    /** user defined context */
    void* sharedContext = nullptr;
};

**memory**、**power**、**precision**分别为内存、功耗和精度偏好。支持这些选项的后端会在执行时做出相应调整；若不支持，则忽略选项。

示例：
后端 OpenCL
precision 为 Low 时，使用 fp16 存储与计算，计算结果与CPU计算结果有少量误差，实时性最好；precision 为 Normal 时，使用 fp16存储，计算时将fp16转为fp32计算，计算结果与CPU计算结果相近，实时性也较好；precision 为 High 时，使用 fp32 存储与计算，实时性下降，但与CPU计算结果保持一致。

后端 CPU
precision 为 Low 时，根据设备情况开启 FP16 或 BF16 计算

**sharedContext**用于自定义后端，用户可以根据自身需要赋值。

创建多段路径Session

需要对推理路径做出更为复杂的配置时，可以通过调度配置组来实现：

/**
 * @brief create multi-path session with schedule configs. created session will be managed in net.
 * @param configs session schedule configs.
 * @return created session if success, NULL otherwise.
 */
Session* createMultiPathSession(const std::vector<ScheduleConfig>& configs);

每个调度配置可以独立配置路径、选项。

共享运行时资源

默认情况下，在createSession时对应create单独一个 Runtime。对于串行的一系列模型，可以先单独创建Runtime ，然后在各 Session 创建时传入，使各模型用共享同样的运行时资源（对CPU而言为线程池、内存池，对GPU而言Kernel池等）。

示例:

ScheduleConfig config;
config.numberThread = 4;
auto runtimeInfo = Interpreter::createRuntime({config});
/*创建第一个模型*/
std::shared_ptr<Interpreter> net1 = Interpreter::createFromFile("1.mnn");
auto session1 = net1->createSession(config, runtimeInfo);
/*创建第二个模型*/
std::shared_ptr<Interpreter> net2 = Interpreter::createFromFile("2.mnn");
auto session2 = net2->createSession(config, runtimeInfo);
/*创建第三个模型*/
std::shared_ptr<Interpreter> net3 = Interpreter::createFromFile("3.mnn");
auto session3 = net3->createSession(config, runtimeInfo);
// 这样 session1, session2, session3 共用同一个Runtime
/*使用*/
/* 填充输入1..... */
net1->runSession(session1);
/* 读取输出1 填充输入2..... */
net2->runSession(session2);
/* 读取输出2 填充输入3..... */
net3->runSession(session3);

MNN 中文文档 - 帮助手册 - 教程

创建会话

概述