Overview

There are two levels of abstraction in MNN for inference, Interpreter and Session. Interpreter is the holder of the model data; Session is created by Interpreter and is the holder of the inference data. Multiple inferences can share the same model data, that is, multiple Sessions can share an Interpreter.

When no more session needs to be created and no more training data needs to be updated, Interpreter can release model data through the releaseModel function to save memory.

Create Interpreter

There are two ways to create an Interpreter:

Create from disk file

/**
* @brief create net from file.
* @param file  given file.
* @return created net if success, NULL otherwise.
*/
static Interpreter* createFromFile(const char* file);

Create from memory buffer

/**
* @brief create net from buffer.
* @param buffer    given data buffer.
* @param size      size of data buffer.
* @return created net if success, NULL otherwise.
*/
static Interpreter* createFromBuffer(const void* buffer, size_t size);

The Interpreter instance returned by the function is created by new, and must be released by delete when it is no longer needed to avoid memory leaks.

Create Session

Sessions are typically created via Interpreter::createSession:

/**
 * @brief create session with schedule config. created session will be managed in net.
 * @param config session schedule config.
 * @return created session if success, NULL otherwise.
 */
Session* createSession(const ScheduleConfig& config);

The Session instance returned by the function is managed by Interpreter. It must be released by calling Interpreter::releaseSession when it is no longer needed to avoid memory leaks.

Simple Usage

In general, there is no need to set additional scheduling configuration. The function will automatically recognize the scheduling path, input and output according to the model structure, for example:

ScheduleConfig conf;
Session* session = interpreter->createSession(conf);

Scheduling configuration

The scheduling configuration is defined as follows:

/** session schedule config */
struct ScheduleConfig {
    /** which tensor should be kept */
    std::vector<std::string> saveTensors;
    /** forward type */
    MNNForwardType type = MNN_FORWARD_CPU;
    /** number of threads in parallel */
    int numThread = 4;
    /** subpath to run */
    struct Path {
        std::vector<std::string> inputs;
        std::vector<std::string> outputs;
        enum Mode {
            /**
             * Op Mode
             * - inputs means the source op, can NOT be empty.
             * - outputs means the sink op, can be empty.
             * The path will start from source op, then flow when encounter the sink op.
             * The sink op will not be compute in this path.
             */
            Op = 0,
            /**
             * Tensor Mode (NOT supported yet)
             * - inputs means the inputs tensors, can NOT be empty.
             * - outputs means the outputs tensors, can NOT be empty.
             * It will find the pipeline that compute outputs from inputs.
             */
            Tensor = 1
        };
        /** running mode */
        Mode mode = Op;
    };
    Path path;
    /** backup backend used to create execution when desinated backend do NOT support any op */
    MNNForwardType backupType = MNN_FORWARD_CPU;
    /** extra backend config */
    BackendConfig* backendConfig = nullptr;
};

In the inference, the main selection backend is specified by type, and the default is CPU. The alternate backend specified by backupType, and is enabled when the main selection backend does not support operators in the model.

The inference path includes all operators from inputs to outputs of the path. When not specified, it is automatically identified based on the model structure. To save memory, MNN reuses tensor memory excluding outputs. If you need to preserve the results of the intermediate tensor, you can use saveTensors to avoid memory reuse.

When inferring, the number of threads can be modified by numThread. But the real number of threads depends on the deployment environment:

iOS, GCD is used, ignores the configuration;
When MNN_USE_THREAD_POOL is enabled, the number of threads depends on the number of threads configured for the first time;
OpenMP, the number of threads is set globally, and the actual number of threads depends on the number of threads configured last time;

In addition, additional parameters for the backend can be set via backendConfig. See below for details.

Backend Configuration

The backend configuration is defined as follows:

struct BackendConfig {
    enum MemoryMode {
        Memory_Normal = 0,
        Memory_High,
        Memory_Low
    };
    MemoryMode memory = Memory_Normal;
    enum PowerMode {
        Power_Normal = 0,
        Power_High,
        Power_Low
    };
    PowerMode power = Power_Normal;
    enum PrecisionMode {
        Precision_Normal = 0,
        Precision_High,
        Precision_Low
    };
    PrecisionMode precision = Precision_Normal;
    /** user defined context */
    void* sharedContext = nullptr;
};

memory, power, and precision are memory, power, and precision preferences, respectively. Backends that support these options are adjusted as they are executed; if any option is not supported, it is ignored.

The sharedContext is used when customizing backend, and users can assign values according to their needs.

Create Session with Multiple Paths

When you need a more complex configuration on inference paths, you can create the session with scheduling configuration group:

/**
 * @brief create multi-path session with schedule configs. created session will be managed in net.
 * @param configs session schedule configs.
 * @return created session if success, NULL otherwise.
 */
Session* createMultiPathSession(const std::vector<ScheduleConfig>& configs);

**
Each scheduling configuration can configure paths and options independently .