调用FFmpeg SDK解析封装格式的视频为音频流和视频流 - 《FFmpeg》

FFmpeg解复用-解码器所包含的结构
FFmpeg解复用-解码的过程

 我们平常最常用的音视频文件通常不是单独的音频信号和视频信号，而是一个整体的文件。这个文件会在其中包含音频流和视频流，并通过某种方式进行同步播放。通常，文件的音频和视频通过某种标准格式进行复用，生成某种封装格式，而封装的标志就是文件的扩展名，常用的有mp4/avi/flv/mkv等。
  从底层考虑，我们可以使用的只有视频解码器、音频解码器，或者再加上一些附加的字幕解码等额外信息，却不存在所谓的mp4解码器或者avi解码器。所以，为了可以正确播放视频文件，必须将封装格式的视频文件分离出视频和音频信息分别进行解码和播放。
    事实上，无论是mp4还是avi等文件格式，都有不同的标准格式，对于不同的格式并没有一种通用的解析方法。因此，FFMpeg专门定义了一个库来处理设计文件封装格式的功能，即libavformat。涉及文件的封装、解封装的问题，都可以通过调用libavformat的API实现。这里我们实现一个demo来处理音视频文件的解复用与解码的功能。

FFmpeg解复用-解码器所包含的结构

   这一过程实际上包括了封装文件的解复用和音频/视频解码两个步骤，因此需要定义的结构体大致包括用于解码和解封装的部分。我们定义下面这样的一个结构体实现这个功能：

/*************************************************
Struct:        DemuxingVideoAudioContex
Description:    保存解复用器和解码器的上下文组件
*************************************************/
typedef struct
{
    AVFormatContext*fmt_ctx;
    AVCodecContext*video_dec_ctx, *audio_dec_ctx;
    AVStream*video_stream, *audio_stream;
    AVFrame *frame;
    AVPacket pkt;
    intvideo_stream_idx, audio_stream_idx;
    int width,height;
    uint8_t*video_dst_data[4];
    intvideo_dst_linesize[4];
    intvideo_dst_bufsize;
    enumAVPixelFormat pix_fmt;
} DemuxingVideoAudioContex;

这个结构体中的大部分数据类型我们在前面做编码/解码等功能时已经见到过，另外几个是涉及到视频文件的复用的，其中有：

AVFormatContext：用于处理音视频封装格式的上下文信息。
AVStream：表示音频或者视频流的结构。
AVPixelFormat：枚举类型，表示图像像素的格式，最常用的是AV_PIX_FMT_YUV420P

FFmpeg解复用-解码的过程

1).相关结构的初始化
与使用FFMpeg进行其他操作一样，首先需注册FFMpeg组件：

av_register_all();

 随后，我们需要打开待处理的音视频文件。然而在此我们不使用打开文件的fopen函数，而是使用avformat_open_input函数。该函数不但会打开输入文件，而且可以根据输入文件读取相应的格式信息。该函数的声明如下：

int avformat_open_input(AVFormatContext **ps, const char*url, AVInputFormat *fmt, AVDictionary **options);

该函数的各个参数的作用为：

ps：根据输入文件接收与格式相关的句柄信息；可以指向NULL，那么AVFormatContext类型的实例将由该函数进行分配。
url：视频url或者文件路径；
fmt：强制输入格式，可设置为NULL以自动检测；
options：保存文件格式无法识别的信息；
返回值：成功返回0，失败则返回负的错误码；

该函数的调用方式为：

if (avformat_open_input(&(va_ctx.fmt_ctx),files.src_filename, NULL, NULL) < 0){
    fprintf(stderr,"Could not open source file %s\n", files.src_filename);
    return -1;
}

打开文件后，调用avformat_find_stream_info函数获取文件中的流信息。该函数的声明为：

int avformat_find_stream_info(AVFormatContext *ic,AVDictionary **options);

该函数的第一个参数即前面的文件句柄，第二个参数也是用于保存无法识别的信息的AVDictionary的结构，通常可设为NULL。调用方式如：

/* retrieve stream information */
if (avformat_find_stream_info(va_ctx.fmt_ctx, NULL) <0){
    fprintf(stderr,"Could not find stream information\n");
    return -1;
}

获取文件中的流信息后，下一步则是获取文件中的音频和视频流，并准备对音频和视频信息进行解码。获取文件中的流使用av_find_best_stream函数，其声明如：

int av_find_best_stream(AVFormatContext *ic,
                   enum AVMediaType type,
                   int wanted_stream_nb,
                   int related_stream,
                   AVCodec **decoder_ret,
                   int flags);

其中各个参数的意义：

ic：视频文件句柄；
type：表示数据的类型，常用的有AVMEDIA_TYPE_VIDEO表示视频，AVMEDIA_TYPE_AUDIO表示音频等；
wanted_stream_nb：我们期望获取到的数据流的数量，设置为-1使用自动获取；
related_stream：获取相关的音视频流，如果没有则设为-1；
decoder_ret：返回这一路数据流的解码器；
flags：未定义；
返回值：函数执行成功返回流的数量，失败则返回负的错误码；

在函数执行成功后，便可调用avcodec_find_decoder和avcodec_open2打开解码器准备解码音视频流。该部分的代码实现如：

static int open_codec_context(IOFileName &files,DemuxingVideoAudioContex &va_ctx, enum AVMediaType type)
{
    int ret,stream_index;
    AVStream *st;
    AVCodecContext*dec_ctx = NULL;
    AVCodec *dec =NULL;
    AVDictionary*opts = NULL;
    ret =av_find_best_stream(va_ctx.fmt_ctx, type, -1, -1, NULL, 0);
    if (ret < 0){
       fprintf(stderr, "Could not find %s stream in input file'%s'\n", av_get_media_type_string(type), files.src_filename);
        return ret;
    }
    else{
       stream_index = ret;
       st =va_ctx.fmt_ctx->streams[stream_index];
        /* finddecoder for the stream */
        dec_ctx =st->codec;
        dec =avcodec_find_decoder(dec_ctx->codec_id);
        if (!dec){
            fprintf(stderr,"Failed to find %s codec\n", av_get_media_type_string(type));
            returnAVERROR(EINVAL);
        }
        /* Init thedecoders, with or without reference counting */
        av_dict_set(&opts, "refcounted_frames", files.refcount ?"1" : "0", 0);
        if ((ret =avcodec_open2(dec_ctx, dec, &opts)) < 0) {
           fprintf(stderr, "Failed to open %s codec\n",av_get_media_type_string(type));
            returnret;
        }
        switch (type){
        caseAVMEDIA_TYPE_VIDEO:
           va_ctx.video_stream_idx = stream_index;
           va_ctx.video_stream = va_ctx.fmt_ctx->streams[stream_index];
           va_ctx.video_dec_ctx = va_ctx.video_stream->codec;
            break;
        caseAVMEDIA_TYPE_AUDIO:
           va_ctx.audio_stream_idx = stream_index;
           va_ctx.audio_stream = va_ctx.fmt_ctx->streams[stream_index];
           va_ctx.audio_dec_ctx = va_ctx.audio_stream->codec;
            break;
        default:
           fprintf(stderr, "Error: unsupported MediaType: %s\n",av_get_media_type_string(type));
            return-1;
        }
    }
    return 0;
}

整体初始化的函数代码为:

int InitDemuxContext(IOFileName &files,DemuxingVideoAudioContex &va_ctx)
{
    int ret = 0,width, height;
    /* register allformats and codecs */
    av_register_all();
    /* open inputfile, and allocate format context */
    if(avformat_open_input(&(va_ctx.fmt_ctx), files.src_filename, NULL, NULL)< 0) {
        fprintf(stderr, "Could not opensource file %s\n", files.src_filename);
        return -1;
    }
    /* retrievestream information */
    if(avformat_find_stream_info(va_ctx.fmt_ctx, NULL) < 0){
       fprintf(stderr, "Could not find stream information\n");
        return -1;
    }
    if(open_codec_context(files, va_ctx, AVMEDIA_TYPE_VIDEO) >= 0){
       files.video_dst_file = fopen(files.video_dst_filename, "wb");
       if(!files.video_dst_file){
           fprintf(stderr, "Could not open destination file %s\n",files.video_dst_filename);
            return-1;
        }
        /* allocateimage where the decoded image will be put */
       va_ctx.width = va_ctx.video_dec_ctx->width;
       va_ctx.height = va_ctx.video_dec_ctx->height;
       va_ctx.pix_fmt = va_ctx.video_dec_ctx->pix_fmt;
       ret =av_image_alloc(va_ctx.video_dst_data, va_ctx.video_dst_linesize, va_ctx.width,va_ctx.height, va_ctx.pix_fmt, 1);
        if (ret < 0) {
           fprintf(stderr, "Could not allocate raw video buffer\n");
           return-1;
        }
       va_ctx.video_dst_bufsize = ret;
    }
    if(open_codec_context(files, va_ctx, AVMEDIA_TYPE_AUDIO) >= 0) {
        files.audio_dst_file = fopen(files.audio_dst_filename, "wb");
        if(!files.audio_dst_file){
           fprintf(stderr, "Could not open destination file %s\n",files.audio_dst_filename);
            return-1;
        }
    }
    if(va_ctx.video_stream){
       printf("Demuxing video from file '%s' into '%s'\n",files.src_filename, files.video_dst_filename);
    }
    if(va_ctx.audio_stream){
       printf("Demuxing audio from file '%s' into '%s'\n",files.src_filename, files.audio_dst_filename);
    }
    /* dump inputinformation to stderr */
    av_dump_format(va_ctx.fmt_ctx, 0, files.src_filename, 0);
    if(!va_ctx.audio_stream && !va_ctx.video_stream){
       fprintf(stderr, "Could not find audio or video stream in the input,aborting\n");
        return -1;
    }
    return 0;
}

随后要做的，是分配AVFrame和初始化AVPacket对象：

va_ctx.frame = av_frame_alloc();            //分配AVFrame结构对象
if (!va_ctx.frame){
    fprintf(stderr,"Could not allocate frame\n");
    ret =AVERROR(ENOMEM);
    goto end;
}
/* initialize packet, set data to NULL, let the demuxerfill it */
av_init_packet(&va_ctx.pkt);                //初始化AVPacket对象
va_ctx.pkt.data = NULL;
va_ctx.pkt.size = 0;

2).循环解析视频文件的包数据

解析视频文件的循环代码段为：

/* read frames from the file */
while (av_read_frame(va_ctx.fmt_ctx, &va_ctx.pkt)>= 0)     //从输入程序中读取一个包的数据
{
    AVPacketorig_pkt = va_ctx.pkt;
    do{
        ret =Decode_packet(files, va_ctx, &got_frame, 0);  //解码这个包
        if (ret< 0)
            break;
       va_ctx.pkt.data += ret;
       va_ctx.pkt.size -= ret;
    } while(va_ctx.pkt.size > 0);
   av_packet_unref(&orig_pkt);
}

 这部分代码逻辑上非常简单，首先调用av_read_frame函数，从文件中读取一个packet的数据，并实现了一个Decode_packet对这个packet进行解码。Decode_packet函数的实现如下：

int Decode_packet(IOFileName &files,DemuxingVideoAudioContex &va_ctx, int *got_frame, int cached)
{
    int ret = 0;
    int decoded =va_ctx.pkt.size;
    static intvideo_frame_count = 0;
    static intaudio_frame_count = 0;
    *got_frame = 0;
    if(va_ctx.pkt.stream_index == va_ctx.video_stream_idx){
        /* decodevideo frame */
        ret =avcodec_decode_video2(va_ctx.video_dec_ctx, va_ctx.frame, got_frame,&va_ctx.pkt);
        if (ret< 0) {
            printf("Error decoding video frame(%d)\n", ret);
            return ret;
        }
        if(*got_frame){
            if(va_ctx.frame->width != va_ctx.width || va_ctx.frame->height !=va_ctx.height || va_ctx.frame->format != va_ctx.pix_fmt){
                /*To handle this change, one could call av_image_alloc again and
                *decode the following frames into another rawvideo file. */
               printf("Error: Width, height and pixel format have to be "
                   "constant in a rawvideo file, but the width, height or "
                   "pixel format of the input video changed:\n"
                   "old: width = %d, height = %d, format = %s\n"
                   "new: width = %d, height = %d, format = %s\n",
                   va_ctx.width, va_ctx.height,av_get_pix_fmt_name((AVPixelFormat)(va_ctx.pix_fmt)),
                   va_ctx.frame->width, va_ctx.frame->height,
                   av_get_pix_fmt_name((AVPixelFormat)va_ctx.frame->format));
               return -1;
            }
           printf("video_frame%s n:%d coded_n:%d pts:%s\n", cached ?"(cached)" : "", video_frame_count++,va_ctx.frame->coded_picture_number, va_ctx.frame->pts);
            /* copy decoded frame to destination buffer:
            * this is required since rawvideo expects non aligned data */
               av_image_copy(va_ctx.video_dst_data, va_ctx.video_dst_linesize,
               (const uint8_t **)(va_ctx.frame->data), va_ctx.frame->linesize,
               va_ctx.pix_fmt, va_ctx.width, va_ctx.height);
            /*write to rawvideo file */
                    fwrite(va_ctx.video_dst_data[0],va_ctx.video_dst_bufsize,files.video_dst_file);
        }
    }
    else if (va_ctx.pkt.stream_index ==va_ctx.audio_stream_idx){
        /* decodeaudio frame */
        ret =avcodec_decode_audio4(va_ctx.audio_dec_ctx, va_ctx.frame, got_frame,&va_ctx.pkt);
        if (ret< 0) {
           printf("Error decoding audio frame (%s)\n", ret);
            return ret;
        }
        /* Someaudio decoders decode only part of the packet, and have to be
        * calledagain with the remainder of the packet data.
        * Sample:fate-suite/lossless-audio/luckynight-partial.shn
        * Also,some decoders might over-read the packet. */
        decoded =FFMIN(ret, va_ctx.pkt.size);
        if(*got_frame){
            size_tunpadded_linesize = va_ctx.frame->nb_samples * av_get_bytes_per_sample((AVSampleFormat)va_ctx.frame->format);
           printf("audio_frame%s n:%d nb_samples:%d pts:%s\n",
               cached ? "(cached)" : "",
               audio_frame_count++, va_ctx.frame->nb_samples,
               va_ctx.frame->pts);
            /*Write the raw audio data samples of the first plane. This works
            * finefor packed formats (e.g. AV_SAMPLE_FMT_S16). However,
            * mostaudio decoders output planar audio, which uses a separate
            * planeof audio samples for each channel (e.g. AV_SAMPLE_FMT_S16P).
            * Inother words, this code will write only the first audio channel
            * inthese cases.
            * Youshould use libswresample or libavfilter to convert the frame
            * topacked data. */
           fwrite(va_ctx.frame->extended_data[0], 1, unpadded_linesize,files.audio_dst_file);
        }
    }
        /* If weuse frame reference counting, we own the data and need
        * tode-reference it when we don't use it anymore */
        if(*got_frame && files.refcount)
           av_frame_unref(va_ctx.frame);
        return decoded;
}

  在该函数中，首先对读取到的packet中的stream_index分别于先前获取的音频和视频的stream_index进行对比来确定是音频还是视频流。而后分别调用相应的解码函数进行解码，以视频流为例，判断当前stream为视频流后，调用avcodec_decode_video2函数将流数据解码为像素数据，并在获取完整的一帧之后，将其写出到输出文件中。

调用FFmpeg SDK解析封装格式的视频为音频流和视频流 - 图1