DOC-VQA SER实战

本节将介绍如何使用PaddleOCR完成DOC-VQA SER算法的训练与运行,包括:

  1. 理解DOC-VQA SER算法原理
  2. 掌握PaddleOCR里DOC-VQA SER代码的训练流程

    1. 快速体验

    准备代码和环境 ```python

    clone PaddleOCR代码

    ! git clone https://github.com/PaddlePaddle/PaddleOCR

安装依赖包

! pip install -U pip ! pip install -r /home/aistudio/PaddleOCR/requirements.txt ! pip install paddleocr

安装依赖包

! pip install yacs gnureadline paddlenlp==2.2.1

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Collecting pip [?25l Downloading https://pypi.tuna.tsinghua.edu.cn/packages/a4/6d/6463d49a933f547439d6b5b98b46af8742cc03ae83543e4d7688c2420f8b/pip-21.3.1-py3-none-any.whl (1.7MB)  |████████████████████████████████| 1.7MB 18.0MB/s eta 0:00:01 [?25hInstalling collected packages: pip Found existing installation: pip 19.2.3 Uninstalling pip-19.2.3: Successfully uninstalled pip-19.2.3 Successfully installed pip-21.3.1 ERROR: Could not open requirements file: [Errno 2] No such file or directory: ‘/home/aistudioPaddleOCR/requirements.txt’ Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Collecting paddleocr Downloading https://pypi.tuna.tsinghua.edu.cn/packages/e1/b6/5486e674ce096667dff247b58bf0fb789c2ce17a10e546c2686a2bb07aec/paddleocr-2.3.0.2-py3-none-any.whl (250 kB) |████████████████████████████████| 250 kB 14.0 MB/s
[?25hCollecting shapely Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ae/20/33ce377bd24d122a4d54e22ae2c445b9b1be8240edb50040b40add950cd9/Shapely-1.8.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.1 MB) |████████████████████████████████| 1.1 MB 79.1 MB/s
[?25hCollecting scikit-image Downloading https://pypi.tuna.tsinghua.edu.cn/packages/9a/44/8f8c7f9c9de7fde70587a656d7df7d056e6f05192a74491f7bc074a724d0/scikit_image-0.19.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (13.3 MB) |████████████████████████████████| 13.3 MB 75.5 MB/s
[?25hCollecting lmdb Downloading https://pypi.tuna.tsinghua.edu.cn/packages/2e/dd/ada2fd91cd7832979069c556607903f274470c3d3d2274e0a848908272e8/lmdb-1.2.1-cp37-cp37m-manylinux2010_x86_64.whl (299 kB) |████████████████████████████████| 299 kB 73.8 MB/s
[?25hCollecting fasttext==0.9.1 Downloading https://pypi.tuna.tsinghua.edu.cn/packages/10/61/2e01f1397ec533756c1d893c22d9d5ed3fce3a6e4af1976e0d86bb13ea97/fasttext-0.9.1.tar.gz (57 kB) |████████████████████████████████| 57 kB 10.9 MB/s
[?25h Preparing metadata (setup.py) … [?25ldone [?25hRequirement already satisfied: tqdm in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddleocr) (4.27.0) Requirement already satisfied: cython in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddleocr) (0.29) Collecting premailer Downloading https://pypi.tuna.tsinghua.edu.cn/packages/b1/07/4e8d94f94c7d41ca5ddf8a9695ad87b888104e2fd41a35546c1dc9ca74ac/premailer-3.10.0-py2.py3-none-any.whl (19 kB) Requirement already satisfied: openpyxl in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddleocr) (3.0.5) Collecting imgaug==0.4.0 Downloading https://pypi.tuna.tsinghua.edu.cn/packages/66/b1/af3142c4a85cba6da9f4ebb5ff4e21e2616309552caca5e8acefe9840622/imgaug-0.4.0-py2.py3-none-any.whl (948 kB) |████████████████████████████████| 948 kB 49.5 MB/s
[?25hCollecting pyclipper Downloading https://pypi.tuna.tsinghua.edu.cn/packages/c5/fa/2c294127e4f88967149a68ad5b3e43636e94e3721109572f8f17ab15b772/pyclipper-1.3.0.post2-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (603 kB) |████████████████████████████████| 603 kB 63.1 MB/s
[?25hCollecting python-Levenshtein Downloading https://pypi.tuna.tsinghua.edu.cn/packages/2a/dc/97f2b63ef0fa1fd78dcb7195aca577804f6b2b51e712516cc0e902a9a201/python-Levenshtein-0.12.2.tar.gz (50 kB) |████████████████████████████████| 50 kB 13.8 MB/s
[?25h Preparing metadata (setup.py) … [?25ldone [?25hRequirement already satisfied: visualdl in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddleocr) (2.2.0) Requirement already satisfied: numpy in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddleocr) (1.20.3) Collecting opencv-contrib-python==4.4.0.46 Downloading https://pypi.tuna.tsinghua.edu.cn/packages/08/51/1e0a206dd5c70fea91084e6f43979dc13e8eb175760cc7a105083ec3eb68/opencv_contrib_python-4.4.0.46-cp37-cp37m-manylinux2014_x86_64.whl (55.7 MB) |████████████████████████████████| 55.7 MB 46 kB/s
[?25hCollecting lxml Downloading https://pypi.tuna.tsinghua.edu.cn/packages/7b/01/16a9b80c8ce4339294bb944f08e157dbfcfbb09ba9031bde4ddf7e3e5499/lxml-4.7.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (6.4 MB) |████████████████████████████████| 6.4 MB 67.3 MB/s ███████ | 5.2 MB 67.3 MB/s eta 0:00:01 [?25hCollecting pybind11>=2.2 Using cached https://pypi.tuna.tsinghua.edu.cn/packages/a8/3b/fc246e1d4c7547a7a07df830128e93c6215e9b93dcb118b2a47a70726153/pybind11-2.8.1-py2.py3-none-any.whl (208 kB) Requirement already satisfied: setuptools>=0.7.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from fasttext==0.9.1->paddleocr) (56.2.0) Requirement already satisfied: Pillow in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->paddleocr) (7.1.2) Requirement already satisfied: opencv-python in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->paddleocr) (4.1.1.26) Requirement already satisfied: six in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->paddleocr) (1.15.0) Requirement already satisfied: imageio in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->paddleocr) (2.6.1) Requirement already satisfied: scipy in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->paddleocr) (1.6.3) Requirement already satisfied: matplotlib in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->paddleocr) (2.2.3) Collecting PyWavelets>=1.1.1 Downloading https://pypi.tuna.tsinghua.edu.cn/packages/a1/9c/564511b6e1c4e1d835ed2d146670436036960d09339a8fa2921fe42dad08/PyWavelets-1.2.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (6.1 MB) |████████████████████████████████| 6.1 MB 70.8 MB/s
[?25hRequirement already satisfied: packaging>=20.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-image->paddleocr) (20.9) Requirement already satisfied: networkx>=2.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-image->paddleocr) (2.4) Collecting tifffile>=2019.7.26 Downloading https://pypi.tuna.tsinghua.edu.cn/packages/d8/38/85ae5ed77598ca90558c17a2f79ddaba33173b31cf8d8f545d34d9134f0d/tifffile-2021.11.2-py3-none-any.whl (178 kB) |████████████████████████████████| 178 kB 68.9 MB/s
[?25hRequirement already satisfied: jdcal in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from openpyxl->paddleocr) (1.4.1) Requirement already satisfied: et-xmlfile in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from openpyxl->paddleocr) (1.0.1) Requirement already satisfied: cachetools in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from premailer->paddleocr) (4.0.0) Requirement already satisfied: requests in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from premailer->paddleocr) (2.22.0) Collecting cssutils Downloading https://pypi.tuna.tsinghua.edu.cn/packages/24/c4/9db28fe567612896d360ab28ad02ee8ae107d0e92a22db39affd3fba6212/cssutils-2.3.0-py3-none-any.whl (404 kB) |████████████████████████████████| 404 kB 70.3 MB/s
[?25hCollecting cssselect Downloading https://pypi.tuna.tsinghua.edu.cn/packages/3b/d4/3b5c17f00cce85b9a1e6f91096e1cc8e8ede2e1be8e96b87ce1ed09e92c5/cssselect-1.1.0-py2.py3-none-any.whl (16 kB) Requirement already satisfied: shellcheck-py in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddleocr) (0.7.1.1) Requirement already satisfied: pre-commit in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddleocr) (1.21.0) Requirement already satisfied: protobuf>=3.11.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddleocr) (3.14.0) Requirement already satisfied: flake8>=3.7.9 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddleocr) (3.8.2) Requirement already satisfied: pandas in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddleocr) (1.1.5) Requirement already satisfied: flask>=1.1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddleocr) (1.1.1) Requirement already satisfied: bce-python-sdk in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddleocr) (0.8.53) Requirement already satisfied: Flask-Babel>=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddleocr) (1.0.0) Requirement already satisfied: importlib-metadata in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->paddleocr) (0.23) Requirement already satisfied: pyflakes<2.3.0,>=2.2.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->paddleocr) (2.2.0) Requirement already satisfied: mccabe<0.7.0,>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->paddleocr) (0.6.1) Requirement already satisfied: pycodestyle<2.7.0,>=2.6.0a1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->paddleocr) (2.6.0) Requirement already satisfied: Werkzeug>=0.15 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->paddleocr) (0.16.0) Requirement already satisfied: Jinja2>=2.10.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->paddleocr) (2.11.0) Requirement already satisfied: itsdangerous>=0.24 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->paddleocr) (1.1.0) Requirement already satisfied: click>=5.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->paddleocr) (7.0) Requirement already satisfied: Babel>=2.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Flask-Babel>=1.0.0->visualdl->paddleocr) (2.8.0) Requirement already satisfied: pytz in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Flask-Babel>=1.0.0->visualdl->paddleocr) (2019.3) Requirement already satisfied: decorator>=4.3.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from networkx>=2.2->scikit-image->paddleocr) (4.4.2) Requirement already satisfied: pyparsing>=2.0.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from packaging>=20.0->scikit-image->paddleocr) (2.4.2) Requirement already satisfied: pycryptodome>=3.8.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from bce-python-sdk->visualdl->paddleocr) (3.9.9) Requirement already satisfied: future>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from bce-python-sdk->visualdl->paddleocr) (0.18.0) Requirement already satisfied: cycler>=0.10 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->imgaug==0.4.0->paddleocr) (0.10.0) Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->imgaug==0.4.0->paddleocr) (1.1.0) Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->imgaug==0.4.0->paddleocr) (2.8.0) Requirement already satisfied: cfgv>=2.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddleocr) (2.0.1) Requirement already satisfied: aspy.yaml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddleocr) (1.3.0) Requirement already satisfied: pyyaml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddleocr) (5.1.2) Requirement already satisfied: nodeenv>=0.11.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddleocr) (1.3.4) Requirement already satisfied: identify>=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddleocr) (1.4.10) Requirement already satisfied: toml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddleocr) (0.10.0) Requirement already satisfied: virtualenv>=15.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddleocr) (16.7.9) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->premailer->paddleocr) (2019.9.11) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->premailer->paddleocr) (1.25.6) Requirement already satisfied: idna<2.9,>=2.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->premailer->paddleocr) (2.8) Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->premailer->paddleocr) (3.0.4) Requirement already satisfied: MarkupSafe>=0.23 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Jinja2>=2.10.1->flask>=1.1.1->visualdl->paddleocr) (1.1.1) Requirement already satisfied: zipp>=0.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from importlib-metadata->flake8>=3.7.9->visualdl->paddleocr) (3.6.0) Building wheels for collected packages: fasttext, python-Levenshtein Building wheel for fasttext (setup.py) … [?25ldone [?25h Created wheel for fasttext: filename=fasttext-0.9.1-cp37-cp37m-linux_x86_64.whl size=2588539 sha256=b99925fa4f256c920068945754af2028c60cd6f04ca5420636903f328ea66b37 Stored in directory: /home/aistudio/.cache/pip/wheels/a1/cb/b3/a25a8ce16c1a4ff102c1e40d6eaa4dfc9d5695b92d57331b36 Building wheel for python-Levenshtein (setup.py) … [?25ldone [?25h Created wheel for python-Levenshtein: filename=python_Levenshtein-0.12.2-cp37-cp37m-linux_x86_64.whl size=171699 sha256=e433b579e472607a6f8b3f52477ce04171f3d60ce8e0743a9e40139d1ac180a2 Stored in directory: /home/aistudio/.cache/pip/wheels/38/b9/a4/3729726160fb103833de468adb5ce019b58543ae41d0b0e446 Successfully built fasttext python-Levenshtein Installing collected packages: tifffile, PyWavelets, shapely, scikit-image, pybind11, lxml, cssutils, cssselect, python-Levenshtein, pyclipper, premailer, opencv-contrib-python, lmdb, imgaug, fasttext, paddleocr Successfully installed PyWavelets-1.2.0 cssselect-1.1.0 cssutils-2.3.0 fasttext-0.9.1 imgaug-0.4.0 lmdb-1.2.1 lxml-4.7.1 opencv-contrib-python-4.4.0.46 paddleocr-2.3.0.2 premailer-3.10.0 pybind11-2.8.1 pyclipper-1.3.0.post2 python-Levenshtein-0.12.2 scikit-image-0.19.1 shapely-1.8.0 tifffile-2021.11.2 Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Collecting yacs Downloading https://pypi.tuna.tsinghua.edu.cn/packages/38/4f/fe9a4d472aa867878ce3bb7efb16654c5d63672b86dc0e6e953a67018433/yacs-0.1.8-py3-none-any.whl (14 kB) Collecting gnureadline Downloading https://pypi.tuna.tsinghua.edu.cn/packages/2b/36/60b53a1793af9a60539b5ee6fed4f3702280bd5a88ab41600a51510002a1/gnureadline-8.0.0-cp37-cp37m-manylinux1_x86_64.whl (283 kB) |████████████████████████████████| 283 kB 15.9 MB/s
[?25hCollecting paddlenlp==2.2.1 Downloading https://pypi.tuna.tsinghua.edu.cn/packages/dd/cb/f15e5406718079a201b869b46705ef0882be2ffb6932472e44efc3d16a75/paddlenlp-2.2.1-py3-none-any.whl (1.0 MB) |████████████████████████████████| 1.0 MB 17.1 MB/s
[?25hRequirement already satisfied: h5py in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp==2.2.1) (2.9.0) Requirement already satisfied: colorama in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp==2.2.1) (0.4.4) Requirement already satisfied: jieba in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp==2.2.1) (0.42.1) Requirement already satisfied: multiprocess in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp==2.2.1) (0.70.11.1) Requirement already satisfied: colorlog in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp==2.2.1) (4.1.0) Requirement already satisfied: seqeval in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp==2.2.1) (1.2.2) Requirement already satisfied: PyYAML in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from yacs) (5.1.2) Requirement already satisfied: numpy>=1.7 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from h5py->paddlenlp==2.2.1) (1.20.3) Requirement already satisfied: six in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from h5py->paddlenlp==2.2.1) (1.15.0) Requirement already satisfied: dill>=0.3.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from multiprocess->paddlenlp==2.2.1) (0.3.3) Requirement already satisfied: scikit-learn>=0.21.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from seqeval->paddlenlp==2.2.1) (0.24.2) Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-learn>=0.21.3->seqeval->paddlenlp==2.2.1) (2.1.0) Requirement already satisfied: scipy>=0.19.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-learn>=0.21.3->seqeval->paddlenlp==2.2.1) (1.6.3) Requirement already satisfied: joblib>=0.11 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-learn>=0.21.3->seqeval->paddlenlp==2.2.1) (0.14.1) Installing collected packages: yacs, paddlenlp, gnureadline Attempting uninstall: paddlenlp Found existing installation: paddlenlp 2.1.1 Uninstalling paddlenlp-2.1.1: Successfully uninstalled paddlenlp-2.1.1 Successfully installed gnureadline-8.0.0 paddlenlp-2.2.1 yacs-0.1.8

  1. ```python
  2. # 切换到 vqa 目录
  3. import os
  4. os.chdir('/home/aistudio/PaddleOCR/ppstructure/vqa')
  1. # 下载模型
  2. ! mkdir inference
  3. # 下载SER模型的检测模型并解压
  4. ! wget -P ./inference/ https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_ser_pretrained.tar && cd inference && tar xf PP-Layout_v1.0_ser_pretrained.tar && cd ..
  1. --2021-12-22 16:03:11-- https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_ser_pretrained.tar
  2. Resolving paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)... 182.61.200.195, 182.61.200.229, 2409:8c04:1001:1002:0:ff:b001:368a
  3. Connecting to paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)|182.61.200.195|:443... connected.
  4. HTTP request sent, awaiting response... 200 OK
  5. Length: 1481431040 (1.4G) [application/x-tar]
  6. Saving to: ‘./inference/PP-Layout_v1.0_ser_pretrained.tar
  7. PP-Layout_v1.0_ser_ 100%[===================>] 1.38G 45.5MB/s in 31s
  8. 2021-12-22 16:03:42 (45.6 MB/s) - ‘./inference/PP-Layout_v1.0_ser_pretrained.tar saved [1481431040/1481431040]
  1. # 执行 SER 预测
  2. # https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.4/ppstructure/vqa/infer_ser_e2e.py
  3. ! python infer_ser_e2e.py \
  4. --model_name_or_path "./inference/PP-Layout_v1.0_ser_pretrained/" \
  5. --max_seq_length 512 \
  6. --output_dir "output/res_e2e/" \
  7. --infer_imgs "images/input/zh_val_42.jpg"
  8. import cv2
  9. from matplotlib import pyplot as plt
  10. # 在notebook中使用matplotlib.pyplot绘图时,需要添加该命令进行显示
  11. %matplotlib inline
  12. img = cv2.imread('output/res_e2e/zh_val_42_ser.jpg')
  13. plt.figure(figsize=(48,24))
  14. plt.imshow(img)
  1. W1225 21:13:25.559681 1343 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
  2. W1225 21:13:25.563249 1343 device_context.cc:465] device: 0, cuDNN Version: 7.6.
  3. [2021/12/25 21:13:35] root WARNING: version PP-OCRv2 not support cls models, auto switch to version PP-OCR
  4. Namespace(benchmark=False, cls_batch_num=6, cls_image_shape='3, 48, 192', cls_model_dir='/home/aistudio/.paddleocr/2.3.0.2/ocr/cls/ch_ppocr_mobile_v2.0_cls_infer', cls_thresh=0.9, cpu_threads=10, det=True, det_algorithm='DB', det_db_box_thresh=0.6, det_db_score_mode='fast', det_db_thresh=0.3, det_db_unclip_ratio=1.5, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_east_score_thresh=0.8, det_limit_side_len=960, det_limit_type='max', det_model_dir='/home/aistudio/.paddleocr/2.3.0.2/ocr/det/ch/ch_PP-OCRv2_det_infer', det_pse_box_thresh=0.85, det_pse_box_type='box', det_pse_min_area=16, det_pse_scale=1, det_pse_thresh=0, det_sast_nms_thresh=0.2, det_sast_polygon=False, det_sast_score_thresh=0.5, drop_score=0.5, e2e_algorithm='PGNet', e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_limit_side_len=768, e2e_limit_type='max', e2e_model_dir=None, e2e_pgnet_mode='fast', e2e_pgnet_polygon=True, e2e_pgnet_score_thresh=0.5, e2e_pgnet_valid_set='totaltext', enable_mkldnn=False, gpu_mem=500, help='==SUPPRESS==', image_dir=None, ir_optim=True, label_list=['0', '180'], lang='ch', layout_path_model='lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config', max_batch_size=10, max_text_length=25, min_subgraph_size=15, ocr_version='PP-OCRv2', output='./output/table', precision='fp32', process_id=0, rec=True, rec_algorithm='CRNN', rec_batch_num=6, rec_char_dict_path='/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleocr/ppocr/utils/ppocr_keys_v1.txt', rec_image_shape='3, 32, 320', rec_model_dir='/home/aistudio/.paddleocr/2.3.0.2/ocr/rec/ch/ch_PP-OCRv2_rec_infer', save_log_path='./log_output/', show_log=False, structure_version='STRUCTURE', table_char_dict_path=None, table_char_type='en', table_max_len=488, table_model_dir=None, total_process_num=1, type='ocr', use_angle_cls=False, use_dilation=False, use_gpu=True, use_mp=False, use_onnx=False, use_pdserving=False, use_space_char=True, use_tensorrt=False, vis_font_path='./doc/fonts/simfang.ttf', warmup=True)
  5. process: [0/1], save result to output/res_e2e/zh_val_42_ser.jpg
  6. Corrupt JPEG data: premature end of data segment
  7. <matplotlib.image.AxesImage at 0x7f655014c2d0>

output_4_2.png

2. 原理详解

PaddleOCR中DOC-VQA系列算法目前基于LayoutXLM论文实现,提供了SER和RE两种任务
LayoutXLM是LayoutLMV2的多语言版本,LayoutLMV2原理图如下:

3.文档分析实战-VQA - 图2
图1 LayoutLMV2算法
相对于NLP中的Bert,LayoutXLM在模型的输入端新增了Image和图像内文字的Layout信息,LayoutXLM已经在PaddleNLP中实现,因此这里从模型前向的角度介绍数据和网络。

2.1 输入数据处理

首先对图像进行ocr识别或pdf解析,获取text及bbox信息,在此基础上构建模型的三个输入:

  1. Text Embedding
    首先对OCR识别的文本使用 WordPiece 进行切分,之后添加 [CLS] 和 [SEP] 标记,并用 [PAD] 补齐长度得到文本输入序列如下: 3.文档分析实战-VQA - 图3
    再将词向量、一维位置向量、分段向量相加得到文本向量,公式如下: 3.文档分析实战-VQA - 图4%2BPosEmb1D(i)%2BSegEmb(s_i)%2C%200%20%5Cleq%20i%3CL%0A#card=math&code=t_i%3DTokEmb%28w_i%29%2BPosEmb1D%28i%29%2BSegEmb%28s_i%29%2C%200%20%5Cleq%20i%3CL%0A&id=XIcrV)
    一维位置向量:词的索引
    分段向量:A
  1. # Text Embedding 演示
  2. from paddlenlp.transformers import LayoutXLMTokenizer
  3. tokenizer = LayoutXLMTokenizer.from_pretrained('inference/PP-Layout_v1.0_ser_pretrained')
  4. # 分词
  5. print('分词结果:', tokenizer.tokenize('我的中国心'))
  6. # 转换为索引
  7. print('转换为索引结果:', tokenizer.encode('我的中国心'))
  1. 分词结果: ['▁我', '的中国', '心']
  2. 转换为索引结果: {'input_ids': [0, 13129, 84072, 1801, 2], 'token_type_ids': [0, 0, 0, 0, 0]}
  1. Image Embedding
    我们使用 ResNeXt-FPN 网络作为图像编码器,首先抽取原始文档图像的特征图,再将其平均池化为固定尺寸(B 256 7 7),接着按行展开平均池化后的特征图(B 256 49),之后经过线性投影(B 49 * 256),就可以得到图像对应的特征序列。和文本向量的组成对应,图像向量也补充了一维相对位置和分段信息。最后将特征向量、一维位置向量、分段向量相加就可以得到最终的图像向量,如下所示: 3.文档分析实战-VQA - 图5_i)%2BPosEmb1D(i)%2BSegEmb(%5BC%5D)%2C%200%20%5Cleq%20i%3CWH%0A#card=math&code=v_i%3DProj%28VisTokEmb%28I%29_i%29%2BPosEmb1D%28i%29%2BSegEmb%28%5BC%5D%29%2C%200%20%5Cleq%20i%3CWH%0A&id=tyFel)
    分段向量:C
  2. Layout Embedding
    对应于每个词或图像区域在页面中覆盖的坐标范围,使用平行于坐标轴的边界框(bounding box)表示布局信息,每个边界框用4个边界坐标值、宽、高来表示。最终的布局向量由6个特征对应的向量拼接得到: 3.文档分析实战-VQA - 图6%2C%20PosEmb2D_y(y_0%2C%20y_1%2C%20h))%2C%200%20%5Cleq%20i%3CWH%2BL%0A#card=math&code=I_i%3DConcat%28PosEmb2D_x%28x_0%2C%20x_1%2C%20w%29%2C%20PosEmb2D_y%28y_0%2C%20y_1%2C%20h%29%29%2C%200%20%5Cleq%20i%3CWH%2BL%0A&id=NDWDo)

下面演示预测过程中,从输入一张图像构建网络输入的过程,整个过程主要包含下面几个步骤

  1. 对图像进行OCR识别
  2. 对图像进行预处理,包括缩放到指定大小和归一化
  3. 对识别到的文本进行分词和转index
  4. 对文本框进行归一化,使其值在0-1000之间
  5. 对3,4处理后的结果进行pad,便于组batch
  1. # 预测输入构建
  2. # https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.4/ppstructure/vqa/vqa_utils.py
  3. import cv2
  4. import numpy as np
  5. import paddle
  6. from copy import deepcopy
  7. from paddleocr import PaddleOCR
  8. from paddlenlp.transformers import LayoutXLMTokenizer
  9. from infer_ser_e2e import trans_poly_to_bbox,pad_sentences,split_page
  10. def parse_ocr_info_for_ser(ocr_result):
  11. # ocr结果转字典形式,文本框转换为外接矩形
  12. ocr_info = []
  13. for res in ocr_result:
  14. ocr_info.append({
  15. "text": res[1][0],
  16. "bbox": trans_poly_to_bbox(res[0]),
  17. "poly": res[0],
  18. })
  19. return ocr_info
  20. def preprocess(
  21. tokenizer,
  22. ori_img,
  23. ocr_info,
  24. img_size=(224, 224),
  25. pad_token_label_id=-100,
  26. max_seq_len=512,
  27. add_special_ids=False,
  28. return_attention_mask=True, ):
  29. ocr_info = deepcopy(ocr_info)
  30. height = ori_img.shape[0]
  31. width = ori_img.shape[1]
  32. # 图像resize到指定shape
  33. img = cv2.resize(ori_img, img_size).transpose([2, 0, 1]).astype(np.float32)
  34. segment_offset_id = [] # 存储了每个文本在input_ids里的结束位置
  35. bbox_list = [] # 存储归一化到 0-1000 的box
  36. input_ids_list = [] # 存储文本经过分词之后的文本段在词表里的索引
  37. token_type_ids_list = [] # 存储文本段的类别信息
  38. for info in ocr_info:
  39. # box 归一化到 0-1000
  40. # x1, y1, x2, y2
  41. bbox = info["bbox"]
  42. bbox[0] = int(bbox[0] * 1000.0 / width)
  43. bbox[2] = int(bbox[2] * 1000.0 / width)
  44. bbox[1] = int(bbox[1] * 1000.0 / height)
  45. bbox[3] = int(bbox[3] * 1000.0 / height)
  46. # 对 文本信息进行 tokenizer, 包含分词和转换为词表里的 index
  47. text = info["text"]
  48. encode_res = tokenizer.encode(
  49. text, pad_to_max_seq_len=False, return_attention_mask=True)
  50. # 根据参数决定是否删掉特殊的字符
  51. if not add_special_ids:
  52. # TODO: use tok.all_special_ids to remove
  53. encode_res["input_ids"] = encode_res["input_ids"][1:-1]
  54. encode_res["token_type_ids"] = encode_res["token_type_ids"][1:-1]
  55. encode_res["attention_mask"] = encode_res["attention_mask"][1:-1]
  56. input_ids_list.extend(encode_res["input_ids"])
  57. token_type_ids_list.extend(encode_res["token_type_ids"])
  58. bbox_list.extend([bbox] * len(encode_res["input_ids"]))
  59. segment_offset_id.append(len(input_ids_list))
  60. encoded_inputs = {
  61. "input_ids": input_ids_list,
  62. "token_type_ids": token_type_ids_list,
  63. "bbox": bbox_list,
  64. "attention_mask": [1] * len(input_ids_list),
  65. }
  66. # 将 val pad到指定长度,不足长度的用 0 进行补充
  67. encoded_inputs = pad_sentences(
  68. tokenizer,
  69. encoded_inputs,
  70. max_seq_len=max_seq_len,
  71. return_attention_mask=return_attention_mask)
  72. # input_ids> 512 时,划分为2个batch
  73. ncoded_inputs = split_page(encoded_inputs)
  74. fake_bs = encoded_inputs["input_ids"].shape[0]
  75. encoded_inputs["image"] = paddle.to_tensor(img).unsqueeze(0).expand(
  76. [fake_bs] + list(img.shape))
  77. encoded_inputs["segment_offset_id"] = segment_offset_id
  78. return encoded_inputs
  79. img = cv2.imread('images/input/zh_val_42.jpg')
  80. ocr_engine = PaddleOCR(use_angle_cls=False,show_log=False)
  81. # 执行ocr识别
  82. ocr_result = ocr_engine.ocr(img, cls=False)
  83. # ocr结果转字典形式,文本框转换为外接矩形
  84. ocr_info = parse_ocr_info_for_ser(ocr_result)
  85. tokenizer = LayoutXLMTokenizer.from_pretrained('inference/PP-Layout_v1.0_ser_pretrained')
  86. # 对图像进行resize,
  87. # 对文本进行分词,转换为词典索引等操作,
  88. # 对box进行归一化
  89. max_seq_length = 512
  90. inputs = preprocess(tokenizer=tokenizer,ori_img=img,ocr_info=ocr_info,max_seq_len=max_seq_length, img_size=(224,224))
  91. print(inputs.keys())
  92. print(inputs['image'].shape)
  1. [2021/12/25 21:11:16] root WARNING: version PP-OCRv2 not support cls models, auto switch to version PP-OCR
  2. download https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar to /home/aistudio/.paddleocr/2.3.0.2/ocr/rec/ch/ch_PP-OCRv2_rec_infer/ch_PP-OCRv2_rec_infer.tar
  3. Corrupt JPEG data: premature end of data segment
  4. 100%|██████████| 8.88M/8.88M [00:00<00:00, 59.4MiB/s]
  5. download https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar to /home/aistudio/.paddleocr/2.3.0.2/ocr/cls/ch_ppocr_mobile_v2.0_cls_infer/ch_ppocr_mobile_v2.0_cls_infer.tar
  6. Namespace(benchmark=False, cls_batch_num=6, cls_image_shape='3, 48, 192', cls_model_dir='/home/aistudio/.paddleocr/2.3.0.2/ocr/cls/ch_ppocr_mobile_v2.0_cls_infer', cls_thresh=0.9, cpu_threads=10, det=True, det_algorithm='DB', det_db_box_thresh=0.6, det_db_score_mode='fast', det_db_thresh=0.3, det_db_unclip_ratio=1.5, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_east_score_thresh=0.8, det_limit_side_len=960, det_limit_type='max', det_model_dir='/home/aistudio/.paddleocr/2.3.0.2/ocr/det/ch/ch_PP-OCRv2_det_infer', det_pse_box_thresh=0.85, det_pse_box_type='box', det_pse_min_area=16, det_pse_scale=1, det_pse_thresh=0, det_sast_nms_thresh=0.2, det_sast_polygon=False, det_sast_score_thresh=0.5, drop_score=0.5, e2e_algorithm='PGNet', e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_limit_side_len=768, e2e_limit_type='max', e2e_model_dir=None, e2e_pgnet_mode='fast', e2e_pgnet_polygon=True, e2e_pgnet_score_thresh=0.5, e2e_pgnet_valid_set='totaltext', enable_mkldnn=False, gpu_mem=500, help='==SUPPRESS==', image_dir=None, ir_optim=True, label_list=['0', '180'], lang='ch', layout_path_model='lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config', max_batch_size=10, max_text_length=25, min_subgraph_size=15, ocr_version='PP-OCRv2', output='./output/table', precision='fp32', process_id=0, rec=True, rec_algorithm='CRNN', rec_batch_num=6, rec_char_dict_path='/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleocr/ppocr/utils/ppocr_keys_v1.txt', rec_image_shape='3, 32, 320', rec_model_dir='/home/aistudio/.paddleocr/2.3.0.2/ocr/rec/ch/ch_PP-OCRv2_rec_infer', save_log_path='./log_output/', show_log=False, structure_version='STRUCTURE', table_char_dict_path=None, table_char_type='en', table_max_len=488, table_model_dir=None, total_process_num=1, type='ocr', use_angle_cls=False, use_dilation=False, use_gpu=True, use_mp=False, use_onnx=False, use_pdserving=False, use_space_char=True, use_tensorrt=False, vis_font_path='./doc/fonts/simfang.ttf', warmup=True)
  7. 100%|██████████| 1.45M/1.45M [00:00<00:00, 32.0MiB/s]
  8. dict_keys(['input_ids', 'token_type_ids', 'bbox', 'attention_mask', 'image', 'segment_offset_id'])
  9. [2, 3, 224, 224]

经过处理后的数据为一个字典,包含下面几个字段:

字段 含义
image resize为224*224的图像
bbox 归一化到 0-1000 的box
input_ids 文本经过分词之后的文本段在词表里的索引
token_type_ids 文本段的类别信息
attention_mask 对文本段进行mask的掩码,特殊字符对应位置标记为0,文本段对应位置标记为1
segment_offset_id 记录了每个文本在input_ids里的结束位置

2.2 SER网络

SER: 语义实体识别 (Semantic Entity Recognition), 可以完成对图像中的文本识别与分类。
SER网络LayoutXLMModel的输出上加了一个全连接的分类头,其网络代码如下:

  1. # https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/transformers/layoutxlm/modeling.py#L846
  2. from paddlenlp.transformers import LayoutXLMPretrainedModel
  3. from paddle import nn
  4. class LayoutXLMForTokenClassification(LayoutXLMPretrainedModel):
  5. def __init__(self, layoutxlm, num_classes=2, dropout=None):
  6. super(LayoutXLMForTokenClassification, self).__init__()
  7. self.num_classes = num_classes
  8. if isinstance(layoutxlm, dict):
  9. self.layoutxlm = LayoutXLMModel(**layoutxlm)
  10. else:
  11. self.layoutxlm = layoutxlm
  12. self.dropout = nn.Dropout(dropout if dropout is not None else self.layoutxlm.config["hidden_dropout_prob"])
  13. self.classifier = nn.Linear(self.layoutxlm.config["hidden_size"],num_classes)
  14. self.classifier.apply(self.init_weights)
  15. def get_input_embeddings(self):
  16. return self.layoutxlm.embeddings.word_embeddings
  17. def forward(self, input_ids=None, bbox=None, image=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, labels=None):
  18. # backbone 运算
  19. outputs = self.layoutxlm(input_ids=input_ids, bbox=bbox, image=image, attention_mask=attention_mask, token_type_ids=token_type_ids, position_ids=position_ids, head_mask=head_mask)
  20. seq_length = input_ids.shape[1]
  21. # head 运算
  22. sequence_output, image_output = outputs[0][:, :seq_length], outputs[0][:, seq_length:]
  23. sequence_output = self.dropout(sequence_output)
  24. logits = self.classifier(sequence_output)
  25. outputs = logits,
  26. # 计算loss
  27. if labels is not None:
  28. loss_fct = nn.CrossEntropyLoss()
  29. if attention_mask is not None:
  30. active_loss = attention_mask.reshape([-1, ]) == 1
  31. active_logits = logits.reshape([-1, self.num_classes])[active_loss]
  32. active_labels = labels.reshape([-1, ])[active_loss]
  33. loss = loss_fct(active_logits, active_labels)
  34. else:
  35. loss = loss_fct(logits.reshape([-1, self.num_classes]),labels.reshape([-1, ]))
  36. outputs = (loss, ) + outputs
  37. return outputs
  1. # 初始化网络
  2. net = LayoutXLMForTokenClassification.from_pretrained('inference/PP-Layout_v1.0_ser_pretrained')
  3. net.eval()
  4. # 执行网络前向
  5. outputs = net(input_ids=inputs["input_ids"],
  6. bbox=inputs["bbox"],
  7. image=inputs["image"],
  8. token_type_ids=inputs["token_type_ids"],
  9. attention_mask=inputs["attention_mask"])
  10. print(outputs[0].shape)
  1. [2, 512, 7]

2.3 后处理

后处理主要完成将模型输出的文本锻的预测结果对应到文本上,并且将结果和ocr的结果进行结合,主要包含以下几个步骤

  1. 每一个文本,统计该文本下所有文本段预测的label
  2. 选取所有文本段预测最多的label作为该文本的label
  1. # https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.4/ppstructure/vqa/vqa_utils.py
  2. import paddle
  3. import numpy as np
  4. from infer_ser_e2e import get_bio_label_maps
  5. label2id_map, id2label_map = get_bio_label_maps('labels/labels_ser.txt')
  6. def postprocess(attention_mask, preds, id2label_map):
  7. if isinstance(preds, paddle.Tensor):
  8. preds = preds.numpy()
  9. preds = np.argmax(preds, axis=2)
  10. preds_list = [[] for _ in range(preds.shape[0])]
  11. # keep batch info
  12. for i in range(preds.shape[0]):
  13. for j in range(preds.shape[1]):
  14. if attention_mask[i][j] == 1:
  15. preds_list[i].append(id2label_map[preds[i][j]])
  16. return preds_list
  17. def merge_preds_list_with_ocr_info(ocr_info, segment_offset_id, preds_list,
  18. label2id_map_for_draw):
  19. # list flatten
  20. preds = [p for pred in preds_list for p in pred]
  21. # label2idx的字典转换为idx2label的字段,去掉B-和I-的前缀
  22. id2label_map = dict()
  23. for key in label2id_map_for_draw:
  24. val = label2id_map_for_draw[key]
  25. if key == "O":
  26. id2label_map[val] = key
  27. if key.startswith("B-") or key.startswith("I-"):
  28. id2label_map[val] = key[2:]
  29. else:
  30. id2label_map[val] = key
  31. print("id2label_map:",id2label_map)
  32. # 对每一个文本,统计预测的label
  33. for idx in range(len(segment_offset_id)):
  34. if idx == 0:
  35. start_id = 0
  36. else:
  37. start_id = segment_offset_id[idx - 1]
  38. end_id = segment_offset_id[idx]
  39. # 拿出文本在 输出里的范围
  40. curr_pred = preds[start_id:end_id]
  41. # 拿出文本在 输出里的所有预测结果
  42. curr_pred = [label2id_map_for_draw[p] for p in curr_pred]
  43. if len(curr_pred) <= 0:
  44. pred_id = 0
  45. else:
  46. # print("pred label:",curr_pred)
  47. # 对label进行计数
  48. counts = np.bincount(curr_pred)
  49. # print("counts:",counts)
  50. pred_id = np.argmax(counts)
  51. ocr_info[idx]["pred_id"] = int(pred_id)
  52. ocr_info[idx]["pred"] = id2label_map[int(pred_id)]
  53. # print("pred label:",id2label_map[int(pred_id)])
  54. return ocr_info
  55. preds = postprocess(inputs["attention_mask"], outputs[0], id2label_map)
  56. # I 开头的value label替换为 B开头的
  57. label2id_map_for_draw = dict()
  58. for key in label2id_map:
  59. if key.startswith("I-"):
  60. label2id_map_for_draw[key] = label2id_map["B" + key[1:]]
  61. else:
  62. label2id_map_for_draw[key] = label2id_map[key]
  63. print("label2id_map:",label2id_map)
  64. print("label2id_map_for_draw:",label2id_map_for_draw)
  65. # 将预测信息和ocr信息合并
  66. ocr_info_with_ser = merge_preds_list_with_ocr_info(ocr_info, inputs["segment_offset_id"], preds, label2id_map_for_draw)
  67. print(ocr_info_with_ser)
  1. label2id_map: {'O': 0, 'B-QUESTION': 1, 'I-QUESTION': 2, 'B-ANSWER': 3, 'I-ANSWER': 4, 'B-HEADER': 5, 'I-HEADER': 6}
  2. label2id_map_for_draw: {'O': 0, 'B-QUESTION': 1, 'I-QUESTION': 1, 'B-ANSWER': 3, 'I-ANSWER': 3, 'B-HEADER': 5, 'I-HEADER': 5}
  3. id2label_map: {0: 'O', 1: 'QUESTION', 3: 'ANSWER', 5: 'HEADER'}
  4. [{'text': '个人信息登记表', 'bbox': [1026.0, 292.0, 1495.0, 377.0], 'poly': [[1027.0, 292.0], [1495.0, 300.0], [1494.0, 377.0], [1026.0, 369.0]], 'pred_id': 5, 'pred': 'HEADER'}, {'text': '申报学院(部门):', 'bbox': [207.0, 424.0, 587.0, 475.0], 'poly': [[207.0, 424.0], [587.0, 424.0], [587.0, 475.0], [207.0, 475.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '出生', 'bbox': [1144.0, 526.0, 1218.0, 566.0], 'poly': [[1144.0, 526.0], [1218.0, 526.0], [1218.0, 566.0], [1144.0, 566.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '政治', 'bbox': [1616.0, 530.0, 1709.0, 570.0], 'poly': [[1616.0, 530.0], [1709.0, 530.0], [1709.0, 570.0], [1616.0, 570.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '1997年12月17日「面貌', 'bbox': [1298.0, 558.0, 1713.0, 644.0], 'poly': [[1301.0, 558.0], [1713.0, 571.0], [1711.0, 644.0], [1298.0, 631.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '岳欣欣', 'bbox': [491.0, 559.0, 653.0, 614.0], 'poly': [[491.0, 559.0], [653.0, 559.0], [653.0, 614.0], [491.0, 614.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '性别', 'bbox': [805.0, 559.0, 908.0, 618.0], 'poly': [[805.0, 559.0], [908.0, 559.0], [908.0, 618.0], [805.0, 618.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '群众', 'bbox': [1801.0, 552.0, 1886.0, 614.0], 'poly': [[1801.0, 552.0], [1886.0, 552.0], [1886.0, 614.0], [1801.0, 614.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '女', 'bbox': [1008.0, 563.0, 1070.0, 610.0], 'poly': [[1008.0, 563.0], [1070.0, 563.0], [1070.0, 610.0], [1008.0, 610.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '(拼音)', 'bbox': [207.0, 577.0, 354.0, 629.0], 'poly': [[207.0, 577.0], [354.0, 577.0], [354.0, 629.0], [207.0, 629.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '日期', 'bbox': [1126.0, 577.0, 1222.0, 632.0], 'poly': [[1126.0, 577.0], [1222.0, 577.0], [1222.0, 632.0], [1126.0, 632.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '婚育「', 'bbox': [1120.0, 634.0, 1272.0, 714.0], 'poly': [[1127.0, 634.0], [1272.0, 649.0], [1265.0, 714.0], [1120.0, 699.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '生源地/培养形式', 'bbox': [1506.0, 647.0, 1790.0, 698.0], 'poly': [[1506.0, 647.0], [1790.0, 647.0], [1790.0, 698.0], [1506.0, 698.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '未婚', 'bbox': [1270.0, 654.0, 1369.0, 709.0], 'poly': [[1270.0, 654.0], [1369.0, 654.0], [1369.0, 709.0], [1270.0, 709.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '寸报名照', 'bbox': [2140.0, 650.0, 2273.0, 702.0], 'poly': [[2140.0, 650.0], [2273.0, 650.0], [2273.0, 702.0], [2140.0, 702.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '中国', 'bbox': [443.0, 661.0, 528.0, 720.0], 'poly': [[443.0, 661.0], [528.0, 661.0], [528.0, 720.0], [443.0, 720.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '国籍', 'bbox': [244.0, 672.0, 336.0, 731.0], 'poly': [[244.0, 672.0], [336.0, 672.0], [336.0, 731.0], [244.0, 731.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '民族', 'bbox': [812.0, 672.0, 904.0, 734.0], 'poly': [[812.0, 672.0], [904.0, 672.0], [904.0, 734.0], [812.0, 734.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '辽宁省西丰县', 'bbox': [1841.0, 668.0, 2064.0, 731.0], 'poly': [[1843.0, 668.0], [2064.0, 677.0], [2062.0, 731.0], [1841.0, 723.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '状况', 'bbox': [1137.0, 713.0, 1218.0, 756.0], 'poly': [[1137.0, 713.0], [1218.0, 713.0], [1218.0, 756.0], [1137.0, 756.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '(应届毕业生填写)', 'bbox': [1528.0, 720.0, 1782.0, 760.0], 'poly': [[1528.0, 720.0], [1782.0, 720.0], [1782.0, 760.0], [1528.0, 760.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '粘贴处', 'bbox': [2140.0, 716.0, 2251.0, 767.0], 'poly': [[2140.0, 716.0], [2251.0, 716.0], [2251.0, 767.0], [2140.0, 767.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '现工作(学习)', 'bbox': [192.0, 767.0, 484.0, 822.0], 'poly': [[192.0, 767.0], [484.0, 771.0], [483.0, 822.0], [192.0, 818.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '大连海事大学', 'bbox': [542.0, 781.0, 794.0, 841.0], 'poly': [[544.0, 781.0], [794.0, 790.0], [792.0, 841.0], [542.0, 833.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '单位', 'bbox': [967.0, 775.0, 1074.0, 829.0], 'poly': [[967.0, 775.0], [1074.0, 775.0], [1074.0, 829.0], [967.0, 829.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '辽宁省大连市甘井子区凌海路3号', 'bbox': [1118.0, 789.0, 1701.0, 844.0], 'poly': [[1118.0, 789.0], [1701.0, 793.0], [1701.0, 844.0], [1118.0, 840.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '邮编', 'bbox': [1727.0, 819.0, 1808.0, 862.0], 'poly': [[1727.0, 819.0], [1808.0, 819.0], [1808.0, 862.0], [1727.0, 862.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '100085', 'bbox': [1893.0, 815.0, 2004.0, 866.0], 'poly': [[1893.0, 815.0], [2004.0, 815.0], [2004.0, 866.0], [1893.0, 866.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '单位', 'bbox': [295.0, 833.0, 391.0, 888.0], 'poly': [[295.0, 833.0], [391.0, 833.0], [391.0, 888.0], [295.0, 888.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '地址', 'bbox': [967.0, 844.0, 1055.0, 888.0], 'poly': [[967.0, 844.0], [1055.0, 844.0], [1055.0, 888.0], [967.0, 888.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '现专业技术职', 'bbox': [207.0, 906.0, 454.0, 957.0], 'poly': [[207.0, 906.0], [454.0, 906.0], [454.0, 957.0], [207.0, 957.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '专技', 'bbox': [974.0, 910.0, 1052.0, 961.0], 'poly': [[974.0, 910.0], [1052.0, 910.0], [1052.0, 961.0], [974.0, 961.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '现任', 'bbox': [1524.0, 917.0, 1620.0, 961.0], 'poly': [[1524.0, 917.0], [1620.0, 917.0], [1620.0, 961.0], [1524.0, 961.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '务及评定时间', 'bbox': [207.0, 968.0, 447.0, 1020.0], 'poly': [[207.0, 968.0], [447.0, 968.0], [447.0, 1020.0], [207.0, 1020.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '等级', 'bbox': [967.0, 968.0, 1052.0, 1023.0], 'poly': [[967.0, 968.0], [1052.0, 968.0], [1052.0, 1023.0], [967.0, 1023.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '行政职务', 'bbox': [1484.0, 976.0, 1653.0, 1027.0], 'poly': [[1484.0, 976.0], [1653.0, 976.0], [1653.0, 1027.0], [1484.0, 1027.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '学科专业', 'bbox': [244.0, 1045.0, 421.0, 1096.0], 'poly': [[244.0, 1045.0], [421.0, 1045.0], [421.0, 1096.0], [244.0, 1096.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '船只制造', 'bbox': [576.0, 1038.0, 749.0, 1093.0], 'poly': [[576.0, 1038.0], [749.0, 1038.0], [749.0, 1093.0], [576.0, 1093.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '主要学术兼职', 'bbox': [1136.0, 1048.0, 1385.0, 1111.0], 'poly': [[1138.0, 1048.0], [1385.0, 1057.0], [1383.0, 1111.0], [1136.0, 1103.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '无', 'bbox': [1546.0, 1052.0, 1624.0, 1104.0], 'poly': [[1546.0, 1052.0], [1624.0, 1052.0], [1624.0, 1104.0], [1546.0, 1104.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '研究生', 'bbox': [450.0, 1125.0, 576.0, 1177.0], 'poly': [[450.0, 1125.0], [576.0, 1125.0], [576.0, 1177.0], [450.0, 1177.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '最高学历', 'bbox': [202.0, 1128.0, 366.0, 1192.0], 'poly': [[205.0, 1128.0], [366.0, 1137.0], [363.0, 1192.0], [202.0, 1183.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '毕业单位', 'bbox': [870.0, 1124.0, 1049.0, 1200.0], 'poly': [[873.0, 1124.0], [1049.0, 1134.0], [1046.0, 1200.0], [870.0, 1190.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '大连海事大学', 'bbox': [1192.0, 1136.0, 1439.0, 1188.0], 'poly': [[1192.0, 1136.0], [1439.0, 1136.0], [1439.0, 1188.0], [1192.0, 1188.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '毕业时间', 'bbox': [1890.0, 1144.0, 2056.0, 1195.0], 'poly': [[1890.0, 1144.0], [2056.0, 1144.0], [2056.0, 1195.0], [1890.0, 1195.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '2019年6月', 'bbox': [2148.0, 1147.0, 2314.0, 1199.0], 'poly': [[2148.0, 1147.0], [2314.0, 1147.0], [2314.0, 1199.0], [2148.0, 1199.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '获得单位', 'bbox': [869.0, 1204.0, 1050.0, 1284.0], 'poly': [[875.0, 1204.0], [1050.0, 1218.0], [1044.0, 1284.0], [869.0, 1270.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '最高学位', 'bbox': [207.0, 1224.0, 369.0, 1275.0], 'poly': [[207.0, 1224.0], [369.0, 1224.0], [369.0, 1275.0], [207.0, 1275.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '硕士学位', 'bbox': [465.0, 1217.0, 624.0, 1272.0], 'poly': [[465.0, 1217.0], [624.0, 1217.0], [624.0, 1272.0], [465.0, 1272.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '大连海事大学', 'bbox': [1188.0, 1231.0, 1428.0, 1283.0], 'poly': [[1188.0, 1231.0], [1428.0, 1231.0], [1428.0, 1283.0], [1188.0, 1283.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '获得时间', 'bbox': [1890.0, 1235.0, 2059.0, 1286.0], 'poly': [[1890.0, 1235.0], [2059.0, 1235.0], [2059.0, 1286.0], [1890.0, 1286.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '2019年6月', 'bbox': [2152.0, 1250.0, 2325.0, 1290.0], 'poly': [[2152.0, 1250.0], [2325.0, 1250.0], [2325.0, 1290.0], [2152.0, 1290.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '「证件号码', 'bbox': [619.0, 1304.0, 813.0, 1375.0], 'poly': [[622.0, 1304.0], [813.0, 1313.0], [810.0, 1375.0], [619.0, 1366.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '证件类型', 'bbox': [202.0, 1307.0, 385.0, 1371.0], 'poly': [[205.0, 1307.0], [385.0, 1316.0], [382.0, 1371.0], [202.0, 1362.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '学位证书', 'bbox': [454.0, 1308.0, 661.0, 1370.0], 'poly': [[454.0, 1308.0], [661.0, 1308.0], [661.0, 1370.0], [454.0, 1370.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '047854648486893', 'bbox': [885.0, 1300.0, 1163.0, 1356.0], 'poly': [[887.0, 1300.0], [1163.0, 1309.0], [1162.0, 1356.0], [885.0, 1348.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '户籍详址', 'bbox': [1395.0, 1319.0, 1572.0, 1370.0], 'poly': [[1395.0, 1319.0], [1572.0, 1319.0], [1572.0, 1370.0], [1395.0, 1370.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '辽宁省西丰县安民镇永淳村二组62号', 'bbox': [1635.0, 1319.0, 2270.0, 1374.0], 'poly': [[1635.0, 1319.0], [2270.0, 1323.0], [2269.0, 1374.0], [1635.0, 1370.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '档案所在', 'bbox': [207.0, 1389.0, 395.0, 1447.0], 'poly': [[207.0, 1389.0], [395.0, 1389.0], [395.0, 1447.0], [207.0, 1447.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '大连海事大学', 'bbox': [483.0, 1388.0, 720.0, 1459.0], 'poly': [[485.0, 1388.0], [720.0, 1396.0], [718.0, 1459.0], [483.0, 1450.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '13585662395', 'bbox': [1634.0, 1409.0, 1843.0, 1474.0], 'poly': [[1637.0, 1409.0], [1843.0, 1422.0], [1839.0, 1474.0], [1634.0, 1461.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '联系电话', 'bbox': [1402.0, 1429.0, 1568.0, 1484.0], 'poly': [[1402.0, 1429.0], [1568.0, 1429.0], [1568.0, 1484.0], [1402.0, 1484.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '100085', 'bbox': [2096.0, 1421.0, 2218.0, 1484.0], 'poly': [[2096.0, 1421.0], [2218.0, 1421.0], [2218.0, 1484.0], [2096.0, 1484.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '邮编', 'bbox': [1971.0, 1432.0, 2063.0, 1487.0], 'poly': [[1971.0, 1432.0], [2063.0, 1432.0], [2063.0, 1487.0], [1971.0, 1487.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '辽宁省大连市廿井子区凌海路3号', 'bbox': [480.0, 1443.0, 1059.0, 1506.0], 'poly': [[480.0, 1443.0], [1059.0, 1443.0], [1059.0, 1506.0], [480.0, 1506.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '单位及地址', 'bbox': [203.0, 1454.0, 413.0, 1506.0], 'poly': [[203.0, 1454.0], [413.0, 1454.0], [413.0, 1506.0], [203.0, 1506.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '有否境外永居', 'bbox': [196.0, 1531.0, 439.0, 1582.0], 'poly': [[196.0, 1531.0], [439.0, 1531.0], [439.0, 1582.0], [196.0, 1582.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '有无上海市居住证', 'bbox': [971.0, 1535.0, 1295.0, 1586.0], 'poly': [[971.0, 1535.0], [1295.0, 1535.0], [1295.0, 1586.0], [971.0, 1586.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '无', 'bbox': [1458.0, 1532.0, 1524.0, 1593.0], 'poly': [[1471.0, 1532.0], [1524.0, 1545.0], [1511.0, 1593.0], [1458.0, 1580.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '居住证有效期', 'bbox': [1646.0, 1542.0, 1901.0, 1593.0], 'poly': [[1646.0, 1542.0], [1901.0, 1542.0], [1901.0, 1593.0], [1646.0, 1593.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '本人', 'bbox': [198.0, 1603.0, 322.0, 1668.0], 'poly': [[202.0, 1603.0], [322.0, 1613.0], [318.0, 1668.0], [198.0, 1658.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '13585662395', 'bbox': [646.0, 1608.0, 856.0, 1659.0], 'poly': [[646.0, 1608.0], [856.0, 1608.0], [856.0, 1659.0], [646.0, 1659.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '联系电话', 'bbox': [358.0, 1619.0, 542.0, 1670.0], 'poly': [[358.0, 1619.0], [542.0, 1619.0], [542.0, 1670.0], [358.0, 1670.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '肖电手命箱', 'bbox': [1373.0, 1626.0, 1572.0, 1677.0], 'poly': [[1373.0, 1626.0], [1572.0, 1626.0], [1572.0, 1677.0], [1373.0, 1677.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '联系电', 'bbox': [66.0, 1659.0, 173.0, 1714.0], 'poly': [[66.0, 1659.0], [173.0, 1659.0], [173.0, 1714.0], [66.0, 1714.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '联关亲', 'bbox': [168.0, 1658.0, 308.0, 1733.0], 'poly': [[173.0, 1658.0], [308.0, 1668.0], [303.0, 1733.0], [168.0, 1723.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '辽宁省大连市甘井子区凌海路3号', 'bbox': [631.0, 1695.0, 1229.0, 1769.0], 'poly': [[632.0, 1695.0], [1229.0, 1707.0], [1228.0, 1769.0], [631.0, 1757.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '通讯地址', 'bbox': [364.0, 1709.0, 532.0, 1773.0], 'poly': [[367.0, 1709.0], [532.0, 1718.0], [529.0, 1773.0], [364.0, 1764.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '方式', 'bbox': [196.0, 1728.0, 303.0, 1787.0], 'poly': [[196.0, 1728.0], [303.0, 1728.0], [303.0, 1787.0], [196.0, 1787.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '邮编', 'bbox': [1982.0, 1732.0, 2048.0, 1776.0], 'poly': [[1982.0, 1732.0], [2048.0, 1732.0], [2048.0, 1776.0], [1982.0, 1776.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '110008市层住证', 'bbox': [2135.0, 1727.0, 2445.0, 1818.0], 'poly': [[2141.0, 1727.0], [2445.0, 1756.0], [2439.0, 1818.0], [2135.0, 1789.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '有香境外永归', 'bbox': [1351.0, 1758.0, 1554.0, 1794.0], 'poly': [[1351.0, 1758.0], [1554.0, 1758.0], [1554.0, 1794.0], [1351.0, 1794.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '起始年月', 'bbox': [362.0, 1812.0, 542.0, 1864.0], 'poly': [[362.0, 1812.0], [542.0, 1812.0], [542.0, 1864.0], [362.0, 1864.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '终止年月', 'bbox': [594.0, 1820.0, 775.0, 1860.0], 'poly': [[594.0, 1820.0], [775.0, 1820.0], [775.0, 1860.0], [594.0, 1860.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '学校或工作单位', 'bbox': [993.0, 1823.0, 1270.0, 1864.0], 'poly': [[993.0, 1823.0], [1270.0, 1823.0], [1270.0, 1864.0], [993.0, 1864.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '学习或任职情况', 'bbox': [1767.0, 1820.0, 2063.0, 1875.0], 'poly': [[1768.0, 1820.0], [2063.0, 1824.0], [2063.0, 1875.0], [1767.0, 1871.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '备注', 'bbox': [2181.0, 1823.0, 2273.0, 1878.0], 'poly': [[2181.0, 1823.0], [2273.0, 1823.0], [2273.0, 1878.0], [2181.0, 1878.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '联系', 'bbox': [1362.0, 1904.0, 1432.0, 1944.0], 'poly': [[1362.0, 1904.0], [1432.0, 1904.0], [1432.0, 1944.0], [1362.0, 1944.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '辽宁省大连市甘', 'bbox': [1808.0, 1933.0, 2085.0, 1973.0], 'poly': [[1808.0, 1933.0], [2085.0, 1933.0], [2085.0, 1973.0], [1808.0, 1973.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '凌海', 'bbox': [2181.0, 1937.0, 2288.0, 1977.0], 'poly': [[2181.0, 1937.0], [2288.0, 1937.0], [2288.0, 1977.0], [2181.0, 1977.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '通讯地士', 'bbox': [1520.0, 1948.0, 1646.0, 1984.0], 'poly': [[1520.0, 1948.0], [1646.0, 1948.0], [1646.0, 1984.0], [1520.0, 1984.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '简历', 'bbox': [225.0, 2094.0, 269.0, 2167.0], 'poly': [[225.0, 2094.0], [269.0, 2094.0], [269.0, 2167.0], [225.0, 2167.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '顺序填记', 'bbox': [225.0, 2361.0, 262.0, 2529.0], 'poly': [[225.0, 2361.0], [262.0, 2361.0], [262.0, 2529.0], [225.0, 2529.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '平火臂防醉间上要连续:', 'bbox': [465.0, 2594.0, 816.0, 2650.0], 'poly': [[466.0, 2594.0], [816.0, 2602.0], [815.0, 2650.0], [465.0, 2642.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '大专家项博气治》任职岗位、访问学者等实际情况', 'bbox': [1107.0, 2601.0, 1945.0, 2664.0], 'poly': [[1108.0, 2601.0], [1945.0, 2617.0], [1944.0, 2664.0], [1107.0, 2649.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '直博', 'bbox': [210.0, 2642.0, 306.0, 2686.0], 'poly': [[210.0, 2642.0], [306.0, 2642.0], [306.0, 2686.0], [210.0, 2686.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '茶职、张超等经房理,无就迎或学习器明的价段需填写“待业”或注明原', 'bbox': [406.0, 2711.0, 1845.0, 2785.0], 'poly': [[406.0, 2711.0], [1845.0, 2726.0], [1845.0, 2785.0], [406.0, 2770.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '看水书名称', 'bbox': [517.0, 2799.0, 775.0, 2850.0], 'poly': [[517.0, 2799.0], [775.0, 2799.0], [775.0, 2850.0], [517.0, 2850.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '写作菜', 'bbox': [1140.0, 2806.0, 1244.0, 2854.0], 'poly': [[1140.0, 2806.0], [1244.0, 2806.0], [1244.0, 2854.0], [1140.0, 2854.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '等级', 'bbox': [1941.0, 2814.0, 2030.0, 2858.0], 'poly': [[1941.0, 2814.0], [2030.0, 2814.0], [2030.0, 2858.0], [1941.0, 2858.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '语种', 'bbox': [369.0, 2825.0, 465.0, 2879.0], 'poly': [[369.0, 2825.0], [465.0, 2825.0], [465.0, 2879.0], [369.0, 2879.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '口语水平', 'bbox': [900.0, 2828.0, 1070.0, 2879.0], 'poly': [[900.0, 2828.0], [1070.0, 2828.0], [1070.0, 2879.0], [900.0, 2879.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '女名积得时间', 'bbox': [1581.0, 2808.0, 1847.0, 2896.0], 'poly': [[1581.0, 2829.0], [1842.0, 2808.0], [1847.0, 2875.0], [1587.0, 2896.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '取得时间。', 'bbox': [2117.0, 2827.0, 2304.0, 2906.0], 'poly': [[2122.0, 2827.0], [2304.0, 2841.0], [2299.0, 2906.0], [2117.0, 2893.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '及或货', 'bbox': [203.0, 2843.0, 280.0, 2883.0], 'poly': [[203.0, 2843.0], [280.0, 2843.0], [280.0, 2883.0], [203.0, 2883.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '(等级城分敷)', 'bbox': [546.0, 2861.0, 830.0, 2912.0], 'poly': [[546.0, 2861.0], [830.0, 2861.0], [830.0, 2912.0], [546.0, 2912.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '水平', 'bbox': [1140.0, 2869.0, 1236.0, 2912.0], 'poly': [[1140.0, 2869.0], [1236.0, 2869.0], [1236.0, 2912.0], [1140.0, 2912.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '或分数', 'bbox': [1919.0, 2872.0, 2045.0, 2923.0], 'poly': [[1919.0, 2872.0], [2045.0, 2872.0], [2045.0, 2923.0], [1919.0, 2923.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '语', 'bbox': [218.0, 2898.0, 288.0, 2960.0], 'poly': [[218.0, 2898.0], [288.0, 2898.0], [288.0, 2960.0], [218.0, 2960.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '其他', 'bbox': [1310.0, 2901.0, 1399.0, 2960.0], 'poly': [[1310.0, 2901.0], [1399.0, 2901.0], [1399.0, 2960.0], [1310.0, 2960.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '英语', 'bbox': [365.0, 2934.0, 480.0, 2989.0], 'poly': [[365.0, 2934.0], [480.0, 2934.0], [480.0, 2989.0], [365.0, 2989.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '六级', 'bbox': [661.0, 2931.0, 760.0, 2985.0], 'poly': [[661.0, 2931.0], [760.0, 2931.0], [760.0, 2985.0], [661.0, 2985.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '育能', 'bbox': [218.0, 2949.0, 258.0, 3033.0], 'poly': [[218.0, 2949.0], [258.0, 2949.0], [258.0, 3033.0], [218.0, 3033.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '资格', 'bbox': [1306.0, 2964.0, 1395.0, 3026.0], 'poly': [[1306.0, 2964.0], [1395.0, 2964.0], [1395.0, 3026.0], [1306.0, 3026.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '证书', 'bbox': [1306.0, 3026.0, 1402.0, 3084.0], 'poly': [[1306.0, 3026.0], [1402.0, 3026.0], [1402.0, 3084.0], [1306.0, 3084.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '专主,错/号作水平分为:精通、熟练、一般。', 'bbox': [346.0, 3102.0, 1015.0, 3169.0], 'poly': [[348.0, 3102.0], [1015.0, 3117.0], [1014.0, 3169.0], [346.0, 3153.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '1-3', 'bbox': [1210.0, 3270.0, 1262.0, 3318.0], 'poly': [[1210.0, 3270.0], [1262.0, 3270.0], [1262.0, 3318.0], [1210.0, 3318.0]], 'pred_id': 0, 'pred': 'O'}]

3. 训练

本节以XFUN中文数据集为例介绍如何完成SER模型的训练、评估与测试。

3.1 数据准备

这里使用XFUN数据集做为实验数据集。
XFUN数据集是微软提出的一个用于KIE任务的多语言数据集,共包含七个数据集,每个数据集包含149张训练集和50张验证集

  • ZH(中文)
  • JA(日语)
  • ES(西班牙)
  • FR(法语)
  • IT(意大利)
  • DE(德语)
  • PT(葡萄牙)

本次实验选取中文数据集作为我们的演示数据集。法语数据集作为实践课程的数据集,数据集样例图如下图所示
image.png
图2 数据集样例,左中文,右法语
可以运行如下指令完成中文数据集下载和解压,或者从 https://github.com/doc-analysis/XFUND 中自行下载。

  1. ! wget https://paddleocr.bj.bcebos.com/dataset/XFUND.tar
  2. ! tar -xf XFUND.tar
  3. # XFUN其他数据集使用下面的代码进行转换
  4. # https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.4/ppstructure/vqa/helper/trans_xfun_data.py
  1. File XFUND.tar already there; not retrieving.

运行上述指令后在 /home/aistudio/PaddleOCR/ppstructure/vqa/XFUND 目录下有2个文件夹,目录结构如下所示:

  1. /home/aistudio/PaddleOCR/ppstructure/vqa/XFUND
  2. └─ zh_train/ 训练集
  3. ├── image/ 图片存放文件夹
  4. ├── xfun_normalize_train.json 标注信息
  5. └─ zh_val/ 验证集
  6. ├── image/ 图片存放文件夹
  7. ├── xfun_normalize_val.json 标注信息

该数据集的标注格式为

  1. {
  2. "height": 3508, # 图像高度
  3. "width": 2480, # 图像宽度
  4. "ocr_info": [
  5. {
  6. "text": "邮政地址:", # 单个文本内容
  7. "label": "question", # 文本所属类别
  8. "bbox": [261, 802, 483, 859], # 单个文本框
  9. "id": 54, # 文本索引
  10. "linking": [[54, 60]], # 当前文本和其他文本的关系 [question, answer]
  11. "words": []
  12. },
  13. {
  14. "text": "湖南省怀化市市辖区",
  15. "label": "answer",
  16. "bbox": [487, 810, 862, 859],
  17. "id": 60,
  18. "linking": [[54, 60]],
  19. "words": []
  20. }
  21. ]
  22. }

3.2 损失函数定义

因为是多分类任务,loss使用 CrossEntropyLoss

3.3 模型训练

完成数据处理和损失函数定义后即可开始训练模型了。
具体训练命令如下:

  1. ! python train_ser.py \
  2. --model_name_or_path "layoutxlm-base-uncased" \
  3. --ser_model_type "LayoutXLM" \
  4. --train_data_dir "XFUND/zh_train/image" \
  5. --train_label_path "XFUND/zh_train/xfun_normalize_train.json" \
  6. --eval_data_dir "XFUND/zh_val/image" \
  7. --eval_label_path "XFUND/zh_val/xfun_normalize_val.json" \
  8. --per_gpu_train_batch_size 1 \
  9. --per_gpu_eval_batch_size 1 \
  10. --num_train_epochs 200 \
  11. --eval_steps 10 \
  12. --output_dir "./output/ser/" \
  13. --learning_rate 5e-5 \
  14. --warmup_steps 50 \
  15. --evaluate_during_training \
  16. --num_workers 0 \
  17. --seed 2048
  1. [2021/12/25 21:12:05] root INFO: ----------- Configuration Arguments -----------
  2. [2021/12/25 21:12:05] root INFO: adam_epsilon: 1e-08
  3. [2021/12/25 21:12:05] root INFO: det_model_dir: None
  4. [2021/12/25 21:12:05] root INFO: eval_data_dir: XFUND/zh_val/image
  5. [2021/12/25 21:12:05] root INFO: eval_label_path: XFUND/zh_val/xfun_normalize_val.json
  6. [2021/12/25 21:12:05] root INFO: eval_steps: 10
  7. [2021/12/25 21:12:05] root INFO: evaluate_during_training: True
  8. [2021/12/25 21:12:05] root INFO: infer_imgs: None
  9. [2021/12/25 21:12:05] root INFO: label_map_path: ./labels/labels_ser.txt
  10. [2021/12/25 21:12:05] root INFO: learning_rate: 5e-05
  11. [2021/12/25 21:12:05] root INFO: max_grad_norm: 1.0
  12. [2021/12/25 21:12:05] root INFO: max_seq_length: 512
  13. [2021/12/25 21:12:05] root INFO: model_name_or_path: layoutxlm-base-uncased
  14. [2021/12/25 21:12:05] root INFO: num_train_epochs: 200
  15. [2021/12/25 21:12:05] root INFO: num_workers: 0
  16. [2021/12/25 21:12:05] root INFO: ocr_json_path: None
  17. [2021/12/25 21:12:05] root INFO: output_dir: ./output/ser/
  18. [2021/12/25 21:12:05] root INFO: per_gpu_eval_batch_size: 1
  19. [2021/12/25 21:12:05] root INFO: per_gpu_train_batch_size: 1
  20. [2021/12/25 21:12:05] root INFO: re_model_name_or_path: None
  21. [2021/12/25 21:12:05] root INFO: rec_model_dir: None
  22. [2021/12/25 21:12:05] root INFO: resume: False
  23. [2021/12/25 21:12:05] root INFO: seed: 2048
  24. [2021/12/25 21:12:05] root INFO: ser_model_type: LayoutXLM
  25. [2021/12/25 21:12:05] root INFO: train_data_dir: XFUND/zh_train/image
  26. [2021/12/25 21:12:05] root INFO: train_label_path: XFUND/zh_train/xfun_normalize_train.json
  27. [2021/12/25 21:12:05] root INFO: warmup_steps: 50
  28. [2021/12/25 21:12:05] root INFO: weight_decay: 0.0
  29. [2021/12/25 21:12:05] root INFO: ------------------------------------------------
  30. [2021-12-25 21:12:05,398] [ INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/transformers/layoutxlm_base/sentencepiece.bpe.model and saved to /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased
  31. [2021-12-25 21:12:05,398] [ INFO] - Downloading sentencepiece.bpe.model from https://bj.bcebos.com/paddlenlp/models/transformers/layoutxlm_base/sentencepiece.bpe.model
  32. 100%|██████████████████████████████████████| 4.83M/4.83M [00:00<00:00, 50.2MB/s]
  33. [2021-12-25 21:12:06,284] [ INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/transformers/layoutxlm_base/model_state.pdparams and saved to /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased
  34. [2021-12-25 21:12:06,285] [ INFO] - Downloading model_state.pdparams from https://bj.bcebos.com/paddlenlp/models/transformers/layoutxlm_base/model_state.pdparams
  35. 100%|██████████████████████████████████████| 1.37G/1.37G [00:22<00:00, 67.1MB/s]
  36. W1225 21:12:28.457448 1236 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
  37. W1225 21:12:28.461452 1236 device_context.cc:465] device: 0, cuDNN Version: 7.6.
  38. [2021/12/25 21:12:39] root INFO: train from scratch
  39. [2021/12/25 21:12:40] root INFO: ***** Running training *****
  40. [2021/12/25 21:12:40] root INFO: Num examples = 149
  41. [2021/12/25 21:12:40] root INFO: Num Epochs = 200
  42. [2021/12/25 21:12:40] root INFO: Instantaneous batch size per GPU = 1
  43. [2021/12/25 21:12:40] root INFO: Total train batch size (w. parallel, distributed) = 1
  44. [2021/12/25 21:12:40] root INFO: Total optimization steps = 29800
  45. [2021/12/25 21:12:41] root INFO: epoch: [0/200], iter: [0/149], global_step:1, train loss: 1.922090, lr: 0.000001, avg_reader_cost: 0.60275 sec, avg_batch_cost: 0.71423 sec, avg_samples: 1.00000, ips: 1.40012 images/sec
  46. [2021/12/25 21:12:41] root INFO: epoch: [0/200], iter: [1/149], global_step:2, train loss: 1.916964, lr: 0.000002, avg_reader_cost: 0.00012 sec, avg_batch_cost: 0.11029 sec, avg_samples: 1.00000, ips: 9.06733 images/sec
  47. [2021/12/25 21:12:41] root INFO: epoch: [0/200], iter: [2/149], global_step:3, train loss: 1.950876, lr: 0.000003, avg_reader_cost: 0.00013 sec, avg_batch_cost: 0.08194 sec, avg_samples: 1.00000, ips: 12.20463 images/sec
  48. [2021/12/25 21:12:41] root INFO: epoch: [0/200], iter: [3/149], global_step:4, train loss: 1.901511, lr: 0.000004, avg_reader_cost: 0.00011 sec, avg_batch_cost: 0.08840 sec, avg_samples: 1.00000, ips: 11.31248 images/sec
  49. [2021/12/25 21:12:42] root INFO: epoch: [0/200], iter: [4/149], global_step:5, train loss: 1.868079, lr: 0.000005, avg_reader_cost: 0.00012 sec, avg_batch_cost: 0.08239 sec, avg_samples: 1.00000, ips: 12.13703 images/sec
  50. [2021/12/25 21:12:42] root INFO: epoch: [0/200], iter: [5/149], global_step:6, train loss: 1.754278, lr: 0.000006, avg_reader_cost: 0.00013 sec, avg_batch_cost: 0.08332 sec, avg_samples: 1.00000, ips: 12.00248 images/sec
  51. [2021/12/25 21:12:42] root INFO: epoch: [0/200], iter: [6/149], global_step:7, train loss: 1.869870, lr: 0.000007, avg_reader_cost: 0.00012 sec, avg_batch_cost: 0.08287 sec, avg_samples: 1.00000, ips: 12.06661 images/sec
  52. [2021/12/25 21:12:42] root INFO: epoch: [0/200], iter: [7/149], global_step:8, train loss: 1.756458, lr: 0.000008, avg_reader_cost: 0.00013 sec, avg_batch_cost: 0.09483 sec, avg_samples: 1.00000, ips: 10.54489 images/sec
  53. [2021/12/25 21:12:42] root INFO: epoch: [0/200], iter: [8/149], global_step:9, train loss: 1.645109, lr: 0.000009, avg_reader_cost: 0.00012 sec, avg_batch_cost: 0.08488 sec, avg_samples: 1.00000, ips: 11.78179 images/sec
  54. [2021/12/25 21:12:43] root INFO: epoch: [0/200], iter: [9/149], global_step:10, train loss: 1.599218, lr: 0.000010, avg_reader_cost: 0.00012 sec, avg_batch_cost: 0.09367 sec, avg_samples: 1.00000, ips: 10.67612 images/sec
  55. [2021/12/25 21:12:43] root INFO: [Eval]process: 0/50, loss: 1.62544
  56. [2021/12/25 21:12:43] root INFO: [Eval]process: 1/50, loss: 1.43953
  57. [2021/12/25 21:12:43] root INFO: [Eval]process: 2/50, loss: 1.52960
  58. [2021/12/25 21:12:43] root INFO: [Eval]process: 3/50, loss: 1.77400
  59. [2021/12/25 21:12:44] root INFO: [Eval]process: 4/50, loss: 1.52048
  60. [2021/12/25 21:12:44] root INFO: [Eval]process: 5/50, loss: 1.62590
  61. [2021/12/25 21:12:44] root INFO: [Eval]process: 6/50, loss: 1.39941
  62. [2021/12/25 21:12:44] root INFO: [Eval]process: 7/50, loss: 1.60583
  63. [2021/12/25 21:12:44] root INFO: [Eval]process: 8/50, loss: 1.60298
  64. [2021/12/25 21:12:44] root INFO: [Eval]process: 9/50, loss: 1.64268
  65. [2021/12/25 21:12:45] root INFO: [Eval]process: 10/50, loss: 1.70920
  66. [2021/12/25 21:12:45] root INFO: [Eval]process: 11/50, loss: 1.49873
  67. [2021/12/25 21:12:45] root INFO: [Eval]process: 12/50, loss: 1.69277
  68. [2021/12/25 21:12:45] root INFO: [Eval]process: 13/50, loss: 1.63639
  69. [2021/12/25 21:12:45] root INFO: [Eval]process: 14/50, loss: 1.78201
  70. [2021/12/25 21:12:45] root INFO: [Eval]process: 15/50, loss: 1.48458
  71. [2021/12/25 21:12:45] root INFO: [Eval]process: 16/50, loss: 1.65950
  72. [2021/12/25 21:12:46] root INFO: [Eval]process: 17/50, loss: 1.67226
  73. [2021/12/25 21:12:46] root INFO: [Eval]process: 18/50, loss: 1.65881
  74. [2021/12/25 21:12:46] root INFO: [Eval]process: 19/50, loss: 1.67092
  75. [2021/12/25 21:12:46] root INFO: [Eval]process: 20/50, loss: 1.58707
  76. [2021/12/25 21:12:46] root INFO: [Eval]process: 21/50, loss: 1.77305
  77. [2021/12/25 21:12:46] root INFO: [Eval]process: 22/50, loss: 1.54853
  78. [2021/12/25 21:12:47] root INFO: [Eval]process: 23/50, loss: 1.74551
  79. [2021/12/25 21:12:47] root INFO: [Eval]process: 24/50, loss: 1.63271
  80. [2021/12/25 21:12:47] root INFO: [Eval]process: 25/50, loss: 1.41209
  81. [2021/12/25 21:12:47] root INFO: [Eval]process: 26/50, loss: 1.79890
  82. [2021/12/25 21:12:47] root INFO: [Eval]process: 27/50, loss: 1.59780
  83. ^C
  84. Traceback (most recent call last):
  85. File "train_ser.py", line 248, in <module>
  86. train(args)
  87. File "train_ser.py", line 210, in train
  88. id2label_map, pad_token_label_id, logger)
  89. File "/home/aistudio/PaddleOCR/ppstructure/vqa/eval_ser.py", line 103, in evaluate
  90. for idx, batch in enumerate(eval_dataloader):
  91. File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 255, in __next__
  92. data = self._reader.read_next_var_list()
  93. KeyboardInterrupt

3.4 模型评估

训练过程中,默认保存两种模型,一种是latest命名的最新训练的模型,一种是best命名的精度最高的模型。保存模型的文件夹结构如下所示

  1. output/ser/
  2. ├── best_model
  3. ├── model_config.json # 模型配置
  4. ├── model_state.pdparams # 模型参数
  5. ├── sentencepiece.bpe.model # 分词器的参数
  6. ├── tokenizer_config.json # tokenizer的配置
  7. └── training_args.bin # 启动训练时的参数
  8. ├── infer_results.txt
  9. ├── latest_model
  10. ├── model_config.json
  11. ├── model_state.pdparams
  12. ├── sentencepiece.bpe.model
  13. ├── tokenizer_config.json
  14. └── training_args.bin
  15. ├── test_gt.txt
  16. ├── test_pred.txt
  17. └── train.log # 训练日志

接下来使用保存的模型参数评估在测试集上的准确率:

  1. ! python eval_ser.py \
  2. --model_name_or_path "output/ser/best_model" \
  3. --ser_model_type "LayoutXLM" \
  4. --eval_data_dir "XFUND/zh_val/image" \
  5. --eval_label_path "XFUND/zh_val/xfun_normalize_val.json" \
  6. --per_gpu_eval_batch_size 8 \
  7. --num_workers 8 \
  8. --output_dir "output/ser/" \
  9. --seed 2048

3.5 模型预测

训练好模型后,也可以使用保存好的模型,对单张图片或者某个文件夹的图像进行模型推理,观察模型预测效果。

  1. ! python3 infer_ser_e2e.py \
  2. --model_name_or_path "./inference/PP-Layout_v1.0_ser_pretrained/" \
  3. --ser_model_type "LayoutXLM" \
  4. --max_seq_length 512 \
  5. --output_dir "output/ser_e2e/" \
  6. --infer_imgs "images/input/zh_val_42.jpg"
  1. W1225 21:12:57.135272 1310 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
  2. W1225 21:12:57.138957 1310 device_context.cc:465] device: 0, cuDNN Version: 7.6.
  3. [2021/12/25 21:13:06] root WARNING: version PP-OCRv2 not support cls models, auto switch to version PP-OCR
  4. Namespace(benchmark=False, cls_batch_num=6, cls_image_shape='3, 48, 192', cls_model_dir='/home/aistudio/.paddleocr/2.3.0.2/ocr/cls/ch_ppocr_mobile_v2.0_cls_infer', cls_thresh=0.9, cpu_threads=10, det=True, det_algorithm='DB', det_db_box_thresh=0.6, det_db_score_mode='fast', det_db_thresh=0.3, det_db_unclip_ratio=1.5, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_east_score_thresh=0.8, det_limit_side_len=960, det_limit_type='max', det_model_dir='/home/aistudio/.paddleocr/2.3.0.2/ocr/det/ch/ch_PP-OCRv2_det_infer', det_pse_box_thresh=0.85, det_pse_box_type='box', det_pse_min_area=16, det_pse_scale=1, det_pse_thresh=0, det_sast_nms_thresh=0.2, det_sast_polygon=False, det_sast_score_thresh=0.5, drop_score=0.5, e2e_algorithm='PGNet', e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_limit_side_len=768, e2e_limit_type='max', e2e_model_dir=None, e2e_pgnet_mode='fast', e2e_pgnet_polygon=True, e2e_pgnet_score_thresh=0.5, e2e_pgnet_valid_set='totaltext', enable_mkldnn=False, gpu_mem=500, help='==SUPPRESS==', image_dir=None, ir_optim=True, label_list=['0', '180'], lang='ch', layout_path_model='lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config', max_batch_size=10, max_text_length=25, min_subgraph_size=15, ocr_version='PP-OCRv2', output='./output/table', precision='fp32', process_id=0, rec=True, rec_algorithm='CRNN', rec_batch_num=6, rec_char_dict_path='/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleocr/ppocr/utils/ppocr_keys_v1.txt', rec_image_shape='3, 32, 320', rec_model_dir='/home/aistudio/.paddleocr/2.3.0.2/ocr/rec/ch/ch_PP-OCRv2_rec_infer', save_log_path='./log_output/', show_log=False, structure_version='STRUCTURE', table_char_dict_path=None, table_char_type='en', table_max_len=488, table_model_dir=None, total_process_num=1, type='ocr', use_angle_cls=False, use_dilation=False, use_gpu=True, use_mp=False, use_onnx=False, use_pdserving=False, use_space_char=True, use_tensorrt=False, vis_font_path='./doc/fonts/simfang.ttf', warmup=True)
  5. process: [0/1], save result to output/ser_e2e/zh_val_42_ser.jpg
  6. Corrupt JPEG data: premature end of data segment

4 作业

实验题
[https://aistudio.baidu.com/aistudio/projectdetail/3281385](