DOC-VQA SER实战

本节将介绍如何使用PaddleOCR完成DOC-VQA SER算法的训练与运行，包括：

理解DOC-VQA SER算法原理
掌握PaddleOCR里DOC-VQA SER代码的训练流程
1. 快速体验
准备代码和环境 ```python
clone PaddleOCR代码
! git clone https://github.com/PaddlePaddle/PaddleOCR

安装依赖包

! pip install -U pip ! pip install -r /home/aistudio/PaddleOCR/requirements.txt ! pip install paddleocr

安装依赖包

! pip install yacs gnureadline paddlenlp==2.2.1

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Collecting pip [?25l Downloading https://pypi.tuna.tsinghua.edu.cn/packages/a4/6d/6463d49a933f547439d6b5b98b46af8742cc03ae83543e4d7688c2420f8b/pip-21.3.1-py3-none-any.whl (1.7MB) [K |████████████████████████████████| 1.7MB 18.0MB/s eta 0:00:01 [?25hInstalling collected packages: pip Found existing installation: pip 19.2.3 Uninstalling pip-19.2.3: Successfully uninstalled pip-19.2.3 Successfully installed pip-21.3.1 [31mERROR: Could not open requirements file: [Errno 2] No such file or directory: ‘/home/aistudioPaddleOCR/requirements.txt’[0m Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Collecting paddleocr Downloading https://pypi.tuna.tsinghua.edu.cn/packages/e1/b6/5486e674ce096667dff247b58bf0fb789c2ce17a10e546c2686a2bb07aec/paddleocr-2.3.0.2-py3-none-any.whl (250 kB) |████████████████████████████████| 250 kB 14.0 MB/s
[?25hCollecting shapely Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ae/20/33ce377bd24d122a4d54e22ae2c445b9b1be8240edb50040b40add950cd9/Shapely-1.8.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.1 MB) |████████████████████████████████| 1.1 MB 79.1 MB/s
[?25hCollecting scikit-image Downloading https://pypi.tuna.tsinghua.edu.cn/packages/9a/44/8f8c7f9c9de7fde70587a656d7df7d056e6f05192a74491f7bc074a724d0/scikit_image-0.19.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (13.3 MB) |████████████████████████████████| 13.3 MB 75.5 MB/s
[?25hCollecting lmdb Downloading https://pypi.tuna.tsinghua.edu.cn/packages/2e/dd/ada2fd91cd7832979069c556607903f274470c3d3d2274e0a848908272e8/lmdb-1.2.1-cp37-cp37m-manylinux2010_x86_64.whl (299 kB) |████████████████████████████████| 299 kB 73.8 MB/s
[?25hCollecting fasttext==0.9.1 Downloading https://pypi.tuna.tsinghua.edu.cn/packages/10/61/2e01f1397ec533756c1d893c22d9d5ed3fce3a6e4af1976e0d86bb13ea97/fasttext-0.9.1.tar.gz (57 kB) |████████████████████████████████| 57 kB 10.9 MB/s
[?25h Preparing metadata (setup.py) … [?25ldone [?25hRequirement already satisfied: tqdm in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddleocr) (4.27.0) Requirement already satisfied: cython in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddleocr) (0.29) Collecting premailer Downloading https://pypi.tuna.tsinghua.edu.cn/packages/b1/07/4e8d94f94c7d41ca5ddf8a9695ad87b888104e2fd41a35546c1dc9ca74ac/premailer-3.10.0-py2.py3-none-any.whl (19 kB) Requirement already satisfied: openpyxl in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddleocr) (3.0.5) Collecting imgaug==0.4.0 Downloading https://pypi.tuna.tsinghua.edu.cn/packages/66/b1/af3142c4a85cba6da9f4ebb5ff4e21e2616309552caca5e8acefe9840622/imgaug-0.4.0-py2.py3-none-any.whl (948 kB) |████████████████████████████████| 948 kB 49.5 MB/s
[?25hCollecting pyclipper Downloading https://pypi.tuna.tsinghua.edu.cn/packages/c5/fa/2c294127e4f88967149a68ad5b3e43636e94e3721109572f8f17ab15b772/pyclipper-1.3.0.post2-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (603 kB) |████████████████████████████████| 603 kB 63.1 MB/s
[?25hCollecting python-Levenshtein Downloading https://pypi.tuna.tsinghua.edu.cn/packages/2a/dc/97f2b63ef0fa1fd78dcb7195aca577804f6b2b51e712516cc0e902a9a201/python-Levenshtein-0.12.2.tar.gz (50 kB) |████████████████████████████████| 50 kB 13.8 MB/s
[?25h Preparing metadata (setup.py) … [?25ldone [?25hRequirement already satisfied: visualdl in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddleocr) (2.2.0) Requirement already satisfied: numpy in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddleocr) (1.20.3) Collecting opencv-contrib-python==4.4.0.46 Downloading https://pypi.tuna.tsinghua.edu.cn/packages/08/51/1e0a206dd5c70fea91084e6f43979dc13e8eb175760cc7a105083ec3eb68/opencv_contrib_python-4.4.0.46-cp37-cp37m-manylinux2014_x86_64.whl (55.7 MB) |████████████████████████████████| 55.7 MB 46 kB/s
[?25hCollecting lxml Downloading https://pypi.tuna.tsinghua.edu.cn/packages/7b/01/16a9b80c8ce4339294bb944f08e157dbfcfbb09ba9031bde4ddf7e3e5499/lxml-4.7.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (6.4 MB) |████████████████████████████████| 6.4 MB 67.3 MB/s ███████ | 5.2 MB 67.3 MB/s eta 0:00:01 [?25hCollecting pybind11>=2.2 Using cached https://pypi.tuna.tsinghua.edu.cn/packages/a8/3b/fc246e1d4c7547a7a07df830128e93c6215e9b93dcb118b2a47a70726153/pybind11-2.8.1-py2.py3-none-any.whl (208 kB) Requirement already satisfied: setuptools>=0.7.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from fasttext==0.9.1->paddleocr) (56.2.0) Requirement already satisfied: Pillow in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->paddleocr) (7.1.2) Requirement already satisfied: opencv-python in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->paddleocr) (4.1.1.26) Requirement already satisfied: six in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->paddleocr) (1.15.0) Requirement already satisfied: imageio in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->paddleocr) (2.6.1) Requirement already satisfied: scipy in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->paddleocr) (1.6.3) Requirement already satisfied: matplotlib in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->paddleocr) (2.2.3) Collecting PyWavelets>=1.1.1 Downloading https://pypi.tuna.tsinghua.edu.cn/packages/a1/9c/564511b6e1c4e1d835ed2d146670436036960d09339a8fa2921fe42dad08/PyWavelets-1.2.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (6.1 MB) |████████████████████████████████| 6.1 MB 70.8 MB/s
[?25hRequirement already satisfied: packaging>=20.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-image->paddleocr) (20.9) Requirement already satisfied: networkx>=2.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-image->paddleocr) (2.4) Collecting tifffile>=2019.7.26 Downloading https://pypi.tuna.tsinghua.edu.cn/packages/d8/38/85ae5ed77598ca90558c17a2f79ddaba33173b31cf8d8f545d34d9134f0d/tifffile-2021.11.2-py3-none-any.whl (178 kB) |████████████████████████████████| 178 kB 68.9 MB/s
[?25hRequirement already satisfied: jdcal in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from openpyxl->paddleocr) (1.4.1) Requirement already satisfied: et-xmlfile in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from openpyxl->paddleocr) (1.0.1) Requirement already satisfied: cachetools in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from premailer->paddleocr) (4.0.0) Requirement already satisfied: requests in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from premailer->paddleocr) (2.22.0) Collecting cssutils Downloading https://pypi.tuna.tsinghua.edu.cn/packages/24/c4/9db28fe567612896d360ab28ad02ee8ae107d0e92a22db39affd3fba6212/cssutils-2.3.0-py3-none-any.whl (404 kB) |████████████████████████████████| 404 kB 70.3 MB/s
[?25hCollecting cssselect Downloading https://pypi.tuna.tsinghua.edu.cn/packages/3b/d4/3b5c17f00cce85b9a1e6f91096e1cc8e8ede2e1be8e96b87ce1ed09e92c5/cssselect-1.1.0-py2.py3-none-any.whl (16 kB) Requirement already satisfied: shellcheck-py in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddleocr) (0.7.1.1) Requirement already satisfied: pre-commit in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddleocr) (1.21.0) Requirement already satisfied: protobuf>=3.11.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddleocr) (3.14.0) Requirement already satisfied: flake8>=3.7.9 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddleocr) (3.8.2) Requirement already satisfied: pandas in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddleocr) (1.1.5) Requirement already satisfied: flask>=1.1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddleocr) (1.1.1) Requirement already satisfied: bce-python-sdk in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddleocr) (0.8.53) Requirement already satisfied: Flask-Babel>=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddleocr) (1.0.0) Requirement already satisfied: importlib-metadata in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->paddleocr) (0.23) Requirement already satisfied: pyflakes<2.3.0,>=2.2.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->paddleocr) (2.2.0) Requirement already satisfied: mccabe<0.7.0,>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->paddleocr) (0.6.1) Requirement already satisfied: pycodestyle<2.7.0,>=2.6.0a1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->paddleocr) (2.6.0) Requirement already satisfied: Werkzeug>=0.15 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->paddleocr) (0.16.0) Requirement already satisfied: Jinja2>=2.10.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->paddleocr) (2.11.0) Requirement already satisfied: itsdangerous>=0.24 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->paddleocr) (1.1.0) Requirement already satisfied: click>=5.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->paddleocr) (7.0) Requirement already satisfied: Babel>=2.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Flask-Babel>=1.0.0->visualdl->paddleocr) (2.8.0) Requirement already satisfied: pytz in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Flask-Babel>=1.0.0->visualdl->paddleocr) (2019.3) Requirement already satisfied: decorator>=4.3.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from networkx>=2.2->scikit-image->paddleocr) (4.4.2) Requirement already satisfied: pyparsing>=2.0.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from packaging>=20.0->scikit-image->paddleocr) (2.4.2) Requirement already satisfied: pycryptodome>=3.8.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from bce-python-sdk->visualdl->paddleocr) (3.9.9) Requirement already satisfied: future>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from bce-python-sdk->visualdl->paddleocr) (0.18.0) Requirement already satisfied: cycler>=0.10 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->imgaug==0.4.0->paddleocr) (0.10.0) Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->imgaug==0.4.0->paddleocr) (1.1.0) Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->imgaug==0.4.0->paddleocr) (2.8.0) Requirement already satisfied: cfgv>=2.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddleocr) (2.0.1) Requirement already satisfied: aspy.yaml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddleocr) (1.3.0) Requirement already satisfied: pyyaml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddleocr) (5.1.2) Requirement already satisfied: nodeenv>=0.11.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddleocr) (1.3.4) Requirement already satisfied: identify>=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddleocr) (1.4.10) Requirement already satisfied: toml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddleocr) (0.10.0) Requirement already satisfied: virtualenv>=15.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddleocr) (16.7.9) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->premailer->paddleocr) (2019.9.11) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->premailer->paddleocr) (1.25.6) Requirement already satisfied: idna<2.9,>=2.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->premailer->paddleocr) (2.8) Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->premailer->paddleocr) (3.0.4) Requirement already satisfied: MarkupSafe>=0.23 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Jinja2>=2.10.1->flask>=1.1.1->visualdl->paddleocr) (1.1.1) Requirement already satisfied: zipp>=0.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from importlib-metadata->flake8>=3.7.9->visualdl->paddleocr) (3.6.0) Building wheels for collected packages: fasttext, python-Levenshtein Building wheel for fasttext (setup.py) … [?25ldone [?25h Created wheel for fasttext: filename=fasttext-0.9.1-cp37-cp37m-linux_x86_64.whl size=2588539 sha256=b99925fa4f256c920068945754af2028c60cd6f04ca5420636903f328ea66b37 Stored in directory: /home/aistudio/.cache/pip/wheels/a1/cb/b3/a25a8ce16c1a4ff102c1e40d6eaa4dfc9d5695b92d57331b36 Building wheel for python-Levenshtein (setup.py) … [?25ldone [?25h Created wheel for python-Levenshtein: filename=python_Levenshtein-0.12.2-cp37-cp37m-linux_x86_64.whl size=171699 sha256=e433b579e472607a6f8b3f52477ce04171f3d60ce8e0743a9e40139d1ac180a2 Stored in directory: /home/aistudio/.cache/pip/wheels/38/b9/a4/3729726160fb103833de468adb5ce019b58543ae41d0b0e446 Successfully built fasttext python-Levenshtein Installing collected packages: tifffile, PyWavelets, shapely, scikit-image, pybind11, lxml, cssutils, cssselect, python-Levenshtein, pyclipper, premailer, opencv-contrib-python, lmdb, imgaug, fasttext, paddleocr Successfully installed PyWavelets-1.2.0 cssselect-1.1.0 cssutils-2.3.0 fasttext-0.9.1 imgaug-0.4.0 lmdb-1.2.1 lxml-4.7.1 opencv-contrib-python-4.4.0.46 paddleocr-2.3.0.2 premailer-3.10.0 pybind11-2.8.1 pyclipper-1.3.0.post2 python-Levenshtein-0.12.2 scikit-image-0.19.1 shapely-1.8.0 tifffile-2021.11.2 Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Collecting yacs Downloading https://pypi.tuna.tsinghua.edu.cn/packages/38/4f/fe9a4d472aa867878ce3bb7efb16654c5d63672b86dc0e6e953a67018433/yacs-0.1.8-py3-none-any.whl (14 kB) Collecting gnureadline Downloading https://pypi.tuna.tsinghua.edu.cn/packages/2b/36/60b53a1793af9a60539b5ee6fed4f3702280bd5a88ab41600a51510002a1/gnureadline-8.0.0-cp37-cp37m-manylinux1_x86_64.whl (283 kB) |████████████████████████████████| 283 kB 15.9 MB/s
[?25hCollecting paddlenlp==2.2.1 Downloading https://pypi.tuna.tsinghua.edu.cn/packages/dd/cb/f15e5406718079a201b869b46705ef0882be2ffb6932472e44efc3d16a75/paddlenlp-2.2.1-py3-none-any.whl (1.0 MB) |████████████████████████████████| 1.0 MB 17.1 MB/s
[?25hRequirement already satisfied: h5py in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp==2.2.1) (2.9.0) Requirement already satisfied: colorama in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp==2.2.1) (0.4.4) Requirement already satisfied: jieba in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp==2.2.1) (0.42.1) Requirement already satisfied: multiprocess in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp==2.2.1) (0.70.11.1) Requirement already satisfied: colorlog in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp==2.2.1) (4.1.0) Requirement already satisfied: seqeval in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp==2.2.1) (1.2.2) Requirement already satisfied: PyYAML in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from yacs) (5.1.2) Requirement already satisfied: numpy>=1.7 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from h5py->paddlenlp==2.2.1) (1.20.3) Requirement already satisfied: six in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from h5py->paddlenlp==2.2.1) (1.15.0) Requirement already satisfied: dill>=0.3.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from multiprocess->paddlenlp==2.2.1) (0.3.3) Requirement already satisfied: scikit-learn>=0.21.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from seqeval->paddlenlp==2.2.1) (0.24.2) Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-learn>=0.21.3->seqeval->paddlenlp==2.2.1) (2.1.0) Requirement already satisfied: scipy>=0.19.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-learn>=0.21.3->seqeval->paddlenlp==2.2.1) (1.6.3) Requirement already satisfied: joblib>=0.11 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-learn>=0.21.3->seqeval->paddlenlp==2.2.1) (0.14.1) Installing collected packages: yacs, paddlenlp, gnureadline Attempting uninstall: paddlenlp Found existing installation: paddlenlp 2.1.1 Uninstalling paddlenlp-2.1.1: Successfully uninstalled paddlenlp-2.1.1 Successfully installed gnureadline-8.0.0 paddlenlp-2.2.1 yacs-0.1.8

```python
# 切换到 vqa 目录
import os
os.chdir('/home/aistudio/PaddleOCR/ppstructure/vqa')

# 下载模型
! mkdir inference
# 下载SER模型的检测模型并解压
! wget -P ./inference/ https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_ser_pretrained.tar && cd inference && tar xf PP-Layout_v1.0_ser_pretrained.tar && cd ..

--2021-12-22 16:03:11--  https://paddleocr.bj.bcebos.com/pplayout/PP-Layout_v1.0_ser_pretrained.tar
Resolving paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)... 182.61.200.195, 182.61.200.229, 2409:8c04:1001:1002:0:ff:b001:368a
Connecting to paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)|182.61.200.195|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1481431040 (1.4G) [application/x-tar]
Saving to: ‘./inference/PP-Layout_v1.0_ser_pretrained.tar’
PP-Layout_v1.0_ser_ 100%[===================>]   1.38G  45.5MB/s    in 31s     
2021-12-22 16:03:42 (45.6 MB/s) - ‘./inference/PP-Layout_v1.0_ser_pretrained.tar’ saved [1481431040/1481431040]

# 执行 SER 预测
# https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.4/ppstructure/vqa/infer_ser_e2e.py
! python infer_ser_e2e.py \
    --model_name_or_path "./inference/PP-Layout_v1.0_ser_pretrained/" \
    --max_seq_length 512 \
    --output_dir "output/res_e2e/" \
    --infer_imgs "images/input/zh_val_42.jpg"
import cv2
from matplotlib import pyplot as plt
# 在notebook中使用matplotlib.pyplot绘图时，需要添加该命令进行显示
%matplotlib inline
img = cv2.imread('output/res_e2e/zh_val_42_ser.jpg')
plt.figure(figsize=(48,24))
plt.imshow(img)

W1225 21:13:25.559681  1343 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W1225 21:13:25.563249  1343 device_context.cc:465] device: 0, cuDNN Version: 7.6.
[2021/12/25 21:13:35] root WARNING: version PP-OCRv2 not support cls models, auto switch to version PP-OCR
Namespace(benchmark=False, cls_batch_num=6, cls_image_shape='3, 48, 192', cls_model_dir='/home/aistudio/.paddleocr/2.3.0.2/ocr/cls/ch_ppocr_mobile_v2.0_cls_infer', cls_thresh=0.9, cpu_threads=10, det=True, det_algorithm='DB', det_db_box_thresh=0.6, det_db_score_mode='fast', det_db_thresh=0.3, det_db_unclip_ratio=1.5, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_east_score_thresh=0.8, det_limit_side_len=960, det_limit_type='max', det_model_dir='/home/aistudio/.paddleocr/2.3.0.2/ocr/det/ch/ch_PP-OCRv2_det_infer', det_pse_box_thresh=0.85, det_pse_box_type='box', det_pse_min_area=16, det_pse_scale=1, det_pse_thresh=0, det_sast_nms_thresh=0.2, det_sast_polygon=False, det_sast_score_thresh=0.5, drop_score=0.5, e2e_algorithm='PGNet', e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_limit_side_len=768, e2e_limit_type='max', e2e_model_dir=None, e2e_pgnet_mode='fast', e2e_pgnet_polygon=True, e2e_pgnet_score_thresh=0.5, e2e_pgnet_valid_set='totaltext', enable_mkldnn=False, gpu_mem=500, help='==SUPPRESS==', image_dir=None, ir_optim=True, label_list=['0', '180'], lang='ch', layout_path_model='lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config', max_batch_size=10, max_text_length=25, min_subgraph_size=15, ocr_version='PP-OCRv2', output='./output/table', precision='fp32', process_id=0, rec=True, rec_algorithm='CRNN', rec_batch_num=6, rec_char_dict_path='/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleocr/ppocr/utils/ppocr_keys_v1.txt', rec_image_shape='3, 32, 320', rec_model_dir='/home/aistudio/.paddleocr/2.3.0.2/ocr/rec/ch/ch_PP-OCRv2_rec_infer', save_log_path='./log_output/', show_log=False, structure_version='STRUCTURE', table_char_dict_path=None, table_char_type='en', table_max_len=488, table_model_dir=None, total_process_num=1, type='ocr', use_angle_cls=False, use_dilation=False, use_gpu=True, use_mp=False, use_onnx=False, use_pdserving=False, use_space_char=True, use_tensorrt=False, vis_font_path='./doc/fonts/simfang.ttf', warmup=True)
process: [0/1], save result to output/res_e2e/zh_val_42_ser.jpg
Corrupt JPEG data: premature end of data segment
<matplotlib.image.AxesImage at 0x7f655014c2d0>

2. 原理详解

PaddleOCR中DOC-VQA系列算法目前基于LayoutXLM论文实现，提供了SER和RE两种任务
LayoutXLM是LayoutLMV2的多语言版本，LayoutLMV2原理图如下：

3.文档分析实战-VQA - 图2
图1 LayoutLMV2算法
相对于NLP中的Bert，LayoutXLM在模型的输入端新增了Image和图像内文字的Layout信息，LayoutXLM已经在PaddleNLP中实现，因此这里从模型前向的角度介绍数据和网络。

2.1 输入数据处理

首先对图像进行ocr识别或pdf解析，获取text及bbox信息，在此基础上构建模型的三个输入：

Text Embedding
首先对OCR识别的文本使用 WordPiece 进行切分，之后添加 [CLS] 和 [SEP] 标记，并用 [PAD] 补齐长度得到文本输入序列如下： $3.文档分析实战-VQA - 图3$
再将词向量、一维位置向量、分段向量相加得到文本向量，公式如下： $3.文档分析实战-VQA - 图4$ %2BPosEmb1D(i)%2BSegEmb(s_i)%2C%200%20%5Cleq%20i%3CL%0A#card=math&code=t_i%3DTokEmb%28w_i%29%2BPosEmb1D%28i%29%2BSegEmb%28s_i%29%2C%200%20%5Cleq%20i%3CL%0A&id=XIcrV)
一维位置向量：词的索引
分段向量：A

# Text Embedding 演示
from paddlenlp.transformers import LayoutXLMTokenizer
tokenizer = LayoutXLMTokenizer.from_pretrained('inference/PP-Layout_v1.0_ser_pretrained')
# 分词
print('分词结果：', tokenizer.tokenize('我的中国心'))
# 转换为索引
print('转换为索引结果：', tokenizer.encode('我的中国心'))

分词结果： ['▁我', '的中国', '心']
转换为索引结果： {'input_ids': [0, 13129, 84072, 1801, 2], 'token_type_ids': [0, 0, 0, 0, 0]}

Image Embedding
我们使用 ResNeXt-FPN 网络作为图像编码器，首先抽取原始文档图像的特征图，再将其平均池化为固定尺寸（B 256 7 7），接着按行展开平均池化后的特征图（B 256 49），之后经过线性投影（B 49 * 256），就可以得到图像对应的特征序列。和文本向量的组成对应，图像向量也补充了一维相对位置和分段信息。最后将特征向量、一维位置向量、分段向量相加就可以得到最终的图像向量，如下所示： $3.文档分析实战-VQA - 图5$ _i)%2BPosEmb1D(i)%2BSegEmb(%5BC%5D)%2C%200%20%5Cleq%20i%3CWH%0A#card=math&code=v_i%3DProj%28VisTokEmb%28I%29_i%29%2BPosEmb1D%28i%29%2BSegEmb%28%5BC%5D%29%2C%200%20%5Cleq%20i%3CWH%0A&id=tyFel)
分段向量：C
Layout Embedding
对应于每个词或图像区域在页面中覆盖的坐标范围，使用平行于坐标轴的边界框（bounding box）表示布局信息，每个边界框用4个边界坐标值、宽、高来表示。最终的布局向量由6个特征对应的向量拼接得到： $3.文档分析实战-VQA - 图6$ %2C%20PosEmb2D_y(y_0%2C%20y_1%2C%20h))%2C%200%20%5Cleq%20i%3CWH%2BL%0A#card=math&code=I_i%3DConcat%28PosEmb2D_x%28x_0%2C%20x_1%2C%20w%29%2C%20PosEmb2D_y%28y_0%2C%20y_1%2C%20h%29%29%2C%200%20%5Cleq%20i%3CWH%2BL%0A&id=NDWDo)

下面演示预测过程中，从输入一张图像构建网络输入的过程，整个过程主要包含下面几个步骤

对图像进行OCR识别
对图像进行预处理，包括缩放到指定大小和归一化
对识别到的文本进行分词和转index
对文本框进行归一化，使其值在0-1000之间
对3，4处理后的结果进行pad，便于组batch

# 预测输入构建
# https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.4/ppstructure/vqa/vqa_utils.py
import cv2
import numpy as np
import paddle
from copy import deepcopy
from paddleocr import PaddleOCR
from paddlenlp.transformers import LayoutXLMTokenizer
from infer_ser_e2e import trans_poly_to_bbox,pad_sentences,split_page
def parse_ocr_info_for_ser(ocr_result):
    # ocr结果转字典形式，文本框转换为外接矩形
    ocr_info = []
    for res in ocr_result:
        ocr_info.append({
            "text": res[1][0],
            "bbox": trans_poly_to_bbox(res[0]),
            "poly": res[0],
        })
    return ocr_info
def preprocess(
        tokenizer,
        ori_img,
        ocr_info,
        img_size=(224, 224),
        pad_token_label_id=-100,
        max_seq_len=512,
        add_special_ids=False,
        return_attention_mask=True, ):
    ocr_info = deepcopy(ocr_info)
    height = ori_img.shape[0]
    width = ori_img.shape[1]
    # 图像resize到指定shape
    img = cv2.resize(ori_img, img_size).transpose([2, 0, 1]).astype(np.float32)
    segment_offset_id = [] # 存储了每个文本在input_ids里的结束位置
    bbox_list = [] # 存储归一化到 0-1000 的box
    input_ids_list = [] # 存储文本经过分词之后的文本段在词表里的索引
    token_type_ids_list = [] # 存储文本段的类别信息
    for info in ocr_info:
        # box 归一化到 0-1000
        # x1, y1, x2, y2
        bbox = info["bbox"]
        bbox[0] = int(bbox[0] * 1000.0 / width)
        bbox[2] = int(bbox[2] * 1000.0 / width)
        bbox[1] = int(bbox[1] * 1000.0 / height)
        bbox[3] = int(bbox[3] * 1000.0 / height)
        # 对 文本信息进行 tokenizer, 包含分词和转换为词表里的 index
        text = info["text"]
        encode_res = tokenizer.encode(
            text, pad_to_max_seq_len=False, return_attention_mask=True)
        # 根据参数决定是否删掉特殊的字符
        if not add_special_ids:
            # TODO: use tok.all_special_ids to remove
            encode_res["input_ids"] = encode_res["input_ids"][1:-1]
            encode_res["token_type_ids"] = encode_res["token_type_ids"][1:-1]
            encode_res["attention_mask"] = encode_res["attention_mask"][1:-1]
        input_ids_list.extend(encode_res["input_ids"])
        token_type_ids_list.extend(encode_res["token_type_ids"])
        bbox_list.extend([bbox] * len(encode_res["input_ids"]))
        segment_offset_id.append(len(input_ids_list))
    encoded_inputs = {
        "input_ids": input_ids_list,
        "token_type_ids": token_type_ids_list,
        "bbox": bbox_list,
        "attention_mask": [1] * len(input_ids_list),
    }
    # 将 val pad到指定长度，不足长度的用 0 进行补充
    encoded_inputs = pad_sentences(
        tokenizer,
        encoded_inputs,
        max_seq_len=max_seq_len,
        return_attention_mask=return_attention_mask)
    # input_ids> 512 时，划分为2个batch
    ncoded_inputs = split_page(encoded_inputs)
    fake_bs = encoded_inputs["input_ids"].shape[0]
    encoded_inputs["image"] = paddle.to_tensor(img).unsqueeze(0).expand(
        [fake_bs] + list(img.shape))
    encoded_inputs["segment_offset_id"] = segment_offset_id
    return encoded_inputs
img = cv2.imread('images/input/zh_val_42.jpg')
ocr_engine = PaddleOCR(use_angle_cls=False,show_log=False)
# 执行ocr识别
ocr_result = ocr_engine.ocr(img, cls=False)
# ocr结果转字典形式，文本框转换为外接矩形
ocr_info = parse_ocr_info_for_ser(ocr_result)
tokenizer = LayoutXLMTokenizer.from_pretrained('inference/PP-Layout_v1.0_ser_pretrained')
# 对图像进行resize，
# 对文本进行分词，转换为词典索引等操作，
# 对box进行归一化
max_seq_length = 512
inputs = preprocess(tokenizer=tokenizer,ori_img=img,ocr_info=ocr_info,max_seq_len=max_seq_length, img_size=(224,224))
print(inputs.keys())
print(inputs['image'].shape)

[2021/12/25 21:11:16] root WARNING: version PP-OCRv2 not support cls models, auto switch to version PP-OCR
download https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar to /home/aistudio/.paddleocr/2.3.0.2/ocr/rec/ch/ch_PP-OCRv2_rec_infer/ch_PP-OCRv2_rec_infer.tar
Corrupt JPEG data: premature end of data segment
100%|██████████| 8.88M/8.88M [00:00<00:00, 59.4MiB/s]
download https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar to /home/aistudio/.paddleocr/2.3.0.2/ocr/cls/ch_ppocr_mobile_v2.0_cls_infer/ch_ppocr_mobile_v2.0_cls_infer.tar
Namespace(benchmark=False, cls_batch_num=6, cls_image_shape='3, 48, 192', cls_model_dir='/home/aistudio/.paddleocr/2.3.0.2/ocr/cls/ch_ppocr_mobile_v2.0_cls_infer', cls_thresh=0.9, cpu_threads=10, det=True, det_algorithm='DB', det_db_box_thresh=0.6, det_db_score_mode='fast', det_db_thresh=0.3, det_db_unclip_ratio=1.5, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_east_score_thresh=0.8, det_limit_side_len=960, det_limit_type='max', det_model_dir='/home/aistudio/.paddleocr/2.3.0.2/ocr/det/ch/ch_PP-OCRv2_det_infer', det_pse_box_thresh=0.85, det_pse_box_type='box', det_pse_min_area=16, det_pse_scale=1, det_pse_thresh=0, det_sast_nms_thresh=0.2, det_sast_polygon=False, det_sast_score_thresh=0.5, drop_score=0.5, e2e_algorithm='PGNet', e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_limit_side_len=768, e2e_limit_type='max', e2e_model_dir=None, e2e_pgnet_mode='fast', e2e_pgnet_polygon=True, e2e_pgnet_score_thresh=0.5, e2e_pgnet_valid_set='totaltext', enable_mkldnn=False, gpu_mem=500, help='==SUPPRESS==', image_dir=None, ir_optim=True, label_list=['0', '180'], lang='ch', layout_path_model='lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config', max_batch_size=10, max_text_length=25, min_subgraph_size=15, ocr_version='PP-OCRv2', output='./output/table', precision='fp32', process_id=0, rec=True, rec_algorithm='CRNN', rec_batch_num=6, rec_char_dict_path='/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleocr/ppocr/utils/ppocr_keys_v1.txt', rec_image_shape='3, 32, 320', rec_model_dir='/home/aistudio/.paddleocr/2.3.0.2/ocr/rec/ch/ch_PP-OCRv2_rec_infer', save_log_path='./log_output/', show_log=False, structure_version='STRUCTURE', table_char_dict_path=None, table_char_type='en', table_max_len=488, table_model_dir=None, total_process_num=1, type='ocr', use_angle_cls=False, use_dilation=False, use_gpu=True, use_mp=False, use_onnx=False, use_pdserving=False, use_space_char=True, use_tensorrt=False, vis_font_path='./doc/fonts/simfang.ttf', warmup=True)
100%|██████████| 1.45M/1.45M [00:00<00:00, 32.0MiB/s]
dict_keys(['input_ids', 'token_type_ids', 'bbox', 'attention_mask', 'image', 'segment_offset_id'])
[2, 3, 224, 224]

经过处理后的数据为一个字典，包含下面几个字段:

字段	含义
image	resize为224*224的图像
bbox	归一化到 0-1000 的box
input_ids	文本经过分词之后的文本段在词表里的索引
token_type_ids	文本段的类别信息
attention_mask	对文本段进行mask的掩码，特殊字符对应位置标记为0，文本段对应位置标记为1
segment_offset_id	记录了每个文本在input_ids里的结束位置

2.2 SER网络

SER: 语义实体识别 (Semantic Entity Recognition）, 可以完成对图像中的文本识别与分类。
SER网络LayoutXLMModel的输出上加了一个全连接的分类头，其网络代码如下：

# https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/transformers/layoutxlm/modeling.py#L846
from paddlenlp.transformers import LayoutXLMPretrainedModel
from paddle import nn
class LayoutXLMForTokenClassification(LayoutXLMPretrainedModel):
    def __init__(self, layoutxlm, num_classes=2, dropout=None):
        super(LayoutXLMForTokenClassification, self).__init__()
        self.num_classes = num_classes
        if isinstance(layoutxlm, dict):
            self.layoutxlm = LayoutXLMModel(**layoutxlm)
        else:
            self.layoutxlm = layoutxlm
        self.dropout = nn.Dropout(dropout if dropout is not None else self.layoutxlm.config["hidden_dropout_prob"])
        self.classifier = nn.Linear(self.layoutxlm.config["hidden_size"],num_classes)
        self.classifier.apply(self.init_weights)
    def get_input_embeddings(self):
        return self.layoutxlm.embeddings.word_embeddings
    def forward(self, input_ids=None, bbox=None, image=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, labels=None):
        # backbone 运算
        outputs = self.layoutxlm(input_ids=input_ids, bbox=bbox, image=image, attention_mask=attention_mask, token_type_ids=token_type_ids, position_ids=position_ids, head_mask=head_mask)
        seq_length = input_ids.shape[1]
        # head 运算
        sequence_output, image_output = outputs[0][:, :seq_length], outputs[0][:, seq_length:]
        sequence_output = self.dropout(sequence_output)
        logits = self.classifier(sequence_output)
        outputs = logits,
        # 计算loss
        if labels is not None:
            loss_fct = nn.CrossEntropyLoss()
            if attention_mask is not None:
                active_loss = attention_mask.reshape([-1, ]) == 1
                active_logits = logits.reshape([-1, self.num_classes])[active_loss]
                active_labels = labels.reshape([-1, ])[active_loss]
                loss = loss_fct(active_logits, active_labels)
            else:
                loss = loss_fct(logits.reshape([-1, self.num_classes]),labels.reshape([-1, ]))
            outputs = (loss, ) + outputs
        return outputs

# 初始化网络
net = LayoutXLMForTokenClassification.from_pretrained('inference/PP-Layout_v1.0_ser_pretrained')
net.eval()
# 执行网络前向
outputs = net(input_ids=inputs["input_ids"],
            bbox=inputs["bbox"],
            image=inputs["image"],
            token_type_ids=inputs["token_type_ids"],
            attention_mask=inputs["attention_mask"])
print(outputs[0].shape)

[2, 512, 7]

2.3 后处理

后处理主要完成将模型输出的文本锻的预测结果对应到文本上，并且将结果和ocr的结果进行结合，主要包含以下几个步骤

每一个文本，统计该文本下所有文本段预测的label
选取所有文本段预测最多的label作为该文本的label

# https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.4/ppstructure/vqa/vqa_utils.py
import paddle
import numpy as np
from infer_ser_e2e import get_bio_label_maps
label2id_map, id2label_map = get_bio_label_maps('labels/labels_ser.txt')
def postprocess(attention_mask, preds, id2label_map):
    if isinstance(preds, paddle.Tensor):
        preds = preds.numpy()
    preds = np.argmax(preds, axis=2)
    preds_list = [[] for _ in range(preds.shape[0])]
    # keep batch info
    for i in range(preds.shape[0]):
        for j in range(preds.shape[1]):
            if attention_mask[i][j] == 1:
                preds_list[i].append(id2label_map[preds[i][j]])
    return preds_list
def merge_preds_list_with_ocr_info(ocr_info, segment_offset_id, preds_list,
                                   label2id_map_for_draw):
    # list flatten
    preds = [p for pred in preds_list for p in pred]
    # label2idx的字典转换为idx2label的字段，去掉B-和I-的前缀
    id2label_map = dict()
    for key in label2id_map_for_draw:
        val = label2id_map_for_draw[key]
        if key == "O":
            id2label_map[val] = key
        if key.startswith("B-") or key.startswith("I-"):
            id2label_map[val] = key[2:]
        else:
            id2label_map[val] = key
    print("id2label_map:",id2label_map)
    # 对每一个文本，统计预测的label
    for idx in range(len(segment_offset_id)):
        if idx == 0:
            start_id = 0
        else:
            start_id = segment_offset_id[idx - 1]
        end_id = segment_offset_id[idx]
        # 拿出文本在 输出里的范围
        curr_pred = preds[start_id:end_id]
        # 拿出文本在 输出里的所有预测结果
        curr_pred = [label2id_map_for_draw[p] for p in curr_pred]
        if len(curr_pred) <= 0:
            pred_id = 0
        else:
            # print("pred label:",curr_pred)
            # 对label进行计数
            counts = np.bincount(curr_pred)
            # print("counts:",counts)
            pred_id = np.argmax(counts)
        ocr_info[idx]["pred_id"] = int(pred_id)
        ocr_info[idx]["pred"] = id2label_map[int(pred_id)]
        # print("pred label:",id2label_map[int(pred_id)])
    return ocr_info
preds = postprocess(inputs["attention_mask"], outputs[0], id2label_map)
# I 开头的value label替换为 B开头的
label2id_map_for_draw = dict()
for key in label2id_map:
    if key.startswith("I-"):
        label2id_map_for_draw[key] = label2id_map["B" + key[1:]]
    else:
        label2id_map_for_draw[key] = label2id_map[key]
print("label2id_map:",label2id_map)
print("label2id_map_for_draw:",label2id_map_for_draw)
# 将预测信息和ocr信息合并
ocr_info_with_ser = merge_preds_list_with_ocr_info(ocr_info, inputs["segment_offset_id"], preds, label2id_map_for_draw)
print(ocr_info_with_ser)

label2id_map: {'O': 0, 'B-QUESTION': 1, 'I-QUESTION': 2, 'B-ANSWER': 3, 'I-ANSWER': 4, 'B-HEADER': 5, 'I-HEADER': 6}
label2id_map_for_draw: {'O': 0, 'B-QUESTION': 1, 'I-QUESTION': 1, 'B-ANSWER': 3, 'I-ANSWER': 3, 'B-HEADER': 5, 'I-HEADER': 5}
id2label_map: {0: 'O', 1: 'QUESTION', 3: 'ANSWER', 5: 'HEADER'}
[{'text': '个人信息登记表', 'bbox': [1026.0, 292.0, 1495.0, 377.0], 'poly': [[1027.0, 292.0], [1495.0, 300.0], [1494.0, 377.0], [1026.0, 369.0]], 'pred_id': 5, 'pred': 'HEADER'}, {'text': '申报学院（部门）：', 'bbox': [207.0, 424.0, 587.0, 475.0], 'poly': [[207.0, 424.0], [587.0, 424.0], [587.0, 475.0], [207.0, 475.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '出生', 'bbox': [1144.0, 526.0, 1218.0, 566.0], 'poly': [[1144.0, 526.0], [1218.0, 526.0], [1218.0, 566.0], [1144.0, 566.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '政治', 'bbox': [1616.0, 530.0, 1709.0, 570.0], 'poly': [[1616.0, 530.0], [1709.0, 530.0], [1709.0, 570.0], [1616.0, 570.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '1997年12月17日「面貌', 'bbox': [1298.0, 558.0, 1713.0, 644.0], 'poly': [[1301.0, 558.0], [1713.0, 571.0], [1711.0, 644.0], [1298.0, 631.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '岳欣欣', 'bbox': [491.0, 559.0, 653.0, 614.0], 'poly': [[491.0, 559.0], [653.0, 559.0], [653.0, 614.0], [491.0, 614.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '性别', 'bbox': [805.0, 559.0, 908.0, 618.0], 'poly': [[805.0, 559.0], [908.0, 559.0], [908.0, 618.0], [805.0, 618.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '群众', 'bbox': [1801.0, 552.0, 1886.0, 614.0], 'poly': [[1801.0, 552.0], [1886.0, 552.0], [1886.0, 614.0], [1801.0, 614.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '女', 'bbox': [1008.0, 563.0, 1070.0, 610.0], 'poly': [[1008.0, 563.0], [1070.0, 563.0], [1070.0, 610.0], [1008.0, 610.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '(拼音)', 'bbox': [207.0, 577.0, 354.0, 629.0], 'poly': [[207.0, 577.0], [354.0, 577.0], [354.0, 629.0], [207.0, 629.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '日期', 'bbox': [1126.0, 577.0, 1222.0, 632.0], 'poly': [[1126.0, 577.0], [1222.0, 577.0], [1222.0, 632.0], [1126.0, 632.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '婚育「', 'bbox': [1120.0, 634.0, 1272.0, 714.0], 'poly': [[1127.0, 634.0], [1272.0, 649.0], [1265.0, 714.0], [1120.0, 699.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '生源地/培养形式', 'bbox': [1506.0, 647.0, 1790.0, 698.0], 'poly': [[1506.0, 647.0], [1790.0, 647.0], [1790.0, 698.0], [1506.0, 698.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '未婚', 'bbox': [1270.0, 654.0, 1369.0, 709.0], 'poly': [[1270.0, 654.0], [1369.0, 654.0], [1369.0, 709.0], [1270.0, 709.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '寸报名照', 'bbox': [2140.0, 650.0, 2273.0, 702.0], 'poly': [[2140.0, 650.0], [2273.0, 650.0], [2273.0, 702.0], [2140.0, 702.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '中国', 'bbox': [443.0, 661.0, 528.0, 720.0], 'poly': [[443.0, 661.0], [528.0, 661.0], [528.0, 720.0], [443.0, 720.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '国籍', 'bbox': [244.0, 672.0, 336.0, 731.0], 'poly': [[244.0, 672.0], [336.0, 672.0], [336.0, 731.0], [244.0, 731.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '民族', 'bbox': [812.0, 672.0, 904.0, 734.0], 'poly': [[812.0, 672.0], [904.0, 672.0], [904.0, 734.0], [812.0, 734.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '辽宁省西丰县', 'bbox': [1841.0, 668.0, 2064.0, 731.0], 'poly': [[1843.0, 668.0], [2064.0, 677.0], [2062.0, 731.0], [1841.0, 723.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '状况', 'bbox': [1137.0, 713.0, 1218.0, 756.0], 'poly': [[1137.0, 713.0], [1218.0, 713.0], [1218.0, 756.0], [1137.0, 756.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '(应届毕业生填写）', 'bbox': [1528.0, 720.0, 1782.0, 760.0], 'poly': [[1528.0, 720.0], [1782.0, 720.0], [1782.0, 760.0], [1528.0, 760.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '粘贴处', 'bbox': [2140.0, 716.0, 2251.0, 767.0], 'poly': [[2140.0, 716.0], [2251.0, 716.0], [2251.0, 767.0], [2140.0, 767.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '现工作（学习)', 'bbox': [192.0, 767.0, 484.0, 822.0], 'poly': [[192.0, 767.0], [484.0, 771.0], [483.0, 822.0], [192.0, 818.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '大连海事大学', 'bbox': [542.0, 781.0, 794.0, 841.0], 'poly': [[544.0, 781.0], [794.0, 790.0], [792.0, 841.0], [542.0, 833.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '单位', 'bbox': [967.0, 775.0, 1074.0, 829.0], 'poly': [[967.0, 775.0], [1074.0, 775.0], [1074.0, 829.0], [967.0, 829.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '辽宁省大连市甘井子区凌海路3号', 'bbox': [1118.0, 789.0, 1701.0, 844.0], 'poly': [[1118.0, 789.0], [1701.0, 793.0], [1701.0, 844.0], [1118.0, 840.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '邮编', 'bbox': [1727.0, 819.0, 1808.0, 862.0], 'poly': [[1727.0, 819.0], [1808.0, 819.0], [1808.0, 862.0], [1727.0, 862.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '100085', 'bbox': [1893.0, 815.0, 2004.0, 866.0], 'poly': [[1893.0, 815.0], [2004.0, 815.0], [2004.0, 866.0], [1893.0, 866.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '单位', 'bbox': [295.0, 833.0, 391.0, 888.0], 'poly': [[295.0, 833.0], [391.0, 833.0], [391.0, 888.0], [295.0, 888.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '地址', 'bbox': [967.0, 844.0, 1055.0, 888.0], 'poly': [[967.0, 844.0], [1055.0, 844.0], [1055.0, 888.0], [967.0, 888.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '现专业技术职', 'bbox': [207.0, 906.0, 454.0, 957.0], 'poly': [[207.0, 906.0], [454.0, 906.0], [454.0, 957.0], [207.0, 957.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '专技', 'bbox': [974.0, 910.0, 1052.0, 961.0], 'poly': [[974.0, 910.0], [1052.0, 910.0], [1052.0, 961.0], [974.0, 961.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '现任', 'bbox': [1524.0, 917.0, 1620.0, 961.0], 'poly': [[1524.0, 917.0], [1620.0, 917.0], [1620.0, 961.0], [1524.0, 961.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '务及评定时间', 'bbox': [207.0, 968.0, 447.0, 1020.0], 'poly': [[207.0, 968.0], [447.0, 968.0], [447.0, 1020.0], [207.0, 1020.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '等级', 'bbox': [967.0, 968.0, 1052.0, 1023.0], 'poly': [[967.0, 968.0], [1052.0, 968.0], [1052.0, 1023.0], [967.0, 1023.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '行政职务', 'bbox': [1484.0, 976.0, 1653.0, 1027.0], 'poly': [[1484.0, 976.0], [1653.0, 976.0], [1653.0, 1027.0], [1484.0, 1027.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '学科专业', 'bbox': [244.0, 1045.0, 421.0, 1096.0], 'poly': [[244.0, 1045.0], [421.0, 1045.0], [421.0, 1096.0], [244.0, 1096.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '船只制造', 'bbox': [576.0, 1038.0, 749.0, 1093.0], 'poly': [[576.0, 1038.0], [749.0, 1038.0], [749.0, 1093.0], [576.0, 1093.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '主要学术兼职', 'bbox': [1136.0, 1048.0, 1385.0, 1111.0], 'poly': [[1138.0, 1048.0], [1385.0, 1057.0], [1383.0, 1111.0], [1136.0, 1103.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '无', 'bbox': [1546.0, 1052.0, 1624.0, 1104.0], 'poly': [[1546.0, 1052.0], [1624.0, 1052.0], [1624.0, 1104.0], [1546.0, 1104.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '研究生', 'bbox': [450.0, 1125.0, 576.0, 1177.0], 'poly': [[450.0, 1125.0], [576.0, 1125.0], [576.0, 1177.0], [450.0, 1177.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '最高学历', 'bbox': [202.0, 1128.0, 366.0, 1192.0], 'poly': [[205.0, 1128.0], [366.0, 1137.0], [363.0, 1192.0], [202.0, 1183.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '毕业单位', 'bbox': [870.0, 1124.0, 1049.0, 1200.0], 'poly': [[873.0, 1124.0], [1049.0, 1134.0], [1046.0, 1200.0], [870.0, 1190.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '大连海事大学', 'bbox': [1192.0, 1136.0, 1439.0, 1188.0], 'poly': [[1192.0, 1136.0], [1439.0, 1136.0], [1439.0, 1188.0], [1192.0, 1188.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '毕业时间', 'bbox': [1890.0, 1144.0, 2056.0, 1195.0], 'poly': [[1890.0, 1144.0], [2056.0, 1144.0], [2056.0, 1195.0], [1890.0, 1195.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '2019年6月', 'bbox': [2148.0, 1147.0, 2314.0, 1199.0], 'poly': [[2148.0, 1147.0], [2314.0, 1147.0], [2314.0, 1199.0], [2148.0, 1199.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '获得单位', 'bbox': [869.0, 1204.0, 1050.0, 1284.0], 'poly': [[875.0, 1204.0], [1050.0, 1218.0], [1044.0, 1284.0], [869.0, 1270.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '最高学位', 'bbox': [207.0, 1224.0, 369.0, 1275.0], 'poly': [[207.0, 1224.0], [369.0, 1224.0], [369.0, 1275.0], [207.0, 1275.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '硕士学位', 'bbox': [465.0, 1217.0, 624.0, 1272.0], 'poly': [[465.0, 1217.0], [624.0, 1217.0], [624.0, 1272.0], [465.0, 1272.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '大连海事大学', 'bbox': [1188.0, 1231.0, 1428.0, 1283.0], 'poly': [[1188.0, 1231.0], [1428.0, 1231.0], [1428.0, 1283.0], [1188.0, 1283.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '获得时间', 'bbox': [1890.0, 1235.0, 2059.0, 1286.0], 'poly': [[1890.0, 1235.0], [2059.0, 1235.0], [2059.0, 1286.0], [1890.0, 1286.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '2019年6月', 'bbox': [2152.0, 1250.0, 2325.0, 1290.0], 'poly': [[2152.0, 1250.0], [2325.0, 1250.0], [2325.0, 1290.0], [2152.0, 1290.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '「证件号码', 'bbox': [619.0, 1304.0, 813.0, 1375.0], 'poly': [[622.0, 1304.0], [813.0, 1313.0], [810.0, 1375.0], [619.0, 1366.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '证件类型', 'bbox': [202.0, 1307.0, 385.0, 1371.0], 'poly': [[205.0, 1307.0], [385.0, 1316.0], [382.0, 1371.0], [202.0, 1362.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '学位证书', 'bbox': [454.0, 1308.0, 661.0, 1370.0], 'poly': [[454.0, 1308.0], [661.0, 1308.0], [661.0, 1370.0], [454.0, 1370.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '047854648486893', 'bbox': [885.0, 1300.0, 1163.0, 1356.0], 'poly': [[887.0, 1300.0], [1163.0, 1309.0], [1162.0, 1356.0], [885.0, 1348.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '户籍详址', 'bbox': [1395.0, 1319.0, 1572.0, 1370.0], 'poly': [[1395.0, 1319.0], [1572.0, 1319.0], [1572.0, 1370.0], [1395.0, 1370.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '辽宁省西丰县安民镇永淳村二组62号', 'bbox': [1635.0, 1319.0, 2270.0, 1374.0], 'poly': [[1635.0, 1319.0], [2270.0, 1323.0], [2269.0, 1374.0], [1635.0, 1370.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '档案所在', 'bbox': [207.0, 1389.0, 395.0, 1447.0], 'poly': [[207.0, 1389.0], [395.0, 1389.0], [395.0, 1447.0], [207.0, 1447.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '大连海事大学', 'bbox': [483.0, 1388.0, 720.0, 1459.0], 'poly': [[485.0, 1388.0], [720.0, 1396.0], [718.0, 1459.0], [483.0, 1450.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '13585662395', 'bbox': [1634.0, 1409.0, 1843.0, 1474.0], 'poly': [[1637.0, 1409.0], [1843.0, 1422.0], [1839.0, 1474.0], [1634.0, 1461.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '联系电话', 'bbox': [1402.0, 1429.0, 1568.0, 1484.0], 'poly': [[1402.0, 1429.0], [1568.0, 1429.0], [1568.0, 1484.0], [1402.0, 1484.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '100085', 'bbox': [2096.0, 1421.0, 2218.0, 1484.0], 'poly': [[2096.0, 1421.0], [2218.0, 1421.0], [2218.0, 1484.0], [2096.0, 1484.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '邮编', 'bbox': [1971.0, 1432.0, 2063.0, 1487.0], 'poly': [[1971.0, 1432.0], [2063.0, 1432.0], [2063.0, 1487.0], [1971.0, 1487.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '辽宁省大连市廿井子区凌海路3号', 'bbox': [480.0, 1443.0, 1059.0, 1506.0], 'poly': [[480.0, 1443.0], [1059.0, 1443.0], [1059.0, 1506.0], [480.0, 1506.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '单位及地址', 'bbox': [203.0, 1454.0, 413.0, 1506.0], 'poly': [[203.0, 1454.0], [413.0, 1454.0], [413.0, 1506.0], [203.0, 1506.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '有否境外永居', 'bbox': [196.0, 1531.0, 439.0, 1582.0], 'poly': [[196.0, 1531.0], [439.0, 1531.0], [439.0, 1582.0], [196.0, 1582.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '有无上海市居住证', 'bbox': [971.0, 1535.0, 1295.0, 1586.0], 'poly': [[971.0, 1535.0], [1295.0, 1535.0], [1295.0, 1586.0], [971.0, 1586.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '无', 'bbox': [1458.0, 1532.0, 1524.0, 1593.0], 'poly': [[1471.0, 1532.0], [1524.0, 1545.0], [1511.0, 1593.0], [1458.0, 1580.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '居住证有效期', 'bbox': [1646.0, 1542.0, 1901.0, 1593.0], 'poly': [[1646.0, 1542.0], [1901.0, 1542.0], [1901.0, 1593.0], [1646.0, 1593.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '本人', 'bbox': [198.0, 1603.0, 322.0, 1668.0], 'poly': [[202.0, 1603.0], [322.0, 1613.0], [318.0, 1668.0], [198.0, 1658.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '13585662395', 'bbox': [646.0, 1608.0, 856.0, 1659.0], 'poly': [[646.0, 1608.0], [856.0, 1608.0], [856.0, 1659.0], [646.0, 1659.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '联系电话', 'bbox': [358.0, 1619.0, 542.0, 1670.0], 'poly': [[358.0, 1619.0], [542.0, 1619.0], [542.0, 1670.0], [358.0, 1670.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '肖电手命箱', 'bbox': [1373.0, 1626.0, 1572.0, 1677.0], 'poly': [[1373.0, 1626.0], [1572.0, 1626.0], [1572.0, 1677.0], [1373.0, 1677.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '联系电', 'bbox': [66.0, 1659.0, 173.0, 1714.0], 'poly': [[66.0, 1659.0], [173.0, 1659.0], [173.0, 1714.0], [66.0, 1714.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '联关亲', 'bbox': [168.0, 1658.0, 308.0, 1733.0], 'poly': [[173.0, 1658.0], [308.0, 1668.0], [303.0, 1733.0], [168.0, 1723.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '辽宁省大连市甘井子区凌海路3号', 'bbox': [631.0, 1695.0, 1229.0, 1769.0], 'poly': [[632.0, 1695.0], [1229.0, 1707.0], [1228.0, 1769.0], [631.0, 1757.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '通讯地址', 'bbox': [364.0, 1709.0, 532.0, 1773.0], 'poly': [[367.0, 1709.0], [532.0, 1718.0], [529.0, 1773.0], [364.0, 1764.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '方式', 'bbox': [196.0, 1728.0, 303.0, 1787.0], 'poly': [[196.0, 1728.0], [303.0, 1728.0], [303.0, 1787.0], [196.0, 1787.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '邮编', 'bbox': [1982.0, 1732.0, 2048.0, 1776.0], 'poly': [[1982.0, 1732.0], [2048.0, 1732.0], [2048.0, 1776.0], [1982.0, 1776.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '110008市层住证', 'bbox': [2135.0, 1727.0, 2445.0, 1818.0], 'poly': [[2141.0, 1727.0], [2445.0, 1756.0], [2439.0, 1818.0], [2135.0, 1789.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '有香境外永归', 'bbox': [1351.0, 1758.0, 1554.0, 1794.0], 'poly': [[1351.0, 1758.0], [1554.0, 1758.0], [1554.0, 1794.0], [1351.0, 1794.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '起始年月', 'bbox': [362.0, 1812.0, 542.0, 1864.0], 'poly': [[362.0, 1812.0], [542.0, 1812.0], [542.0, 1864.0], [362.0, 1864.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '终止年月', 'bbox': [594.0, 1820.0, 775.0, 1860.0], 'poly': [[594.0, 1820.0], [775.0, 1820.0], [775.0, 1860.0], [594.0, 1860.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '学校或工作单位', 'bbox': [993.0, 1823.0, 1270.0, 1864.0], 'poly': [[993.0, 1823.0], [1270.0, 1823.0], [1270.0, 1864.0], [993.0, 1864.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '学习或任职情况', 'bbox': [1767.0, 1820.0, 2063.0, 1875.0], 'poly': [[1768.0, 1820.0], [2063.0, 1824.0], [2063.0, 1875.0], [1767.0, 1871.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '备注', 'bbox': [2181.0, 1823.0, 2273.0, 1878.0], 'poly': [[2181.0, 1823.0], [2273.0, 1823.0], [2273.0, 1878.0], [2181.0, 1878.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '联系', 'bbox': [1362.0, 1904.0, 1432.0, 1944.0], 'poly': [[1362.0, 1904.0], [1432.0, 1904.0], [1432.0, 1944.0], [1362.0, 1944.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '辽宁省大连市甘', 'bbox': [1808.0, 1933.0, 2085.0, 1973.0], 'poly': [[1808.0, 1933.0], [2085.0, 1933.0], [2085.0, 1973.0], [1808.0, 1973.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '凌海', 'bbox': [2181.0, 1937.0, 2288.0, 1977.0], 'poly': [[2181.0, 1937.0], [2288.0, 1937.0], [2288.0, 1977.0], [2181.0, 1977.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '通讯地士', 'bbox': [1520.0, 1948.0, 1646.0, 1984.0], 'poly': [[1520.0, 1948.0], [1646.0, 1948.0], [1646.0, 1984.0], [1520.0, 1984.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '简历', 'bbox': [225.0, 2094.0, 269.0, 2167.0], 'poly': [[225.0, 2094.0], [269.0, 2094.0], [269.0, 2167.0], [225.0, 2167.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '顺序填记', 'bbox': [225.0, 2361.0, 262.0, 2529.0], 'poly': [[225.0, 2361.0], [262.0, 2361.0], [262.0, 2529.0], [225.0, 2529.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '平火臂防醉间上要连续：', 'bbox': [465.0, 2594.0, 816.0, 2650.0], 'poly': [[466.0, 2594.0], [816.0, 2602.0], [815.0, 2650.0], [465.0, 2642.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '大专家项博气治》任职岗位、访问学者等实际情况', 'bbox': [1107.0, 2601.0, 1945.0, 2664.0], 'poly': [[1108.0, 2601.0], [1945.0, 2617.0], [1944.0, 2664.0], [1107.0, 2649.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '直博', 'bbox': [210.0, 2642.0, 306.0, 2686.0], 'poly': [[210.0, 2642.0], [306.0, 2642.0], [306.0, 2686.0], [210.0, 2686.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '茶职、张超等经房理，无就迎或学习器明的价段需填写“待业”或注明原', 'bbox': [406.0, 2711.0, 1845.0, 2785.0], 'poly': [[406.0, 2711.0], [1845.0, 2726.0], [1845.0, 2785.0], [406.0, 2770.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '看水书名称', 'bbox': [517.0, 2799.0, 775.0, 2850.0], 'poly': [[517.0, 2799.0], [775.0, 2799.0], [775.0, 2850.0], [517.0, 2850.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '写作菜', 'bbox': [1140.0, 2806.0, 1244.0, 2854.0], 'poly': [[1140.0, 2806.0], [1244.0, 2806.0], [1244.0, 2854.0], [1140.0, 2854.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '等级', 'bbox': [1941.0, 2814.0, 2030.0, 2858.0], 'poly': [[1941.0, 2814.0], [2030.0, 2814.0], [2030.0, 2858.0], [1941.0, 2858.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '语种', 'bbox': [369.0, 2825.0, 465.0, 2879.0], 'poly': [[369.0, 2825.0], [465.0, 2825.0], [465.0, 2879.0], [369.0, 2879.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '口语水平', 'bbox': [900.0, 2828.0, 1070.0, 2879.0], 'poly': [[900.0, 2828.0], [1070.0, 2828.0], [1070.0, 2879.0], [900.0, 2879.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '女名积得时间', 'bbox': [1581.0, 2808.0, 1847.0, 2896.0], 'poly': [[1581.0, 2829.0], [1842.0, 2808.0], [1847.0, 2875.0], [1587.0, 2896.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '取得时间。', 'bbox': [2117.0, 2827.0, 2304.0, 2906.0], 'poly': [[2122.0, 2827.0], [2304.0, 2841.0], [2299.0, 2906.0], [2117.0, 2893.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '及或货', 'bbox': [203.0, 2843.0, 280.0, 2883.0], 'poly': [[203.0, 2843.0], [280.0, 2843.0], [280.0, 2883.0], [203.0, 2883.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '（等级城分敷）', 'bbox': [546.0, 2861.0, 830.0, 2912.0], 'poly': [[546.0, 2861.0], [830.0, 2861.0], [830.0, 2912.0], [546.0, 2912.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '水平', 'bbox': [1140.0, 2869.0, 1236.0, 2912.0], 'poly': [[1140.0, 2869.0], [1236.0, 2869.0], [1236.0, 2912.0], [1140.0, 2912.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '或分数', 'bbox': [1919.0, 2872.0, 2045.0, 2923.0], 'poly': [[1919.0, 2872.0], [2045.0, 2872.0], [2045.0, 2923.0], [1919.0, 2923.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '语', 'bbox': [218.0, 2898.0, 288.0, 2960.0], 'poly': [[218.0, 2898.0], [288.0, 2898.0], [288.0, 2960.0], [218.0, 2960.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '其他', 'bbox': [1310.0, 2901.0, 1399.0, 2960.0], 'poly': [[1310.0, 2901.0], [1399.0, 2901.0], [1399.0, 2960.0], [1310.0, 2960.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '英语', 'bbox': [365.0, 2934.0, 480.0, 2989.0], 'poly': [[365.0, 2934.0], [480.0, 2934.0], [480.0, 2989.0], [365.0, 2989.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '六级', 'bbox': [661.0, 2931.0, 760.0, 2985.0], 'poly': [[661.0, 2931.0], [760.0, 2931.0], [760.0, 2985.0], [661.0, 2985.0]], 'pred_id': 3, 'pred': 'ANSWER'}, {'text': '育能', 'bbox': [218.0, 2949.0, 258.0, 3033.0], 'poly': [[218.0, 2949.0], [258.0, 2949.0], [258.0, 3033.0], [218.0, 3033.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '资格', 'bbox': [1306.0, 2964.0, 1395.0, 3026.0], 'poly': [[1306.0, 2964.0], [1395.0, 2964.0], [1395.0, 3026.0], [1306.0, 3026.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '证书', 'bbox': [1306.0, 3026.0, 1402.0, 3084.0], 'poly': [[1306.0, 3026.0], [1402.0, 3026.0], [1402.0, 3084.0], [1306.0, 3084.0]], 'pred_id': 0, 'pred': 'O'}, {'text': '专主，错/号作水平分为：精通、熟练、一般。', 'bbox': [346.0, 3102.0, 1015.0, 3169.0], 'poly': [[348.0, 3102.0], [1015.0, 3117.0], [1014.0, 3169.0], [346.0, 3153.0]], 'pred_id': 1, 'pred': 'QUESTION'}, {'text': '1-3', 'bbox': [1210.0, 3270.0, 1262.0, 3318.0], 'poly': [[1210.0, 3270.0], [1262.0, 3270.0], [1262.0, 3318.0], [1210.0, 3318.0]], 'pred_id': 0, 'pred': 'O'}]

3. 训练

本节以XFUN中文数据集为例介绍如何完成SER模型的训练、评估与测试。

3.1 数据准备

这里使用XFUN数据集做为实验数据集。
XFUN数据集是微软提出的一个用于KIE任务的多语言数据集，共包含七个数据集，每个数据集包含149张训练集和50张验证集

ZH(中文)
JA(日语)
ES(西班牙)
FR(法语)
IT(意大利)
DE(德语)
PT(葡萄牙)

本次实验选取中文数据集作为我们的演示数据集。法语数据集作为实践课程的数据集，数据集样例图如下图所示

图2 数据集样例，左中文，右法语
可以运行如下指令完成中文数据集下载和解压，或者从 https://github.com/doc-analysis/XFUND 中自行下载。

! wget https://paddleocr.bj.bcebos.com/dataset/XFUND.tar
! tar -xf XFUND.tar
# XFUN其他数据集使用下面的代码进行转换
# https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.4/ppstructure/vqa/helper/trans_xfun_data.py

File ‘XFUND.tar’ already there; not retrieving.

运行上述指令后在 /home/aistudio/PaddleOCR/ppstructure/vqa/XFUND 目录下有2个文件夹，目录结构如下所示：

/home/aistudio/PaddleOCR/ppstructure/vqa/XFUND
  └─ zh_train/                     训练集
      ├── image/                图片存放文件夹
      ├── xfun_normalize_train.json     标注信息
  └─ zh_val/                     验证集
      ├── image/            图片存放文件夹
      ├── xfun_normalize_val.json     标注信息

该数据集的标注格式为

{
    "height": 3508, # 图像高度
    "width": 2480,  # 图像宽度
    "ocr_info": [
        {
            "text": "邮政地址:",  # 单个文本内容
            "label": "question", # 文本所属类别
            "bbox": [261, 802, 483, 859], # 单个文本框
            "id": 54,  # 文本索引
            "linking": [[54, 60]], # 当前文本和其他文本的关系 [question, answer]
            "words": []
        },
        {
            "text": "湖南省怀化市市辖区",
            "label": "answer",
            "bbox": [487, 810, 862, 859],
            "id": 60,
            "linking": [[54, 60]],
            "words": []
        }
    ]
}

3.2 损失函数定义

因为是多分类任务，loss使用 CrossEntropyLoss

3.3 模型训练

完成数据处理和损失函数定义后即可开始训练模型了。
具体训练命令如下：

! python train_ser.py \
    --model_name_or_path "layoutxlm-base-uncased" \
    --ser_model_type "LayoutXLM" \
    --train_data_dir "XFUND/zh_train/image" \
    --train_label_path "XFUND/zh_train/xfun_normalize_train.json" \
    --eval_data_dir "XFUND/zh_val/image" \
    --eval_label_path "XFUND/zh_val/xfun_normalize_val.json" \
    --per_gpu_train_batch_size 1 \
    --per_gpu_eval_batch_size 1 \
    --num_train_epochs 200 \
    --eval_steps 10 \
    --output_dir "./output/ser/" \
    --learning_rate 5e-5 \
    --warmup_steps 50 \
    --evaluate_during_training \
    --num_workers 0 \
    --seed 2048

[2021/12/25 21:12:05] root INFO: -----------  Configuration Arguments -----------
[2021/12/25 21:12:05] root INFO: adam_epsilon: 1e-08
[2021/12/25 21:12:05] root INFO: det_model_dir: None
[2021/12/25 21:12:05] root INFO: eval_data_dir: XFUND/zh_val/image
[2021/12/25 21:12:05] root INFO: eval_label_path: XFUND/zh_val/xfun_normalize_val.json
[2021/12/25 21:12:05] root INFO: eval_steps: 10
[2021/12/25 21:12:05] root INFO: evaluate_during_training: True
[2021/12/25 21:12:05] root INFO: infer_imgs: None
[2021/12/25 21:12:05] root INFO: label_map_path: ./labels/labels_ser.txt
[2021/12/25 21:12:05] root INFO: learning_rate: 5e-05
[2021/12/25 21:12:05] root INFO: max_grad_norm: 1.0
[2021/12/25 21:12:05] root INFO: max_seq_length: 512
[2021/12/25 21:12:05] root INFO: model_name_or_path: layoutxlm-base-uncased
[2021/12/25 21:12:05] root INFO: num_train_epochs: 200
[2021/12/25 21:12:05] root INFO: num_workers: 0
[2021/12/25 21:12:05] root INFO: ocr_json_path: None
[2021/12/25 21:12:05] root INFO: output_dir: ./output/ser/
[2021/12/25 21:12:05] root INFO: per_gpu_eval_batch_size: 1
[2021/12/25 21:12:05] root INFO: per_gpu_train_batch_size: 1
[2021/12/25 21:12:05] root INFO: re_model_name_or_path: None
[2021/12/25 21:12:05] root INFO: rec_model_dir: None
[2021/12/25 21:12:05] root INFO: resume: False
[2021/12/25 21:12:05] root INFO: seed: 2048
[2021/12/25 21:12:05] root INFO: ser_model_type: LayoutXLM
[2021/12/25 21:12:05] root INFO: train_data_dir: XFUND/zh_train/image
[2021/12/25 21:12:05] root INFO: train_label_path: XFUND/zh_train/xfun_normalize_train.json
[2021/12/25 21:12:05] root INFO: warmup_steps: 50
[2021/12/25 21:12:05] root INFO: weight_decay: 0.0
[2021/12/25 21:12:05] root INFO: ------------------------------------------------
[2021-12-25 21:12:05,398] [    INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/transformers/layoutxlm_base/sentencepiece.bpe.model and saved to /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased
[2021-12-25 21:12:05,398] [    INFO] - Downloading sentencepiece.bpe.model from https://bj.bcebos.com/paddlenlp/models/transformers/layoutxlm_base/sentencepiece.bpe.model
100%|██████████████████████████████████████| 4.83M/4.83M [00:00<00:00, 50.2MB/s]
[2021-12-25 21:12:06,284] [    INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/transformers/layoutxlm_base/model_state.pdparams and saved to /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased
[2021-12-25 21:12:06,285] [    INFO] - Downloading model_state.pdparams from https://bj.bcebos.com/paddlenlp/models/transformers/layoutxlm_base/model_state.pdparams
100%|██████████████████████████████████████| 1.37G/1.37G [00:22<00:00, 67.1MB/s]
W1225 21:12:28.457448  1236 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W1225 21:12:28.461452  1236 device_context.cc:465] device: 0, cuDNN Version: 7.6.
[2021/12/25 21:12:39] root INFO: train from scratch
[2021/12/25 21:12:40] root INFO: ***** Running training *****
[2021/12/25 21:12:40] root INFO:   Num examples = 149
[2021/12/25 21:12:40] root INFO:   Num Epochs = 200
[2021/12/25 21:12:40] root INFO:   Instantaneous batch size per GPU = 1
[2021/12/25 21:12:40] root INFO:   Total train batch size (w. parallel, distributed) = 1
[2021/12/25 21:12:40] root INFO:   Total optimization steps = 29800
[2021/12/25 21:12:41] root INFO: epoch: [0/200], iter: [0/149], global_step:1, train loss: 1.922090, lr: 0.000001, avg_reader_cost: 0.60275 sec, avg_batch_cost: 0.71423 sec, avg_samples: 1.00000, ips: 1.40012 images/sec
[2021/12/25 21:12:41] root INFO: epoch: [0/200], iter: [1/149], global_step:2, train loss: 1.916964, lr: 0.000002, avg_reader_cost: 0.00012 sec, avg_batch_cost: 0.11029 sec, avg_samples: 1.00000, ips: 9.06733 images/sec
[2021/12/25 21:12:41] root INFO: epoch: [0/200], iter: [2/149], global_step:3, train loss: 1.950876, lr: 0.000003, avg_reader_cost: 0.00013 sec, avg_batch_cost: 0.08194 sec, avg_samples: 1.00000, ips: 12.20463 images/sec
[2021/12/25 21:12:41] root INFO: epoch: [0/200], iter: [3/149], global_step:4, train loss: 1.901511, lr: 0.000004, avg_reader_cost: 0.00011 sec, avg_batch_cost: 0.08840 sec, avg_samples: 1.00000, ips: 11.31248 images/sec
[2021/12/25 21:12:42] root INFO: epoch: [0/200], iter: [4/149], global_step:5, train loss: 1.868079, lr: 0.000005, avg_reader_cost: 0.00012 sec, avg_batch_cost: 0.08239 sec, avg_samples: 1.00000, ips: 12.13703 images/sec
[2021/12/25 21:12:42] root INFO: epoch: [0/200], iter: [5/149], global_step:6, train loss: 1.754278, lr: 0.000006, avg_reader_cost: 0.00013 sec, avg_batch_cost: 0.08332 sec, avg_samples: 1.00000, ips: 12.00248 images/sec
[2021/12/25 21:12:42] root INFO: epoch: [0/200], iter: [6/149], global_step:7, train loss: 1.869870, lr: 0.000007, avg_reader_cost: 0.00012 sec, avg_batch_cost: 0.08287 sec, avg_samples: 1.00000, ips: 12.06661 images/sec
[2021/12/25 21:12:42] root INFO: epoch: [0/200], iter: [7/149], global_step:8, train loss: 1.756458, lr: 0.000008, avg_reader_cost: 0.00013 sec, avg_batch_cost: 0.09483 sec, avg_samples: 1.00000, ips: 10.54489 images/sec
[2021/12/25 21:12:42] root INFO: epoch: [0/200], iter: [8/149], global_step:9, train loss: 1.645109, lr: 0.000009, avg_reader_cost: 0.00012 sec, avg_batch_cost: 0.08488 sec, avg_samples: 1.00000, ips: 11.78179 images/sec
[2021/12/25 21:12:43] root INFO: epoch: [0/200], iter: [9/149], global_step:10, train loss: 1.599218, lr: 0.000010, avg_reader_cost: 0.00012 sec, avg_batch_cost: 0.09367 sec, avg_samples: 1.00000, ips: 10.67612 images/sec
[2021/12/25 21:12:43] root INFO: [Eval]process: 0/50, loss: 1.62544
[2021/12/25 21:12:43] root INFO: [Eval]process: 1/50, loss: 1.43953
[2021/12/25 21:12:43] root INFO: [Eval]process: 2/50, loss: 1.52960
[2021/12/25 21:12:43] root INFO: [Eval]process: 3/50, loss: 1.77400
[2021/12/25 21:12:44] root INFO: [Eval]process: 4/50, loss: 1.52048
[2021/12/25 21:12:44] root INFO: [Eval]process: 5/50, loss: 1.62590
[2021/12/25 21:12:44] root INFO: [Eval]process: 6/50, loss: 1.39941
[2021/12/25 21:12:44] root INFO: [Eval]process: 7/50, loss: 1.60583
[2021/12/25 21:12:44] root INFO: [Eval]process: 8/50, loss: 1.60298
[2021/12/25 21:12:44] root INFO: [Eval]process: 9/50, loss: 1.64268
[2021/12/25 21:12:45] root INFO: [Eval]process: 10/50, loss: 1.70920
[2021/12/25 21:12:45] root INFO: [Eval]process: 11/50, loss: 1.49873
[2021/12/25 21:12:45] root INFO: [Eval]process: 12/50, loss: 1.69277
[2021/12/25 21:12:45] root INFO: [Eval]process: 13/50, loss: 1.63639
[2021/12/25 21:12:45] root INFO: [Eval]process: 14/50, loss: 1.78201
[2021/12/25 21:12:45] root INFO: [Eval]process: 15/50, loss: 1.48458
[2021/12/25 21:12:45] root INFO: [Eval]process: 16/50, loss: 1.65950
[2021/12/25 21:12:46] root INFO: [Eval]process: 17/50, loss: 1.67226
[2021/12/25 21:12:46] root INFO: [Eval]process: 18/50, loss: 1.65881
[2021/12/25 21:12:46] root INFO: [Eval]process: 19/50, loss: 1.67092
[2021/12/25 21:12:46] root INFO: [Eval]process: 20/50, loss: 1.58707
[2021/12/25 21:12:46] root INFO: [Eval]process: 21/50, loss: 1.77305
[2021/12/25 21:12:46] root INFO: [Eval]process: 22/50, loss: 1.54853
[2021/12/25 21:12:47] root INFO: [Eval]process: 23/50, loss: 1.74551
[2021/12/25 21:12:47] root INFO: [Eval]process: 24/50, loss: 1.63271
[2021/12/25 21:12:47] root INFO: [Eval]process: 25/50, loss: 1.41209
[2021/12/25 21:12:47] root INFO: [Eval]process: 26/50, loss: 1.79890
[2021/12/25 21:12:47] root INFO: [Eval]process: 27/50, loss: 1.59780
^C
Traceback (most recent call last):
  File "train_ser.py", line 248, in <module>
    train(args)
  File "train_ser.py", line 210, in train
    id2label_map, pad_token_label_id, logger)
  File "/home/aistudio/PaddleOCR/ppstructure/vqa/eval_ser.py", line 103, in evaluate
    for idx, batch in enumerate(eval_dataloader):
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 255, in __next__
    data = self._reader.read_next_var_list()
KeyboardInterrupt

3.4 模型评估

训练过程中，默认保存两种模型，一种是latest命名的最新训练的模型，一种是best命名的精度最高的模型。保存模型的文件夹结构如下所示

output/ser/
├── best_model
│   ├── model_config.json   # 模型配置
│   ├── model_state.pdparams # 模型参数
│   ├── sentencepiece.bpe.model # 分词器的参数
│   ├── tokenizer_config.json # tokenizer的配置
│   └── training_args.bin # 启动训练时的参数
├── infer_results.txt
├── latest_model
│   ├── model_config.json
│   ├── model_state.pdparams
│   ├── sentencepiece.bpe.model
│   ├── tokenizer_config.json
│   └── training_args.bin
├── test_gt.txt
├── test_pred.txt
└── train.log   # 训练日志

接下来使用保存的模型参数评估在测试集上的准确率：

! python eval_ser.py \
    --model_name_or_path "output/ser/best_model" \
    --ser_model_type "LayoutXLM" \
    --eval_data_dir "XFUND/zh_val/image" \
    --eval_label_path "XFUND/zh_val/xfun_normalize_val.json" \
    --per_gpu_eval_batch_size 8 \
    --num_workers 8 \
    --output_dir "output/ser/"  \
    --seed 2048

3.5 模型预测

训练好模型后，也可以使用保存好的模型，对单张图片或者某个文件夹的图像进行模型推理，观察模型预测效果。

! python3 infer_ser_e2e.py \
    --model_name_or_path "./inference/PP-Layout_v1.0_ser_pretrained/" \
    --ser_model_type "LayoutXLM" \
    --max_seq_length 512 \
    --output_dir "output/ser_e2e/" \
    --infer_imgs "images/input/zh_val_42.jpg"

W1225 21:12:57.135272  1310 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W1225 21:12:57.138957  1310 device_context.cc:465] device: 0, cuDNN Version: 7.6.
[2021/12/25 21:13:06] root WARNING: version PP-OCRv2 not support cls models, auto switch to version PP-OCR
Namespace(benchmark=False, cls_batch_num=6, cls_image_shape='3, 48, 192', cls_model_dir='/home/aistudio/.paddleocr/2.3.0.2/ocr/cls/ch_ppocr_mobile_v2.0_cls_infer', cls_thresh=0.9, cpu_threads=10, det=True, det_algorithm='DB', det_db_box_thresh=0.6, det_db_score_mode='fast', det_db_thresh=0.3, det_db_unclip_ratio=1.5, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_east_score_thresh=0.8, det_limit_side_len=960, det_limit_type='max', det_model_dir='/home/aistudio/.paddleocr/2.3.0.2/ocr/det/ch/ch_PP-OCRv2_det_infer', det_pse_box_thresh=0.85, det_pse_box_type='box', det_pse_min_area=16, det_pse_scale=1, det_pse_thresh=0, det_sast_nms_thresh=0.2, det_sast_polygon=False, det_sast_score_thresh=0.5, drop_score=0.5, e2e_algorithm='PGNet', e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_limit_side_len=768, e2e_limit_type='max', e2e_model_dir=None, e2e_pgnet_mode='fast', e2e_pgnet_polygon=True, e2e_pgnet_score_thresh=0.5, e2e_pgnet_valid_set='totaltext', enable_mkldnn=False, gpu_mem=500, help='==SUPPRESS==', image_dir=None, ir_optim=True, label_list=['0', '180'], lang='ch', layout_path_model='lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config', max_batch_size=10, max_text_length=25, min_subgraph_size=15, ocr_version='PP-OCRv2', output='./output/table', precision='fp32', process_id=0, rec=True, rec_algorithm='CRNN', rec_batch_num=6, rec_char_dict_path='/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleocr/ppocr/utils/ppocr_keys_v1.txt', rec_image_shape='3, 32, 320', rec_model_dir='/home/aistudio/.paddleocr/2.3.0.2/ocr/rec/ch/ch_PP-OCRv2_rec_infer', save_log_path='./log_output/', show_log=False, structure_version='STRUCTURE', table_char_dict_path=None, table_char_type='en', table_max_len=488, table_model_dir=None, total_process_num=1, type='ocr', use_angle_cls=False, use_dilation=False, use_gpu=True, use_mp=False, use_onnx=False, use_pdserving=False, use_space_char=True, use_tensorrt=False, vis_font_path='./doc/fonts/simfang.ttf', warmup=True)
process: [0/1], save result to output/ser_e2e/zh_val_42_ser.jpg
Corrupt JPEG data: premature end of data segment

4 作业

实验题
[https://aistudio.baidu.com/aistudio/projectdetail/3281385](

3.文档分析实战-VQA

DOC-VQA SER实战

1. 快速体验

clone PaddleOCR代码

安装依赖包

安装依赖包

2. 原理详解

2.1 输入数据处理

2.2 SER网络

2.3 后处理

3. 训练

3.1 数据准备

3.2 损失函数定义

3.3 模型训练

3.4 模型评估

3.5 模型预测

4 作业