OCR表格识别实战
本节将介绍如何使用PaddleOCR完成表格识别算法的训练与运行,包括:
- 理解表格识别算法原理
 - 掌握PaddleOCR表格识别代码的训练和预测流程
 
1. 快速体验
快速演示 PP-Structure 预测,首先下载PaddleOCR代码并安装依赖包
# clone PaddleOCR代码# ! git clone https://github.com/PaddlePaddle/PaddleOCR# 安装依赖包! pip install -U pip! pip install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl! pip install -r PaddleOCR/requirements.txt! pip install pandas
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simpleCollecting pip[?25l Downloading https://pypi.tuna.tsinghua.edu.cn/packages/a4/6d/6463d49a933f547439d6b5b98b46af8742cc03ae83543e4d7688c2420f8b/pip-21.3.1-py3-none-any.whl (1.7MB)[K |████████████████████████████████| 1.7MB 11.7MB/s eta 0:00:01[?25hInstalling collected packages: pipFound existing installation: pip 19.2.3Uninstalling pip-19.2.3:Successfully uninstalled pip-19.2.3Successfully installed pip-21.3.1Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simpleCollecting layoutparser==0.0.0Downloading https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl (19.1 MB)|████████████████████████████████| 19.1 MB 3.1 MB/s[?25hCollecting iopathDownloading https://pypi.tuna.tsinghua.edu.cn/packages/af/20/65dd9bd25a1eb7fa35b5ae38d289126af065f8a0c1f6a90564f4bff0f89d/iopath-0.1.9-py3-none-any.whl (27 kB)Requirement already satisfied: pyyaml>=5.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from layoutparser==0.0.0) (5.1.2)Requirement already satisfied: pandas in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from layoutparser==0.0.0) (1.1.5)Requirement already satisfied: opencv-python in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from layoutparser==0.0.0) (4.1.1.26)Requirement already satisfied: numpy in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from layoutparser==0.0.0) (1.16.4)Requirement already satisfied: pillow in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from layoutparser==0.0.0) (7.1.2)Requirement already satisfied: tqdm in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from layoutparser==0.0.0) (4.27.0)Collecting portalockerDownloading https://pypi.tuna.tsinghua.edu.cn/packages/63/eb/f84872af6e9312ea2f345b218015a41191cfd37eeba4a4fd228f241c2a75/portalocker-2.3.2-py2.py3-none-any.whl (15 kB)Requirement already satisfied: pytz>=2017.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pandas->layoutparser==0.0.0) (2019.3)Requirement already satisfied: python-dateutil>=2.7.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pandas->layoutparser==0.0.0) (2.8.0)Requirement already satisfied: six>=1.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas->layoutparser==0.0.0) (1.15.0)Installing collected packages: portalocker, iopath, layoutparserSuccessfully installed iopath-0.1.9 layoutparser-0.0.0 portalocker-2.3.2Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simpleCollecting shapelyDownloading https://pypi.tuna.tsinghua.edu.cn/packages/ae/20/33ce377bd24d122a4d54e22ae2c445b9b1be8240edb50040b40add950cd9/Shapely-1.8.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.1 MB)|████████████████████████████████| 1.1 MB 19.2 MB/s[?25hCollecting scikit-imageDownloading https://pypi.tuna.tsinghua.edu.cn/packages/9a/44/8f8c7f9c9de7fde70587a656d7df7d056e6f05192a74491f7bc074a724d0/scikit_image-0.19.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (13.3 MB)|████████████████████████████████| 13.3 MB 64.3 MB/s[?25hCollecting imgaug==0.4.0Downloading https://pypi.tuna.tsinghua.edu.cn/packages/66/b1/af3142c4a85cba6da9f4ebb5ff4e21e2616309552caca5e8acefe9840622/imgaug-0.4.0-py2.py3-none-any.whl (948 kB)|████████████████████████████████| 948 kB 51.8 MB/s[?25hCollecting pyclipperDownloading https://pypi.tuna.tsinghua.edu.cn/packages/c5/fa/2c294127e4f88967149a68ad5b3e43636e94e3721109572f8f17ab15b772/pyclipper-1.3.0.post2-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (603 kB)|████████████████████████████████| 603 kB 62.1 MB/s[?25hCollecting lmdbDownloading https://pypi.tuna.tsinghua.edu.cn/packages/2e/dd/ada2fd91cd7832979069c556607903f274470c3d3d2274e0a848908272e8/lmdb-1.2.1-cp37-cp37m-manylinux2010_x86_64.whl (299 kB)|████████████████████████████████| 299 kB 65.7 MB/s[?25hRequirement already satisfied: tqdm in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r PaddleOCR/requirements.txt (line 6)) (4.27.0)Requirement already satisfied: numpy in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r PaddleOCR/requirements.txt (line 7)) (1.16.4)Requirement already satisfied: visualdl in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r PaddleOCR/requirements.txt (line 8)) (2.2.0)Collecting python-LevenshteinDownloading https://pypi.tuna.tsinghua.edu.cn/packages/2a/dc/97f2b63ef0fa1fd78dcb7195aca577804f6b2b51e712516cc0e902a9a201/python-Levenshtein-0.12.2.tar.gz (50 kB)|████████████████████████████████| 50 kB 12.2 MB/s[?25h Preparing metadata (setup.py) ... [?25ldone[?25hCollecting opencv-contrib-python==4.4.0.46Downloading https://pypi.tuna.tsinghua.edu.cn/packages/08/51/1e0a206dd5c70fea91084e6f43979dc13e8eb175760cc7a105083ec3eb68/opencv_contrib_python-4.4.0.46-cp37-cp37m-manylinux2014_x86_64.whl (55.7 MB)|████████████████████████████████| 55.7 MB 46 kB/s[?25hRequirement already satisfied: cython in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r PaddleOCR/requirements.txt (line 11)) (0.29)Collecting lxmlDownloading https://pypi.tuna.tsinghua.edu.cn/packages/7b/01/16a9b80c8ce4339294bb944f08e157dbfcfbb09ba9031bde4ddf7e3e5499/lxml-4.7.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (6.4 MB)|████████████████████████████████| 6.4 MB 59.5 MB/s[?25hCollecting premailerDownloading https://pypi.tuna.tsinghua.edu.cn/packages/b1/07/4e8d94f94c7d41ca5ddf8a9695ad87b888104e2fd41a35546c1dc9ca74ac/premailer-3.10.0-py2.py3-none-any.whl (19 kB)Requirement already satisfied: openpyxl in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r PaddleOCR/requirements.txt (line 14)) (3.0.5)Collecting fasttext==0.9.1Downloading https://pypi.tuna.tsinghua.edu.cn/packages/10/61/2e01f1397ec533756c1d893c22d9d5ed3fce3a6e4af1976e0d86bb13ea97/fasttext-0.9.1.tar.gz (57 kB)|████████████████████████████████| 57 kB 7.7 MB/s[?25h Preparing metadata (setup.py) ... [?25ldone[?25hRequirement already satisfied: Pillow in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->-r PaddleOCR/requirements.txt (line 3)) (7.1.2)Requirement already satisfied: opencv-python in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->-r PaddleOCR/requirements.txt (line 3)) (4.1.1.26)Requirement already satisfied: scipy in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->-r PaddleOCR/requirements.txt (line 3)) (1.3.0)Requirement already satisfied: imageio in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->-r PaddleOCR/requirements.txt (line 3)) (2.6.1)Requirement already satisfied: six in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->-r PaddleOCR/requirements.txt (line 3)) (1.15.0)Requirement already satisfied: matplotlib in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->-r PaddleOCR/requirements.txt (line 3)) (2.2.3)Collecting pybind11>=2.2Using cached https://pypi.tuna.tsinghua.edu.cn/packages/a8/3b/fc246e1d4c7547a7a07df830128e93c6215e9b93dcb118b2a47a70726153/pybind11-2.8.1-py2.py3-none-any.whl (208 kB)Requirement already satisfied: setuptools>=0.7.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from fasttext==0.9.1->-r PaddleOCR/requirements.txt (line 15)) (41.4.0)Collecting tifffile>=2019.7.26Downloading https://pypi.tuna.tsinghua.edu.cn/packages/d8/38/85ae5ed77598ca90558c17a2f79ddaba33173b31cf8d8f545d34d9134f0d/tifffile-2021.11.2-py3-none-any.whl (178 kB)|████████████████████████████████| 178 kB 78.8 MB/s[?25hCollecting numpyDownloading https://pypi.tuna.tsinghua.edu.cn/packages/50/46/292cff79f5b30151b027400efdb3f740ea03271b600751b6696cf550c10d/numpy-1.21.5-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.7 MB)|████████████████████████████████| 15.7 MB 59.0 MB/s[?25hCollecting PyWavelets>=1.1.1Downloading https://pypi.tuna.tsinghua.edu.cn/packages/a1/9c/564511b6e1c4e1d835ed2d146670436036960d09339a8fa2921fe42dad08/PyWavelets-1.2.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (6.1 MB)|████████████████████████████████| 6.1 MB 61.8 MB/s[?25hRequirement already satisfied: networkx>=2.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-image->-r PaddleOCR/requirements.txt (line 2)) (2.4)Requirement already satisfied: packaging>=20.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-image->-r PaddleOCR/requirements.txt (line 2)) (20.9)Collecting scipyDownloading https://pypi.tuna.tsinghua.edu.cn/packages/58/4f/11f34cfc57ead25752a7992b069c36f5d18421958ebd6466ecd849aeaf86/scipy-1.7.3-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (38.1 MB)|████████████████████████████████| 38.1 MB 64.5 MB/s[?25hRequirement already satisfied: Flask-Babel>=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r PaddleOCR/requirements.txt (line 8)) (1.0.0)Requirement already satisfied: pre-commit in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r PaddleOCR/requirements.txt (line 8)) (1.21.0)Requirement already satisfied: flask>=1.1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r PaddleOCR/requirements.txt (line 8)) (1.1.1)Requirement already satisfied: bce-python-sdk in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r PaddleOCR/requirements.txt (line 8)) (0.8.53)Requirement already satisfied: protobuf>=3.11.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r PaddleOCR/requirements.txt (line 8)) (3.14.0)Requirement already satisfied: requests in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r PaddleOCR/requirements.txt (line 8)) (2.22.0)Requirement already satisfied: shellcheck-py in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r PaddleOCR/requirements.txt (line 8)) (0.7.1.1)Requirement already satisfied: pandas in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r PaddleOCR/requirements.txt (line 8)) (1.1.5)Requirement already satisfied: flake8>=3.7.9 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r PaddleOCR/requirements.txt (line 8)) (3.8.2)Collecting cssselectDownloading https://pypi.tuna.tsinghua.edu.cn/packages/3b/d4/3b5c17f00cce85b9a1e6f91096e1cc8e8ede2e1be8e96b87ce1ed09e92c5/cssselect-1.1.0-py2.py3-none-any.whl (16 kB)Requirement already satisfied: cachetools in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from premailer->-r PaddleOCR/requirements.txt (line 13)) (4.0.0)Collecting cssutilsDownloading https://pypi.tuna.tsinghua.edu.cn/packages/24/c4/9db28fe567612896d360ab28ad02ee8ae107d0e92a22db39affd3fba6212/cssutils-2.3.0-py3-none-any.whl (404 kB)|████████████████████████████████| 404 kB 56.5 MB/s[?25hRequirement already satisfied: et-xmlfile in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from openpyxl->-r PaddleOCR/requirements.txt (line 14)) (1.0.1)Requirement already satisfied: jdcal in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from openpyxl->-r PaddleOCR/requirements.txt (line 14)) (1.4.1)Requirement already satisfied: importlib-metadata in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->-r PaddleOCR/requirements.txt (line 8)) (0.23)Requirement already satisfied: pycodestyle<2.7.0,>=2.6.0a1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->-r PaddleOCR/requirements.txt (line 8)) (2.6.0)Requirement already satisfied: mccabe<0.7.0,>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->-r PaddleOCR/requirements.txt (line 8)) (0.6.1)Requirement already satisfied: pyflakes<2.3.0,>=2.2.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->-r PaddleOCR/requirements.txt (line 8)) (2.2.0)Requirement already satisfied: Werkzeug>=0.15 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->-r PaddleOCR/requirements.txt (line 8)) (0.16.0)Requirement already satisfied: click>=5.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->-r PaddleOCR/requirements.txt (line 8)) (7.0)Requirement already satisfied: itsdangerous>=0.24 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->-r PaddleOCR/requirements.txt (line 8)) (1.1.0)Requirement already satisfied: Jinja2>=2.10.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->-r PaddleOCR/requirements.txt (line 8)) (2.11.0)Requirement already satisfied: Babel>=2.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Flask-Babel>=1.0.0->visualdl->-r PaddleOCR/requirements.txt (line 8)) (2.8.0)Requirement already satisfied: pytz in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Flask-Babel>=1.0.0->visualdl->-r PaddleOCR/requirements.txt (line 8)) (2019.3)Requirement already satisfied: decorator>=4.3.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from networkx>=2.2->scikit-image->-r PaddleOCR/requirements.txt (line 2)) (4.4.2)Requirement already satisfied: pyparsing>=2.0.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from packaging>=20.0->scikit-image->-r PaddleOCR/requirements.txt (line 2)) (2.4.2)Requirement already satisfied: future>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from bce-python-sdk->visualdl->-r PaddleOCR/requirements.txt (line 8)) (0.18.0)Requirement already satisfied: pycryptodome>=3.8.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from bce-python-sdk->visualdl->-r PaddleOCR/requirements.txt (line 8)) (3.9.9)Requirement already satisfied: cycler>=0.10 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->imgaug==0.4.0->-r PaddleOCR/requirements.txt (line 3)) (0.10.0)Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->imgaug==0.4.0->-r PaddleOCR/requirements.txt (line 3)) (1.1.0)Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->imgaug==0.4.0->-r PaddleOCR/requirements.txt (line 3)) (2.8.0)Requirement already satisfied: nodeenv>=0.11.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->-r PaddleOCR/requirements.txt (line 8)) (1.3.4)Requirement already satisfied: aspy.yaml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->-r PaddleOCR/requirements.txt (line 8)) (1.3.0)Requirement already satisfied: identify>=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->-r PaddleOCR/requirements.txt (line 8)) (1.4.10)Requirement already satisfied: pyyaml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->-r PaddleOCR/requirements.txt (line 8)) (5.1.2)Requirement already satisfied: virtualenv>=15.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->-r PaddleOCR/requirements.txt (line 8)) (16.7.9)Requirement already satisfied: cfgv>=2.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->-r PaddleOCR/requirements.txt (line 8)) (2.0.1)Requirement already satisfied: toml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->-r PaddleOCR/requirements.txt (line 8)) (0.10.0)Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->-r PaddleOCR/requirements.txt (line 8)) (2019.9.11)Requirement already satisfied: idna<2.9,>=2.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->-r PaddleOCR/requirements.txt (line 8)) (2.8)Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->-r PaddleOCR/requirements.txt (line 8)) (1.25.6)Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->-r PaddleOCR/requirements.txt (line 8)) (3.0.4)Requirement already satisfied: MarkupSafe>=0.23 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Jinja2>=2.10.1->flask>=1.1.1->visualdl->-r PaddleOCR/requirements.txt (line 8)) (1.1.1)Requirement already satisfied: zipp>=0.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from importlib-metadata->flake8>=3.7.9->visualdl->-r PaddleOCR/requirements.txt (line 8)) (3.6.0)Building wheels for collected packages: fasttext, python-LevenshteinBuilding wheel for fasttext (setup.py) ... [?25ldone[?25h Created wheel for fasttext: filename=fasttext-0.9.1-cp37-cp37m-linux_x86_64.whl size=2583202 sha256=d2aa164d5fb1d0f07a77998f9d84a0a44706eaf632b9e1fcdef24870116dd415Stored in directory: /home/aistudio/.cache/pip/wheels/a1/cb/b3/a25a8ce16c1a4ff102c1e40d6eaa4dfc9d5695b92d57331b36Building wheel for python-Levenshtein (setup.py) ... [?25ldone[?25h Created wheel for python-Levenshtein: filename=python_Levenshtein-0.12.2-cp37-cp37m-linux_x86_64.whl size=171690 sha256=150f6450ff1cb634660af63992fce7c4647174c3865c24f74d1946ec2dca1879Stored in directory: /home/aistudio/.cache/pip/wheels/38/b9/a4/3729726160fb103833de468adb5ce019b58543ae41d0b0e446Successfully built fasttext python-LevenshteinInstalling collected packages: numpy, tifffile, scipy, PyWavelets, shapely, scikit-image, pybind11, lxml, cssutils, cssselect, python-Levenshtein, pyclipper, premailer, opencv-contrib-python, lmdb, imgaug, fasttextAttempting uninstall: numpyFound existing installation: numpy 1.16.4Uninstalling numpy-1.16.4:Successfully uninstalled numpy-1.16.4Attempting uninstall: scipyFound existing installation: scipy 1.3.0Uninstalling scipy-1.3.0:Successfully uninstalled scipy-1.3.0[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.paddlefsl 1.0.0 requires numpy~=1.19.2, but you have numpy 1.21.5 which is incompatible.paddlefsl 1.0.0 requires pillow==8.2.0, but you have pillow 7.1.2 which is incompatible.paddlefsl 1.0.0 requires requests~=2.24.0, but you have requests 2.22.0 which is incompatible.blackhole 1.0.1 requires numpy<=1.19.5, but you have numpy 1.21.5 which is incompatible.[0mSuccessfully installed PyWavelets-1.2.0 cssselect-1.1.0 cssutils-2.3.0 fasttext-0.9.1 imgaug-0.4.0 lmdb-1.2.1 lxml-4.7.1 numpy-1.21.5 opencv-contrib-python-4.4.0.46 premailer-3.10.0 pybind11-2.8.1 pyclipper-1.3.0.post2 python-Levenshtein-0.12.2 scikit-image-0.19.1 scipy-1.7.3 shapely-1.8.0 tifffile-2021.11.2Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simpleRequirement already satisfied: pandas in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (1.1.5)Requirement already satisfied: pytz>=2017.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pandas) (2019.3)Requirement already satisfied: python-dateutil>=2.7.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pandas) (2.8.0)Requirement already satisfied: numpy>=1.15.4 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pandas) (1.21.5)Requirement already satisfied: six>=1.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas) (1.15.0)
安装完成后,通过下面命令即可快速完成表格的识别
# 切换到工作目录import osos.chdir('/home/aistudio/PaddleOCR/ppstructure')
# 下载模型! mkdir inference && cd inference# 下载超轻量级表格英文OCR模型的检测模型并解压! wget -P ./inference/ https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar && cd inference && tar xf ch_PP-OCRv2_det_infer.tar && cd ..# 下载超轻量级表格英文OCR模型的识别模型并解压! wget -P ./inference/ https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar && cd inference && tar xf ch_PP-OCRv2_rec_infer.tar && cd ..# 下载超轻量级英文表格英寸模型并解压! wget -P ./inference/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && cd inference && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar && cd ..
mkdir: cannot create directory ‘inference’: File exists--2021-12-25 20:46:49-- https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tarResolving paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)... 182.61.200.229, 182.61.200.195, 2409:8c04:1001:1002:0:ff:b001:368aConnecting to paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)|182.61.200.229|:443... connected.HTTP request sent, awaiting response... 200 OKLength: 3190272 (3.0M) [application/x-tar]Saving to: ‘./inference/ch_PP-OCRv2_det_infer.tar.2’ch_PP-OCRv2_det_inf 100%[===================>] 3.04M 6.77MB/s in 0.4s2021-12-25 20:46:49 (6.77 MB/s) - ‘./inference/ch_PP-OCRv2_det_infer.tar.2’ saved [3190272/3190272]--2021-12-25 20:46:50-- https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tarResolving paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)... 182.61.200.229, 182.61.200.195, 2409:8c04:1001:1002:0:ff:b001:368aConnecting to paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)|182.61.200.229|:443... connected.HTTP request sent, awaiting response... 200 OKLength: 8875520 (8.5M) [application/x-tar]Saving to: ‘./inference/ch_PP-OCRv2_rec_infer.tar.2’ch_PP-OCRv2_rec_inf 100%[===================>] 8.46M 12.7MB/s in 0.7s2021-12-25 20:46:50 (12.7 MB/s) - ‘./inference/ch_PP-OCRv2_rec_infer.tar.2’ saved [8875520/8875520]--2021-12-25 20:46:51-- https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tarResolving paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)... 182.61.200.229, 182.61.200.195, 2409:8c04:1001:1002:0:ff:b001:368aConnecting to paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)|182.61.200.229|:443... connected.HTTP request sent, awaiting response... 200 OKLength: 19667456 (19M) [application/x-tar]Saving to: ‘./inference/en_ppocr_mobile_v2.0_table_structure_infer.tar.2’en_ppocr_mobile_v2. 100%[===================>] 18.76M 21.0MB/s in 0.9s2021-12-25 20:46:52 (21.0 MB/s) - ‘./inference/en_ppocr_mobile_v2.0_table_structure_infer.tar.2’ saved [19667456/19667456]
# 先是输入图像import cv2from matplotlib import pyplot as plt%matplotlib inline# 读取表格图像并显示img = cv2.imread('/home/aistudio/1.jpg')plt.imshow(img)
<matplotlib.image.AxesImage at 0x7fab5d10c150>

# https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/table/predict_table.py#L55from table.predict_table import TableSystem,to_excelfrom utility import init_args# 初始化参数args = init_args().parse_args(args=[])args.det_model_dir='inference/ch_PP-OCRv2_det_infer'args.rec_model_dir='inference/ch_PP-OCRv2_rec_infer'args.table_model_dir='inference/en_ppocr_mobile_v2.0_table_structure_infer'args.image_dir='/home/aistudio/1.jpg'args.rec_char_dict_path='../ppocr/utils/ppocr_keys_v1.txt'args.table_char_dict_path='../ppocr/utils/dict/table_structure_dict.txt'args.det_limit_side_len=736args.det_limit_type='min'args.output='../output/table'args.use_gpu=False# 初始化表格识别系统table_sys = TableSystem(args)img = cv2.imread('/home/aistudio/1.jpg')# 执行表格识别pred_html = table_sys(img)# 结果存储到excel文件to_excel(pred_html,'1.xlsx')print(pred_html)
[2021/12/26 19:55:37] root DEBUG: dt_boxes num : 69, elapse : 2.900609254837036[2021/12/26 19:55:43] root DEBUG: rec_res num : 69, elapse : 5.992196321487427<html><body><table><thead><tr><td>代号</td><td>项目</td><td>结果</td><td>参考值</td><td>单位</td></tr></thead><tbody><tr><td>ALT</td><td>谷丙转氨酶</td><td>25.6</td><td>0--40</td><td>U/L</td></tr><tr><td>TBIL</td><td>总胆红素</td><td>11.2</td><td><20</td><td>UMOL/L</td></tr><tr><td>DBIL</td><td>直接胆红素</td><td>3.3</td><td>0--7</td><td>UMOL/L</td></tr><tr><td>IBIL</td><td>间接胆红素</td><td>7.9</td><td>1.5--15</td><td>UMOL/L</td></tr><tr><td>TP</td><td>总蛋白</td><td>58.9J</td><td>60--80</td><td>g/L</td></tr><tr><td>ALB</td><td>白蛋白</td><td>35.1</td><td>33--55</td><td>g/L</td></tr><tr><td>GLO</td><td>球蛋白</td><td>23.8</td><td>20--30</td><td>8/L</td></tr><tr><td>A/G</td><td>白球比</td><td>1.5</td><td>1.5--2.5</td><td></td></tr><tr><td>ALP</td><td>碱性磷酸酶</td><td>93</td><td>15--112</td><td>HUSL</td></tr><tr><td>GGT</td><td>谷氨酰转肽酶</td><td>14.3</td><td><50</td><td>U/L</td></tr><tr><td>AST</td><td>谷草转氨酶</td><td>16.3</td><td>8--40</td><td>W/L</td></tr><tr><td>LDH</td><td>乳酸脱氢酶</td><td>167</td><td>114--240</td><td>U/L</td></tr><tr><td>ADA</td><td>腺甘脱氨酶</td><td>12.6</td><td>4--24</td><td>U/L</td></tr></table></body></html>
# 读取excel并显示import pandas as pddf = pd.read_excel('1.xlsx').fillna('')print(df)
代号 项目 结果 参考值 单位0 ALT 谷丙转氨酶 25.6 0--40 U/L1 TBIL 总胆红素 11.2 <20 UMOL/L2 DBIL 直接胆红素 3.3 0--7 UMOL/L3 IBIL 间接胆红素 7.9 1.5--15 UMOL/L4 TP 总蛋白 58.9J 60--80 g/L5 ALB 白蛋白 35.1 33--55 g/L6 GLO 球蛋白 23.8 20--30 8/L7 A/G 白球比 1.5 1.5--2.58 ALP 碱性磷酸酶 93 15--112 HUSL9 GGT 谷氨酰转肽酶 14.3 <50 U/L10 AST 谷草转氨酶 16.3 8--40 W/L11 LDH 乳酸脱氢酶 167 114--240 U/L12 ADA 腺甘脱氨酶 12.6 4--24 U/L
2. 预测原理详解:
2.1 整体pipeline介绍
PP-Structure 的表格识别模型算法属于基于端到端的方法
表格识别算法由三个模型组成:
- 文字检测模型:用于检测表格里的文本
 - 文字识别模型:用于对检测到的文本进行识别
 - 表格单元格预测和表格结构预测模型:用于预测表格结构的HTML信息和表格单元格坐标
 
三个模型的串联过程如下图所示:

图 1:表格识别pipeline
具体过程为:
- 使用文字检测模型用于检测表格里的文本
 - 使用文字识别模型对检测到的文本进行识别,到这一步,我们拿到了文字的框和文字信息
 - 使用表格单元格预测和表格结构预测模型进行单元格坐标预测和表格结构的HTML信息预测
 - 对2中的文字框和3中的单元格坐标进行聚合,如下图所示,根据红色的文字检测框和蓝色的单元格坐标检测框之间的IOU进行判定是否需要聚合。
 - 在完成文本框聚合之后,对文本框进行一个从上到下,从左到右的排序,根据排序后文本框的索引即可拿到对应的文字信息,然后文字信息做一个字符串拼接即可得到最终单元格里的文本内容。
 
2.2 表格结构预测模型介绍
表格识别需要三个模型:文字检测,文字识别和表格结构识别模型。文字检测和识别模型在前面的课程中已经介绍,这里重点介绍表格结构预测模型。
表格结构预测模型完成了表格结构的预测和表格单元格坐标的检测。表格结构模型由RARE算法改动而来, 主要在下面几个方面进行了改动
2.2.1 输入数据
对于文字识别模型,数据集标注的每个字符的独立的,但是在表格结构预测模型中,要求预测的类别不是单个字符,下面是RARE和表格结构预测模型的词典对比:
| 模型 | 字典 | 
|---|---|
| RARE | '<', 's', 'u', 'p', '>', '<', '/', 's', 'u', 'b', '>', '<', 'b', '>', '<', '/', 'b', '>', '<', 'i', '>', '<', '/', 'i', '>’ | 
| 表格结构预测模型 | 'sos', '<thead>', '<tr>', '<td>', '</td>', '</tr>', '</thead>', '<tbody>', '</tbody>', '<td', ' colspan="5"', '>', ' colspan="2"', ' colspan="3"', ' rowspan="2"', ' colspan="4"', ' colspan="6"', ' rowspan="3"', ' colspan="9"', ' colspan="10"', ' colspan="7"', ' rowspan="4"', ' rowspan="5"', ' rowspan="9"', ' colspan="8"', ' rowspan="8"', ' rowspan="6"', ' rowspan="7"', ' rowspan="10"', 'eos' | 
在表格结构预测模型中,将<thead>这类字符串视为一个字符来进行识别。
2.2.2 模型
表格结构识别模型和EARE的对比图如下
图 3:表格结构识别模型示意图
RARE模型由TPS+CNN+RNN+AttentionHead组成,各个部分的主要作用如下:
- TPS:对弯曲的文本进行校正,使图像恢复到水平
 - CNN:对图像进行特征提取
 - RNN:对提取的特征进行进一步的增强,提取语义方面的特征
 - AttentionHead:进行输出
 
在表格结构识别模型中,输入的图像为整张图像,因此移除TPS模块,另外通过实验证明RNN对结果影响不大,因此移除RNN模块,最终表格结构识别模型的结构为CNN+AttentionHead。
为了输出单元格的坐标,尝试了再检测模型里进行单元格坐标的检测,在DB模型的基础上尝试了下面的方案2,3
| 方案 | 结果 | 
|---|---|
| 1. 单行文本检测 | ![]()  | 
| 2. 文本和单元格用一个模型检测 | ![]()  | 
| 3. 文本和单元格用两个模型检测 | ![]()  | 
可以看到,在分割模型中完成文本和单元格的检测会导致GT的奇义性: Cell里每一行之间背景的GT是文本还是背景?
在整个表格识别Pipeline的三个模型中,只有文字检测和表格结构识别模型能够获取到整张图像的信息,因此,在表格结构识别模型的AttentionHead中额外添加一个基于回归的分支来完成单元格的坐标(x0,y0,x1,y1)检测。
2.3 表格结构预测模型前向分析
模型前向分析通过分析图像输入从预处理到网络输出之间各个模块里的输出shape变化,来更好的了解表格单元格预测和表格结构预测模型,涉及到的模块如下:
| 类型 | 模块名称 | 
|---|---|
| 数据处理 | ResizeTableImage | 
| 数据处理 | PaddingTableImage | 
| Backbone | MobileNetV3 | 
| Head | TableAttentionHead | 
2.3.1 输入数据处理
本例中输入图像和数据处理模块输出可视化如下:
# 切换到PaddleOCR目录os.chdir('/home/aistudio/PaddleOCR')from ppocr.data import create_operators, transformplt.figure(figsize=(24,8))# 读取输入图像img = cv2.imread('/home/aistudio/1.jpg')# 显示输入图像plt.subplot(1,3,1)plt.title('src, shape:{}'.format(img.shape))plt.imshow(img)# 执行 ResizeTableImage# https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/data/imaug/gen_table_mask.py#L182pre_process_list = [{'ResizeTableImage': {'max_len': args.table_max_len }}] # 将图片长边缩放到指定长度,短边进行等比缩放preprocess_op = create_operators(pre_process_list)data = {'image': img}data = transform(data, preprocess_op)# 显示 ResizeTableImage 后的图像plt.subplot(1,3,2)plt.title('ResizeTableImage, shape:{}'.format(data['image'].shape))plt.imshow(data['image'])# 执行 PaddingTableImage# https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/data/imaug/gen_table_mask.py#L232pre_process_list = [{'PaddingTableImage': None}]preprocess_op = create_operators(pre_process_list)data = transform(data, preprocess_op)# 显示 PaddingTableImage 后的图像plt.subplot(1,3,3)plt.title('PaddingTableImage, shape:{}'.format(data['image'].shape))plt.imshow(data['image']/255)plt.show()# 定义完整的处理op列表pre_process_list = [{'ResizeTableImage': {'max_len': args.table_max_len }},{'NormalizeImage':{'scale':1./255., 'mean': [0.485, 0.456, 0.406],'std': [0.229, 0.224, 0.225], 'order': 'hwc'}},{'PaddingTableImage': None},{'ToCHWImage': None}]# 创建op列表preprocess_op = create_operators(pre_process_list)# 执行op列表data = {'image': img}data = transform(data, preprocess_op)

# 下载预训练模型! wget -P ./pre_train/ https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tar && cd pre_train && tar xf en_ppocr_mobile_v2.0_table_structure_train.tar && cd ..# 下载的预训练模型import paddle# 读取预训练参数,并分为 backbone 参数和 head 参数pretrain_params = paddle.load('/home/aistudio/PaddleOCR/pre_train/en_ppocr_mobile_v2.0_table_structure_train/best_accuracy.pdparams')def filter_params(pretrain_params,prefix):new_dict = {}for k,v in pretrain_params.items():if k.startswith(prefix):new_dict[k.replace(prefix+'.','')] = vreturn new_dict# 抽取参数backbone_dict = filter_params(pretrain_params,'backbone')head_dict = filter_params(pretrain_params,'head')
--2021-12-26 19:56:07-- https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tarResolving paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)... 182.61.200.229, 182.61.200.195, 2409:8c04:1001:1002:0:ff:b001:368aConnecting to paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)|182.61.200.229|:443... connected.HTTP request sent, awaiting response... 200 OKLength: 76103680 (73M) [application/x-tar]Saving to: ‘./pre_train/en_ppocr_mobile_v2.0_table_structure_train.tar.3’en_ppocr_mobile_v2. 100%[===================>] 72.58M 36.6MB/s in 2.0s2021-12-26 19:56:09 (36.6 MB/s) - ‘./pre_train/en_ppocr_mobile_v2.0_table_structure_train.tar.3’ saved [76103680/76103680]
2.3.2 Backbone
backbone和检测的backbone一致,均输出尺寸为输入图像 1/4,1/8,1/16和1/32 的四个特征图。相关backbone在文本检测章节已经介绍过了,这里不再赘述。
# https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/modeling/backbones/det_mobilenet_v3.pyfrom ppocr.modeling.backbones import build_backbone# 初始化 backbonebackbone = build_backbone(dict(name='MobileNetV3',scale=1.0,model_name='large'),model_type='table')backbone.eval()# 加载 backbone 参数backbone.set_state_dict(backbone_dict)
import numpy as npx = np.expand_dims(data['image'],axis=0)x = paddle.to_tensor(x)backbone_out = backbone(x)for item in backbone_out:print(item.shape)
[1, 24, 122, 122][1, 40, 61, 61][1, 112, 31, 31][1, 960, 16, 16]
2.3.3 Head
Head的输入为backbone输出的四个特征图,输出为表格结构和单元格坐标的预测结果
输入参数含义为:
| 参数 | 含义 | 
|---|---|
| in_channels | 输入特征图的通道数 | 
| hidden_size | Attention里RNN模块的隐藏层单元 | 
| max_elem_length | 最大预测字符的数量 | 
| in_max_len | 输入图像的尺寸 | 
| loc_type | 输出单元格坐标分支的输入 1:仅使用Attention后的隐藏层 2:融合CNN部分+Attention部分  | 
其代码如下
# https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/modeling/heads/table_att_head.pyfrom paddle import nnimport paddle.nn.functional as Ffrom ppocr.modeling.heads.table_att_head import AttentionGRUCellclass TableAttentionHead(nn.Layer):def __init__(self,in_channels,hidden_size,loc_type=2,in_max_len=488, # 输入图像的尺寸max_elem_length=800, # 输出的最大标签数量**kwargs):super(TableAttentionHead, self).__init__()self.input_size = in_channels[-1]self.hidden_size = hidden_sizeself.elem_num = 30self.max_elem_length = max_elem_lengthself.structure_attention_cell = AttentionGRUCell(self.input_size, hidden_size, self.elem_num, use_gru=False)self.structure_generator = nn.Linear(hidden_size, self.elem_num)self.loc_type = loc_typeself.in_max_len = in_max_len# 坐标框回归分支if self.loc_type == 1:self.loc_generator = nn.Linear(hidden_size, 4)else:if self.in_max_len == 640:# 640经过backbone后最后一个特征图为 20*20,因此这里输入的特征图大小为400self.loc_fea_trans = nn.Linear(400, self.max_elem_length + 1)elif self.in_max_len == 800:# 800 经过backbone后最后一个特征图为 23*25,因此这里输入的特征图大小为625self.loc_fea_trans = nn.Linear(625, self.max_elem_length + 1)elif self.in_max_len == 488:# 800 经过backbone后最后一个特征图为 16*16,因此这里输入的特征图大小为256self.loc_fea_trans = nn.Linear(256, self.max_elem_length + 1)self.loc_generator = nn.Linear(self.input_size + hidden_size, 4)def _char_to_onehot(self, input_char, onehot_dim):input_ont_hot = F.one_hot(input_char, onehot_dim)return input_ont_hotdef forward(self, inputs, targets=None):# 取出backbone输出的最小mapfea = inputs[-1]if len(fea.shape) == 3:passelse:# B,C,H,W reshape 为 B,C,H*Wlast_shape = int(np.prod(fea.shape[2:]))fea = paddle.reshape(fea, [fea.shape[0], fea.shape[1], last_shape])# B,C,W 改为 B,W,Cfea = fea.transpose([0, 2, 1])batch_size = fea.shape[0]hidden = paddle.zeros((batch_size, self.hidden_size))output_hiddens = []if self.training and targets is not None:structure = targets[0]for i in range(self.max_elem_length + 1):elem_onehots = self._char_to_onehot(structure[:, i], onehot_dim=self.elem_num)(outputs, hidden), alpha = self.structure_attention_cell(hidden, fea, elem_onehots)output_hiddens.append(paddle.unsqueeze(outputs, axis=1))output = paddle.concat(output_hiddens, axis=1)structure_probs = self.structure_generator(output)if self.loc_type == 1:loc_preds = self.loc_generator(output)loc_preds = F.sigmoid(loc_preds)else:loc_fea = fea.transpose([0, 2, 1])loc_fea = self.loc_fea_trans(loc_fea)loc_fea = loc_fea.transpose([0, 2, 1])loc_concat = paddle.concat([output, loc_fea], axis=2)loc_preds = self.loc_generator(loc_concat)loc_preds = F.sigmoid(loc_preds)else:temp_elem = paddle.zeros(shape=[batch_size], dtype="int32")structure_probs = Noneloc_preds = Noneelem_onehots = Noneoutputs = Nonealpha = Nonemax_elem_length = paddle.to_tensor(self.max_elem_length)i = 0# Attention forwardwhile i < max_elem_length + 1:elem_onehots = self._char_to_onehot(temp_elem, onehot_dim=self.elem_num)(outputs, hidden), alpha = self.structure_attention_cell(hidden, fea, elem_onehots)output_hiddens.append(paddle.unsqueeze(outputs, axis=1))structure_probs_step = self.structure_generator(outputs)temp_elem = structure_probs_step.argmax(axis=1, dtype="int32")i += 1output = paddle.concat(output_hiddens, axis=1)print('Attention output shape',output.shape)# 表格结构分支structure_probs = self.structure_generator(output)structure_probs = F.softmax(structure_probs)# 单元格坐标分支if self.loc_type == 1:loc_preds = self.loc_generator(output)loc_preds = F.sigmoid(loc_preds)else:# B,W,C 改为 B,C,Wloc_fea = fea.transpose([0, 2, 1])loc_fea = self.loc_fea_trans(loc_fea)loc_fea = loc_fea.transpose([0, 2, 1])loc_concat = paddle.concat([output, loc_fea], axis=2)loc_preds = self.loc_generator(loc_concat)loc_preds = F.sigmoid(loc_preds)return {'structure_probs': structure_probs, 'loc_preds': loc_preds}
# 初始化 headhead = TableAttentionHead(in_channels=backbone.out_channels,hidden_size=256,loc_type=2)head.eval()# 加载 head 参数head.set_state_dict(head_dict)# 执行 headprint('*'*10,'head forward shape','*'*10)head_out = head(backbone_out)print('*'*10,'head out shape','*'*10)# 打印 head 输出和对应的 shapefor key in head_out:print(key,head_out[key].shape)
********** head forward shape **********Attention output shape [1, 801, 256]********** head out shape **********structure_probs [1, 801, 30]loc_preds [1, 801, 4]
后处理
后处理的字典文件为 ppocr/utils/dict/table_structure_dict.txt
后处理解码思路:
- 对 structure_probs 进行CTC解码: 不要背景字符sos和eos,连续重复的字符只取一个
 - 输出的坐标为归一化到0-1的值,对坐标乘上图片宽高,decode到图像空间
```python
https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/postprocess/rec_postprocess.py#L441
 
from ppocr.postprocess.rec_postprocess import TableLabelDecode
def post_process(out): character_dict_path = ‘/home/aistudio/PaddleOCR/ppocr/utils/dict/table_structure_dict.txt’
# 初始化后处理 oppost_op = TableLabelDecode(character_dict_path)post_result = post_op(out)structure_str_list = post_result['structure_str_list']# 归一化的坐标回复到原图大小res_loc = post_result['res_loc']imgh, imgw = img.shape[0:2]res_loc_final = []for rno in range(len(res_loc[0])):x0, y0, x1, y1 = res_loc[0][rno]left = max(int(imgw * x0), 0)top = max(int(imgh * y0), 0)right = min(int(imgw * x1), imgw - 1)bottom = min(int(imgh * y1), imgh - 1)res_loc_final.append([left, top, right, bottom])# 结构信息处理structure_str_list = structure_str_list[0]structure_str_list = ['<html>', '<body>', '<table>'] + structure_str_list + ['</table>', '</body>', '</html>']return structure_str_list,res_loc_final
structure_str_list,res_loc_final = post_process(head_out)
print(structure_str_list) print(res_loc_final)
可视化预测 box
plt.figure(figsize=(24,8)) img_show = img.copy() for box in res_loc_final: cv2.rectangle(img_show, (box[0], box[1]), (box[2], box[3]), (0, 255, 0), 2) plt.imshow(img_show)
```['<html>', '<body>', '<table>', '<thead>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '</thead>', '<tbody>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '</tbody>', '</table>', '</body>', '</html>'][[32, 9, 104, 40], [232, 8, 307, 41], [429, 7, 500, 44], [559, 8, 656, 44], [715, 7, 780, 44], [37, 45, 99, 73], [190, 44, 342, 74], [432, 45, 502, 74], [565, 44, 655, 73], [712, 46, 777, 74], [30, 81, 101, 109], [202, 80, 337, 110], [433, 81, 503, 111], [578, 83, 638, 110], [698, 82, 790, 110], [31, 119, 104, 148], [197, 116, 347, 147], [443, 117, 492, 148], [572, 118, 643, 147], [698, 118, 797, 147], [35, 154, 101, 183], [199, 152, 342, 184], [436, 154, 501, 184], [558, 155, 670, 184], [701, 153, 801, 183], [40, 188, 93, 217], [217, 187, 314, 219], [417, 187, 516, 218], [556, 187, 667, 217], [716, 188, 772, 216], [48, 227, 98, 255], [223, 224, 313, 256], [429, 226, 500, 256], [558, 226, 667, 256], [722, 225, 772, 254], [47, 262, 99, 291], [217, 260, 313, 293], [439, 261, 506, 293], [557, 260, 678, 292], [722, 261, 777, 290], [36, 295, 95, 324], [210, 296, 317, 326], [443, 296, 499, 326], [547, 296, 681, 326], [701, 300, 767, 328], [42, 332, 99, 361], [191, 330, 350, 360], [451, 331, 493, 361], [557, 331, 683, 361], [717, 335, 785, 362], [45, 369, 97, 396], [186, 367, 355, 400], [444, 369, 504, 398], [581, 369, 640, 397], [723, 368, 773, 396], [37, 404, 95, 431], [192, 404, 351, 433], [438, 404, 511, 432], [560, 405, 658, 432], [723, 404, 775, 431], [46, 444, 104, 469], [188, 441, 346, 470], [444, 444, 496, 471], [544, 445, 681, 471], [721, 444, 773, 470], [35, 478, 104, 503], [190, 475, 345, 505], [436, 477, 504, 504], [559, 476, 662, 505], [712, 477, 778, 504]]<matplotlib.image.AxesImage at 0x7faad993b2d0>
3. 训练
训练表格识别,需要训练三个模型,分别为文本检测,文本识别,表格结构模型,文本检测和识别的训练可以参考之前课程,这里只介绍表格结构模型的训练过程。
本节以pubtabnet数据集、MobileNetV3作为骨干网络的表格结构模型模型介绍如何完成表格结构模型的训练、评估与测试。
3.1 数据准备
本次实验选取PubTabNet数据集作为我们的演示数据集。PubTabNet数据集的样例图如下图所示:
项目中已经下载了PubTabNet的部分数据集,存放在 /home/aistudio/data/data119702 中,可以运行如下指令完成数据集解压,或者从 https://github.com/ibm-aur-nlp/PubTabNet 中自行下载。
# 解压数据集! cd /home/aistudio/data/data119702 && tar -xf pubtabnet_val.tar && cd -! ls /home/aistudio/data/data119702
/home/aistudio/PaddleOCRPubTabNet_2.0.0_val.jsonl pubtabnet_val.tar val
运行上述指令后 /home/aistudio/data/data119702 有一个文件夹和一个文件,分别是:
/home/aistudio/data/data119702└─ val/ 图片存放文件夹└─ PubTabNet_2.0.0_val.jsonl/ 标注信息
该数据集的标注格式为
{'filename': PMC5755158_010_01.png, # 图像名'split': ’train‘, # 图像属于训练集还是验证集'imgid': 0, # 图像的index'html': {'structure': {'tokens': ['<thead>', '<tr>', '<td>', ...]}, # 表格的HTML字符串'cell': [{'tokens': ['P', 'a', 'd', 'd', 'l', 'e', 'P', 'a', 'd', 'd', 'l', 'e'], # 表格中的单个文本'bbox': [x0, y0, x1, y1] # 表格中的单个文本的坐标}]}}
3.2 数据预处理
练时对输入图片的格式、大小有一定的要求。所以,在数据输入模型前,需要对数据进行预处理操作,使得图片和标签满足网络训练和预测的需要。
表格结构模型的数据预处理主要班刊下面几个:
- DecodeImage,将图像转为Numpy格式
 - ResizeTableImage,对图片进行resize,长边resize到指定尺寸,短边等比例缩放
 - TableLabelEncode,解析标注文件中的标签信息,并按统一格式进行保存
 - NormalizeImage,通过规范化手段,把神经网络每层中任意神经元的输入值分布改变成均值为0,方差为1的标准正太分布,使得最优解的寻优过程明显会变得平缓,训练过程更容易收敛;
 - PaddingTableImage,对图像的短边进pad,将其pad到和长边一样的尺寸
 - ToCHWImage,图像的数据格式为[H, W, C](即高度、宽度和通道数),而神经网络使用的训练数据的格式为[C, H, W],因此需要对图像数据重新排列,例如[224, 224, 3]变为[3, 224, 224];
 - KeepKeys,dict过滤
 
TableLabelEncode
解析标签文件中的标签信息,首先加载标注数据并取出一条标注
# 加载数据集里的一条数据import jsonfrom pprint import pprintwith open('/home/aistudio/data/data119702/PubTabNet_2.0.0_val.jsonl', "rb") as f:data_lines = f.readlines()for line in data_lines:data_line = line.decode('utf-8').strip("\n")info = json.loads(data_line)break
运行下述代码观察 TableLabelEncode 类编码标签前后的对比。
from ppocr.data.imaug import TableLabelEncode# 初始化 label 编码器label_eocoder_op = TableLabelEncode(max_text_length=100,# 未使用max_elem_length=50, # 每张图最多预测多少个celmax_cell_num=500, # 未使用character_dict_path='ppocr/utils/dict/table_structure_dict.txt')# 构建输入数据cells = info['html']['cells']structure = info['html']['structure']# 2. 打印解码前的标签print("The cells and structure before decode")print("cells: ", cells)print("structure: ", structure)image = cv2.imread(os.path.join('/home/aistudio/data/data119702/val', info['filename']))data = {'image':image,'cells': cells, 'structure':structure}# 执行 label 编码器data = label_eocoder_op(data)# 打印编码后的信息print("The bbox_list and structure after decode")print("bbox_list:",data['bbox_list'].tolist())print("structure:", data['structure'].tolist())
The cells and structure before decodecells: [{'tokens': []}, {'tokens': ['<b>', 'W', 'e', 'a', 'n', 'i', 'n', 'g', '</b>'], 'bbox': [66, 4, 96, 13]}, {'tokens': ['<b>', 'W', 'e', 'e', 'k', ' ', '1', '5', '</b>'], 'bbox': [131, 4, 160, 13]}, {'tokens': ['<b>', 'O', 'f', 'f', '-', 't', 'e', 's', 't', '</b>'], 'bbox': [201, 4, 226, 13]}, {'tokens': ['W', 'e', 'a', 'n', 'i', 'n', 'g'], 'bbox': [1, 17, 31, 26]}, {'tokens': ['–'], 'bbox': [66, 21, 72, 25]}, {'tokens': ['–'], 'bbox': [131, 21, 137, 25]}, {'tokens': ['–'], 'bbox': [201, 21, 207, 25]}, {'tokens': ['W', 'e', 'e', 'k', ' ', '1', '5'], 'bbox': [1, 31, 30, 40]}, {'tokens': ['–'], 'bbox': [66, 35, 72, 39]}, {'tokens': ['0', '.', '1', '7', ' ', '±', ' ', '0', '.', '0', '8'], 'bbox': [131, 31, 166, 40]}, {'tokens': ['0', '.', '1', '6', ' ', '±', ' ', '0', '.', '0', '3'], 'bbox': [201, 31, 236, 40]}, {'tokens': ['O', 'f', 'f', '-', 't', 'e', 's', 't'], 'bbox': [1, 45, 26, 54]}, {'tokens': ['–'], 'bbox': [66, 49, 72, 53]}, {'tokens': ['0', '.', '8', '0', ' ', '±', ' ', '0', '.', '2', '4'], 'bbox': [131, 45, 166, 54]}, {'tokens': ['0', '.', '1', '9', ' ', '±', ' ', '0', '.', '0', '9'], 'bbox': [201, 45, 236, 54]}]structure: {'tokens': ['<thead>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '</thead>', '<tbody>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '</tbody>']}The bbox_list and structure after decodebbox_list: [[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.27731093764305115, 0.06779661029577255, 0.40336135029792786, 0.22033898532390594], [0.0, 0.0, 0.0, 0.0], [0.5504201650619507, 0.06779661029577255, 0.6722689270973206, 0.22033898532390594], [0.0, 0.0, 0.0, 0.0], [0.8445377945899963, 0.06779661029577255, 0.9495798349380493, 0.22033898532390594], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.004201680887490511, 0.2881355881690979, 0.13025210797786713, 0.4406779706478119], [0.0, 0.0, 0.0, 0.0], [0.27731093764305115, 0.35593220591545105, 0.3025210201740265, 0.4237288236618042], [0.0, 0.0, 0.0, 0.0], [0.5504201650619507, 0.35593220591545105, 0.575630247592926, 0.4237288236618042], [0.0, 0.0, 0.0, 0.0], [0.8445377945899963, 0.35593220591545105, 0.8697478771209717, 0.4237288236618042], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.004201680887490511, 0.5254237055778503, 0.1260504275560379, 0.6779661178588867], [0.0, 0.0, 0.0, 0.0], [0.27731093764305115, 0.5932203531265259, 0.3025210201740265, 0.6610169410705566], [0.0, 0.0, 0.0, 0.0], [0.5504201650619507, 0.5254237055778503, 0.6974790096282959, 0.6779661178588867], [0.0, 0.0, 0.0, 0.0], [0.8445377945899963, 0.5254237055778503, 0.9915966391563416, 0.6779661178588867], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.004201680887490511, 0.7627118825912476, 0.10924369841814041, 0.9152542352676392], [0.0, 0.0, 0.0, 0.0], [0.27731093764305115, 0.8305084705352783, 0.3025210201740265, 0.8983050584793091], [0.0, 0.0, 0.0, 0.0], [0.5504201650619507, 0.7627118825912476, 0.6974790096282959, 0.9152542352676392], [0.0, 0.0, 0.0, 0.0], [0.8445377945899963, 0.7627118825912476, 0.9915966391563416, 0.9152542352676392], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0]]structure: [0, 1, 2, 3, 4, 3, 4, 3, 4, 3, 4, 5, 6, 7, 2, 3, 4, 3, 4, 3, 4, 3, 4, 5, 2, 3, 4, 3, 4, 3, 4, 3, 4, 5, 2, 3, 4, 3, 4, 3, 4, 3, 4, 5, 8, 29, 0, 0, 0, 0, 0, 0]
3.3 损失函数定义
模型的loss分为两部分:
- structure loss: structure loss使用分类常见的 CrossEntropyLoss
 - loc loss: loc loss使用MSELoss
 
两个loss通过加权进行融合, 在代码汇总structure_weight和loc_weight分别为100和10000
total_loss = structure_loss * structure_weight + loc_loss * loc_weight
3.4 模型训练
完成数据处理和损失函数定义后即可开始训练模型了。
训练基于PaddleOCR训练,采用参数配置的形式,参数文件参考 https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/configs/table/table_mv3.yml,网络结构参数如下
Architecture:model_type: tablealgorithm: TableAttnBackbone:name: MobileNetV3scale: 1.0model_name: largeHead:name: TableAttentionHeadhidden_size: 256loc_type: 2max_text_length: 100max_elem_length: 800max_cell_num: 500
损失函数参数如下:
Loss:name: TableAttentionLossstructure_weight: 100.0loc_weight: 10000.0
配置完成后通过下面命令即可开启训练
# 配置数据集# !mkdir -p train_data/table/pubtabnet!cd train_data/table/pubtabnet && ln -s /home/aistudio/data/data119702/PubTabNet_2.0.0_val.jsonl PubTabNet_2.0.0_train.jsonl \&& ln -s /home/aistudio/data/data119702/PubTabNet_2.0.0_val.jsonl PubTabNet_2.0.0_val.jsonl \&& ln -s /home/aistudio/data/data119702/val train \&& ln -s /home/aistudio/data/data119702/val val
ln: failed to create symbolic link 'PubTabNet_2.0.0_train.jsonl': File exists
! python tools/train.py -c configs/table/table_mv3.yml -o Global.use_gpu=False Global.print_batch_step=1 Train.loader.batch_size_per_card=1 Eval.loader.batch_size_per_card=1
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative usesimport imp[2021/12/26 19:57:29] root INFO: Architecture :[2021/12/26 19:57:29] root INFO: Backbone :[2021/12/26 19:57:29] root INFO: model_name : large[2021/12/26 19:57:29] root INFO: name : MobileNetV3[2021/12/26 19:57:29] root INFO: scale : 1.0[2021/12/26 19:57:29] root INFO: Head :[2021/12/26 19:57:29] root INFO: hidden_size : 256[2021/12/26 19:57:29] root INFO: l2_decay : 1e-05[2021/12/26 19:57:29] root INFO: loc_type : 2[2021/12/26 19:57:29] root INFO: max_cell_num : 500[2021/12/26 19:57:29] root INFO: max_elem_length : 800[2021/12/26 19:57:29] root INFO: max_text_length : 100[2021/12/26 19:57:29] root INFO: name : TableAttentionHead[2021/12/26 19:57:29] root INFO: algorithm : TableAttn[2021/12/26 19:57:29] root INFO: model_type : table[2021/12/26 19:57:29] root INFO: Eval :[2021/12/26 19:57:29] root INFO: dataset :[2021/12/26 19:57:29] root INFO: data_dir : train_data/table/pubtabnet/val/[2021/12/26 19:57:29] root INFO: label_file_path : train_data/table/pubtabnet/PubTabNet_2.0.0_val.jsonl[2021/12/26 19:57:29] root INFO: name : PubTabDataSet[2021/12/26 19:57:29] root INFO: transforms :[2021/12/26 19:57:29] root INFO: DecodeImage :[2021/12/26 19:57:29] root INFO: channel_first : False[2021/12/26 19:57:29] root INFO: img_mode : BGR[2021/12/26 19:57:29] root INFO: ResizeTableImage :[2021/12/26 19:57:29] root INFO: max_len : 488[2021/12/26 19:57:29] root INFO: TableLabelEncode : None[2021/12/26 19:57:29] root INFO: NormalizeImage :[2021/12/26 19:57:29] root INFO: mean : [0.485, 0.456, 0.406][2021/12/26 19:57:29] root INFO: order : hwc[2021/12/26 19:57:29] root INFO: scale : 1./255.[2021/12/26 19:57:29] root INFO: std : [0.229, 0.224, 0.225][2021/12/26 19:57:29] root INFO: PaddingTableImage : None[2021/12/26 19:57:29] root INFO: ToCHWImage : None[2021/12/26 19:57:29] root INFO: KeepKeys :[2021/12/26 19:57:29] root INFO: keep_keys : ['image', 'structure', 'bbox_list', 'sp_tokens', 'bbox_list_mask'][2021/12/26 19:57:29] root INFO: loader :[2021/12/26 19:57:29] root INFO: batch_size_per_card : 1[2021/12/26 19:57:29] root INFO: drop_last : False[2021/12/26 19:57:29] root INFO: num_workers : 1[2021/12/26 19:57:29] root INFO: shuffle : False[2021/12/26 19:57:29] root INFO: Global :[2021/12/26 19:57:29] root INFO: cal_metric_during_train : True[2021/12/26 19:57:29] root INFO: character_dict_path : ppocr/utils/dict/table_structure_dict.txt[2021/12/26 19:57:29] root INFO: character_type : en[2021/12/26 19:57:29] root INFO: checkpoints : None[2021/12/26 19:57:29] root INFO: debug : False[2021/12/26 19:57:29] root INFO: distributed : False[2021/12/26 19:57:29] root INFO: epoch_num : 400[2021/12/26 19:57:29] root INFO: eval_batch_step : [0, 400][2021/12/26 19:57:29] root INFO: infer_img : doc/table/table.jpg[2021/12/26 19:57:29] root INFO: infer_mode : False[2021/12/26 19:57:29] root INFO: log_smooth_window : 20[2021/12/26 19:57:29] root INFO: max_cell_num : 500[2021/12/26 19:57:29] root INFO: max_elem_length : 800[2021/12/26 19:57:29] root INFO: max_text_length : 100[2021/12/26 19:57:29] root INFO: pretrained_model : None[2021/12/26 19:57:29] root INFO: print_batch_step : 1[2021/12/26 19:57:29] root INFO: process_cut_num : 0[2021/12/26 19:57:29] root INFO: process_total_num : 0[2021/12/26 19:57:29] root INFO: save_epoch_step : 3[2021/12/26 19:57:29] root INFO: save_inference_dir : None[2021/12/26 19:57:29] root INFO: save_model_dir : ./output/table_mv3/[2021/12/26 19:57:29] root INFO: use_gpu : False[2021/12/26 19:57:29] root INFO: use_visualdl : False[2021/12/26 19:57:29] root INFO: Loss :[2021/12/26 19:57:29] root INFO: loc_weight : 10000.0[2021/12/26 19:57:29] root INFO: name : TableAttentionLoss[2021/12/26 19:57:29] root INFO: structure_weight : 100.0[2021/12/26 19:57:29] root INFO: Metric :[2021/12/26 19:57:29] root INFO: main_indicator : acc[2021/12/26 19:57:29] root INFO: name : TableMetric[2021/12/26 19:57:29] root INFO: Optimizer :[2021/12/26 19:57:29] root INFO: beta1 : 0.9[2021/12/26 19:57:29] root INFO: beta2 : 0.999[2021/12/26 19:57:29] root INFO: clip_norm : 5.0[2021/12/26 19:57:29] root INFO: lr :[2021/12/26 19:57:29] root INFO: learning_rate : 0.001[2021/12/26 19:57:29] root INFO: name : Adam[2021/12/26 19:57:29] root INFO: regularizer :[2021/12/26 19:57:29] root INFO: factor : 0.0[2021/12/26 19:57:29] root INFO: name : L2[2021/12/26 19:57:29] root INFO: PostProcess :[2021/12/26 19:57:29] root INFO: name : TableLabelDecode[2021/12/26 19:57:29] root INFO: Train :[2021/12/26 19:57:29] root INFO: dataset :[2021/12/26 19:57:29] root INFO: data_dir : train_data/table/pubtabnet/train/[2021/12/26 19:57:29] root INFO: label_file_path : train_data/table/pubtabnet/PubTabNet_2.0.0_train.jsonl[2021/12/26 19:57:29] root INFO: name : PubTabDataSet[2021/12/26 19:57:29] root INFO: transforms :[2021/12/26 19:57:29] root INFO: DecodeImage :[2021/12/26 19:57:29] root INFO: channel_first : False[2021/12/26 19:57:29] root INFO: img_mode : BGR[2021/12/26 19:57:29] root INFO: ResizeTableImage :[2021/12/26 19:57:29] root INFO: max_len : 488[2021/12/26 19:57:29] root INFO: TableLabelEncode : None[2021/12/26 19:57:29] root INFO: NormalizeImage :[2021/12/26 19:57:29] root INFO: mean : [0.485, 0.456, 0.406][2021/12/26 19:57:29] root INFO: order : hwc[2021/12/26 19:57:29] root INFO: scale : 1./255.[2021/12/26 19:57:29] root INFO: std : [0.229, 0.224, 0.225][2021/12/26 19:57:29] root INFO: PaddingTableImage : None[2021/12/26 19:57:29] root INFO: ToCHWImage : None[2021/12/26 19:57:29] root INFO: KeepKeys :[2021/12/26 19:57:29] root INFO: keep_keys : ['image', 'structure', 'bbox_list', 'sp_tokens', 'bbox_list_mask'][2021/12/26 19:57:29] root INFO: loader :[2021/12/26 19:57:29] root INFO: batch_size_per_card : 1[2021/12/26 19:57:29] root INFO: drop_last : True[2021/12/26 19:57:29] root INFO: num_workers : 1[2021/12/26 19:57:29] root INFO: shuffle : True[2021/12/26 19:57:29] root INFO: profiler_options : None[2021/12/26 19:57:29] root INFO: train with paddle 2.2.1 and device CPUPlace[2021/12/26 19:57:29] root INFO: Initialize indexs of datasets:train_data/table/pubtabnet/PubTabNet_2.0.0_train.jsonl[2021/12/26 19:57:29] root INFO: Initialize indexs of datasets:train_data/table/pubtabnet/PubTabNet_2.0.0_val.jsonl[2021/12/26 19:57:29] root INFO: train from scratch[2021/12/26 19:57:29] root INFO: train dataloader has 9115 iters[2021/12/26 19:57:29] root INFO: valid dataloader has 9115 iters[2021/12/26 19:57:29] root INFO: During the training process, after the 0th iteration, an evaluation is run every 400 iterations[2021/12/26 19:57:29] root INFO: Initialize indexs of datasets:train_data/table/pubtabnet/PubTabNet_2.0.0_train.jsonl[2021/12/26 19:57:47] root INFO: epoch: [1/400], iter: 1, lr: 0.001000, loss: 358.711182, structure_loss: 277.904785, loc_loss: 80.806374, acc: 0.000000, reader_cost: 0.05254 s, batch_cost: 17.39120 s, samples: 2, ips: 0.11500[2021/12/26 19:57:55] root INFO: epoch: [1/400], iter: 2, lr: 0.001000, loss: 353.381165, structure_loss: 208.200623, loc_loss: 137.825607, acc: 0.000000, reader_cost: 0.00041 s, batch_cost: 8.65134 s, samples: 1, ips: 0.11559^Cmain proc 431 exit, kill process group 417main proc 417 exit, kill process group 417
3.5 模型评估
训练过程中,默认保存两种模型,一种是latest命名的最新训练的模型,一种是best_accuracy命名的精度最高的模型。接下来使用保存的模型参数评估在测试集上的准确率:
表格结构模型的精度评估代码位于PaddleOCR/ppocr/metrics/table_metric.py中,调用tools/eval.py即可进行对训练好的模型做精度评估。
!python tools/eval.py -c configs/table/table_mv3.yml -o Global.checkpoints=/home/aistudio/PaddleOCR/pre_train/en_ppocr_mobile_v2.0_table_structure_train/best_accuracy Global.use_gpu=False Eval.loader.batch_size_per_card=1
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative usesimport imp[2021/12/26 20:00:08] root INFO: Architecture :[2021/12/26 20:00:08] root INFO: Backbone :[2021/12/26 20:00:08] root INFO: model_name : large[2021/12/26 20:00:08] root INFO: name : MobileNetV3[2021/12/26 20:00:08] root INFO: scale : 1.0[2021/12/26 20:00:08] root INFO: Head :[2021/12/26 20:00:08] root INFO: hidden_size : 256[2021/12/26 20:00:08] root INFO: l2_decay : 1e-05[2021/12/26 20:00:08] root INFO: loc_type : 2[2021/12/26 20:00:08] root INFO: max_cell_num : 500[2021/12/26 20:00:08] root INFO: max_elem_length : 800[2021/12/26 20:00:08] root INFO: max_text_length : 100[2021/12/26 20:00:08] root INFO: name : TableAttentionHead[2021/12/26 20:00:08] root INFO: algorithm : TableAttn[2021/12/26 20:00:08] root INFO: model_type : table[2021/12/26 20:00:08] root INFO: Eval :[2021/12/26 20:00:08] root INFO: dataset :[2021/12/26 20:00:08] root INFO: data_dir : train_data/table/pubtabnet/val/[2021/12/26 20:00:08] root INFO: label_file_path : train_data/table/pubtabnet/PubTabNet_2.0.0_val.jsonl[2021/12/26 20:00:08] root INFO: name : PubTabDataSet[2021/12/26 20:00:08] root INFO: transforms :[2021/12/26 20:00:08] root INFO: DecodeImage :[2021/12/26 20:00:08] root INFO: channel_first : False[2021/12/26 20:00:08] root INFO: img_mode : BGR[2021/12/26 20:00:08] root INFO: ResizeTableImage :[2021/12/26 20:00:08] root INFO: max_len : 488[2021/12/26 20:00:08] root INFO: TableLabelEncode : None[2021/12/26 20:00:08] root INFO: NormalizeImage :[2021/12/26 20:00:08] root INFO: mean : [0.485, 0.456, 0.406][2021/12/26 20:00:08] root INFO: order : hwc[2021/12/26 20:00:08] root INFO: scale : 1./255.[2021/12/26 20:00:08] root INFO: std : [0.229, 0.224, 0.225][2021/12/26 20:00:08] root INFO: PaddingTableImage : None[2021/12/26 20:00:08] root INFO: ToCHWImage : None[2021/12/26 20:00:08] root INFO: KeepKeys :[2021/12/26 20:00:08] root INFO: keep_keys : ['image', 'structure', 'bbox_list', 'sp_tokens', 'bbox_list_mask'][2021/12/26 20:00:08] root INFO: loader :[2021/12/26 20:00:08] root INFO: batch_size_per_card : 1[2021/12/26 20:00:08] root INFO: drop_last : False[2021/12/26 20:00:08] root INFO: num_workers : 1[2021/12/26 20:00:08] root INFO: shuffle : False[2021/12/26 20:00:08] root INFO: Global :[2021/12/26 20:00:08] root INFO: cal_metric_during_train : True[2021/12/26 20:00:08] root INFO: character_dict_path : ppocr/utils/dict/table_structure_dict.txt[2021/12/26 20:00:08] root INFO: character_type : en[2021/12/26 20:00:08] root INFO: checkpoints : /home/aistudio/PaddleOCR/pre_train/en_ppocr_mobile_v2.0_table_structure_train/best_accuracy[2021/12/26 20:00:08] root INFO: debug : False[2021/12/26 20:00:08] root INFO: distributed : False[2021/12/26 20:00:08] root INFO: epoch_num : 400[2021/12/26 20:00:08] root INFO: eval_batch_step : [0, 400][2021/12/26 20:00:08] root INFO: infer_img : doc/table/table.jpg[2021/12/26 20:00:08] root INFO: infer_mode : False[2021/12/26 20:00:08] root INFO: log_smooth_window : 20[2021/12/26 20:00:08] root INFO: max_cell_num : 500[2021/12/26 20:00:08] root INFO: max_elem_length : 800[2021/12/26 20:00:08] root INFO: max_text_length : 100[2021/12/26 20:00:08] root INFO: pretrained_model : None[2021/12/26 20:00:08] root INFO: print_batch_step : 5[2021/12/26 20:00:08] root INFO: process_cut_num : 0[2021/12/26 20:00:08] root INFO: process_total_num : 0[2021/12/26 20:00:08] root INFO: save_epoch_step : 3[2021/12/26 20:00:08] root INFO: save_inference_dir : None[2021/12/26 20:00:08] root INFO: save_model_dir : ./output/table_mv3/[2021/12/26 20:00:08] root INFO: use_gpu : False[2021/12/26 20:00:08] root INFO: use_visualdl : False[2021/12/26 20:00:08] root INFO: Loss :[2021/12/26 20:00:08] root INFO: loc_weight : 10000.0[2021/12/26 20:00:08] root INFO: name : TableAttentionLoss[2021/12/26 20:00:08] root INFO: structure_weight : 100.0[2021/12/26 20:00:08] root INFO: Metric :[2021/12/26 20:00:08] root INFO: main_indicator : acc[2021/12/26 20:00:08] root INFO: name : TableMetric[2021/12/26 20:00:08] root INFO: Optimizer :[2021/12/26 20:00:08] root INFO: beta1 : 0.9[2021/12/26 20:00:08] root INFO: beta2 : 0.999[2021/12/26 20:00:08] root INFO: clip_norm : 5.0[2021/12/26 20:00:08] root INFO: lr :[2021/12/26 20:00:08] root INFO: learning_rate : 0.001[2021/12/26 20:00:08] root INFO: name : Adam[2021/12/26 20:00:08] root INFO: regularizer :[2021/12/26 20:00:08] root INFO: factor : 0.0[2021/12/26 20:00:08] root INFO: name : L2[2021/12/26 20:00:08] root INFO: PostProcess :[2021/12/26 20:00:08] root INFO: name : TableLabelDecode[2021/12/26 20:00:08] root INFO: Train :[2021/12/26 20:00:08] root INFO: dataset :[2021/12/26 20:00:08] root INFO: data_dir : train_data/table/pubtabnet/train/[2021/12/26 20:00:08] root INFO: label_file_path : train_data/table/pubtabnet/PubTabNet_2.0.0_train.jsonl[2021/12/26 20:00:08] root INFO: name : PubTabDataSet[2021/12/26 20:00:08] root INFO: transforms :[2021/12/26 20:00:08] root INFO: DecodeImage :[2021/12/26 20:00:08] root INFO: channel_first : False[2021/12/26 20:00:08] root INFO: img_mode : BGR[2021/12/26 20:00:08] root INFO: ResizeTableImage :[2021/12/26 20:00:08] root INFO: max_len : 488[2021/12/26 20:00:08] root INFO: TableLabelEncode : None[2021/12/26 20:00:08] root INFO: NormalizeImage :[2021/12/26 20:00:08] root INFO: mean : [0.485, 0.456, 0.406][2021/12/26 20:00:08] root INFO: order : hwc[2021/12/26 20:00:08] root INFO: scale : 1./255.[2021/12/26 20:00:08] root INFO: std : [0.229, 0.224, 0.225][2021/12/26 20:00:08] root INFO: PaddingTableImage : None[2021/12/26 20:00:08] root INFO: ToCHWImage : None[2021/12/26 20:00:08] root INFO: KeepKeys :[2021/12/26 20:00:08] root INFO: keep_keys : ['image', 'structure', 'bbox_list', 'sp_tokens', 'bbox_list_mask'][2021/12/26 20:00:08] root INFO: loader :[2021/12/26 20:00:08] root INFO: batch_size_per_card : 32[2021/12/26 20:00:08] root INFO: drop_last : True[2021/12/26 20:00:08] root INFO: num_workers : 1[2021/12/26 20:00:08] root INFO: shuffle : True[2021/12/26 20:00:08] root INFO: profiler_options : None[2021/12/26 20:00:08] root INFO: train with paddle 2.2.1 and device CPUPlace[2021/12/26 20:00:08] root INFO: Initialize indexs of datasets:train_data/table/pubtabnet/PubTabNet_2.0.0_val.jsonl[2021/12/26 20:00:08] root INFO: resume from /home/aistudio/PaddleOCR/pre_train/en_ppocr_mobile_v2.0_table_structure_train/best_accuracy[2021/12/26 20:00:08] root INFO: metric in ckpt ***************[2021/12/26 20:00:08] root INFO: acc:0.7380142622051563[2021/12/26 20:00:08] root INFO: fps:8.360272547972942[2021/12/26 20:00:08] root INFO: best_epoch:7[2021/12/26 20:00:08] root INFO: start_epoch:8eval model:: 0%| | 2/9115 [00:07<8:55:26, 3.53s/it]^Cmain proc 602 exit, kill process group 576main proc 576 exit, kill process group 576
3.6 模型预测
训练好模型后,也可以使用保存好的模型,对单张图片或者某个文件夹的图像进行模型推理,观察模型预测效果。
! python tools/infer_table.py -c configs/table/table_mv3.yml -o Global.checkpoints=/home/aistudio/PaddleOCR/pre_train/en_ppocr_mobile_v2.0_table_structure_train/best_accuracy Global.infer_img=/home/aistudio/1.jpg Global.use_gpu=False
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative usesimport imp[2021/12/26 20:00:22] root INFO: Architecture :[2021/12/26 20:00:22] root INFO: Backbone :[2021/12/26 20:00:22] root INFO: model_name : large[2021/12/26 20:00:22] root INFO: name : MobileNetV3[2021/12/26 20:00:22] root INFO: scale : 1.0[2021/12/26 20:00:22] root INFO: Head :[2021/12/26 20:00:22] root INFO: hidden_size : 256[2021/12/26 20:00:22] root INFO: l2_decay : 1e-05[2021/12/26 20:00:22] root INFO: loc_type : 2[2021/12/26 20:00:22] root INFO: max_cell_num : 500[2021/12/26 20:00:22] root INFO: max_elem_length : 800[2021/12/26 20:00:22] root INFO: max_text_length : 100[2021/12/26 20:00:22] root INFO: name : TableAttentionHead[2021/12/26 20:00:22] root INFO: algorithm : TableAttn[2021/12/26 20:00:22] root INFO: model_type : table[2021/12/26 20:00:22] root INFO: Eval :[2021/12/26 20:00:22] root INFO: dataset :[2021/12/26 20:00:22] root INFO: data_dir : train_data/table/pubtabnet/val/[2021/12/26 20:00:22] root INFO: label_file_path : train_data/table/pubtabnet/PubTabNet_2.0.0_val.jsonl[2021/12/26 20:00:22] root INFO: name : PubTabDataSet[2021/12/26 20:00:22] root INFO: transforms :[2021/12/26 20:00:22] root INFO: DecodeImage :[2021/12/26 20:00:22] root INFO: channel_first : False[2021/12/26 20:00:22] root INFO: img_mode : BGR[2021/12/26 20:00:22] root INFO: ResizeTableImage :[2021/12/26 20:00:22] root INFO: max_len : 488[2021/12/26 20:00:22] root INFO: TableLabelEncode : None[2021/12/26 20:00:22] root INFO: NormalizeImage :[2021/12/26 20:00:22] root INFO: mean : [0.485, 0.456, 0.406][2021/12/26 20:00:22] root INFO: order : hwc[2021/12/26 20:00:22] root INFO: scale : 1./255.[2021/12/26 20:00:22] root INFO: std : [0.229, 0.224, 0.225][2021/12/26 20:00:22] root INFO: PaddingTableImage : None[2021/12/26 20:00:22] root INFO: ToCHWImage : None[2021/12/26 20:00:22] root INFO: KeepKeys :[2021/12/26 20:00:22] root INFO: keep_keys : ['image', 'structure', 'bbox_list', 'sp_tokens', 'bbox_list_mask'][2021/12/26 20:00:22] root INFO: loader :[2021/12/26 20:00:22] root INFO: batch_size_per_card : 16[2021/12/26 20:00:22] root INFO: drop_last : False[2021/12/26 20:00:22] root INFO: num_workers : 1[2021/12/26 20:00:22] root INFO: shuffle : False[2021/12/26 20:00:22] root INFO: Global :[2021/12/26 20:00:22] root INFO: cal_metric_during_train : True[2021/12/26 20:00:22] root INFO: character_dict_path : ppocr/utils/dict/table_structure_dict.txt[2021/12/26 20:00:22] root INFO: character_type : en[2021/12/26 20:00:22] root INFO: checkpoints : /home/aistudio/PaddleOCR/pre_train/en_ppocr_mobile_v2.0_table_structure_train/best_accuracy[2021/12/26 20:00:22] root INFO: debug : False[2021/12/26 20:00:22] root INFO: distributed : False[2021/12/26 20:00:22] root INFO: epoch_num : 400[2021/12/26 20:00:22] root INFO: eval_batch_step : [0, 400][2021/12/26 20:00:22] root INFO: infer_img : /home/aistudio/1.jpg[2021/12/26 20:00:22] root INFO: infer_mode : False[2021/12/26 20:00:22] root INFO: log_smooth_window : 20[2021/12/26 20:00:22] root INFO: max_cell_num : 500[2021/12/26 20:00:22] root INFO: max_elem_length : 800[2021/12/26 20:00:22] root INFO: max_text_length : 100[2021/12/26 20:00:22] root INFO: pretrained_model : None[2021/12/26 20:00:22] root INFO: print_batch_step : 5[2021/12/26 20:00:22] root INFO: process_cut_num : 0[2021/12/26 20:00:22] root INFO: process_total_num : 0[2021/12/26 20:00:22] root INFO: save_epoch_step : 3[2021/12/26 20:00:22] root INFO: save_inference_dir : None[2021/12/26 20:00:22] root INFO: save_model_dir : ./output/table_mv3/[2021/12/26 20:00:22] root INFO: use_gpu : False[2021/12/26 20:00:22] root INFO: use_visualdl : False[2021/12/26 20:00:22] root INFO: Loss :[2021/12/26 20:00:22] root INFO: loc_weight : 10000.0[2021/12/26 20:00:22] root INFO: name : TableAttentionLoss[2021/12/26 20:00:22] root INFO: structure_weight : 100.0[2021/12/26 20:00:22] root INFO: Metric :[2021/12/26 20:00:22] root INFO: main_indicator : acc[2021/12/26 20:00:22] root INFO: name : TableMetric[2021/12/26 20:00:22] root INFO: Optimizer :[2021/12/26 20:00:22] root INFO: beta1 : 0.9[2021/12/26 20:00:22] root INFO: beta2 : 0.999[2021/12/26 20:00:22] root INFO: clip_norm : 5.0[2021/12/26 20:00:22] root INFO: lr :[2021/12/26 20:00:22] root INFO: learning_rate : 0.001[2021/12/26 20:00:22] root INFO: name : Adam[2021/12/26 20:00:22] root INFO: regularizer :[2021/12/26 20:00:22] root INFO: factor : 0.0[2021/12/26 20:00:22] root INFO: name : L2[2021/12/26 20:00:22] root INFO: PostProcess :[2021/12/26 20:00:22] root INFO: name : TableLabelDecode[2021/12/26 20:00:22] root INFO: Train :[2021/12/26 20:00:22] root INFO: dataset :[2021/12/26 20:00:22] root INFO: data_dir : train_data/table/pubtabnet/train/[2021/12/26 20:00:22] root INFO: label_file_path : train_data/table/pubtabnet/PubTabNet_2.0.0_train.jsonl[2021/12/26 20:00:22] root INFO: name : PubTabDataSet[2021/12/26 20:00:22] root INFO: transforms :[2021/12/26 20:00:22] root INFO: DecodeImage :[2021/12/26 20:00:22] root INFO: channel_first : False[2021/12/26 20:00:22] root INFO: img_mode : BGR[2021/12/26 20:00:22] root INFO: ResizeTableImage :[2021/12/26 20:00:22] root INFO: max_len : 488[2021/12/26 20:00:22] root INFO: TableLabelEncode : None[2021/12/26 20:00:22] root INFO: NormalizeImage :[2021/12/26 20:00:22] root INFO: mean : [0.485, 0.456, 0.406][2021/12/26 20:00:22] root INFO: order : hwc[2021/12/26 20:00:22] root INFO: scale : 1./255.[2021/12/26 20:00:22] root INFO: std : [0.229, 0.224, 0.225][2021/12/26 20:00:22] root INFO: PaddingTableImage : None[2021/12/26 20:00:22] root INFO: ToCHWImage : None[2021/12/26 20:00:22] root INFO: KeepKeys :[2021/12/26 20:00:22] root INFO: keep_keys : ['image', 'structure', 'bbox_list', 'sp_tokens', 'bbox_list_mask'][2021/12/26 20:00:22] root INFO: loader :[2021/12/26 20:00:22] root INFO: batch_size_per_card : 32[2021/12/26 20:00:22] root INFO: drop_last : True[2021/12/26 20:00:22] root INFO: num_workers : 1[2021/12/26 20:00:22] root INFO: shuffle : True[2021/12/26 20:00:22] root INFO: profiler_options : None[2021/12/26 20:00:22] root INFO: train with paddle 2.2.1 and device CPUPlace[2021/12/26 20:00:22] root INFO: resume from /home/aistudio/PaddleOCR/pre_train/en_ppocr_mobile_v2.0_table_structure_train/best_accuracy[2021/12/26 20:00:22] root INFO: infer_img: /home/aistudio/1.jpg[2021/12/26 20:00:26] root INFO: result: ['<thead><tr><td></td><td></td><td></td><td></td><td></td></tr></thead><tbody><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr></tbody>'], [[32, 9, 104, 40], [232, 8, 307, 41], [429, 7, 500, 44], [559, 8, 656, 44], [715, 7, 780, 44], [37, 45, 99, 73], [190, 44, 342, 74], [432, 45, 502, 74], [565, 44, 655, 73], [712, 46, 777, 74], [30, 81, 101, 109], [202, 80, 337, 110], [433, 81, 503, 111], [578, 83, 638, 110], [698, 82, 790, 110], [31, 119, 104, 148], [197, 116, 347, 147], [443, 117, 492, 148], [572, 118, 643, 147], [698, 118, 797, 147], [35, 154, 101, 183], [199, 152, 342, 184], [436, 154, 501, 184], [558, 155, 670, 184], [701, 153, 801, 183], [40, 188, 93, 217], [217, 187, 314, 219], [417, 187, 516, 218], [556, 187, 667, 217], [716, 188, 772, 216], [48, 227, 98, 255], [223, 224, 313, 256], [429, 226, 500, 256], [558, 226, 667, 256], [722, 225, 772, 254], [47, 262, 99, 291], [217, 260, 313, 293], [439, 261, 506, 293], [557, 260, 678, 292], [722, 261, 777, 290], [36, 295, 95, 324], [210, 296, 317, 326], [443, 296, 499, 326], [547, 296, 681, 326], [701, 300, 767, 328], [42, 332, 99, 361], [191, 330, 350, 360], [451, 331, 493, 361], [557, 331, 683, 361], [717, 335, 785, 362], [45, 369, 97, 396], [186, 367, 355, 400], [444, 369, 504, 398], [581, 369, 640, 397], [723, 368, 773, 396], [37, 404, 95, 431], [192, 404, 351, 433], [438, 404, 511, 432], [560, 405, 658, 432], [723, 404, 775, 431], [46, 444, 104, 469], [188, 441, 346, 470], [444, 444, 496, 471], [544, 445, 681, 471], [721, 444, 773, 470], [35, 478, 104, 503], [190, 475, 345, 505], [436, 477, 504, 504], [559, 476, 662, 505], [712, 477, 778, 504]][2021/12/26 20:00:26] root INFO: success!
4 总结
本节介绍了PaddleOCR中 PP-Structure 表格识别算法的原理,并且介绍了表格结构模型从数据处理到完成训练的过程。
5. 作业
https://aistudio.baidu.com/aistudio/education/objective/28711




