OCR表格识别实战

本节将介绍如何使用PaddleOCR完成表格识别算法的训练与运行,包括:

  1. 理解表格识别算法原理
  2. 掌握PaddleOCR表格识别代码的训练和预测流程

1. 快速体验

快速演示 PP-Structure 预测,首先下载PaddleOCR代码并安装依赖包

  1. # clone PaddleOCR代码
  2. # ! git clone https://github.com/PaddlePaddle/PaddleOCR
  3. # 安装依赖包
  4. ! pip install -U pip
  5. ! pip install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
  6. ! pip install -r PaddleOCR/requirements.txt
  7. ! pip install pandas
  1. Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
  2. Collecting pip
  3. [?25l Downloading https://pypi.tuna.tsinghua.edu.cn/packages/a4/6d/6463d49a933f547439d6b5b98b46af8742cc03ae83543e4d7688c2420f8b/pip-21.3.1-py3-none-any.whl (1.7MB)
  4.  |████████████████████████████████| 1.7MB 11.7MB/s eta 0:00:01
  5. [?25hInstalling collected packages: pip
  6. Found existing installation: pip 19.2.3
  7. Uninstalling pip-19.2.3:
  8. Successfully uninstalled pip-19.2.3
  9. Successfully installed pip-21.3.1
  10. Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
  11. Collecting layoutparser==0.0.0
  12. Downloading https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl (19.1 MB)
  13. |████████████████████████████████| 19.1 MB 3.1 MB/s
  14. [?25hCollecting iopath
  15. Downloading https://pypi.tuna.tsinghua.edu.cn/packages/af/20/65dd9bd25a1eb7fa35b5ae38d289126af065f8a0c1f6a90564f4bff0f89d/iopath-0.1.9-py3-none-any.whl (27 kB)
  16. Requirement already satisfied: pyyaml>=5.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from layoutparser==0.0.0) (5.1.2)
  17. Requirement already satisfied: pandas in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from layoutparser==0.0.0) (1.1.5)
  18. Requirement already satisfied: opencv-python in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from layoutparser==0.0.0) (4.1.1.26)
  19. Requirement already satisfied: numpy in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from layoutparser==0.0.0) (1.16.4)
  20. Requirement already satisfied: pillow in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from layoutparser==0.0.0) (7.1.2)
  21. Requirement already satisfied: tqdm in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from layoutparser==0.0.0) (4.27.0)
  22. Collecting portalocker
  23. Downloading https://pypi.tuna.tsinghua.edu.cn/packages/63/eb/f84872af6e9312ea2f345b218015a41191cfd37eeba4a4fd228f241c2a75/portalocker-2.3.2-py2.py3-none-any.whl (15 kB)
  24. Requirement already satisfied: pytz>=2017.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pandas->layoutparser==0.0.0) (2019.3)
  25. Requirement already satisfied: python-dateutil>=2.7.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pandas->layoutparser==0.0.0) (2.8.0)
  26. Requirement already satisfied: six>=1.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas->layoutparser==0.0.0) (1.15.0)
  27. Installing collected packages: portalocker, iopath, layoutparser
  28. Successfully installed iopath-0.1.9 layoutparser-0.0.0 portalocker-2.3.2
  29. Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
  30. Collecting shapely
  31. Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ae/20/33ce377bd24d122a4d54e22ae2c445b9b1be8240edb50040b40add950cd9/Shapely-1.8.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.1 MB)
  32. |████████████████████████████████| 1.1 MB 19.2 MB/s
  33. [?25hCollecting scikit-image
  34. Downloading https://pypi.tuna.tsinghua.edu.cn/packages/9a/44/8f8c7f9c9de7fde70587a656d7df7d056e6f05192a74491f7bc074a724d0/scikit_image-0.19.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (13.3 MB)
  35. |████████████████████████████████| 13.3 MB 64.3 MB/s
  36. [?25hCollecting imgaug==0.4.0
  37. Downloading https://pypi.tuna.tsinghua.edu.cn/packages/66/b1/af3142c4a85cba6da9f4ebb5ff4e21e2616309552caca5e8acefe9840622/imgaug-0.4.0-py2.py3-none-any.whl (948 kB)
  38. |████████████████████████████████| 948 kB 51.8 MB/s
  39. [?25hCollecting pyclipper
  40. Downloading https://pypi.tuna.tsinghua.edu.cn/packages/c5/fa/2c294127e4f88967149a68ad5b3e43636e94e3721109572f8f17ab15b772/pyclipper-1.3.0.post2-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (603 kB)
  41. |████████████████████████████████| 603 kB 62.1 MB/s
  42. [?25hCollecting lmdb
  43. Downloading https://pypi.tuna.tsinghua.edu.cn/packages/2e/dd/ada2fd91cd7832979069c556607903f274470c3d3d2274e0a848908272e8/lmdb-1.2.1-cp37-cp37m-manylinux2010_x86_64.whl (299 kB)
  44. |████████████████████████████████| 299 kB 65.7 MB/s
  45. [?25hRequirement already satisfied: tqdm in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r PaddleOCR/requirements.txt (line 6)) (4.27.0)
  46. Requirement already satisfied: numpy in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r PaddleOCR/requirements.txt (line 7)) (1.16.4)
  47. Requirement already satisfied: visualdl in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r PaddleOCR/requirements.txt (line 8)) (2.2.0)
  48. Collecting python-Levenshtein
  49. Downloading https://pypi.tuna.tsinghua.edu.cn/packages/2a/dc/97f2b63ef0fa1fd78dcb7195aca577804f6b2b51e712516cc0e902a9a201/python-Levenshtein-0.12.2.tar.gz (50 kB)
  50. |████████████████████████████████| 50 kB 12.2 MB/s
  51. [?25h Preparing metadata (setup.py) ... [?25ldone
  52. [?25hCollecting opencv-contrib-python==4.4.0.46
  53. Downloading https://pypi.tuna.tsinghua.edu.cn/packages/08/51/1e0a206dd5c70fea91084e6f43979dc13e8eb175760cc7a105083ec3eb68/opencv_contrib_python-4.4.0.46-cp37-cp37m-manylinux2014_x86_64.whl (55.7 MB)
  54. |████████████████████████████████| 55.7 MB 46 kB/s
  55. [?25hRequirement already satisfied: cython in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r PaddleOCR/requirements.txt (line 11)) (0.29)
  56. Collecting lxml
  57. Downloading https://pypi.tuna.tsinghua.edu.cn/packages/7b/01/16a9b80c8ce4339294bb944f08e157dbfcfbb09ba9031bde4ddf7e3e5499/lxml-4.7.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (6.4 MB)
  58. |████████████████████████████████| 6.4 MB 59.5 MB/s
  59. [?25hCollecting premailer
  60. Downloading https://pypi.tuna.tsinghua.edu.cn/packages/b1/07/4e8d94f94c7d41ca5ddf8a9695ad87b888104e2fd41a35546c1dc9ca74ac/premailer-3.10.0-py2.py3-none-any.whl (19 kB)
  61. Requirement already satisfied: openpyxl in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r PaddleOCR/requirements.txt (line 14)) (3.0.5)
  62. Collecting fasttext==0.9.1
  63. Downloading https://pypi.tuna.tsinghua.edu.cn/packages/10/61/2e01f1397ec533756c1d893c22d9d5ed3fce3a6e4af1976e0d86bb13ea97/fasttext-0.9.1.tar.gz (57 kB)
  64. |████████████████████████████████| 57 kB 7.7 MB/s
  65. [?25h Preparing metadata (setup.py) ... [?25ldone
  66. [?25hRequirement already satisfied: Pillow in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->-r PaddleOCR/requirements.txt (line 3)) (7.1.2)
  67. Requirement already satisfied: opencv-python in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->-r PaddleOCR/requirements.txt (line 3)) (4.1.1.26)
  68. Requirement already satisfied: scipy in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->-r PaddleOCR/requirements.txt (line 3)) (1.3.0)
  69. Requirement already satisfied: imageio in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->-r PaddleOCR/requirements.txt (line 3)) (2.6.1)
  70. Requirement already satisfied: six in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->-r PaddleOCR/requirements.txt (line 3)) (1.15.0)
  71. Requirement already satisfied: matplotlib in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from imgaug==0.4.0->-r PaddleOCR/requirements.txt (line 3)) (2.2.3)
  72. Collecting pybind11>=2.2
  73. Using cached https://pypi.tuna.tsinghua.edu.cn/packages/a8/3b/fc246e1d4c7547a7a07df830128e93c6215e9b93dcb118b2a47a70726153/pybind11-2.8.1-py2.py3-none-any.whl (208 kB)
  74. Requirement already satisfied: setuptools>=0.7.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from fasttext==0.9.1->-r PaddleOCR/requirements.txt (line 15)) (41.4.0)
  75. Collecting tifffile>=2019.7.26
  76. Downloading https://pypi.tuna.tsinghua.edu.cn/packages/d8/38/85ae5ed77598ca90558c17a2f79ddaba33173b31cf8d8f545d34d9134f0d/tifffile-2021.11.2-py3-none-any.whl (178 kB)
  77. |████████████████████████████████| 178 kB 78.8 MB/s
  78. [?25hCollecting numpy
  79. Downloading https://pypi.tuna.tsinghua.edu.cn/packages/50/46/292cff79f5b30151b027400efdb3f740ea03271b600751b6696cf550c10d/numpy-1.21.5-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.7 MB)
  80. |████████████████████████████████| 15.7 MB 59.0 MB/s
  81. [?25hCollecting PyWavelets>=1.1.1
  82. Downloading https://pypi.tuna.tsinghua.edu.cn/packages/a1/9c/564511b6e1c4e1d835ed2d146670436036960d09339a8fa2921fe42dad08/PyWavelets-1.2.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (6.1 MB)
  83. |████████████████████████████████| 6.1 MB 61.8 MB/s
  84. [?25hRequirement already satisfied: networkx>=2.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-image->-r PaddleOCR/requirements.txt (line 2)) (2.4)
  85. Requirement already satisfied: packaging>=20.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-image->-r PaddleOCR/requirements.txt (line 2)) (20.9)
  86. Collecting scipy
  87. Downloading https://pypi.tuna.tsinghua.edu.cn/packages/58/4f/11f34cfc57ead25752a7992b069c36f5d18421958ebd6466ecd849aeaf86/scipy-1.7.3-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (38.1 MB)
  88. |████████████████████████████████| 38.1 MB 64.5 MB/s
  89. [?25hRequirement already satisfied: Flask-Babel>=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r PaddleOCR/requirements.txt (line 8)) (1.0.0)
  90. Requirement already satisfied: pre-commit in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r PaddleOCR/requirements.txt (line 8)) (1.21.0)
  91. Requirement already satisfied: flask>=1.1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r PaddleOCR/requirements.txt (line 8)) (1.1.1)
  92. Requirement already satisfied: bce-python-sdk in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r PaddleOCR/requirements.txt (line 8)) (0.8.53)
  93. Requirement already satisfied: protobuf>=3.11.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r PaddleOCR/requirements.txt (line 8)) (3.14.0)
  94. Requirement already satisfied: requests in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r PaddleOCR/requirements.txt (line 8)) (2.22.0)
  95. Requirement already satisfied: shellcheck-py in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r PaddleOCR/requirements.txt (line 8)) (0.7.1.1)
  96. Requirement already satisfied: pandas in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r PaddleOCR/requirements.txt (line 8)) (1.1.5)
  97. Requirement already satisfied: flake8>=3.7.9 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->-r PaddleOCR/requirements.txt (line 8)) (3.8.2)
  98. Collecting cssselect
  99. Downloading https://pypi.tuna.tsinghua.edu.cn/packages/3b/d4/3b5c17f00cce85b9a1e6f91096e1cc8e8ede2e1be8e96b87ce1ed09e92c5/cssselect-1.1.0-py2.py3-none-any.whl (16 kB)
  100. Requirement already satisfied: cachetools in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from premailer->-r PaddleOCR/requirements.txt (line 13)) (4.0.0)
  101. Collecting cssutils
  102. Downloading https://pypi.tuna.tsinghua.edu.cn/packages/24/c4/9db28fe567612896d360ab28ad02ee8ae107d0e92a22db39affd3fba6212/cssutils-2.3.0-py3-none-any.whl (404 kB)
  103. |████████████████████████████████| 404 kB 56.5 MB/s
  104. [?25hRequirement already satisfied: et-xmlfile in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from openpyxl->-r PaddleOCR/requirements.txt (line 14)) (1.0.1)
  105. Requirement already satisfied: jdcal in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from openpyxl->-r PaddleOCR/requirements.txt (line 14)) (1.4.1)
  106. Requirement already satisfied: importlib-metadata in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->-r PaddleOCR/requirements.txt (line 8)) (0.23)
  107. Requirement already satisfied: pycodestyle<2.7.0,>=2.6.0a1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->-r PaddleOCR/requirements.txt (line 8)) (2.6.0)
  108. Requirement already satisfied: mccabe<0.7.0,>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->-r PaddleOCR/requirements.txt (line 8)) (0.6.1)
  109. Requirement already satisfied: pyflakes<2.3.0,>=2.2.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->-r PaddleOCR/requirements.txt (line 8)) (2.2.0)
  110. Requirement already satisfied: Werkzeug>=0.15 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->-r PaddleOCR/requirements.txt (line 8)) (0.16.0)
  111. Requirement already satisfied: click>=5.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->-r PaddleOCR/requirements.txt (line 8)) (7.0)
  112. Requirement already satisfied: itsdangerous>=0.24 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->-r PaddleOCR/requirements.txt (line 8)) (1.1.0)
  113. Requirement already satisfied: Jinja2>=2.10.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->-r PaddleOCR/requirements.txt (line 8)) (2.11.0)
  114. Requirement already satisfied: Babel>=2.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Flask-Babel>=1.0.0->visualdl->-r PaddleOCR/requirements.txt (line 8)) (2.8.0)
  115. Requirement already satisfied: pytz in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Flask-Babel>=1.0.0->visualdl->-r PaddleOCR/requirements.txt (line 8)) (2019.3)
  116. Requirement already satisfied: decorator>=4.3.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from networkx>=2.2->scikit-image->-r PaddleOCR/requirements.txt (line 2)) (4.4.2)
  117. Requirement already satisfied: pyparsing>=2.0.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from packaging>=20.0->scikit-image->-r PaddleOCR/requirements.txt (line 2)) (2.4.2)
  118. Requirement already satisfied: future>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from bce-python-sdk->visualdl->-r PaddleOCR/requirements.txt (line 8)) (0.18.0)
  119. Requirement already satisfied: pycryptodome>=3.8.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from bce-python-sdk->visualdl->-r PaddleOCR/requirements.txt (line 8)) (3.9.9)
  120. Requirement already satisfied: cycler>=0.10 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->imgaug==0.4.0->-r PaddleOCR/requirements.txt (line 3)) (0.10.0)
  121. Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->imgaug==0.4.0->-r PaddleOCR/requirements.txt (line 3)) (1.1.0)
  122. Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from matplotlib->imgaug==0.4.0->-r PaddleOCR/requirements.txt (line 3)) (2.8.0)
  123. Requirement already satisfied: nodeenv>=0.11.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->-r PaddleOCR/requirements.txt (line 8)) (1.3.4)
  124. Requirement already satisfied: aspy.yaml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->-r PaddleOCR/requirements.txt (line 8)) (1.3.0)
  125. Requirement already satisfied: identify>=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->-r PaddleOCR/requirements.txt (line 8)) (1.4.10)
  126. Requirement already satisfied: pyyaml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->-r PaddleOCR/requirements.txt (line 8)) (5.1.2)
  127. Requirement already satisfied: virtualenv>=15.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->-r PaddleOCR/requirements.txt (line 8)) (16.7.9)
  128. Requirement already satisfied: cfgv>=2.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->-r PaddleOCR/requirements.txt (line 8)) (2.0.1)
  129. Requirement already satisfied: toml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->-r PaddleOCR/requirements.txt (line 8)) (0.10.0)
  130. Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->-r PaddleOCR/requirements.txt (line 8)) (2019.9.11)
  131. Requirement already satisfied: idna<2.9,>=2.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->-r PaddleOCR/requirements.txt (line 8)) (2.8)
  132. Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->-r PaddleOCR/requirements.txt (line 8)) (1.25.6)
  133. Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->-r PaddleOCR/requirements.txt (line 8)) (3.0.4)
  134. Requirement already satisfied: MarkupSafe>=0.23 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Jinja2>=2.10.1->flask>=1.1.1->visualdl->-r PaddleOCR/requirements.txt (line 8)) (1.1.1)
  135. Requirement already satisfied: zipp>=0.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from importlib-metadata->flake8>=3.7.9->visualdl->-r PaddleOCR/requirements.txt (line 8)) (3.6.0)
  136. Building wheels for collected packages: fasttext, python-Levenshtein
  137. Building wheel for fasttext (setup.py) ... [?25ldone
  138. [?25h Created wheel for fasttext: filename=fasttext-0.9.1-cp37-cp37m-linux_x86_64.whl size=2583202 sha256=d2aa164d5fb1d0f07a77998f9d84a0a44706eaf632b9e1fcdef24870116dd415
  139. Stored in directory: /home/aistudio/.cache/pip/wheels/a1/cb/b3/a25a8ce16c1a4ff102c1e40d6eaa4dfc9d5695b92d57331b36
  140. Building wheel for python-Levenshtein (setup.py) ... [?25ldone
  141. [?25h Created wheel for python-Levenshtein: filename=python_Levenshtein-0.12.2-cp37-cp37m-linux_x86_64.whl size=171690 sha256=150f6450ff1cb634660af63992fce7c4647174c3865c24f74d1946ec2dca1879
  142. Stored in directory: /home/aistudio/.cache/pip/wheels/38/b9/a4/3729726160fb103833de468adb5ce019b58543ae41d0b0e446
  143. Successfully built fasttext python-Levenshtein
  144. Installing collected packages: numpy, tifffile, scipy, PyWavelets, shapely, scikit-image, pybind11, lxml, cssutils, cssselect, python-Levenshtein, pyclipper, premailer, opencv-contrib-python, lmdb, imgaug, fasttext
  145. Attempting uninstall: numpy
  146. Found existing installation: numpy 1.16.4
  147. Uninstalling numpy-1.16.4:
  148. Successfully uninstalled numpy-1.16.4
  149. Attempting uninstall: scipy
  150. Found existing installation: scipy 1.3.0
  151. Uninstalling scipy-1.3.0:
  152. Successfully uninstalled scipy-1.3.0
  153. ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
  154. paddlefsl 1.0.0 requires numpy~=1.19.2, but you have numpy 1.21.5 which is incompatible.
  155. paddlefsl 1.0.0 requires pillow==8.2.0, but you have pillow 7.1.2 which is incompatible.
  156. paddlefsl 1.0.0 requires requests~=2.24.0, but you have requests 2.22.0 which is incompatible.
  157. blackhole 1.0.1 requires numpy<=1.19.5, but you have numpy 1.21.5 which is incompatible.
  158. Successfully installed PyWavelets-1.2.0 cssselect-1.1.0 cssutils-2.3.0 fasttext-0.9.1 imgaug-0.4.0 lmdb-1.2.1 lxml-4.7.1 numpy-1.21.5 opencv-contrib-python-4.4.0.46 premailer-3.10.0 pybind11-2.8.1 pyclipper-1.3.0.post2 python-Levenshtein-0.12.2 scikit-image-0.19.1 scipy-1.7.3 shapely-1.8.0 tifffile-2021.11.2
  159. Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
  160. Requirement already satisfied: pandas in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (1.1.5)
  161. Requirement already satisfied: pytz>=2017.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pandas) (2019.3)
  162. Requirement already satisfied: python-dateutil>=2.7.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pandas) (2.8.0)
  163. Requirement already satisfied: numpy>=1.15.4 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pandas) (1.21.5)
  164. Requirement already satisfied: six>=1.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas) (1.15.0)

安装完成后,通过下面命令即可快速完成表格的识别

  1. # 切换到工作目录
  2. import os
  3. os.chdir('/home/aistudio/PaddleOCR/ppstructure')
  1. # 下载模型
  2. ! mkdir inference && cd inference
  3. # 下载超轻量级表格英文OCR模型的检测模型并解压
  4. ! wget -P ./inference/ https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar && cd inference && tar xf ch_PP-OCRv2_det_infer.tar && cd ..
  5. # 下载超轻量级表格英文OCR模型的识别模型并解压
  6. ! wget -P ./inference/ https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar && cd inference && tar xf ch_PP-OCRv2_rec_infer.tar && cd ..
  7. # 下载超轻量级英文表格英寸模型并解压
  8. ! wget -P ./inference/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar && cd inference && tar xf en_ppocr_mobile_v2.0_table_structure_infer.tar && cd ..
  1. mkdir: cannot create directory inference’: File exists
  2. --2021-12-25 20:46:49-- https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar
  3. Resolving paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)... 182.61.200.229, 182.61.200.195, 2409:8c04:1001:1002:0:ff:b001:368a
  4. Connecting to paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)|182.61.200.229|:443... connected.
  5. HTTP request sent, awaiting response... 200 OK
  6. Length: 3190272 (3.0M) [application/x-tar]
  7. Saving to: ‘./inference/ch_PP-OCRv2_det_infer.tar.2
  8. ch_PP-OCRv2_det_inf 100%[===================>] 3.04M 6.77MB/s in 0.4s
  9. 2021-12-25 20:46:49 (6.77 MB/s) - ‘./inference/ch_PP-OCRv2_det_infer.tar.2 saved [3190272/3190272]
  10. --2021-12-25 20:46:50-- https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar
  11. Resolving paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)... 182.61.200.229, 182.61.200.195, 2409:8c04:1001:1002:0:ff:b001:368a
  12. Connecting to paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)|182.61.200.229|:443... connected.
  13. HTTP request sent, awaiting response... 200 OK
  14. Length: 8875520 (8.5M) [application/x-tar]
  15. Saving to: ‘./inference/ch_PP-OCRv2_rec_infer.tar.2
  16. ch_PP-OCRv2_rec_inf 100%[===================>] 8.46M 12.7MB/s in 0.7s
  17. 2021-12-25 20:46:50 (12.7 MB/s) - ‘./inference/ch_PP-OCRv2_rec_infer.tar.2 saved [8875520/8875520]
  18. --2021-12-25 20:46:51-- https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar
  19. Resolving paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)... 182.61.200.229, 182.61.200.195, 2409:8c04:1001:1002:0:ff:b001:368a
  20. Connecting to paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)|182.61.200.229|:443... connected.
  21. HTTP request sent, awaiting response... 200 OK
  22. Length: 19667456 (19M) [application/x-tar]
  23. Saving to: ‘./inference/en_ppocr_mobile_v2.0_table_structure_infer.tar.2
  24. en_ppocr_mobile_v2. 100%[===================>] 18.76M 21.0MB/s in 0.9s
  25. 2021-12-25 20:46:52 (21.0 MB/s) - ‘./inference/en_ppocr_mobile_v2.0_table_structure_infer.tar.2 saved [19667456/19667456]
  1. # 先是输入图像
  2. import cv2
  3. from matplotlib import pyplot as plt
  4. %matplotlib inline
  5. # 读取表格图像并显示
  6. img = cv2.imread('/home/aistudio/1.jpg')
  7. plt.imshow(img)
  1. <matplotlib.image.AxesImage at 0x7fab5d10c150>

image.png

  1. # https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppstructure/table/predict_table.py#L55
  2. from table.predict_table import TableSystem,to_excel
  3. from utility import init_args
  4. # 初始化参数
  5. args = init_args().parse_args(args=[])
  6. args.det_model_dir='inference/ch_PP-OCRv2_det_infer'
  7. args.rec_model_dir='inference/ch_PP-OCRv2_rec_infer'
  8. args.table_model_dir='inference/en_ppocr_mobile_v2.0_table_structure_infer'
  9. args.image_dir='/home/aistudio/1.jpg'
  10. args.rec_char_dict_path='../ppocr/utils/ppocr_keys_v1.txt'
  11. args.table_char_dict_path='../ppocr/utils/dict/table_structure_dict.txt'
  12. args.det_limit_side_len=736
  13. args.det_limit_type='min'
  14. args.output='../output/table'
  15. args.use_gpu=False
  16. # 初始化表格识别系统
  17. table_sys = TableSystem(args)
  18. img = cv2.imread('/home/aistudio/1.jpg')
  19. # 执行表格识别
  20. pred_html = table_sys(img)
  21. # 结果存储到excel文件
  22. to_excel(pred_html,'1.xlsx')
  23. print(pred_html)
  1. [2021/12/26 19:55:37] root DEBUG: dt_boxes num : 69, elapse : 2.900609254837036
  2. [2021/12/26 19:55:43] root DEBUG: rec_res num : 69, elapse : 5.992196321487427
  3. <html><body><table><thead><tr><td>代号</td><td>项目</td><td>结果</td><td>参考值</td><td>单位</td></tr></thead><tbody><tr><td>ALT</td><td>谷丙转氨酶</td><td>25.6</td><td>0--40</td><td>U/L</td></tr><tr><td>TBIL</td><td>总胆红素</td><td>11.2</td><td><20</td><td>UMOL/L</td></tr><tr><td>DBIL</td><td>直接胆红素</td><td>3.3</td><td>0--7</td><td>UMOL/L</td></tr><tr><td>IBIL</td><td>间接胆红素</td><td>7.9</td><td>1.5--15</td><td>UMOL/L</td></tr><tr><td>TP</td><td>总蛋白</td><td>58.9J</td><td>60--80</td><td>g/L</td></tr><tr><td>ALB</td><td>白蛋白</td><td>35.1</td><td>33--55</td><td>g/L</td></tr><tr><td>GLO</td><td>球蛋白</td><td>23.8</td><td>20--30</td><td>8/L</td></tr><tr><td>A/G</td><td>白球比</td><td>1.5</td><td>1.5--2.5</td><td></td></tr><tr><td>ALP</td><td>碱性磷酸酶</td><td>93</td><td>15--112</td><td>HUSL</td></tr><tr><td>GGT</td><td>谷氨酰转肽酶</td><td>14.3</td><td><50</td><td>U/L</td></tr><tr><td>AST</td><td>谷草转氨酶</td><td>16.3</td><td>8--40</td><td>W/L</td></tr><tr><td>LDH</td><td>乳酸脱氢酶</td><td>167</td><td>114--240</td><td>U/L</td></tr><tr><td>ADA</td><td>腺甘脱氨酶</td><td>12.6</td><td>4--24</td><td>U/L</td></tr></table></body></html>
  1. # 读取excel并显示
  2. import pandas as pd
  3. df = pd.read_excel('1.xlsx').fillna('')
  4. print(df)
  1. 代号 项目 结果 参考值 单位
  2. 0 ALT 谷丙转氨酶 25.6 0--40 U/L
  3. 1 TBIL 总胆红素 11.2 <20 UMOL/L
  4. 2 DBIL 直接胆红素 3.3 0--7 UMOL/L
  5. 3 IBIL 间接胆红素 7.9 1.5--15 UMOL/L
  6. 4 TP 总蛋白 58.9J 60--80 g/L
  7. 5 ALB 白蛋白 35.1 33--55 g/L
  8. 6 GLO 球蛋白 23.8 20--30 8/L
  9. 7 A/G 白球比 1.5 1.5--2.5
  10. 8 ALP 碱性磷酸酶 93 15--112 HUSL
  11. 9 GGT 谷氨酰转肽酶 14.3 <50 U/L
  12. 10 AST 谷草转氨酶 16.3 8--40 W/L
  13. 11 LDH 乳酸脱氢酶 167 114--240 U/L
  14. 12 ADA 腺甘脱氨酶 12.6 4--24 U/L

2. 预测原理详解:

2.1 整体pipeline介绍

PP-Structure 的表格识别模型算法属于基于端到端的方法

表格识别算法由三个模型组成:

  1. 文字检测模型:用于检测表格里的文本
  2. 文字识别模型:用于对检测到的文本进行识别
  3. 表格单元格预测和表格结构预测模型:用于预测表格结构的HTML信息和表格单元格坐标

三个模型的串联过程如下图所示:

2.文档分析实战-表格识别 - 图2
图 1:表格识别pipeline
具体过程为:

  1. 使用文字检测模型用于检测表格里的文本
  2. 使用文字识别模型对检测到的文本进行识别,到这一步,我们拿到了文字的框和文字信息
  3. 使用表格单元格预测和表格结构预测模型进行单元格坐标预测和表格结构的HTML信息预测
  4. 对2中的文字框和3中的单元格坐标进行聚合,如下图所示,根据红色的文字检测框和蓝色的单元格坐标检测框之间的IOU进行判定是否需要聚合。
  5. 在完成文本框聚合之后,对文本框进行一个从上到下,从左到右的排序,根据排序后文本框的索引即可拿到对应的文字信息,然后文字信息做一个字符串拼接即可得到最终单元格里的文本内容。

2.文档分析实战-表格识别 - 图3
图 2:文字框和单元格坐标聚合示意图

2.2 表格结构预测模型介绍

表格识别需要三个模型:文字检测,文字识别和表格结构识别模型。文字检测和识别模型在前面的课程中已经介绍,这里重点介绍表格结构预测模型。

表格结构预测模型完成了表格结构的预测和表格单元格坐标的检测。表格结构模型由RARE算法改动而来, 主要在下面几个方面进行了改动

2.2.1 输入数据

对于文字识别模型,数据集标注的每个字符的独立的,但是在表格结构预测模型中,要求预测的类别不是单个字符,下面是RARE和表格结构预测模型的词典对比:

模型 字典
RARE '<', 's', 'u', 'p', '>', '<', '/', 's', 'u', 'b', '>', '<', 'b', '>', '<', '/', 'b', '>', '<', 'i', '>', '<', '/', 'i', '>’
表格结构预测模型 'sos', '<thead>', '<tr>', '<td>', '</td>', '</tr>', '</thead>', '<tbody>', '</tbody>', '<td', ' colspan="5"', '>', ' colspan="2"', ' colspan="3"', ' rowspan="2"', ' colspan="4"', ' colspan="6"', ' rowspan="3"', ' colspan="9"', ' colspan="10"', ' colspan="7"', ' rowspan="4"', ' rowspan="5"', ' rowspan="9"', ' colspan="8"', ' rowspan="8"', ' rowspan="6"', ' rowspan="7"', ' rowspan="10"', 'eos'

在表格结构预测模型中,将<thead>这类字符串视为一个字符来进行识别。

2.2.2 模型

表格结构识别模型和EARE的对比图如下
2.文档分析实战-表格识别 - 图4
图 3:表格结构识别模型示意图
RARE模型由TPS+CNN+RNN+AttentionHead组成,各个部分的主要作用如下:

  1. TPS:对弯曲的文本进行校正,使图像恢复到水平
  2. CNN:对图像进行特征提取
  3. RNN:对提取的特征进行进一步的增强,提取语义方面的特征
  4. AttentionHead:进行输出

在表格结构识别模型中,输入的图像为整张图像,因此移除TPS模块,另外通过实验证明RNN对结果影响不大,因此移除RNN模块,最终表格结构识别模型的结构为CNN+AttentionHead。
为了输出单元格的坐标,尝试了再检测模型里进行单元格坐标的检测,在DB模型的基础上尝试了下面的方案2,3

方案 结果
1. 单行文本检测 image.png
2. 文本和单元格用一个模型检测 image.png
3. 文本和单元格用两个模型检测 image.png

可以看到,在分割模型中完成文本和单元格的检测会导致GT的奇义性: Cell里每一行之间背景的GT是文本还是背景?
在整个表格识别Pipeline的三个模型中,只有文字检测和表格结构识别模型能够获取到整张图像的信息,因此,在表格结构识别模型的AttentionHead中额外添加一个基于回归的分支来完成单元格的坐标(x0,y0,x1,y1)检测。

2.3 表格结构预测模型前向分析

模型前向分析通过分析图像输入从预处理到网络输出之间各个模块里的输出shape变化,来更好的了解表格单元格预测和表格结构预测模型,涉及到的模块如下:

类型 模块名称
数据处理 ResizeTableImage
数据处理 PaddingTableImage
Backbone MobileNetV3
Head TableAttentionHead

2.3.1 输入数据处理

本例中输入图像和数据处理模块输出可视化如下:

  1. # 切换到PaddleOCR目录
  2. os.chdir('/home/aistudio/PaddleOCR')
  3. from ppocr.data import create_operators, transform
  4. plt.figure(figsize=(24,8))
  5. # 读取输入图像
  6. img = cv2.imread('/home/aistudio/1.jpg')
  7. # 显示输入图像
  8. plt.subplot(1,3,1)
  9. plt.title('src, shape:{}'.format(img.shape))
  10. plt.imshow(img)
  11. # 执行 ResizeTableImage
  12. # https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/data/imaug/gen_table_mask.py#L182
  13. pre_process_list = [{'ResizeTableImage': {'max_len': args.table_max_len }}] # 将图片长边缩放到指定长度,短边进行等比缩放
  14. preprocess_op = create_operators(pre_process_list)
  15. data = {'image': img}
  16. data = transform(data, preprocess_op)
  17. # 显示 ResizeTableImage 后的图像
  18. plt.subplot(1,3,2)
  19. plt.title('ResizeTableImage, shape:{}'.format(data['image'].shape))
  20. plt.imshow(data['image'])
  21. # 执行 PaddingTableImage
  22. # https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/data/imaug/gen_table_mask.py#L232
  23. pre_process_list = [{'PaddingTableImage': None}]
  24. preprocess_op = create_operators(pre_process_list)
  25. data = transform(data, preprocess_op)
  26. # 显示 PaddingTableImage 后的图像
  27. plt.subplot(1,3,3)
  28. plt.title('PaddingTableImage, shape:{}'.format(data['image'].shape))
  29. plt.imshow(data['image']/255)
  30. plt.show()
  31. # 定义完整的处理op列表
  32. pre_process_list = [
  33. {'ResizeTableImage': {'max_len': args.table_max_len }},
  34. {'NormalizeImage':{'scale':1./255., 'mean': [0.485, 0.456, 0.406],'std': [0.229, 0.224, 0.225], 'order': 'hwc'}},
  35. {'PaddingTableImage': None},
  36. {'ToCHWImage': None}
  37. ]
  38. # 创建op列表
  39. preprocess_op = create_operators(pre_process_list)
  40. # 执行op列表
  41. data = {'image': img}
  42. data = transform(data, preprocess_op)

image.png

  1. # 下载预训练模型
  2. ! wget -P ./pre_train/ https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tar && cd pre_train && tar xf en_ppocr_mobile_v2.0_table_structure_train.tar && cd ..
  3. # 下载的预训练模型
  4. import paddle
  5. # 读取预训练参数,并分为 backbone 参数和 head 参数
  6. pretrain_params = paddle.load('/home/aistudio/PaddleOCR/pre_train/en_ppocr_mobile_v2.0_table_structure_train/best_accuracy.pdparams')
  7. def filter_params(pretrain_params,prefix):
  8. new_dict = {}
  9. for k,v in pretrain_params.items():
  10. if k.startswith(prefix):
  11. new_dict[k.replace(prefix+'.','')] = v
  12. return new_dict
  13. # 抽取参数
  14. backbone_dict = filter_params(pretrain_params,'backbone')
  15. head_dict = filter_params(pretrain_params,'head')
  1. --2021-12-26 19:56:07-- https://paddleocr.bj.bcebos.com/dygraph_v2.1/table/en_ppocr_mobile_v2.0_table_structure_train.tar
  2. Resolving paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)... 182.61.200.229, 182.61.200.195, 2409:8c04:1001:1002:0:ff:b001:368a
  3. Connecting to paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)|182.61.200.229|:443... connected.
  4. HTTP request sent, awaiting response... 200 OK
  5. Length: 76103680 (73M) [application/x-tar]
  6. Saving to: ‘./pre_train/en_ppocr_mobile_v2.0_table_structure_train.tar.3
  7. en_ppocr_mobile_v2. 100%[===================>] 72.58M 36.6MB/s in 2.0s
  8. 2021-12-26 19:56:09 (36.6 MB/s) - ‘./pre_train/en_ppocr_mobile_v2.0_table_structure_train.tar.3 saved [76103680/76103680]

2.3.2 Backbone

backbone和检测的backbone一致,均输出尺寸为输入图像 1/4,1/8,1/16和1/32 的四个特征图。相关backbone在文本检测章节已经介绍过了,这里不再赘述。

  1. # https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/modeling/backbones/det_mobilenet_v3.py
  2. from ppocr.modeling.backbones import build_backbone
  3. # 初始化 backbone
  4. backbone = build_backbone(dict(name='MobileNetV3',scale=1.0,model_name='large'),model_type='table')
  5. backbone.eval()
  6. # 加载 backbone 参数
  7. backbone.set_state_dict(backbone_dict)
  1. import numpy as np
  2. x = np.expand_dims(data['image'],axis=0)
  3. x = paddle.to_tensor(x)
  4. backbone_out = backbone(x)
  5. for item in backbone_out:
  6. print(item.shape)
  1. [1, 24, 122, 122]
  2. [1, 40, 61, 61]
  3. [1, 112, 31, 31]
  4. [1, 960, 16, 16]

2.3.3 Head

Head的输入为backbone输出的四个特征图,输出为表格结构和单元格坐标的预测结果

输入参数含义为:

参数 含义
in_channels 输入特征图的通道数
hidden_size Attention里RNN模块的隐藏层单元
max_elem_length 最大预测字符的数量
in_max_len 输入图像的尺寸
loc_type 输出单元格坐标分支的输入
1:仅使用Attention后的隐藏层
2:融合CNN部分+Attention部分

其代码如下

  1. # https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/modeling/heads/table_att_head.py
  2. from paddle import nn
  3. import paddle.nn.functional as F
  4. from ppocr.modeling.heads.table_att_head import AttentionGRUCell
  5. class TableAttentionHead(nn.Layer):
  6. def __init__(self,
  7. in_channels,
  8. hidden_size,
  9. loc_type=2,
  10. in_max_len=488, # 输入图像的尺寸
  11. max_elem_length=800, # 输出的最大标签数量
  12. **kwargs):
  13. super(TableAttentionHead, self).__init__()
  14. self.input_size = in_channels[-1]
  15. self.hidden_size = hidden_size
  16. self.elem_num = 30
  17. self.max_elem_length = max_elem_length
  18. self.structure_attention_cell = AttentionGRUCell(
  19. self.input_size, hidden_size, self.elem_num, use_gru=False)
  20. self.structure_generator = nn.Linear(hidden_size, self.elem_num)
  21. self.loc_type = loc_type
  22. self.in_max_len = in_max_len
  23. # 坐标框回归分支
  24. if self.loc_type == 1:
  25. self.loc_generator = nn.Linear(hidden_size, 4)
  26. else:
  27. if self.in_max_len == 640:
  28. # 640经过backbone后最后一个特征图为 20*20,因此这里输入的特征图大小为400
  29. self.loc_fea_trans = nn.Linear(400, self.max_elem_length + 1)
  30. elif self.in_max_len == 800:
  31. # 800 经过backbone后最后一个特征图为 23*25,因此这里输入的特征图大小为625
  32. self.loc_fea_trans = nn.Linear(625, self.max_elem_length + 1)
  33. elif self.in_max_len == 488:
  34. # 800 经过backbone后最后一个特征图为 16*16,因此这里输入的特征图大小为256
  35. self.loc_fea_trans = nn.Linear(256, self.max_elem_length + 1)
  36. self.loc_generator = nn.Linear(self.input_size + hidden_size, 4)
  37. def _char_to_onehot(self, input_char, onehot_dim):
  38. input_ont_hot = F.one_hot(input_char, onehot_dim)
  39. return input_ont_hot
  40. def forward(self, inputs, targets=None):
  41. # 取出backbone输出的最小map
  42. fea = inputs[-1]
  43. if len(fea.shape) == 3:
  44. pass
  45. else:
  46. # B,C,H,W reshape 为 B,C,H*W
  47. last_shape = int(np.prod(fea.shape[2:]))
  48. fea = paddle.reshape(fea, [fea.shape[0], fea.shape[1], last_shape])
  49. # B,C,W 改为 B,W,C
  50. fea = fea.transpose([0, 2, 1])
  51. batch_size = fea.shape[0]
  52. hidden = paddle.zeros((batch_size, self.hidden_size))
  53. output_hiddens = []
  54. if self.training and targets is not None:
  55. structure = targets[0]
  56. for i in range(self.max_elem_length + 1):
  57. elem_onehots = self._char_to_onehot(
  58. structure[:, i], onehot_dim=self.elem_num)
  59. (outputs, hidden), alpha = self.structure_attention_cell(
  60. hidden, fea, elem_onehots)
  61. output_hiddens.append(paddle.unsqueeze(outputs, axis=1))
  62. output = paddle.concat(output_hiddens, axis=1)
  63. structure_probs = self.structure_generator(output)
  64. if self.loc_type == 1:
  65. loc_preds = self.loc_generator(output)
  66. loc_preds = F.sigmoid(loc_preds)
  67. else:
  68. loc_fea = fea.transpose([0, 2, 1])
  69. loc_fea = self.loc_fea_trans(loc_fea)
  70. loc_fea = loc_fea.transpose([0, 2, 1])
  71. loc_concat = paddle.concat([output, loc_fea], axis=2)
  72. loc_preds = self.loc_generator(loc_concat)
  73. loc_preds = F.sigmoid(loc_preds)
  74. else:
  75. temp_elem = paddle.zeros(shape=[batch_size], dtype="int32")
  76. structure_probs = None
  77. loc_preds = None
  78. elem_onehots = None
  79. outputs = None
  80. alpha = None
  81. max_elem_length = paddle.to_tensor(self.max_elem_length)
  82. i = 0
  83. # Attention forward
  84. while i < max_elem_length + 1:
  85. elem_onehots = self._char_to_onehot(
  86. temp_elem, onehot_dim=self.elem_num)
  87. (outputs, hidden), alpha = self.structure_attention_cell(
  88. hidden, fea, elem_onehots)
  89. output_hiddens.append(paddle.unsqueeze(outputs, axis=1))
  90. structure_probs_step = self.structure_generator(outputs)
  91. temp_elem = structure_probs_step.argmax(axis=1, dtype="int32")
  92. i += 1
  93. output = paddle.concat(output_hiddens, axis=1)
  94. print('Attention output shape',output.shape)
  95. # 表格结构分支
  96. structure_probs = self.structure_generator(output)
  97. structure_probs = F.softmax(structure_probs)
  98. # 单元格坐标分支
  99. if self.loc_type == 1:
  100. loc_preds = self.loc_generator(output)
  101. loc_preds = F.sigmoid(loc_preds)
  102. else:
  103. # B,W,C 改为 B,C,W
  104. loc_fea = fea.transpose([0, 2, 1])
  105. loc_fea = self.loc_fea_trans(loc_fea)
  106. loc_fea = loc_fea.transpose([0, 2, 1])
  107. loc_concat = paddle.concat([output, loc_fea], axis=2)
  108. loc_preds = self.loc_generator(loc_concat)
  109. loc_preds = F.sigmoid(loc_preds)
  110. return {'structure_probs': structure_probs, 'loc_preds': loc_preds}
  1. # 初始化 head
  2. head = TableAttentionHead(in_channels=backbone.out_channels,hidden_size=256,loc_type=2)
  3. head.eval()
  4. # 加载 head 参数
  5. head.set_state_dict(head_dict)
  6. # 执行 head
  7. print('*'*10,'head forward shape','*'*10)
  8. head_out = head(backbone_out)
  9. print('*'*10,'head out shape','*'*10)
  10. # 打印 head 输出和对应的 shape
  11. for key in head_out:
  12. print(key,head_out[key].shape)
  1. ********** head forward shape **********
  2. Attention output shape [1, 801, 256]
  3. ********** head out shape **********
  4. structure_probs [1, 801, 30]
  5. loc_preds [1, 801, 4]

后处理
后处理的字典文件为 ppocr/utils/dict/table_structure_dict.txt
后处理解码思路:

  1. 对 structure_probs 进行CTC解码: 不要背景字符sos和eos,连续重复的字符只取一个
  2. 输出的坐标为归一化到0-1的值,对坐标乘上图片宽高,decode到图像空间 ```python

    https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/ppocr/postprocess/rec_postprocess.py#L441

from ppocr.postprocess.rec_postprocess import TableLabelDecode

def post_process(out): character_dict_path = ‘/home/aistudio/PaddleOCR/ppocr/utils/dict/table_structure_dict.txt’

  1. # 初始化后处理 op
  2. post_op = TableLabelDecode(character_dict_path)
  3. post_result = post_op(out)
  4. structure_str_list = post_result['structure_str_list']
  5. # 归一化的坐标回复到原图大小
  6. res_loc = post_result['res_loc']
  7. imgh, imgw = img.shape[0:2]
  8. res_loc_final = []
  9. for rno in range(len(res_loc[0])):
  10. x0, y0, x1, y1 = res_loc[0][rno]
  11. left = max(int(imgw * x0), 0)
  12. top = max(int(imgh * y0), 0)
  13. right = min(int(imgw * x1), imgw - 1)
  14. bottom = min(int(imgh * y1), imgh - 1)
  15. res_loc_final.append([left, top, right, bottom])
  16. # 结构信息处理
  17. structure_str_list = structure_str_list[0]
  18. structure_str_list = ['<html>', '<body>', '<table>'] + structure_str_list + ['</table>', '</body>', '</html>']
  19. return structure_str_list,res_loc_final

structure_str_list,res_loc_final = post_process(head_out)

print(structure_str_list) print(res_loc_final)

可视化预测 box

plt.figure(figsize=(24,8)) img_show = img.copy() for box in res_loc_final: cv2.rectangle(img_show, (box[0], box[1]), (box[2], box[3]), (0, 255, 0), 2) plt.imshow(img_show)

  1. ```
  2. ['<html>', '<body>', '<table>', '<thead>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '</thead>', '<tbody>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '</tbody>', '</table>', '</body>', '</html>']
  3. [[32, 9, 104, 40], [232, 8, 307, 41], [429, 7, 500, 44], [559, 8, 656, 44], [715, 7, 780, 44], [37, 45, 99, 73], [190, 44, 342, 74], [432, 45, 502, 74], [565, 44, 655, 73], [712, 46, 777, 74], [30, 81, 101, 109], [202, 80, 337, 110], [433, 81, 503, 111], [578, 83, 638, 110], [698, 82, 790, 110], [31, 119, 104, 148], [197, 116, 347, 147], [443, 117, 492, 148], [572, 118, 643, 147], [698, 118, 797, 147], [35, 154, 101, 183], [199, 152, 342, 184], [436, 154, 501, 184], [558, 155, 670, 184], [701, 153, 801, 183], [40, 188, 93, 217], [217, 187, 314, 219], [417, 187, 516, 218], [556, 187, 667, 217], [716, 188, 772, 216], [48, 227, 98, 255], [223, 224, 313, 256], [429, 226, 500, 256], [558, 226, 667, 256], [722, 225, 772, 254], [47, 262, 99, 291], [217, 260, 313, 293], [439, 261, 506, 293], [557, 260, 678, 292], [722, 261, 777, 290], [36, 295, 95, 324], [210, 296, 317, 326], [443, 296, 499, 326], [547, 296, 681, 326], [701, 300, 767, 328], [42, 332, 99, 361], [191, 330, 350, 360], [451, 331, 493, 361], [557, 331, 683, 361], [717, 335, 785, 362], [45, 369, 97, 396], [186, 367, 355, 400], [444, 369, 504, 398], [581, 369, 640, 397], [723, 368, 773, 396], [37, 404, 95, 431], [192, 404, 351, 433], [438, 404, 511, 432], [560, 405, 658, 432], [723, 404, 775, 431], [46, 444, 104, 469], [188, 441, 346, 470], [444, 444, 496, 471], [544, 445, 681, 471], [721, 444, 773, 470], [35, 478, 104, 503], [190, 475, 345, 505], [436, 477, 504, 504], [559, 476, 662, 505], [712, 477, 778, 504]]
  4. <matplotlib.image.AxesImage at 0x7faad993b2d0>

image.png

3. 训练

训练表格识别,需要训练三个模型,分别为文本检测,文本识别,表格结构模型,文本检测和识别的训练可以参考之前课程,这里只介绍表格结构模型的训练过程。
本节以pubtabnet数据集、MobileNetV3作为骨干网络的表格结构模型模型介绍如何完成表格结构模型的训练、评估与测试。

3.1 数据准备

本次实验选取PubTabNet数据集作为我们的演示数据集。PubTabNet数据集的样例图如下图所示:
image.png

项目中已经下载了PubTabNet的部分数据集,存放在 /home/aistudio/data/data119702 中,可以运行如下指令完成数据集解压,或者从 https://github.com/ibm-aur-nlp/PubTabNet 中自行下载。

  1. # 解压数据集
  2. ! cd /home/aistudio/data/data119702 && tar -xf pubtabnet_val.tar && cd -
  3. ! ls /home/aistudio/data/data119702
  1. /home/aistudio/PaddleOCR
  2. PubTabNet_2.0.0_val.jsonl pubtabnet_val.tar val

运行上述指令后 /home/aistudio/data/data119702 有一个文件夹和一个文件,分别是:

  1. /home/aistudio/data/data119702
  2. └─ val/ 图片存放文件夹
  3. └─ PubTabNet_2.0.0_val.jsonl/ 标注信息

该数据集的标注格式为

  1. {
  2. 'filename': PMC5755158_010_01.png, # 图像名
  3. 'split': train‘, # 图像属于训练集还是验证集
  4. 'imgid': 0, # 图像的index
  5. 'html': {
  6. 'structure': {'tokens': ['<thead>', '<tr>', '<td>', ...]}, # 表格的HTML字符串
  7. 'cell': [
  8. {
  9. 'tokens': ['P', 'a', 'd', 'd', 'l', 'e', 'P', 'a', 'd', 'd', 'l', 'e'], # 表格中的单个文本
  10. 'bbox': [x0, y0, x1, y1] # 表格中的单个文本的坐标
  11. }
  12. ]
  13. }
  14. }

3.2 数据预处理

练时对输入图片的格式、大小有一定的要求。所以,在数据输入模型前,需要对数据进行预处理操作,使得图片和标签满足网络训练和预测的需要。
表格结构模型的数据预处理主要班刊下面几个:

  • DecodeImage,将图像转为Numpy格式
  • ResizeTableImage,对图片进行resize,长边resize到指定尺寸,短边等比例缩放
  • TableLabelEncode,解析标注文件中的标签信息,并按统一格式进行保存
  • NormalizeImage,通过规范化手段,把神经网络每层中任意神经元的输入值分布改变成均值为0,方差为1的标准正太分布,使得最优解的寻优过程明显会变得平缓,训练过程更容易收敛;
  • PaddingTableImage,对图像的短边进pad,将其pad到和长边一样的尺寸
  • ToCHWImage,图像的数据格式为[H, W, C](即高度、宽度和通道数),而神经网络使用的训练数据的格式为[C, H, W],因此需要对图像数据重新排列,例如[224, 224, 3]变为[3, 224, 224];
  • KeepKeys,dict过滤

TableLabelEncode
解析标签文件中的标签信息,首先加载标注数据并取出一条标注

  1. # 加载数据集里的一条数据
  2. import json
  3. from pprint import pprint
  4. with open('/home/aistudio/data/data119702/PubTabNet_2.0.0_val.jsonl', "rb") as f:
  5. data_lines = f.readlines()
  6. for line in data_lines:
  7. data_line = line.decode('utf-8').strip("\n")
  8. info = json.loads(data_line)
  9. break

运行下述代码观察 TableLabelEncode 类编码标签前后的对比。

  1. from ppocr.data.imaug import TableLabelEncode
  2. # 初始化 label 编码器
  3. label_eocoder_op = TableLabelEncode(max_text_length=100,# 未使用
  4. max_elem_length=50, # 每张图最多预测多少个cel
  5. max_cell_num=500, # 未使用
  6. character_dict_path='ppocr/utils/dict/table_structure_dict.txt')
  7. # 构建输入数据
  8. cells = info['html']['cells']
  9. structure = info['html']['structure']
  10. # 2. 打印解码前的标签
  11. print("The cells and structure before decode")
  12. print("cells: ", cells)
  13. print("structure: ", structure)
  14. image = cv2.imread(os.path.join('/home/aistudio/data/data119702/val', info['filename']))
  15. data = {'image':image,'cells': cells, 'structure':structure}
  16. # 执行 label 编码器
  17. data = label_eocoder_op(data)
  18. # 打印编码后的信息
  19. print("The bbox_list and structure after decode")
  20. print("bbox_list:",data['bbox_list'].tolist())
  21. print("structure:", data['structure'].tolist())
  1. The cells and structure before decode
  2. cells: [{'tokens': []}, {'tokens': ['<b>', 'W', 'e', 'a', 'n', 'i', 'n', 'g', '</b>'], 'bbox': [66, 4, 96, 13]}, {'tokens': ['<b>', 'W', 'e', 'e', 'k', ' ', '1', '5', '</b>'], 'bbox': [131, 4, 160, 13]}, {'tokens': ['<b>', 'O', 'f', 'f', '-', 't', 'e', 's', 't', '</b>'], 'bbox': [201, 4, 226, 13]}, {'tokens': ['W', 'e', 'a', 'n', 'i', 'n', 'g'], 'bbox': [1, 17, 31, 26]}, {'tokens': ['–'], 'bbox': [66, 21, 72, 25]}, {'tokens': ['–'], 'bbox': [131, 21, 137, 25]}, {'tokens': ['–'], 'bbox': [201, 21, 207, 25]}, {'tokens': ['W', 'e', 'e', 'k', ' ', '1', '5'], 'bbox': [1, 31, 30, 40]}, {'tokens': ['–'], 'bbox': [66, 35, 72, 39]}, {'tokens': ['0', '.', '1', '7', ' ', '±', ' ', '0', '.', '0', '8'], 'bbox': [131, 31, 166, 40]}, {'tokens': ['0', '.', '1', '6', ' ', '±', ' ', '0', '.', '0', '3'], 'bbox': [201, 31, 236, 40]}, {'tokens': ['O', 'f', 'f', '-', 't', 'e', 's', 't'], 'bbox': [1, 45, 26, 54]}, {'tokens': ['–'], 'bbox': [66, 49, 72, 53]}, {'tokens': ['0', '.', '8', '0', ' ', '±', ' ', '0', '.', '2', '4'], 'bbox': [131, 45, 166, 54]}, {'tokens': ['0', '.', '1', '9', ' ', '±', ' ', '0', '.', '0', '9'], 'bbox': [201, 45, 236, 54]}]
  3. structure: {'tokens': ['<thead>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '</thead>', '<tbody>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '<tr>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '<td>', '</td>', '</tr>', '</tbody>']}
  4. The bbox_list and structure after decode
  5. bbox_list: [[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.27731093764305115, 0.06779661029577255, 0.40336135029792786, 0.22033898532390594], [0.0, 0.0, 0.0, 0.0], [0.5504201650619507, 0.06779661029577255, 0.6722689270973206, 0.22033898532390594], [0.0, 0.0, 0.0, 0.0], [0.8445377945899963, 0.06779661029577255, 0.9495798349380493, 0.22033898532390594], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.004201680887490511, 0.2881355881690979, 0.13025210797786713, 0.4406779706478119], [0.0, 0.0, 0.0, 0.0], [0.27731093764305115, 0.35593220591545105, 0.3025210201740265, 0.4237288236618042], [0.0, 0.0, 0.0, 0.0], [0.5504201650619507, 0.35593220591545105, 0.575630247592926, 0.4237288236618042], [0.0, 0.0, 0.0, 0.0], [0.8445377945899963, 0.35593220591545105, 0.8697478771209717, 0.4237288236618042], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.004201680887490511, 0.5254237055778503, 0.1260504275560379, 0.6779661178588867], [0.0, 0.0, 0.0, 0.0], [0.27731093764305115, 0.5932203531265259, 0.3025210201740265, 0.6610169410705566], [0.0, 0.0, 0.0, 0.0], [0.5504201650619507, 0.5254237055778503, 0.6974790096282959, 0.6779661178588867], [0.0, 0.0, 0.0, 0.0], [0.8445377945899963, 0.5254237055778503, 0.9915966391563416, 0.6779661178588867], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.004201680887490511, 0.7627118825912476, 0.10924369841814041, 0.9152542352676392], [0.0, 0.0, 0.0, 0.0], [0.27731093764305115, 0.8305084705352783, 0.3025210201740265, 0.8983050584793091], [0.0, 0.0, 0.0, 0.0], [0.5504201650619507, 0.7627118825912476, 0.6974790096282959, 0.9152542352676392], [0.0, 0.0, 0.0, 0.0], [0.8445377945899963, 0.7627118825912476, 0.9915966391563416, 0.9152542352676392], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0]]
  6. structure: [0, 1, 2, 3, 4, 3, 4, 3, 4, 3, 4, 5, 6, 7, 2, 3, 4, 3, 4, 3, 4, 3, 4, 5, 2, 3, 4, 3, 4, 3, 4, 3, 4, 5, 2, 3, 4, 3, 4, 3, 4, 3, 4, 5, 8, 29, 0, 0, 0, 0, 0, 0]

3.3 损失函数定义

模型的loss分为两部分:

  1. structure loss: structure loss使用分类常见的 CrossEntropyLoss
  2. loc loss: loc loss使用MSELoss

两个loss通过加权进行融合, 在代码汇总structure_weight和loc_weight分别为100和10000

  1. total_loss = structure_loss * structure_weight + loc_loss * loc_weight

3.4 模型训练

完成数据处理和损失函数定义后即可开始训练模型了。
训练基于PaddleOCR训练,采用参数配置的形式,参数文件参考 https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/configs/table/table_mv3.yml,网络结构参数如下

  1. Architecture:
  2. model_type: table
  3. algorithm: TableAttn
  4. Backbone:
  5. name: MobileNetV3
  6. scale: 1.0
  7. model_name: large
  8. Head:
  9. name: TableAttentionHead
  10. hidden_size: 256
  11. loc_type: 2
  12. max_text_length: 100
  13. max_elem_length: 800
  14. max_cell_num: 500

损失函数参数如下:

  1. Loss:
  2. name: TableAttentionLoss
  3. structure_weight: 100.0
  4. loc_weight: 10000.0

配置完成后通过下面命令即可开启训练

  1. # 配置数据集
  2. # !mkdir -p train_data/table/pubtabnet
  3. !cd train_data/table/pubtabnet && ln -s /home/aistudio/data/data119702/PubTabNet_2.0.0_val.jsonl PubTabNet_2.0.0_train.jsonl \
  4. && ln -s /home/aistudio/data/data119702/PubTabNet_2.0.0_val.jsonl PubTabNet_2.0.0_val.jsonl \
  5. && ln -s /home/aistudio/data/data119702/val train \
  6. && ln -s /home/aistudio/data/data119702/val val
  1. ln: failed to create symbolic link 'PubTabNet_2.0.0_train.jsonl': File exists
  1. ! python tools/train.py -c configs/table/table_mv3.yml -o Global.use_gpu=False Global.print_batch_step=1 Train.loader.batch_size_per_card=1 Eval.loader.batch_size_per_card=1
  1. /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  2. import imp
  3. [2021/12/26 19:57:29] root INFO: Architecture :
  4. [2021/12/26 19:57:29] root INFO: Backbone :
  5. [2021/12/26 19:57:29] root INFO: model_name : large
  6. [2021/12/26 19:57:29] root INFO: name : MobileNetV3
  7. [2021/12/26 19:57:29] root INFO: scale : 1.0
  8. [2021/12/26 19:57:29] root INFO: Head :
  9. [2021/12/26 19:57:29] root INFO: hidden_size : 256
  10. [2021/12/26 19:57:29] root INFO: l2_decay : 1e-05
  11. [2021/12/26 19:57:29] root INFO: loc_type : 2
  12. [2021/12/26 19:57:29] root INFO: max_cell_num : 500
  13. [2021/12/26 19:57:29] root INFO: max_elem_length : 800
  14. [2021/12/26 19:57:29] root INFO: max_text_length : 100
  15. [2021/12/26 19:57:29] root INFO: name : TableAttentionHead
  16. [2021/12/26 19:57:29] root INFO: algorithm : TableAttn
  17. [2021/12/26 19:57:29] root INFO: model_type : table
  18. [2021/12/26 19:57:29] root INFO: Eval :
  19. [2021/12/26 19:57:29] root INFO: dataset :
  20. [2021/12/26 19:57:29] root INFO: data_dir : train_data/table/pubtabnet/val/
  21. [2021/12/26 19:57:29] root INFO: label_file_path : train_data/table/pubtabnet/PubTabNet_2.0.0_val.jsonl
  22. [2021/12/26 19:57:29] root INFO: name : PubTabDataSet
  23. [2021/12/26 19:57:29] root INFO: transforms :
  24. [2021/12/26 19:57:29] root INFO: DecodeImage :
  25. [2021/12/26 19:57:29] root INFO: channel_first : False
  26. [2021/12/26 19:57:29] root INFO: img_mode : BGR
  27. [2021/12/26 19:57:29] root INFO: ResizeTableImage :
  28. [2021/12/26 19:57:29] root INFO: max_len : 488
  29. [2021/12/26 19:57:29] root INFO: TableLabelEncode : None
  30. [2021/12/26 19:57:29] root INFO: NormalizeImage :
  31. [2021/12/26 19:57:29] root INFO: mean : [0.485, 0.456, 0.406]
  32. [2021/12/26 19:57:29] root INFO: order : hwc
  33. [2021/12/26 19:57:29] root INFO: scale : 1./255.
  34. [2021/12/26 19:57:29] root INFO: std : [0.229, 0.224, 0.225]
  35. [2021/12/26 19:57:29] root INFO: PaddingTableImage : None
  36. [2021/12/26 19:57:29] root INFO: ToCHWImage : None
  37. [2021/12/26 19:57:29] root INFO: KeepKeys :
  38. [2021/12/26 19:57:29] root INFO: keep_keys : ['image', 'structure', 'bbox_list', 'sp_tokens', 'bbox_list_mask']
  39. [2021/12/26 19:57:29] root INFO: loader :
  40. [2021/12/26 19:57:29] root INFO: batch_size_per_card : 1
  41. [2021/12/26 19:57:29] root INFO: drop_last : False
  42. [2021/12/26 19:57:29] root INFO: num_workers : 1
  43. [2021/12/26 19:57:29] root INFO: shuffle : False
  44. [2021/12/26 19:57:29] root INFO: Global :
  45. [2021/12/26 19:57:29] root INFO: cal_metric_during_train : True
  46. [2021/12/26 19:57:29] root INFO: character_dict_path : ppocr/utils/dict/table_structure_dict.txt
  47. [2021/12/26 19:57:29] root INFO: character_type : en
  48. [2021/12/26 19:57:29] root INFO: checkpoints : None
  49. [2021/12/26 19:57:29] root INFO: debug : False
  50. [2021/12/26 19:57:29] root INFO: distributed : False
  51. [2021/12/26 19:57:29] root INFO: epoch_num : 400
  52. [2021/12/26 19:57:29] root INFO: eval_batch_step : [0, 400]
  53. [2021/12/26 19:57:29] root INFO: infer_img : doc/table/table.jpg
  54. [2021/12/26 19:57:29] root INFO: infer_mode : False
  55. [2021/12/26 19:57:29] root INFO: log_smooth_window : 20
  56. [2021/12/26 19:57:29] root INFO: max_cell_num : 500
  57. [2021/12/26 19:57:29] root INFO: max_elem_length : 800
  58. [2021/12/26 19:57:29] root INFO: max_text_length : 100
  59. [2021/12/26 19:57:29] root INFO: pretrained_model : None
  60. [2021/12/26 19:57:29] root INFO: print_batch_step : 1
  61. [2021/12/26 19:57:29] root INFO: process_cut_num : 0
  62. [2021/12/26 19:57:29] root INFO: process_total_num : 0
  63. [2021/12/26 19:57:29] root INFO: save_epoch_step : 3
  64. [2021/12/26 19:57:29] root INFO: save_inference_dir : None
  65. [2021/12/26 19:57:29] root INFO: save_model_dir : ./output/table_mv3/
  66. [2021/12/26 19:57:29] root INFO: use_gpu : False
  67. [2021/12/26 19:57:29] root INFO: use_visualdl : False
  68. [2021/12/26 19:57:29] root INFO: Loss :
  69. [2021/12/26 19:57:29] root INFO: loc_weight : 10000.0
  70. [2021/12/26 19:57:29] root INFO: name : TableAttentionLoss
  71. [2021/12/26 19:57:29] root INFO: structure_weight : 100.0
  72. [2021/12/26 19:57:29] root INFO: Metric :
  73. [2021/12/26 19:57:29] root INFO: main_indicator : acc
  74. [2021/12/26 19:57:29] root INFO: name : TableMetric
  75. [2021/12/26 19:57:29] root INFO: Optimizer :
  76. [2021/12/26 19:57:29] root INFO: beta1 : 0.9
  77. [2021/12/26 19:57:29] root INFO: beta2 : 0.999
  78. [2021/12/26 19:57:29] root INFO: clip_norm : 5.0
  79. [2021/12/26 19:57:29] root INFO: lr :
  80. [2021/12/26 19:57:29] root INFO: learning_rate : 0.001
  81. [2021/12/26 19:57:29] root INFO: name : Adam
  82. [2021/12/26 19:57:29] root INFO: regularizer :
  83. [2021/12/26 19:57:29] root INFO: factor : 0.0
  84. [2021/12/26 19:57:29] root INFO: name : L2
  85. [2021/12/26 19:57:29] root INFO: PostProcess :
  86. [2021/12/26 19:57:29] root INFO: name : TableLabelDecode
  87. [2021/12/26 19:57:29] root INFO: Train :
  88. [2021/12/26 19:57:29] root INFO: dataset :
  89. [2021/12/26 19:57:29] root INFO: data_dir : train_data/table/pubtabnet/train/
  90. [2021/12/26 19:57:29] root INFO: label_file_path : train_data/table/pubtabnet/PubTabNet_2.0.0_train.jsonl
  91. [2021/12/26 19:57:29] root INFO: name : PubTabDataSet
  92. [2021/12/26 19:57:29] root INFO: transforms :
  93. [2021/12/26 19:57:29] root INFO: DecodeImage :
  94. [2021/12/26 19:57:29] root INFO: channel_first : False
  95. [2021/12/26 19:57:29] root INFO: img_mode : BGR
  96. [2021/12/26 19:57:29] root INFO: ResizeTableImage :
  97. [2021/12/26 19:57:29] root INFO: max_len : 488
  98. [2021/12/26 19:57:29] root INFO: TableLabelEncode : None
  99. [2021/12/26 19:57:29] root INFO: NormalizeImage :
  100. [2021/12/26 19:57:29] root INFO: mean : [0.485, 0.456, 0.406]
  101. [2021/12/26 19:57:29] root INFO: order : hwc
  102. [2021/12/26 19:57:29] root INFO: scale : 1./255.
  103. [2021/12/26 19:57:29] root INFO: std : [0.229, 0.224, 0.225]
  104. [2021/12/26 19:57:29] root INFO: PaddingTableImage : None
  105. [2021/12/26 19:57:29] root INFO: ToCHWImage : None
  106. [2021/12/26 19:57:29] root INFO: KeepKeys :
  107. [2021/12/26 19:57:29] root INFO: keep_keys : ['image', 'structure', 'bbox_list', 'sp_tokens', 'bbox_list_mask']
  108. [2021/12/26 19:57:29] root INFO: loader :
  109. [2021/12/26 19:57:29] root INFO: batch_size_per_card : 1
  110. [2021/12/26 19:57:29] root INFO: drop_last : True
  111. [2021/12/26 19:57:29] root INFO: num_workers : 1
  112. [2021/12/26 19:57:29] root INFO: shuffle : True
  113. [2021/12/26 19:57:29] root INFO: profiler_options : None
  114. [2021/12/26 19:57:29] root INFO: train with paddle 2.2.1 and device CPUPlace
  115. [2021/12/26 19:57:29] root INFO: Initialize indexs of datasets:train_data/table/pubtabnet/PubTabNet_2.0.0_train.jsonl
  116. [2021/12/26 19:57:29] root INFO: Initialize indexs of datasets:train_data/table/pubtabnet/PubTabNet_2.0.0_val.jsonl
  117. [2021/12/26 19:57:29] root INFO: train from scratch
  118. [2021/12/26 19:57:29] root INFO: train dataloader has 9115 iters
  119. [2021/12/26 19:57:29] root INFO: valid dataloader has 9115 iters
  120. [2021/12/26 19:57:29] root INFO: During the training process, after the 0th iteration, an evaluation is run every 400 iterations
  121. [2021/12/26 19:57:29] root INFO: Initialize indexs of datasets:train_data/table/pubtabnet/PubTabNet_2.0.0_train.jsonl
  122. [2021/12/26 19:57:47] root INFO: epoch: [1/400], iter: 1, lr: 0.001000, loss: 358.711182, structure_loss: 277.904785, loc_loss: 80.806374, acc: 0.000000, reader_cost: 0.05254 s, batch_cost: 17.39120 s, samples: 2, ips: 0.11500
  123. [2021/12/26 19:57:55] root INFO: epoch: [1/400], iter: 2, lr: 0.001000, loss: 353.381165, structure_loss: 208.200623, loc_loss: 137.825607, acc: 0.000000, reader_cost: 0.00041 s, batch_cost: 8.65134 s, samples: 1, ips: 0.11559
  124. ^C
  125. main proc 431 exit, kill process group 417
  126. main proc 417 exit, kill process group 417

3.5 模型评估

训练过程中,默认保存两种模型,一种是latest命名的最新训练的模型,一种是best_accuracy命名的精度最高的模型。接下来使用保存的模型参数评估在测试集上的准确率:
表格结构模型的精度评估代码位于PaddleOCR/ppocr/metrics/table_metric.py中,调用tools/eval.py即可进行对训练好的模型做精度评估。

  1. !python tools/eval.py -c configs/table/table_mv3.yml -o Global.checkpoints=/home/aistudio/PaddleOCR/pre_train/en_ppocr_mobile_v2.0_table_structure_train/best_accuracy Global.use_gpu=False Eval.loader.batch_size_per_card=1
  1. /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  2. import imp
  3. [2021/12/26 20:00:08] root INFO: Architecture :
  4. [2021/12/26 20:00:08] root INFO: Backbone :
  5. [2021/12/26 20:00:08] root INFO: model_name : large
  6. [2021/12/26 20:00:08] root INFO: name : MobileNetV3
  7. [2021/12/26 20:00:08] root INFO: scale : 1.0
  8. [2021/12/26 20:00:08] root INFO: Head :
  9. [2021/12/26 20:00:08] root INFO: hidden_size : 256
  10. [2021/12/26 20:00:08] root INFO: l2_decay : 1e-05
  11. [2021/12/26 20:00:08] root INFO: loc_type : 2
  12. [2021/12/26 20:00:08] root INFO: max_cell_num : 500
  13. [2021/12/26 20:00:08] root INFO: max_elem_length : 800
  14. [2021/12/26 20:00:08] root INFO: max_text_length : 100
  15. [2021/12/26 20:00:08] root INFO: name : TableAttentionHead
  16. [2021/12/26 20:00:08] root INFO: algorithm : TableAttn
  17. [2021/12/26 20:00:08] root INFO: model_type : table
  18. [2021/12/26 20:00:08] root INFO: Eval :
  19. [2021/12/26 20:00:08] root INFO: dataset :
  20. [2021/12/26 20:00:08] root INFO: data_dir : train_data/table/pubtabnet/val/
  21. [2021/12/26 20:00:08] root INFO: label_file_path : train_data/table/pubtabnet/PubTabNet_2.0.0_val.jsonl
  22. [2021/12/26 20:00:08] root INFO: name : PubTabDataSet
  23. [2021/12/26 20:00:08] root INFO: transforms :
  24. [2021/12/26 20:00:08] root INFO: DecodeImage :
  25. [2021/12/26 20:00:08] root INFO: channel_first : False
  26. [2021/12/26 20:00:08] root INFO: img_mode : BGR
  27. [2021/12/26 20:00:08] root INFO: ResizeTableImage :
  28. [2021/12/26 20:00:08] root INFO: max_len : 488
  29. [2021/12/26 20:00:08] root INFO: TableLabelEncode : None
  30. [2021/12/26 20:00:08] root INFO: NormalizeImage :
  31. [2021/12/26 20:00:08] root INFO: mean : [0.485, 0.456, 0.406]
  32. [2021/12/26 20:00:08] root INFO: order : hwc
  33. [2021/12/26 20:00:08] root INFO: scale : 1./255.
  34. [2021/12/26 20:00:08] root INFO: std : [0.229, 0.224, 0.225]
  35. [2021/12/26 20:00:08] root INFO: PaddingTableImage : None
  36. [2021/12/26 20:00:08] root INFO: ToCHWImage : None
  37. [2021/12/26 20:00:08] root INFO: KeepKeys :
  38. [2021/12/26 20:00:08] root INFO: keep_keys : ['image', 'structure', 'bbox_list', 'sp_tokens', 'bbox_list_mask']
  39. [2021/12/26 20:00:08] root INFO: loader :
  40. [2021/12/26 20:00:08] root INFO: batch_size_per_card : 1
  41. [2021/12/26 20:00:08] root INFO: drop_last : False
  42. [2021/12/26 20:00:08] root INFO: num_workers : 1
  43. [2021/12/26 20:00:08] root INFO: shuffle : False
  44. [2021/12/26 20:00:08] root INFO: Global :
  45. [2021/12/26 20:00:08] root INFO: cal_metric_during_train : True
  46. [2021/12/26 20:00:08] root INFO: character_dict_path : ppocr/utils/dict/table_structure_dict.txt
  47. [2021/12/26 20:00:08] root INFO: character_type : en
  48. [2021/12/26 20:00:08] root INFO: checkpoints : /home/aistudio/PaddleOCR/pre_train/en_ppocr_mobile_v2.0_table_structure_train/best_accuracy
  49. [2021/12/26 20:00:08] root INFO: debug : False
  50. [2021/12/26 20:00:08] root INFO: distributed : False
  51. [2021/12/26 20:00:08] root INFO: epoch_num : 400
  52. [2021/12/26 20:00:08] root INFO: eval_batch_step : [0, 400]
  53. [2021/12/26 20:00:08] root INFO: infer_img : doc/table/table.jpg
  54. [2021/12/26 20:00:08] root INFO: infer_mode : False
  55. [2021/12/26 20:00:08] root INFO: log_smooth_window : 20
  56. [2021/12/26 20:00:08] root INFO: max_cell_num : 500
  57. [2021/12/26 20:00:08] root INFO: max_elem_length : 800
  58. [2021/12/26 20:00:08] root INFO: max_text_length : 100
  59. [2021/12/26 20:00:08] root INFO: pretrained_model : None
  60. [2021/12/26 20:00:08] root INFO: print_batch_step : 5
  61. [2021/12/26 20:00:08] root INFO: process_cut_num : 0
  62. [2021/12/26 20:00:08] root INFO: process_total_num : 0
  63. [2021/12/26 20:00:08] root INFO: save_epoch_step : 3
  64. [2021/12/26 20:00:08] root INFO: save_inference_dir : None
  65. [2021/12/26 20:00:08] root INFO: save_model_dir : ./output/table_mv3/
  66. [2021/12/26 20:00:08] root INFO: use_gpu : False
  67. [2021/12/26 20:00:08] root INFO: use_visualdl : False
  68. [2021/12/26 20:00:08] root INFO: Loss :
  69. [2021/12/26 20:00:08] root INFO: loc_weight : 10000.0
  70. [2021/12/26 20:00:08] root INFO: name : TableAttentionLoss
  71. [2021/12/26 20:00:08] root INFO: structure_weight : 100.0
  72. [2021/12/26 20:00:08] root INFO: Metric :
  73. [2021/12/26 20:00:08] root INFO: main_indicator : acc
  74. [2021/12/26 20:00:08] root INFO: name : TableMetric
  75. [2021/12/26 20:00:08] root INFO: Optimizer :
  76. [2021/12/26 20:00:08] root INFO: beta1 : 0.9
  77. [2021/12/26 20:00:08] root INFO: beta2 : 0.999
  78. [2021/12/26 20:00:08] root INFO: clip_norm : 5.0
  79. [2021/12/26 20:00:08] root INFO: lr :
  80. [2021/12/26 20:00:08] root INFO: learning_rate : 0.001
  81. [2021/12/26 20:00:08] root INFO: name : Adam
  82. [2021/12/26 20:00:08] root INFO: regularizer :
  83. [2021/12/26 20:00:08] root INFO: factor : 0.0
  84. [2021/12/26 20:00:08] root INFO: name : L2
  85. [2021/12/26 20:00:08] root INFO: PostProcess :
  86. [2021/12/26 20:00:08] root INFO: name : TableLabelDecode
  87. [2021/12/26 20:00:08] root INFO: Train :
  88. [2021/12/26 20:00:08] root INFO: dataset :
  89. [2021/12/26 20:00:08] root INFO: data_dir : train_data/table/pubtabnet/train/
  90. [2021/12/26 20:00:08] root INFO: label_file_path : train_data/table/pubtabnet/PubTabNet_2.0.0_train.jsonl
  91. [2021/12/26 20:00:08] root INFO: name : PubTabDataSet
  92. [2021/12/26 20:00:08] root INFO: transforms :
  93. [2021/12/26 20:00:08] root INFO: DecodeImage :
  94. [2021/12/26 20:00:08] root INFO: channel_first : False
  95. [2021/12/26 20:00:08] root INFO: img_mode : BGR
  96. [2021/12/26 20:00:08] root INFO: ResizeTableImage :
  97. [2021/12/26 20:00:08] root INFO: max_len : 488
  98. [2021/12/26 20:00:08] root INFO: TableLabelEncode : None
  99. [2021/12/26 20:00:08] root INFO: NormalizeImage :
  100. [2021/12/26 20:00:08] root INFO: mean : [0.485, 0.456, 0.406]
  101. [2021/12/26 20:00:08] root INFO: order : hwc
  102. [2021/12/26 20:00:08] root INFO: scale : 1./255.
  103. [2021/12/26 20:00:08] root INFO: std : [0.229, 0.224, 0.225]
  104. [2021/12/26 20:00:08] root INFO: PaddingTableImage : None
  105. [2021/12/26 20:00:08] root INFO: ToCHWImage : None
  106. [2021/12/26 20:00:08] root INFO: KeepKeys :
  107. [2021/12/26 20:00:08] root INFO: keep_keys : ['image', 'structure', 'bbox_list', 'sp_tokens', 'bbox_list_mask']
  108. [2021/12/26 20:00:08] root INFO: loader :
  109. [2021/12/26 20:00:08] root INFO: batch_size_per_card : 32
  110. [2021/12/26 20:00:08] root INFO: drop_last : True
  111. [2021/12/26 20:00:08] root INFO: num_workers : 1
  112. [2021/12/26 20:00:08] root INFO: shuffle : True
  113. [2021/12/26 20:00:08] root INFO: profiler_options : None
  114. [2021/12/26 20:00:08] root INFO: train with paddle 2.2.1 and device CPUPlace
  115. [2021/12/26 20:00:08] root INFO: Initialize indexs of datasets:train_data/table/pubtabnet/PubTabNet_2.0.0_val.jsonl
  116. [2021/12/26 20:00:08] root INFO: resume from /home/aistudio/PaddleOCR/pre_train/en_ppocr_mobile_v2.0_table_structure_train/best_accuracy
  117. [2021/12/26 20:00:08] root INFO: metric in ckpt ***************
  118. [2021/12/26 20:00:08] root INFO: acc:0.7380142622051563
  119. [2021/12/26 20:00:08] root INFO: fps:8.360272547972942
  120. [2021/12/26 20:00:08] root INFO: best_epoch:7
  121. [2021/12/26 20:00:08] root INFO: start_epoch:8
  122. eval model:: 0%| | 2/9115 [00:07<8:55:26, 3.53s/it]^C
  123. main proc 602 exit, kill process group 576
  124. main proc 576 exit, kill process group 576

3.6 模型预测

训练好模型后,也可以使用保存好的模型,对单张图片或者某个文件夹的图像进行模型推理,观察模型预测效果。

  1. ! python tools/infer_table.py -c configs/table/table_mv3.yml -o Global.checkpoints=/home/aistudio/PaddleOCR/pre_train/en_ppocr_mobile_v2.0_table_structure_train/best_accuracy Global.infer_img=/home/aistudio/1.jpg Global.use_gpu=False
  1. /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  2. import imp
  3. [2021/12/26 20:00:22] root INFO: Architecture :
  4. [2021/12/26 20:00:22] root INFO: Backbone :
  5. [2021/12/26 20:00:22] root INFO: model_name : large
  6. [2021/12/26 20:00:22] root INFO: name : MobileNetV3
  7. [2021/12/26 20:00:22] root INFO: scale : 1.0
  8. [2021/12/26 20:00:22] root INFO: Head :
  9. [2021/12/26 20:00:22] root INFO: hidden_size : 256
  10. [2021/12/26 20:00:22] root INFO: l2_decay : 1e-05
  11. [2021/12/26 20:00:22] root INFO: loc_type : 2
  12. [2021/12/26 20:00:22] root INFO: max_cell_num : 500
  13. [2021/12/26 20:00:22] root INFO: max_elem_length : 800
  14. [2021/12/26 20:00:22] root INFO: max_text_length : 100
  15. [2021/12/26 20:00:22] root INFO: name : TableAttentionHead
  16. [2021/12/26 20:00:22] root INFO: algorithm : TableAttn
  17. [2021/12/26 20:00:22] root INFO: model_type : table
  18. [2021/12/26 20:00:22] root INFO: Eval :
  19. [2021/12/26 20:00:22] root INFO: dataset :
  20. [2021/12/26 20:00:22] root INFO: data_dir : train_data/table/pubtabnet/val/
  21. [2021/12/26 20:00:22] root INFO: label_file_path : train_data/table/pubtabnet/PubTabNet_2.0.0_val.jsonl
  22. [2021/12/26 20:00:22] root INFO: name : PubTabDataSet
  23. [2021/12/26 20:00:22] root INFO: transforms :
  24. [2021/12/26 20:00:22] root INFO: DecodeImage :
  25. [2021/12/26 20:00:22] root INFO: channel_first : False
  26. [2021/12/26 20:00:22] root INFO: img_mode : BGR
  27. [2021/12/26 20:00:22] root INFO: ResizeTableImage :
  28. [2021/12/26 20:00:22] root INFO: max_len : 488
  29. [2021/12/26 20:00:22] root INFO: TableLabelEncode : None
  30. [2021/12/26 20:00:22] root INFO: NormalizeImage :
  31. [2021/12/26 20:00:22] root INFO: mean : [0.485, 0.456, 0.406]
  32. [2021/12/26 20:00:22] root INFO: order : hwc
  33. [2021/12/26 20:00:22] root INFO: scale : 1./255.
  34. [2021/12/26 20:00:22] root INFO: std : [0.229, 0.224, 0.225]
  35. [2021/12/26 20:00:22] root INFO: PaddingTableImage : None
  36. [2021/12/26 20:00:22] root INFO: ToCHWImage : None
  37. [2021/12/26 20:00:22] root INFO: KeepKeys :
  38. [2021/12/26 20:00:22] root INFO: keep_keys : ['image', 'structure', 'bbox_list', 'sp_tokens', 'bbox_list_mask']
  39. [2021/12/26 20:00:22] root INFO: loader :
  40. [2021/12/26 20:00:22] root INFO: batch_size_per_card : 16
  41. [2021/12/26 20:00:22] root INFO: drop_last : False
  42. [2021/12/26 20:00:22] root INFO: num_workers : 1
  43. [2021/12/26 20:00:22] root INFO: shuffle : False
  44. [2021/12/26 20:00:22] root INFO: Global :
  45. [2021/12/26 20:00:22] root INFO: cal_metric_during_train : True
  46. [2021/12/26 20:00:22] root INFO: character_dict_path : ppocr/utils/dict/table_structure_dict.txt
  47. [2021/12/26 20:00:22] root INFO: character_type : en
  48. [2021/12/26 20:00:22] root INFO: checkpoints : /home/aistudio/PaddleOCR/pre_train/en_ppocr_mobile_v2.0_table_structure_train/best_accuracy
  49. [2021/12/26 20:00:22] root INFO: debug : False
  50. [2021/12/26 20:00:22] root INFO: distributed : False
  51. [2021/12/26 20:00:22] root INFO: epoch_num : 400
  52. [2021/12/26 20:00:22] root INFO: eval_batch_step : [0, 400]
  53. [2021/12/26 20:00:22] root INFO: infer_img : /home/aistudio/1.jpg
  54. [2021/12/26 20:00:22] root INFO: infer_mode : False
  55. [2021/12/26 20:00:22] root INFO: log_smooth_window : 20
  56. [2021/12/26 20:00:22] root INFO: max_cell_num : 500
  57. [2021/12/26 20:00:22] root INFO: max_elem_length : 800
  58. [2021/12/26 20:00:22] root INFO: max_text_length : 100
  59. [2021/12/26 20:00:22] root INFO: pretrained_model : None
  60. [2021/12/26 20:00:22] root INFO: print_batch_step : 5
  61. [2021/12/26 20:00:22] root INFO: process_cut_num : 0
  62. [2021/12/26 20:00:22] root INFO: process_total_num : 0
  63. [2021/12/26 20:00:22] root INFO: save_epoch_step : 3
  64. [2021/12/26 20:00:22] root INFO: save_inference_dir : None
  65. [2021/12/26 20:00:22] root INFO: save_model_dir : ./output/table_mv3/
  66. [2021/12/26 20:00:22] root INFO: use_gpu : False
  67. [2021/12/26 20:00:22] root INFO: use_visualdl : False
  68. [2021/12/26 20:00:22] root INFO: Loss :
  69. [2021/12/26 20:00:22] root INFO: loc_weight : 10000.0
  70. [2021/12/26 20:00:22] root INFO: name : TableAttentionLoss
  71. [2021/12/26 20:00:22] root INFO: structure_weight : 100.0
  72. [2021/12/26 20:00:22] root INFO: Metric :
  73. [2021/12/26 20:00:22] root INFO: main_indicator : acc
  74. [2021/12/26 20:00:22] root INFO: name : TableMetric
  75. [2021/12/26 20:00:22] root INFO: Optimizer :
  76. [2021/12/26 20:00:22] root INFO: beta1 : 0.9
  77. [2021/12/26 20:00:22] root INFO: beta2 : 0.999
  78. [2021/12/26 20:00:22] root INFO: clip_norm : 5.0
  79. [2021/12/26 20:00:22] root INFO: lr :
  80. [2021/12/26 20:00:22] root INFO: learning_rate : 0.001
  81. [2021/12/26 20:00:22] root INFO: name : Adam
  82. [2021/12/26 20:00:22] root INFO: regularizer :
  83. [2021/12/26 20:00:22] root INFO: factor : 0.0
  84. [2021/12/26 20:00:22] root INFO: name : L2
  85. [2021/12/26 20:00:22] root INFO: PostProcess :
  86. [2021/12/26 20:00:22] root INFO: name : TableLabelDecode
  87. [2021/12/26 20:00:22] root INFO: Train :
  88. [2021/12/26 20:00:22] root INFO: dataset :
  89. [2021/12/26 20:00:22] root INFO: data_dir : train_data/table/pubtabnet/train/
  90. [2021/12/26 20:00:22] root INFO: label_file_path : train_data/table/pubtabnet/PubTabNet_2.0.0_train.jsonl
  91. [2021/12/26 20:00:22] root INFO: name : PubTabDataSet
  92. [2021/12/26 20:00:22] root INFO: transforms :
  93. [2021/12/26 20:00:22] root INFO: DecodeImage :
  94. [2021/12/26 20:00:22] root INFO: channel_first : False
  95. [2021/12/26 20:00:22] root INFO: img_mode : BGR
  96. [2021/12/26 20:00:22] root INFO: ResizeTableImage :
  97. [2021/12/26 20:00:22] root INFO: max_len : 488
  98. [2021/12/26 20:00:22] root INFO: TableLabelEncode : None
  99. [2021/12/26 20:00:22] root INFO: NormalizeImage :
  100. [2021/12/26 20:00:22] root INFO: mean : [0.485, 0.456, 0.406]
  101. [2021/12/26 20:00:22] root INFO: order : hwc
  102. [2021/12/26 20:00:22] root INFO: scale : 1./255.
  103. [2021/12/26 20:00:22] root INFO: std : [0.229, 0.224, 0.225]
  104. [2021/12/26 20:00:22] root INFO: PaddingTableImage : None
  105. [2021/12/26 20:00:22] root INFO: ToCHWImage : None
  106. [2021/12/26 20:00:22] root INFO: KeepKeys :
  107. [2021/12/26 20:00:22] root INFO: keep_keys : ['image', 'structure', 'bbox_list', 'sp_tokens', 'bbox_list_mask']
  108. [2021/12/26 20:00:22] root INFO: loader :
  109. [2021/12/26 20:00:22] root INFO: batch_size_per_card : 32
  110. [2021/12/26 20:00:22] root INFO: drop_last : True
  111. [2021/12/26 20:00:22] root INFO: num_workers : 1
  112. [2021/12/26 20:00:22] root INFO: shuffle : True
  113. [2021/12/26 20:00:22] root INFO: profiler_options : None
  114. [2021/12/26 20:00:22] root INFO: train with paddle 2.2.1 and device CPUPlace
  115. [2021/12/26 20:00:22] root INFO: resume from /home/aistudio/PaddleOCR/pre_train/en_ppocr_mobile_v2.0_table_structure_train/best_accuracy
  116. [2021/12/26 20:00:22] root INFO: infer_img: /home/aistudio/1.jpg
  117. [2021/12/26 20:00:26] root INFO: result: ['<thead><tr><td></td><td></td><td></td><td></td><td></td></tr></thead><tbody><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr></tbody>'], [[32, 9, 104, 40], [232, 8, 307, 41], [429, 7, 500, 44], [559, 8, 656, 44], [715, 7, 780, 44], [37, 45, 99, 73], [190, 44, 342, 74], [432, 45, 502, 74], [565, 44, 655, 73], [712, 46, 777, 74], [30, 81, 101, 109], [202, 80, 337, 110], [433, 81, 503, 111], [578, 83, 638, 110], [698, 82, 790, 110], [31, 119, 104, 148], [197, 116, 347, 147], [443, 117, 492, 148], [572, 118, 643, 147], [698, 118, 797, 147], [35, 154, 101, 183], [199, 152, 342, 184], [436, 154, 501, 184], [558, 155, 670, 184], [701, 153, 801, 183], [40, 188, 93, 217], [217, 187, 314, 219], [417, 187, 516, 218], [556, 187, 667, 217], [716, 188, 772, 216], [48, 227, 98, 255], [223, 224, 313, 256], [429, 226, 500, 256], [558, 226, 667, 256], [722, 225, 772, 254], [47, 262, 99, 291], [217, 260, 313, 293], [439, 261, 506, 293], [557, 260, 678, 292], [722, 261, 777, 290], [36, 295, 95, 324], [210, 296, 317, 326], [443, 296, 499, 326], [547, 296, 681, 326], [701, 300, 767, 328], [42, 332, 99, 361], [191, 330, 350, 360], [451, 331, 493, 361], [557, 331, 683, 361], [717, 335, 785, 362], [45, 369, 97, 396], [186, 367, 355, 400], [444, 369, 504, 398], [581, 369, 640, 397], [723, 368, 773, 396], [37, 404, 95, 431], [192, 404, 351, 433], [438, 404, 511, 432], [560, 405, 658, 432], [723, 404, 775, 431], [46, 444, 104, 469], [188, 441, 346, 470], [444, 444, 496, 471], [544, 445, 681, 471], [721, 444, 773, 470], [35, 478, 104, 503], [190, 475, 345, 505], [436, 477, 504, 504], [559, 476, 662, 505], [712, 477, 778, 504]]
  118. [2021/12/26 20:00:26] root INFO: success!

4 总结

本节介绍了PaddleOCR中 PP-Structure 表格识别算法的原理,并且介绍了表格结构模型从数据处理到完成训练的过程。

5. 作业

https://aistudio.baidu.com/aistudio/education/objective/28711