背景

产品新功能需要用到OCR技术,调研选用了Tess4j,受限于服务器版本,在网络上并没有找到太多部署指南,遂记录一番,愿带给你一片清明。

CentOS部署

CentOS:6.10

步骤(主要参考安装步骤):

  1. # 安装开发工具及Tesseract先决条件
  2. yum -y groupinstall "development tools"
  3. yum -y install libpng-devel libtiff-devel libjpeg-devel
  4. # 安装CentOS Software Collections yum repository 及较新版本的GCC
  5. yum -y install centos-release-scl
  6. yum -y install devtoolset-7-gcc devtoolset-7-gcc-c++ devtoolset-7-binutils
  7. # scl仅临时启用新的gcc,退出shell或重启将恢复原系统版本
  8. scl enable devtoolset-7 bash
  9. # 使用source长期启用
  10. source /opt/rh/devtoolset-7/enable
  11. # 安装autoconf请安装尽可能新版本,提示autoconf v2.64或更高
  12. cd /usr/src/
  13. wget ftp://ftp.gnu.org/gnu/autoconf/autoconf-2.69.tar.gz
  14. tar xvvfz autoconf-2.69.tar.gz
  15. cd autoconf-2.69/
  16. ./configure --prefix=/usr
  17. make
  18. make install
  19. 安装autoconf-archive
  20. cd /usr/src/
  21. wget http://ftpmirror.gnu.org/autoconf-archive/autoconf-archive-2019.01.06.tar.xz
  22. tar xvvfJ autoconf-archive-2019.01.06.tar.xz
  23. cd autoconf-archive-2019.01.06/
  24. ./configure --prefix=/usr
  25. make
  26. make install
  27. # 安装Leptonica,tesseract v4.0.0要求Leptonica v1.77及以上
  28. cd /usr/src/
  29. wget http://leptonica.org/source/leptonica-1.77.0.tar.gz
  30. tar xvvfz leptonica-1.77.0.tar.gz
  31. cd leptonica-1.77.0/
  32. ./configure --prefix=/usr/local/
  33. make
  34. make install
  35. # 依赖安装完毕,安装最新版tesseract
  36. cd /usr/src/
  37. wget https://github.com/tesseract-ocr/tesseract/archive/4.1.1.tar.gz -O tesseract-4.1.1.tar.gz
  38. tar xvvfz tesseract-4.1.1.tar.gz
  39. cd tesseract-4.1.1
  40. export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
  41. ./autogen.sh
  42. ./configure --prefix=/usr/local/ --with-extra-libraries=/usr/local/lib/ --disable-openmp
  43. make install
  44. # 检查是否安装成功
  45. tesseract --version
  46. # tesseract 4.1.1
  47. # leptonica-1.77.0
  48. # libjpeg 6b (libjpeg-turbo 1.2.1) : libpng 1.2.49 : libtiff 3.9.4 : zlib 1.2.3

常见问题:

  • 找不到命令

    tesseract: command not found

解决:将/usr/local/bin添加到$PATH中,修改~/.bash_profile,添加 export PATH="$PATH:/usr/local/bin"

  • 无法加载库资源文件

    Unable to load library ‘tesseract’: Native library (linux-x86-64/libtesseract)

解决:将/usr/local/lib下相关的tesseract和leptonica的library(.so)的文件复制到 /usr/lib

  • 缺少环境变量

    !strcmp(locale, “C”):Error:Assert failed:in file ../../../src/api/baseapi.cpp

解决:export LC_ALL=C

  • 写jpg格式问题,OpenJDK does not have a native JPEG encoder

    javax.imageio.IIOException: Invalid argument to native writeImage

解决: new BufferedImage(width, height, BufferedImage.TYPE_3BYTE_BGR);

CentOS:7

步骤: