背景
产品新功能需要用到OCR技术,调研选用了Tess4j,受限于服务器版本,在网络上并没有找到太多部署指南,遂记录一番,愿带给你一片清明。
CentOS部署
CentOS:6.10
步骤(主要参考安装步骤):
# 安装开发工具及Tesseract先决条件yum -y groupinstall "development tools"yum -y install libpng-devel libtiff-devel libjpeg-devel# 安装CentOS Software Collections yum repository 及较新版本的GCCyum -y install centos-release-sclyum -y install devtoolset-7-gcc devtoolset-7-gcc-c++ devtoolset-7-binutils# scl仅临时启用新的gcc,退出shell或重启将恢复原系统版本scl enable devtoolset-7 bash# 使用source长期启用source /opt/rh/devtoolset-7/enable# 安装autoconf请安装尽可能新版本,提示autoconf v2.64或更高cd /usr/src/wget ftp://ftp.gnu.org/gnu/autoconf/autoconf-2.69.tar.gztar xvvfz autoconf-2.69.tar.gzcd autoconf-2.69/./configure --prefix=/usrmakemake install安装autoconf-archivecd /usr/src/wget http://ftpmirror.gnu.org/autoconf-archive/autoconf-archive-2019.01.06.tar.xztar xvvfJ autoconf-archive-2019.01.06.tar.xzcd autoconf-archive-2019.01.06/./configure --prefix=/usrmakemake install# 安装Leptonica,tesseract v4.0.0要求Leptonica v1.77及以上cd /usr/src/wget http://leptonica.org/source/leptonica-1.77.0.tar.gztar xvvfz leptonica-1.77.0.tar.gzcd leptonica-1.77.0/./configure --prefix=/usr/local/makemake install# 依赖安装完毕,安装最新版tesseractcd /usr/src/wget https://github.com/tesseract-ocr/tesseract/archive/4.1.1.tar.gz -O tesseract-4.1.1.tar.gztar xvvfz tesseract-4.1.1.tar.gzcd tesseract-4.1.1export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig./autogen.sh./configure --prefix=/usr/local/ --with-extra-libraries=/usr/local/lib/ --disable-openmpmake install# 检查是否安装成功tesseract --version# tesseract 4.1.1# leptonica-1.77.0# libjpeg 6b (libjpeg-turbo 1.2.1) : libpng 1.2.49 : libtiff 3.9.4 : zlib 1.2.3
常见问题:
- 找不到命令
tesseract: command not found
解决:将/usr/local/bin添加到$PATH中,修改~/.bash_profile,添加 export PATH="$PATH:/usr/local/bin"
- 无法加载库资源文件
Unable to load library ‘tesseract’: Native library (linux-x86-64/libtesseract)
解决:将/usr/local/lib下相关的tesseract和leptonica的library(.so)的文件复制到 /usr/lib
- 缺少环境变量
!strcmp(locale, “C”):Error:Assert failed:in file ../../../src/api/baseapi.cpp
解决:export LC_ALL=C
- 写jpg格式问题,OpenJDK does not have a native JPEG encoder
javax.imageio.IIOException: Invalid argument to native writeImage
解决: new BufferedImage(width, height, BufferedImage.TYPE_3BYTE_BGR);
CentOS:7
步骤:
-
yum-config-manager --add-repo https://download.opensuse.org/repositories/home:/Alexander_Pozdnyakov/CentOS_7/ sudo rpm --import https://build.opensuse.org/projects/home:Alexander_Pozdnyakov/public_key yum update yum install tesseract yum install tesseract-langpack-deu参考:
- Installing Tesseract OCR 4.0 on CentOS 6
- linux tesseract 安装及部署tess4j项目的常见问题
- !strcmp(locale, “C”):Error:Assert failed:in file ../../../src/api/baseapi.cpp, line 191 #105
