What is this document for?
"gosseract" is a Tesseract-OCR wrapper for Golang, and this document is for an issue reported to "gosseract"
Reproduce the issue
Set up environment for simulating CentOS
# because I'm using MacOS % docker-machine create -d virtualbox goss-test % eval $(docker-machine env goss-test) % docker run --rm -t -i library/centos:centos7 ## # Dive into CentOS ##
try to install Golang itself
[root@2bc64375ee2c /]# yum search go | grep golang [root@2bc64375ee2c /]# yum install -y golang [root@2bc64375ee2c /]# go version go version go1.6.3 linux/amd64 # successfully
try to get gosseract
[root@2bc64375ee2c /]# go get github.com/otiai10/gosseract package github.com/otiai10/gosseract: cannot download, $GOPATH not set. For more details see: go help gopath
OK, set $GOPATH
[root@2bc64375ee2c /]# export GOPATH=/ [root@2bc64375ee2c /]# go get github.com/otiai10/gosseract go: missing Git command. See https://golang.org/s/gogetcmd package github.com/otiai10/gosseract: exec: "git": executable file not found in $PATH
OK, install git
[root@2bc64375ee2c /]# yum install -y git [root@2bc64375ee2c /]# go get github.com/otiai10/gosseract go build github.com/otiai10/gosseract/tesseract: g++: exec: "g++": executable file not found in $PATH
OK, install g++
[root@2bc64375ee2c /]# yum install -y gcc-c++ [root@2bc64375ee2c /]# go get github.com/otiai10/gosseract # github.com/otiai10/gosseract/tesseract src/github.com/otiai10/gosseract/tesseract/tess.cpp:1:31: fatal error: tesseract/baseapi.h: No such file or directory #include <tesseract/baseapi.h> ^ compilation terminated. [root@2bc64375ee2c /]#
OK, the problem is reproduced now
Solution
Install tesseract-ocr
.
for preparation
[root@2bc64375ee2c /]# yum install -y autoconf automake libtool [root@2bc64375ee2c /]# yum install -y libjpeg-devel libpng-devel libtiff-devel zlib-devel [root@2bc64375ee2c /]# wget http://www.leptonica.org/source/leptonica-1.72.tar.gz bash: wget: command not found [root@2bc64375ee2c /]# yum install -y wget
install leptonica
[root@2bc64375ee2c /]# wget http://www.leptonica.org/source/leptonica-1.72.tar.gz [root@2bc64375ee2c /]# tar -xzvf leptonica-1.72.tar.gz [root@2bc64375ee2c /]# cd leptonica-1.72 [root@2bc64375ee2c leptonica-1.72]# ./configure [root@2bc64375ee2c leptonica-1.72]# make [root@2bc64375ee2c leptonica-1.72]# make install # ... ---------------------------------------------------------------------- Libraries have been installed in: /usr/local/lib If you ever happen to want to link against installed libraries in a given directory, LIBDIR, you must either use libtool, and specify the full pathname of the library, or use the `-LLIBDIR` flag during linking and do at least one of the following: - add LIBDIR to the `LD_LIBRARY_PATH` environment variable during execution - add LIBDIR to the `LD_RUN_PATH` environment variable during linking - use the `-Wl,-rpath -Wl,LIBDIR` linker flag - have your system administrator add LIBDIR to `/etc/ld.so.conf` See any operating system documentation about shared libraries for more information, such as the ld(1) and ld.so(8) manual pages. ---------------------------------------------------------------------- # ... [root@2bc64375ee2c leptonica-1.72]# [root@2bc64375ee2c leptonica-1.72]# cd ..
install tesseract
[root@2bc64375ee2c /]# wget https://github.com/tesseract-ocr/tesseract/archive/3.02.02.tar.gz [root@2bc64375ee2c /]# tar -xzvf 3.02.02.tar.gz [root@2bc64375ee2c /]# cd tesseract-3.02.02/ [root@2bc64375ee2c tesseract-3.02.02]# ./autogen.sh [root@2bc64375ee2c tesseract-3.02.02]# ./configure [root@2bc64375ee2c tesseract-3.02.02]# make [root@2bc64375ee2c tesseract-3.02.02]# make install # ... ---------------------------------------------------------------------- Libraries have been installed in: /usr/local/lib If you ever happen to want to link against installed libraries in a given directory, LIBDIR, you must either use libtool, and specify the full pathname of the library, or use the `-LLIBDIR` flag during linking and do at least one of the following: - add LIBDIR to the `LD_LIBRARY_PATH` environment variable during execution - add LIBDIR to the `LD_RUN_PATH` environment variable during linking - use the `-Wl,-rpath -Wl,LIBDIR` linker flag - have your system administrator add LIBDIR to `/etc/ld.so.conf` See any operating system documentation about shared libraries for more information, such as the ld(1) and ld.so(8) manual pages. ---------------------------------------------------------------------- # ...
[root@2bc64375ee2c tesseract-3.02.02]# tesseract --version tesseract 3.02.02 leptonica-1.72 libjpeg 6b (libjpeg-turbo 1.2.90) : libpng 1.5.13 : libtiff 4.0.3 : zlib 1.2.7 [root@2bc64375ee2c tesseract-3.02.02]#
tesseract-ocr
is successfully installed. It's better to locate tessdata of any language, but skip now.
[root@2bc64375ee2c /]# cd $GOPATH/src/github.com/otiai10/gosseract/ [root@2bc64375ee2c gosseract]# go test # github.com/otiai10/gosseract all_test.go:12:2: cannot find package "github.com/otiai10/mint" in any of: /usr/lib/golang/src/github.com/otiai10/mint (from $GOROOT) /src/github.com/otiai10/mint (from $GOPATH) FAIL github.com/otiai10/gosseract [setup failed] [root@2bc64375ee2c gosseract]# go get -u github.com/otiai10/mint [root@2bc64375ee2c gosseract]# go test /tmp/go-build910821104/github.com/otiai10/gosseract/_test/gosseract.test: error while loading shared libraries: liblept.so.4: cannot open shared object file: No such file or directory exit status 127 FAIL github.com/otiai10/gosseract 0.002s [root@2bc64375ee2c gosseract]# # OK, now facing new problem
check files in /usr/local/lib
[root@2bc64375ee2c gosseract]# ls /usr/local/lib/liblept.so.4 /usr/local/lib/liblept.so.4 # Yes, there is
need to add path to shared library
[root@2bc64375ee2c gosseract]# export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
[root@2bc64375ee2c gosseract]# go test Error opening data file /usr/local/share/tessdata/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory. Failed loading language 'eng' Tesseract could not load any languages! Could not initialize tesseract. exit status 1 FAIL github.com/otiai10/gosseract 0.006s
OK, it's almost there. Now we need traineddata for eng
.
[root@2bc64375ee2c gosseract]# wget https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata [root@2bc64375ee2c gosseract]# mv eng.traineddata /usr/local/share/tessdata/ [root@2bc64375ee2c gosseract]# go test PASS ok github.com/otiai10/gosseract 0.278s [root@2bc64375ee2c gosseract]#
Yay!
refs
- Docker run リファレンス — Docker-docs-ja 17.06.Beta ドキュメント
- CentOSへのg++の追加方法 - Reinvention of the Wheel
- git - make: install: Command not found - Stack Overflow
- CentOS Install Tesseract-OCR | alantamproject
- Release Tarball Version 3.02.02 · tesseract-ocr/tesseract · GitHub
- java - Unable to load library 'tesseract': libtesseract.so: cannot open shared object file: No such file or directory - Stack Overflow
- 共有ライブラリをシステムに認識させるには
- 共有ライブラリの追加 - tetsuyai’s blog
- 共有ライブラリへパスを通す | hajichan.net technical version
- tessdata/eng.traineddata at master · tesseract-ocr/tessdata · GitHub
DRYな備忘録として