DRYな備忘録

Don't Repeat Yourself.

How to install gosseract to CentOS 7

What is this document for?

"gosseract" is a Tesseract-OCR wrapper for Golang, and this document is for an issue reported to "gosseract"

github.com

Reproduce the issue

Set up environment for simulating CentOS

# because I'm using MacOS
% docker-machine create -d virtualbox goss-test
% eval $(docker-machine env goss-test)
% docker run --rm -t -i library/centos:centos7

##
# Dive into CentOS
##

try to install Golang itself

[root@2bc64375ee2c /]# yum search go | grep golang
[root@2bc64375ee2c /]# yum install -y golang
[root@2bc64375ee2c /]# go version
go version go1.6.3 linux/amd64
# successfully

try to get gosseract

[root@2bc64375ee2c /]# go get github.com/otiai10/gosseract
package github.com/otiai10/gosseract: cannot download, $GOPATH not set. For more details see: go help gopath

OK, set $GOPATH

[root@2bc64375ee2c /]# export GOPATH=/

[root@2bc64375ee2c /]# go get github.com/otiai10/gosseract
go: missing Git command. See https://golang.org/s/gogetcmd
package github.com/otiai10/gosseract: exec: "git": executable file not found in $PATH

OK, install git

[root@2bc64375ee2c /]# yum install -y git

[root@2bc64375ee2c /]# go get github.com/otiai10/gosseract
go build github.com/otiai10/gosseract/tesseract: g++: exec: "g++": executable file not found in $PATH

OK, install g++

[root@2bc64375ee2c /]# yum install -y gcc-c++
[root@2bc64375ee2c /]# go get github.com/otiai10/gosseract
# github.com/otiai10/gosseract/tesseract
src/github.com/otiai10/gosseract/tesseract/tess.cpp:1:31: fatal error: tesseract/baseapi.h: No such file or directory
 #include <tesseract/baseapi.h>
                               ^
compilation terminated.
[root@2bc64375ee2c /]#

OK, the problem is reproduced now

Solution

Install tesseract-ocr.

for preparation

[root@2bc64375ee2c /]# yum install -y autoconf automake libtool
[root@2bc64375ee2c /]# yum install -y libjpeg-devel libpng-devel libtiff-devel zlib-devel
[root@2bc64375ee2c /]# wget http://www.leptonica.org/source/leptonica-1.72.tar.gz
bash: wget: command not found
[root@2bc64375ee2c /]# yum install -y wget

install leptonica

[root@2bc64375ee2c /]# wget http://www.leptonica.org/source/leptonica-1.72.tar.gz
[root@2bc64375ee2c /]# tar -xzvf leptonica-1.72.tar.gz
[root@2bc64375ee2c /]# cd leptonica-1.72
[root@2bc64375ee2c leptonica-1.72]# ./configure
[root@2bc64375ee2c leptonica-1.72]# make
[root@2bc64375ee2c leptonica-1.72]# make install

# ...
----------------------------------------------------------------------
Libraries have been installed in:
   /usr/local/lib

If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the `-LLIBDIR`
flag during linking and do at least one of the following:
   - add LIBDIR to the `LD_LIBRARY_PATH` environment variable
     during execution
   - add LIBDIR to the `LD_RUN_PATH` environment variable
     during linking
   - use the `-Wl,-rpath -Wl,LIBDIR` linker flag
   - have your system administrator add LIBDIR to `/etc/ld.so.conf`

See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.
----------------------------------------------------------------------
# ...

[root@2bc64375ee2c leptonica-1.72]#
[root@2bc64375ee2c leptonica-1.72]# cd ..

install tesseract

[root@2bc64375ee2c /]# wget https://github.com/tesseract-ocr/tesseract/archive/3.02.02.tar.gz
[root@2bc64375ee2c /]# tar -xzvf 3.02.02.tar.gz
[root@2bc64375ee2c /]# cd tesseract-3.02.02/
[root@2bc64375ee2c tesseract-3.02.02]# ./autogen.sh
[root@2bc64375ee2c tesseract-3.02.02]# ./configure
[root@2bc64375ee2c tesseract-3.02.02]# make
[root@2bc64375ee2c tesseract-3.02.02]# make install

# ...
----------------------------------------------------------------------
Libraries have been installed in:
   /usr/local/lib

If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the `-LLIBDIR`
flag during linking and do at least one of the following:
   - add LIBDIR to the `LD_LIBRARY_PATH` environment variable
     during execution
   - add LIBDIR to the `LD_RUN_PATH` environment variable
     during linking
   - use the `-Wl,-rpath -Wl,LIBDIR` linker flag
   - have your system administrator add LIBDIR to `/etc/ld.so.conf`

See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.
----------------------------------------------------------------------
# ...
[root@2bc64375ee2c tesseract-3.02.02]# tesseract --version
tesseract 3.02.02
 leptonica-1.72
  libjpeg 6b (libjpeg-turbo 1.2.90) : libpng 1.5.13 : libtiff 4.0.3 : zlib 1.2.7

[root@2bc64375ee2c tesseract-3.02.02]#

tesseract-ocr is successfully installed. It's better to locate tessdata of any language, but skip now.

[root@2bc64375ee2c /]# cd $GOPATH/src/github.com/otiai10/gosseract/
[root@2bc64375ee2c gosseract]# go test
# github.com/otiai10/gosseract
all_test.go:12:2: cannot find package "github.com/otiai10/mint" in any of:
    /usr/lib/golang/src/github.com/otiai10/mint (from $GOROOT)
    /src/github.com/otiai10/mint (from $GOPATH)
FAIL    github.com/otiai10/gosseract [setup failed]
[root@2bc64375ee2c gosseract]# go get -u github.com/otiai10/mint
[root@2bc64375ee2c gosseract]# go test
/tmp/go-build910821104/github.com/otiai10/gosseract/_test/gosseract.test: error while loading shared libraries: liblept.so.4: cannot open shared object file: No such file or directory
exit status 127
FAIL    github.com/otiai10/gosseract    0.002s
[root@2bc64375ee2c gosseract]#

# OK, now facing new problem

check files in /usr/local/lib

[root@2bc64375ee2c gosseract]# ls /usr/local/lib/liblept.so.4
/usr/local/lib/liblept.so.4

# Yes, there is

need to add path to shared library

[root@2bc64375ee2c gosseract]# export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
[root@2bc64375ee2c gosseract]# go test
Error opening data file /usr/local/share/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract could not load any languages!
Could not initialize tesseract.
exit status 1
FAIL    github.com/otiai10/gosseract    0.006s

OK, it's almost there. Now we need traineddata for eng.

[root@2bc64375ee2c gosseract]# wget https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata
[root@2bc64375ee2c gosseract]# mv eng.traineddata /usr/local/share/tessdata/
[root@2bc64375ee2c gosseract]# go test
PASS
ok      github.com/otiai10/gosseract    0.278s
[root@2bc64375ee2c gosseract]#

Yay!

refs

DRYな備忘録として