コンパイルして、共有ライブラリとして読み込まれる.so
ファイルをつくれることを確認したい。APIファイル(.hとか)はReleases · tesseract-ocr/tesseract · GitHubを解凍すれば同梱されてる。ついでに同環境下でそのTesseract-OCRがちゃんと動くことも確認したい。
事前準備: Dockerで雑に使い捨て開発環境つくる個人的なメモ - DRYな備忘録。
参考
ログ
root@f456604ccbed:/# cd
root@f456604ccbed:~# mkdir workspace && cd workspace
root@f456604ccbed:~/workspace#
root@f456604ccbed:~/workspace# wget https://github.com/tesseract-ocr/tesseract/archive/3.04.01.tar.gz
root@f456604ccbed:~/workspace# tar -zxvf 3.04.01.tar.gz
root@f456604ccbed:~/workspace# cd tesseract-3.04.01/
root@f456604ccbed:~/workspace/tesseract-3.04.01#
root@f456604ccbed:~/workspace/tesseract-3.04.01# ./autogen.sh
Running aclocal
./autogen.sh: 60: ./autogen.sh: aclocal: not found
Something went wrong, bailing out!
root@f456604ccbed:~/workspace/tesseract-3.04.01# apt-get install -y autotools-dev
root@f456604ccbed:~/workspace/tesseract-3.04.01# apt-get install -y automake
root@f456604ccbed:~/workspace/tesseract-3.04.01# ./autogen.sh
Running aclocal
Running libtoolize
./autogen.sh: 65: ./autogen.sh: libtoolize: not found
./autogen.sh: 65: ./autogen.sh: glibtoolize: not found
Something went wrong, bailing out!
root@f456604ccbed:~/workspace/tesseract-3.04.01# apt-get install -y build-essential libtool
root@f456604ccbed:~/workspace/tesseract-3.04.01# ./autogen.sh
Running aclocal
Running libtoolize
libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, `config`.
libtoolize: copying file `config/ltmain.sh`
libtoolize: putting macros in AC_CONFIG_MACRO_DIR, `m4`.
libtoolize: copying file `m4/libtool.m4`
libtoolize: copying file `m4/ltoptions.m4`
libtoolize: copying file `m4/ltsugar.m4`
libtoolize: copying file `m4/ltversion.m4`
libtoolize: copying file `m4/lt~obsolete.m4`
Running autoheader
Running automake --add-missing --copy
configure.ac:321: installing 'config/compile'
Running autoconf
All done.
To build the software now, do something like:
$ ./configure [--enable-debug] [...other options]
root@f456604ccbed:~/workspace/tesseract-3.04.01#
autogen.shの成功
root@f456604ccbed:~/workspace/tesseract-3.04.01# ./configure
checking for leptonica... configure: error: leptonica not found
root@f456604ccbed:~/workspace/tesseract-3.04.01# cd ..
root@f456604ccbed:~/workspace# wget https://github.com/DanBloomberg/leptonica/archive/v1.73.tar.gz
root@f456604ccbed:~/workspace# tar -zxvf v1.73.tar.gz
root@f456604ccbed:~/workspace# cd leptonica-1.73/
root@f456604ccbed:~/workspace/leptonica-1.73# ./configure
bash: ./configure: Permission denied
root@f456604ccbed:~/workspace/leptonica-1.73# chmod 755 ./configure
root@f456604ccbed:~/workspace/leptonica-1.73#
root@f456604ccbed:~/workspace/leptonica-1.73# ./configure
root@f456604ccbed:~/workspace/leptonica-1.73# make
root@f456604ccbed:~/workspace/leptonica-1.73# make install
Making install in src
make[1]: Entering directory '/root/workspace/leptonica-1.73/src'
make[2]: Entering directory '/root/workspace/leptonica-1.73/src'
test -z "/usr/local/lib" || /bin/mkdir -p "/usr/local/lib"
----------------------------------------------------------------------
Libraries have been installed in:
/usr/local/lib
If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the `-LLIBDIR`
flag during linking and do at least one of the following:
- add LIBDIR to the `LD_LIBRARY_PATH` environment variable
during execution
- add LIBDIR to the `LD_RUN_PATH` environment variable
during linking
- use the `-Wl,-rpath -Wl,LIBDIR` linker flag
- have your system administrator add LIBDIR to `/etc/ld.so.conf`
See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.
----------------------------------------------------------------------
root@f456604ccbed:~/workspace/leptonica-1.73# ls -l /usr/local/lib/
total 22156
-rw-r--r-- 1 root staff 14116202 Nov 6 20:35 liblept.a
-rwxr-xr-x 1 root staff 943 Nov 6 20:35 liblept.la
lrwxrwxrwx 1 root staff 16 Nov 6 20:35 liblept.so -> liblept.so.5.0.0
lrwxrwxrwx 1 root staff 16 Nov 6 20:35 liblept.so.5 -> liblept.so.5.0.0
-rwxr-xr-x 1 root staff 8559120 Nov 6 20:35 liblept.so.5.0.0
drwxr-sr-x 2 root staff 4096 Nov 6 20:35 pkgconfig
root@f456604ccbed:~/workspace/leptonica-1.73#
leptonicaのコンパイルは完了
root@f456604ccbed:~/workspace/leptonica-1.73# cd ../tesseract-3.04.01/
root@f456604ccbed:~/workspace/tesseract-3.04.01# export LIBLEPT_HEADERSDIR=/root/workspace/leptonica-1.73/src
root@f456604ccbed:~/workspace/tesseract-3.04.01# ./configure
Configuration is done.
You can now build and install tesseract by running:
$ make
$ sudo make install
You can not build training tools because of missing dependency.
Check configure output for details.
training toolsがうんちゃらと言っているものの、tesseractのconfigureは完了
root@f456604ccbed:~/workspace/tesseract-3.04.01# make
root@f456604ccbed:~/workspace/tesseract-3.04.01# make install
----------------------------------------------------------------------
Libraries have been installed in:
/usr/local/lib
If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the `-LLIBDIR`
flag during linking and do at least one of the following:
- add LIBDIR to the `LD_LIBRARY_PATH` environment variable
during execution
- add LIBDIR to the `LD_RUN_PATH` environment variable
during linking
- use the `-Wl,-rpath -Wl,LIBDIR` linker flag
- have your system administrator add LIBDIR to `/etc/ld.so.conf`
See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.
----------------------------------------------------------------------
tesseractのmake, make installも完了。確認する
root@f456604ccbed:~/workspace/tesseract-3.04.01# cd
root@f456604ccbed:~# ls -l /usr/local/lib/
total 137364
-rw-r--r-- 1 root staff 14116202 Nov 6 20:35 liblept.a
-rwxr-xr-x 1 root staff 943 Nov 6 20:35 liblept.la
lrwxrwxrwx 1 root staff 16 Nov 6 20:35 liblept.so -> liblept.so.5.0.0
lrwxrwxrwx 1 root staff 16 Nov 6 20:35 liblept.so.5 -> liblept.so.5.0.0
-rwxr-xr-x 1 root staff 8559120 Nov 6 20:35 liblept.so.5.0.0
-rw-r--r-- 1 root staff 87030250 Nov 6 20:45 libtesseract.a
-rwxr-xr-x 1 root staff 987 Nov 6 20:45 libtesseract.la
lrwxrwxrwx 1 root staff 21 Nov 6 20:45 libtesseract.so -> libtesseract.so.3.0.4
lrwxrwxrwx 1 root staff 21 Nov 6 20:45 libtesseract.so.3 -> libtesseract.so.3.0.4
-rwxr-xr-x 1 root staff 30937064 Nov 6 20:45 libtesseract.so.3.0.4
drwxr-sr-x 2 root staff 4096 Nov 6 20:45 pkgconfig
root@f456604ccbed:~# which tesseract
/usr/local/bin/tesseract
root@f456604ccbed:~#
まあたぶんtraineddataが無いのでtesseractコマンド自体は失敗すると予想される。今回の目的は「OSのパッケージマネージャを使わず、tesseract/leptonicaのヘッダファイルとコンパイル済み.soファイルの入手」だったので、とりあえず目的達成できたと思う。
番外: tesseractコマンドの挙動確認
root@f456604ccbed:~# export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
root@f456604ccbed:~# tesseract --list-langs
Error opening data file /usr/local/share/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract could not load any languages!
Could not initialize tesseract.
root@f456604ccbed:~#
予想通り、eng.traineddataが無いと言われる。
root@f456604ccbed:~# mkdir -p data/tessdata
root@f456604ccbed:~# wget https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata?raw=true
root@f456604ccbed:~# pwd
/root
root@f456604ccbed:~# mv eng.traineddata\?raw\=true /root/data/tessdata/eng.traineddata
root@f456604ccbed:~# export TESSDATA_PREFIX=/root/data
root@f456604ccbed:~# tesseract --list-langs
List of available languages (1):
eng
root@f456604ccbed:~#
traineddataの配置と認識確認できた。
root@f456604ccbed:~# cd
root@f456604ccbed:~# wget https://cloud.githubusercontent.com/assets/931554/20041852/bda107d4-a46f-11e6-8c49-6d022007e445.jpg -O sample.jpg
root@f456604ccbed:~# tesseract sample.jpg stdout
Error in pixReadMemJpeg: function not present
Error in pixReadMem: jpeg: no pix returned
Error during processing.
root@f456604ccbed:~#
むむ。
stackoverflow.com
Leptonicaを入れる前にlibjpegを入れる必要があったっぽい。このへんでもう別コンテナで仕切り直したいな、という気持ちがある。
root@f456604ccbed:~# cd /root/workspace/
root@f456604ccbed:~/workspace# wget https://github.com/LuaDist/libjpeg/archive/8.4.0.tar.gz
root@f456604ccbed:~/workspace# tar -zxvf 8.4.0.tar.gz
root@f456604ccbed:~/workspace# cd libjpeg-8.4.0
root@f456604ccbed:~/workspace/libjpeg-8.4.0# configure
root@f456604ccbed:~/workspace/libjpeg-8.4.0# make
root@f456604ccbed:~/workspace/libjpeg-8.4.0# make install
----------------------------------------------------------------------
Libraries have been installed in:
/usr/local/lib
If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the `-LLIBDIR`
flag during linking and do at least one of the following:
- add LIBDIR to the `LD_LIBRARY_PATH` environment variable
during execution
- add LIBDIR to the `LD_RUN_PATH` environment variable
during linking
- use the `-Wl,-rpath -Wl,LIBDIR` linker flag
- have your system administrator add LIBDIR to `/etc/ld.so.conf`
See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.
----------------------------------------------------------------------
で、もっかいleptonicaのmakeをする
root@f456604ccbed:~# cd /root/workspace/leptonica-1.7
root@f456604ccbed:~/workspace/leptonica-1.73# ./configure
root@f456604ccbed:~/workspace/leptonica-1.73# make
root@f456604ccbed:~/workspace/leptonica-1.73# make install
これでどうや
root@f456604ccbed:~/workspace/leptonica-1.73# cd
root@f456604ccbed:~# tesseract sample.jpg stdout
Error in pixGenHalftoneMask: pix too small: w = 173, h = 64
otiai’lO / gosseract
root@f456604ccbed:~#
これが
よっしゃ!
これで、OSのパッケージマネージャを使わず、make/make installでTesseract-OCRが動く環境を確認できた。
DRYな備忘録として