コンパイルして、共有ライブラリとして読み込まれる.so
ファイルをつくれることを確認したい。APIファイル(.hとか)はReleases · tesseract-ocr/tesseract · GitHubを解凍すれば同梱されてる。ついでに同環境下でそのTesseract-OCRがちゃんと動くことも確認したい。
事前準備: Dockerで雑に使い捨て開発環境つくる個人的なメモ - DRYな備忘録。
参考
- GitHub - tesseract-ocr/tesseract: Tesseract Open Source OCR Engine (main repository)
- Compiling · tesseract-ocr/tesseract Wiki · GitHub
- GitHub - DanBloomberg/leptonica: Official github repository for the Leptonica image processing library. See Leptonica.org for more documentation and recent releases. Leptonica is an open source library containing software that is broadly useful for image processing and image analysis applications
- Releases · DanBloomberg/leptonica · GitHub
- .soファイルとは|.so形式|.soフォーマット|soファイル|so形式 − 意味 / 定義 / 解説 / 説明 : IT用語辞典
- package management - How do I install aclocal? - Ask Ubuntu
- debian - autoreconf fails with 'Can't exec "libtoolize"' - Unix & Linux Stack Exchange
- install from source - tesseract-ocr `./configure` triggering Error "leptonica not found" - Ask Ubuntu
- configureの作り方(autotoolsの使い方) - のぴぴのメモ
- ./configureとmake、make installの違い - by shigemk2
- ./configure;make;make installにはどんな意味がある? - ITmedia エンタープライズ
- compilation - configure: error: leptonica library missing (when building tesseract-ocr-3.01 on MinGW) - Stack Overflow
- FAQ · tesseract-ocr/tesseract Wiki · GitHub
- unclear "leptonica not found" message · Issue #215 · tesseract-ocr/tesseract · GitHub
- GitHub - tesseract-ocr/tessdata
- Tesseract Image Issue - Stack Overflow
- GitHub - LuaDist/libjpeg: Independent JPEG Group's JPEG software
ログ
root@f456604ccbed:/# cd root@f456604ccbed:~# mkdir workspace && cd workspace root@f456604ccbed:~/workspace# root@f456604ccbed:~/workspace# wget https://github.com/tesseract-ocr/tesseract/archive/3.04.01.tar.gz root@f456604ccbed:~/workspace# tar -zxvf 3.04.01.tar.gz root@f456604ccbed:~/workspace# cd tesseract-3.04.01/ root@f456604ccbed:~/workspace/tesseract-3.04.01# root@f456604ccbed:~/workspace/tesseract-3.04.01# ./autogen.sh Running aclocal ./autogen.sh: 60: ./autogen.sh: aclocal: not found Something went wrong, bailing out! root@f456604ccbed:~/workspace/tesseract-3.04.01# apt-get install -y autotools-dev root@f456604ccbed:~/workspace/tesseract-3.04.01# apt-get install -y automake root@f456604ccbed:~/workspace/tesseract-3.04.01# ./autogen.sh Running aclocal Running libtoolize ./autogen.sh: 65: ./autogen.sh: libtoolize: not found ./autogen.sh: 65: ./autogen.sh: glibtoolize: not found Something went wrong, bailing out! root@f456604ccbed:~/workspace/tesseract-3.04.01# apt-get install -y build-essential libtool root@f456604ccbed:~/workspace/tesseract-3.04.01# ./autogen.sh Running aclocal Running libtoolize libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, `config`. libtoolize: copying file `config/ltmain.sh` libtoolize: putting macros in AC_CONFIG_MACRO_DIR, `m4`. libtoolize: copying file `m4/libtool.m4` libtoolize: copying file `m4/ltoptions.m4` libtoolize: copying file `m4/ltsugar.m4` libtoolize: copying file `m4/ltversion.m4` libtoolize: copying file `m4/lt~obsolete.m4` Running autoheader Running automake --add-missing --copy configure.ac:321: installing 'config/compile' Running autoconf All done. To build the software now, do something like: $ ./configure [--enable-debug] [...other options] root@f456604ccbed:~/workspace/tesseract-3.04.01#
autogen.shの成功
root@f456604ccbed:~/workspace/tesseract-3.04.01# ./configure # 中略 checking for leptonica... configure: error: leptonica not found root@f456604ccbed:~/workspace/tesseract-3.04.01# cd .. root@f456604ccbed:~/workspace# wget https://github.com/DanBloomberg/leptonica/archive/v1.73.tar.gz root@f456604ccbed:~/workspace# tar -zxvf v1.73.tar.gz root@f456604ccbed:~/workspace# cd leptonica-1.73/ root@f456604ccbed:~/workspace/leptonica-1.73# ./configure bash: ./configure: Permission denied root@f456604ccbed:~/workspace/leptonica-1.73# chmod 755 ./configure root@f456604ccbed:~/workspace/leptonica-1.73# root@f456604ccbed:~/workspace/leptonica-1.73# ./configure root@f456604ccbed:~/workspace/leptonica-1.73# make root@f456604ccbed:~/workspace/leptonica-1.73# make install Making install in src make[1]: Entering directory '/root/workspace/leptonica-1.73/src' make[2]: Entering directory '/root/workspace/leptonica-1.73/src' test -z "/usr/local/lib" || /bin/mkdir -p "/usr/local/lib" # 中略 ---------------------------------------------------------------------- Libraries have been installed in: /usr/local/lib If you ever happen to want to link against installed libraries in a given directory, LIBDIR, you must either use libtool, and specify the full pathname of the library, or use the `-LLIBDIR` flag during linking and do at least one of the following: - add LIBDIR to the `LD_LIBRARY_PATH` environment variable during execution - add LIBDIR to the `LD_RUN_PATH` environment variable during linking - use the `-Wl,-rpath -Wl,LIBDIR` linker flag - have your system administrator add LIBDIR to `/etc/ld.so.conf` See any operating system documentation about shared libraries for more information, such as the ld(1) and ld.so(8) manual pages. ---------------------------------------------------------------------- # 後略 root@f456604ccbed:~/workspace/leptonica-1.73# ls -l /usr/local/lib/ total 22156 -rw-r--r-- 1 root staff 14116202 Nov 6 20:35 liblept.a -rwxr-xr-x 1 root staff 943 Nov 6 20:35 liblept.la lrwxrwxrwx 1 root staff 16 Nov 6 20:35 liblept.so -> liblept.so.5.0.0 lrwxrwxrwx 1 root staff 16 Nov 6 20:35 liblept.so.5 -> liblept.so.5.0.0 -rwxr-xr-x 1 root staff 8559120 Nov 6 20:35 liblept.so.5.0.0 drwxr-sr-x 2 root staff 4096 Nov 6 20:35 pkgconfig root@f456604ccbed:~/workspace/leptonica-1.73#
leptonicaのコンパイルは完了
root@f456604ccbed:~/workspace/leptonica-1.73# cd ../tesseract-3.04.01/ root@f456604ccbed:~/workspace/tesseract-3.04.01# export LIBLEPT_HEADERSDIR=/root/workspace/leptonica-1.73/src root@f456604ccbed:~/workspace/tesseract-3.04.01# ./configure # 中略 Configuration is done. You can now build and install tesseract by running: $ make $ sudo make install You can not build training tools because of missing dependency. Check configure output for details.
training toolsがうんちゃらと言っているものの、tesseractのconfigureは完了
root@f456604ccbed:~/workspace/tesseract-3.04.01# make root@f456604ccbed:~/workspace/tesseract-3.04.01# make install # 中略 ---------------------------------------------------------------------- Libraries have been installed in: /usr/local/lib If you ever happen to want to link against installed libraries in a given directory, LIBDIR, you must either use libtool, and specify the full pathname of the library, or use the `-LLIBDIR` flag during linking and do at least one of the following: - add LIBDIR to the `LD_LIBRARY_PATH` environment variable during execution - add LIBDIR to the `LD_RUN_PATH` environment variable during linking - use the `-Wl,-rpath -Wl,LIBDIR` linker flag - have your system administrator add LIBDIR to `/etc/ld.so.conf` See any operating system documentation about shared libraries for more information, such as the ld(1) and ld.so(8) manual pages. ----------------------------------------------------------------------
tesseractのmake, make installも完了。確認する
root@f456604ccbed:~/workspace/tesseract-3.04.01# cd root@f456604ccbed:~# ls -l /usr/local/lib/ total 137364 -rw-r--r-- 1 root staff 14116202 Nov 6 20:35 liblept.a -rwxr-xr-x 1 root staff 943 Nov 6 20:35 liblept.la lrwxrwxrwx 1 root staff 16 Nov 6 20:35 liblept.so -> liblept.so.5.0.0 lrwxrwxrwx 1 root staff 16 Nov 6 20:35 liblept.so.5 -> liblept.so.5.0.0 -rwxr-xr-x 1 root staff 8559120 Nov 6 20:35 liblept.so.5.0.0 -rw-r--r-- 1 root staff 87030250 Nov 6 20:45 libtesseract.a -rwxr-xr-x 1 root staff 987 Nov 6 20:45 libtesseract.la lrwxrwxrwx 1 root staff 21 Nov 6 20:45 libtesseract.so -> libtesseract.so.3.0.4 lrwxrwxrwx 1 root staff 21 Nov 6 20:45 libtesseract.so.3 -> libtesseract.so.3.0.4 -rwxr-xr-x 1 root staff 30937064 Nov 6 20:45 libtesseract.so.3.0.4 drwxr-sr-x 2 root staff 4096 Nov 6 20:45 pkgconfig root@f456604ccbed:~# which tesseract /usr/local/bin/tesseract root@f456604ccbed:~#
まあたぶんtraineddataが無いのでtesseractコマンド自体は失敗すると予想される。今回の目的は「OSのパッケージマネージャを使わず、tesseract/leptonicaのヘッダファイルとコンパイル済み.soファイルの入手」だったので、とりあえず目的達成できたと思う。
番外: tesseractコマンドの挙動確認
root@f456604ccbed:~# export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib root@f456604ccbed:~# tesseract --list-langs Error opening data file /usr/local/share/tessdata/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory. Failed loading language 'eng' Tesseract could not load any languages! Could not initialize tesseract. root@f456604ccbed:~#
予想通り、eng.traineddataが無いと言われる。
root@f456604ccbed:~# mkdir -p data/tessdata root@f456604ccbed:~# wget https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata?raw=true root@f456604ccbed:~# pwd /root root@f456604ccbed:~# mv eng.traineddata\?raw\=true /root/data/tessdata/eng.traineddata root@f456604ccbed:~# export TESSDATA_PREFIX=/root/data root@f456604ccbed:~# tesseract --list-langs List of available languages (1): eng root@f456604ccbed:~#
traineddataの配置と認識確認できた。
root@f456604ccbed:~# cd root@f456604ccbed:~# wget https://cloud.githubusercontent.com/assets/931554/20041852/bda107d4-a46f-11e6-8c49-6d022007e445.jpg -O sample.jpg root@f456604ccbed:~# tesseract sample.jpg stdout Error in pixReadMemJpeg: function not present Error in pixReadMem: jpeg: no pix returned Error during processing. root@f456604ccbed:~#
むむ。
Leptonicaを入れる前にlibjpegを入れる必要があったっぽい。このへんでもう別コンテナで仕切り直したいな、という気持ちがある。
root@f456604ccbed:~# cd /root/workspace/ root@f456604ccbed:~/workspace# wget https://github.com/LuaDist/libjpeg/archive/8.4.0.tar.gz root@f456604ccbed:~/workspace# tar -zxvf 8.4.0.tar.gz root@f456604ccbed:~/workspace# cd libjpeg-8.4.0 root@f456604ccbed:~/workspace/libjpeg-8.4.0# configure root@f456604ccbed:~/workspace/libjpeg-8.4.0# make root@f456604ccbed:~/workspace/libjpeg-8.4.0# make install # 中略 ---------------------------------------------------------------------- Libraries have been installed in: /usr/local/lib If you ever happen to want to link against installed libraries in a given directory, LIBDIR, you must either use libtool, and specify the full pathname of the library, or use the `-LLIBDIR` flag during linking and do at least one of the following: - add LIBDIR to the `LD_LIBRARY_PATH` environment variable during execution - add LIBDIR to the `LD_RUN_PATH` environment variable during linking - use the `-Wl,-rpath -Wl,LIBDIR` linker flag - have your system administrator add LIBDIR to `/etc/ld.so.conf` See any operating system documentation about shared libraries for more information, such as the ld(1) and ld.so(8) manual pages. ----------------------------------------------------------------------
で、もっかいleptonicaのmakeをする
root@f456604ccbed:~# cd /root/workspace/leptonica-1.7 root@f456604ccbed:~/workspace/leptonica-1.73# ./configure root@f456604ccbed:~/workspace/leptonica-1.73# make root@f456604ccbed:~/workspace/leptonica-1.73# make install
これでどうや
root@f456604ccbed:~/workspace/leptonica-1.73# cd root@f456604ccbed:~# tesseract sample.jpg stdout Error in pixGenHalftoneMask: pix too small: w = 173, h = 64 otiai’lO / gosseract root@f456604ccbed:~#
これが
よっしゃ!
これで、OSのパッケージマネージャを使わず、make/make installでTesseract-OCRが動く環境を確認できた。
DRYな備忘録として