SliTaz Packages

Community Doc Forum Pro Shop Bugs Hg
.
Name tesseract-ocr
Version 5.2.0
Category office
Description The most accurate open source OCR engine available.
Maintainer pascal.bellard​@​slitaz.org
License Apache
Website https://github.com/tesseract-ocr/tesseract
Sizes
Depends ongcc83-lib-base icongcc83-lib-base  giflib icongiflib  jpeg iconjpeg  leptonica iconleptonica  libpng iconlibpng  tiff icontiff 
tesseract-ocr screenshot
Download package tesseract-ocr screenshot
Show receipt
Show files list
Show cooking log

Description

This package contains an OCR engine - libtesseract and a command line program - tesseract.

Tesseract 4 adds a new neural net (LSTM) based OCR engine which is focused on line recognition, but also still supports the legacy Tesseract OCR engine of Tesseract 3 which works by recognizing character patterns. Compatibility with Tesseract 3 is enabled by using the Legacy OCR Engine mode (--oem 0). It also needs traineddata files which support the legacy engine, for example those from the tessdata repository.

The lead developer is Ray Smith. The maintainer is Zdenko Podobny. For a list of contributors see AUTHORS and GitHub's log of contributors.

Tesseract has unicode (UTF-8) support, and can recognize more than 100 languages "out of the box".

Tesseract supports various output formats: plain text, hOCR (HTML), PDF, invisible-text-only PDF, TSV. The main branch also has experimental support for ALTO (XML) output.

You should note that in many cases, in order to get better OCR results, you'll need to improve the quality of the image you are giving Tesseract.

6025 packages and 203154 files in current database (Thu Apr 25 06:17:47 2024)