The Tesseract OCR application will perform an OCR operation on an image file using the open-source tool known as Tesseract

Features


Some features of the Tesseract OCR application include:

  • Multiple language support.
  • Several page segmentation modes.
  • Multiple image creation color scales and formats.

Limitations 


Since the Tesseract OCR application is a third-party tool that needs to be set up separately from Aspire, it has the following limitations as per the API:

  • It must be installed separately.
  • Before using a Tesseract feature, it must be properly installed.
    • For example: OCR for other languages as French, Spanish among others.
  • While performing OCR in a file, the order where the languages are provided will affect the output.
    • See the documentation here.
    • List of all available languages, see here. 
  • Multipage tiff files are not supported right now.

Future Development Plan  


  • Add multipage tiff support


Is there anything we should add? Please let us know.

  • No labels