Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Easy Heading Free
navigationTitleOn this Page
navigationExpandOptionexpand-all-by-default

The Elastic CacheLookup joins the information extracted from an Elasticsearch index and generates it as a subjob of an Aspire document.Tesseract OCR application will perform an OCR operation on an image file using the open-source tool known as Tesseract

Features


Some features  features of the  Elastic Cache Lookup component include:

  • Connector-independence.
  • Runs from any machine with access to Elasticsearch.

Content Retrieved

Tesseract OCR application include:

  • Multiple language support.
  • Several page segmentation modes.
  • Multiple image creation color scales and formats.

Limitations 


Since the Tesseract OCR application is a third-party tool that needs to be set up separately from Aspire, it has the following limitations as per the API:

  • It must be installed separately.
  • Before using a Tesseract feature, it must be properly installed.
    • For example: OCR for other languages as French, Spanish among others.
  • While performing OCR in a file, the order where the languages are provided will affect the output.
    • See the documentation here.
    • List of all available languages, see here. 
  • Multipage tiff files are not supported right now.

Future Development Plan  


  • Add multipage tiff support


Is there anything we should add? Please let us knowExtracts everything that is inside the Elasticsearch index "_source" field.