The Elastic CacheLookup joins the information extracted from an Elasticsearch index and generates it as a subjob of an Aspire document.Tesseract OCR application will perform an OCR operation on an image file using the open-source tool known as Tesseract.

Features

Some features features of the Elastic Cache Lookup component include:

Connector-independence.
Runs from any machine with access to Elasticsearch.

Content Retrieved

Tesseract OCR application include:

Multiple language support.
Several page segmentation modes.
Multiple image creation color scales and formats.

Limitations

Since the Tesseract OCR application is a third-party tool that needs to be set up separately from Aspire, it has the following limitations as per the API:

It must be installed separately.
Before using a Tesseract feature, it must be properly installed.
- For example: OCR for other languages as French, Spanish among others.
While performing OCR in a file, the order where the languages are provided will affect the output.
- See the documentation here.
- List of all available languages, see here.
Multipage tiff files are not supported right now.

Future Development Plan

Add multipage tiff support

Is there anything we should add? Please let us knowExtracts everything that is inside the Elasticsearch index "_source" field.

Page tree

Versions Compared

Old Version 1

New Version Current

Key

Features

Limitations

Future Development Plan

Page tree

Page History

Versions Compared

Old Version 1

New Version Current

Key

Features

Limitations

Future Development Plan