FAQs

Is needed any other component before the Tesseract OCR component?

Yes, in order to configure properly the component we recommend to use a normalize mime type

Is needed any preprocessing before the Tesseract OCR component?

Right now multipage tiff file is not supported, so you need to split multipage tiff in advance of OCR process.

Can I have multiple Tesseract OCR versions installed?

We recommend that only have 1 version installed, because sometimes the installations are not properly complete as you will see in the next example.

Troubleshooting

Problem

Sometimes you could encounter a NPE on the process of the OCR, but if you enable the debug you could found the real issue:

Error opening data file <path to Tesseract>\eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng' Tesseract couldn't load any languages! Could not initialize tesseract.

Solution

This could happens if you have multiple installations of the tesseract, but you can use two approaches to solve this:

Method 1 (Recommended)

Clean installation:

Uninstall all the tesseract that you have on your machine
Restart your machine
Install again the 5.0.2 version and verify you select the English language
Restart again
Verify that the TESSDATA_PREFIX is set properly to the tessdata folder in your tesseract installation

Method 2

Set properly the TESSDATA_PREFIX environment variable:

If your installation was properly complete, you should have a folder like this (be sure you have the proper languages installed, in this case eng):

Set the variable to this folder.

Page tree

FAQs

Is needed any other component before the Tesseract OCR component?

Is needed any preprocessing before the Tesseract OCR component?

Can I have multiple Tesseract OCR versions installed?

Troubleshooting

Problem

Solution

Contact Us: [email protected]

Page tree

Tesseract OCR - Troubleshooting

FAQs

Is needed any other component before the Tesseract OCR component?

Is needed any preprocessing before the Tesseract OCR component?

Can I have multiple Tesseract OCR versions installed?

Troubleshooting

Problem

Solution

Contact Us: [email protected]