Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

FAQs

Specific

Is needed any other component before the Tesseract OCR component?

Yes, in order to configure properly the component we recommend to use a normalize mime type 

Image AddedImage Added


Is needed any preprocessing before the Tesseract OCR component?

Right now multipage tiff file is not supported, so you need to split multipage tiff in advance of OCR process.

Can I have multiple Tesseract OCR versions installed?

We recommend that only have 1 version installed, because sometimes the installations are not properly complete as you will see in the next example.

Troubleshooting


Problem

Sometimes you could encounter a NPE on the process of the OCR, but if you enable the debug you could found the real issue:

Error opening data file <path to Tesseract>\eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng' Tesseract couldn't load any languages! Could not initialize tesseract.

Solution

This could happens if you have multiple installations of the tesseract, but you can use two approaches to solve this:

Panel
borderStylesolid
titleMethod 1 (Recommended)

Clean installation:

  • Uninstall all the tesseract that you have on your machine
  • Restart your machine
  • Install again the 5.0.2 version and verify you select the English language 
  • Restart again
  • Verify that the TESSDATA_PREFIX is set properly to the tessdata folder in your tesseract installation
Panel
borderStylesolid
titleMethod 2

Set properly the TESSDATA_PREFIX environment variable:

  • If your installation was properly complete, you should have a folder like this (be sure you have the proper languages installed, in this case eng):

  • Set the variable to this folder.