You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Current »

FAQs

Is needed any other component before the Tesseract OCR component?

Yes, to configure the component properly, we recommend to use a normalize mime type.


Is any preprocessing needed before the Tesseract OCR component?

Right now multipage tiff file is not supported, so you need to split multipage tiff before the OCR process.

Can I have multiple Tesseract OCR versions installed?

We recommend that only 1 version is installed, since sometimes the installations are not completed properly, as you will see in the next example.

Troubleshooting

Problem

You might get an NPE on the OCR process and if you enable the debug option, you'll find the real cause:

Error opening data file <path to Tesseract>/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng' Tesseract couldn't load any languages! Could not initialize tesseract.

Solution

This could happen if you have multiple Tesseract installations, but you can use two approaches to solve this:

Either you don't have the proper language installed you are trying to use:

Method 1 (Recommended)

Clean installation:

  • Uninstall all the tesseract programs that you have on your machine
  • Restart your machine
  • Install again the 5.0.2 version and verify you have selected the English language or other language you want to use
  • Restart again
  • Verify that the TESSDATA_PREFIX is set properly to the tessdata folder in your tesseract installation

Or you might not have the TESSDATA_PREFIX variable correctly defined: 

Method 2

Set properly the TESSDATA_PREFIX environment variable:

  • If your installation was completed properly, you should have a folder like this installed: (verify you have the proper languages installed, in this case "eng"):

  • Set the variable to this folder. 



  • No labels