You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

FAQs

Is needed any other component before the Tesseract OCR component?

Yes, in order to configure properly the component we recommend to use a normalize mime type 


Is needed any preprocessing before the Tesseract OCR component?

Right now multipage tiff file is not supported, so you need to split multipage tiff in advance of OCR process.

Can I have multiple Tesseract OCR versions installed?

We recommend that only have 1 version installed, because sometimes the installations are not properly complete as you will see in the next example.

Troubleshooting


Problem

Sometimes you could encounter a NPE on the process of the OCR, but if you enable the debug you could found the real issue:

Error opening data file <path to Tesseract>\eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng' Tesseract couldn't load any languages! Could not initialize tesseract.

Solution

This could happens if you have multiple installations of the tesseract, but you can use two approaches to solve this:

Method 1 (Recommended)

Clean installation:

  • Uninstall all the tesseract that you have on your machine
  • Restart your machine
  • Install again the 5.0.2 version and verify you select the English language 
  • Restart again
  • Verify that the TESSDATA_PREFIX is set properly to the tessdata folder in your tesseract installation
Method 2

Set properly the TESSDATA_PREFIX environment variable:

  • If your installation was properly complete, you should have a folder like this (be sure you have the proper languages installed, in this case eng):

  • Set the variable to this folder. 



  • No labels