Yes, in order to configure properly the component properly, we recommend to use a normalize mime type type.
Right now multipage tiff file is not supported, so you need to split multipage tiff in advance of before the OCR process.
We recommend that only have 1 version is installed, because since sometimes the installations are not completed properly complete , as you will see in the next example.
Sometimes you could encounter a You might get an NPE on the OCR process of the OCR, but and if you enable the debug you could found option, you'll find the real issuecause:
Error opening data file <path to Tesseract>/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng' Tesseract couldn't load any languages! Could not initialize tesseract.
This could happens happen if you have multiple Tesseract installations of the tesseract, but you can use two approaches to solve this or :
Either you don't have installed the proper language installed you are trying to use:
Panel | ||||
---|---|---|---|---|
| ||||
Clean installation:
|
Or you might not have the TESSDATA_PREFIX variable correctly defined:
Panel | ||||
---|---|---|---|---|
| ||||
Set properly the TESSDATA_PREFIX environment variable:
|