Yes, to configure the component properly, we recommend to use a normalize mime type.
Right now multipage tiff file is not supported, so you need to split multipage tiff before the OCR process.
We recommend that only 1 version is installed, since sometimes the installations are not completed properly, as you will see in the next example.
You might get an NPE on the OCR process and if you enable the debug option, you'll find the real cause:
Error opening data file <path to Tesseract>/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng' Tesseract couldn't load any languages! Could not initialize tesseract.
This could happen if you have multiple Tesseract installations, but you can use two approaches to solve this:
Either you don't have the proper language installed you are trying to use:
Clean installation:
Or you might not have the TESSDATA_PREFIX variable correctly defined:
Set properly the TESSDATA_PREFIX environment variable: