This section lists all configuration parameters available to configure the Tesseract OCR component.
Element | Type | Default | Description | |
---|---|---|---|---|
OCR Settings | tesseractPath | text | - | Tesseract binary location |
processTimeout | number | 600000 | Time (in milliseconds) to wait before killing a tesseract process | |
imageDirectory | text | - | Directory used to store the temporary files generated during OCR | |
maxSize | text | 10mb | Apply image correction only for those images |
that fall under this size. (i.e., 250kb, 5mb, 1gb) | |||
confidenceThreshold | number | 80.0 | Minimum confidence value to accept the |
OCR output | ||||
Image creation settings | outputFormat | select | jpg | Image format ( |
JPG, |
PNG, tiff) | |||
imageType | select | bilevel | Image color scale (bilevel, gray, rgba, rgb) |
dpi | number | 300 | Image dots per inch |
Mime Type settings | mimeTypeXPath | text | /doc/mimeType |
XPath's expression to get the document Mime type | ||||
pdfMimeTypes | array | - | Mime type for PDF documents | |
imageMimeTypes | array | - | Mime type for image documents | |
Page splitter settings | startPage | number | 0 | Page to start processing with OCR. If the value is 0 will start from the first page. |
endPage | number | 20 | Last page to process with OCR | |
Advanced settings | processThreads | number | 8 | Max number of threads used by the application |
processQueue | number | 30 | Size of application process queue, should be at least 3 times the process threads | |
backoffTime | number | 1000 | Time (in milliseconds) to wait before trying to add a job to the queue when it is full | |
debug | boolean | false | Check if you want debug messages enabled |
Code Block |
---|
{ "tesseractPath": "C:\\Tesseract-OCR\\tesseract", "processTimeout": 600000, "imageDirectory": "C:\\dev\\tempDir", "maxSize": "10mb", "confidenceThreshold": 80, "outputFormat": "png", "imageType": "bilevel", "dpi": 300, "mimeTypeXPath": "/doc/normalizedMimeType", "pdfMimeTypes": "aspire/pdf", "imageMimeTypes": "aspire/drawing", "startPage": 0, "endPage": 20, "processThreads": 8, "processQueue": 30, "backoffTime": 1000, "debug": true } |