This section lists all configuration parameters available to configure the Elastic Cache Lookup componentTesseract OCR component.
Element | Type | Default | Description |
---|
OCR Settings |
tesseractPath | text | - |
Tesseract binary location | |||
processTimeout | number | 600000 | Time (in milliseconds) to |
wait before killing a tesseract process | ||||
imageDirectory | text | - | Directory used to store the temporary files generated during OCR | |
maxSize | text | 10mb | Apply image correction only for those images that fall under this size. (i.e., 250kb, 5mb, 1gb) | |
confidenceThreshold | number | 80.0 | Minimum confidence value to accept the OCR output | |
Image creation settings | outputFormat | select | jpg | Image format (JPG, PNG, tiff) |
imageType | select | bilevel | Image color scale (bilevel, gray, rgba, rgb) | |
dpi | number | 300 | Image dots per inch | |
Mime Type settings | mimeTypeXPath | text | /doc/mimeType | XPath's expression to get the document Mime type |
pdfMimeTypes | array | - | Mime type for PDF documents | |
imageMimeTypes | array | - | Mime type for image documents | |
Page splitter settings | startPage | number | 0 | Page to start processing with OCR. If the value is 0 will start from the first page. |
endPage | number | 20 | Last page to process with OCR | |
Advanced settings | processThreads | number | 8 | Max number of threads used by the application |
processQueue | number | 30 | Size of application process queue, should be at least 3 times the process threads | |
backoffTime | number | 1000 | Time (in milliseconds) to wait before trying to add a job to the queue when it is full | |
debug | boolean | false | Check if you want debug messages enabled |
Code Block |
---|
{
"tesseractPath": "C:\\Tesseract-OCR\\tesseract",
"processTimeout": 600000,
"imageDirectory": "C:\\dev\\tempDir",
"maxSize": "10mb",
"confidenceThreshold": 80,
"outputFormat": "png",
"imageType": "bilevel",
"dpi": 300,
"mimeTypeXPath": "/doc/normalizedMimeType",
"pdfMimeTypes": "aspire/pdf",
"imageMimeTypes": "aspire/drawing",
"startPage": 0,
"endPage": 20,
"processThreads": 8,
"processQueue": 30,
"backoffTime": 1000,
"debug": true
} |
Code Block |
"Elasticsearch Settings":[
{
"url": "http://localhost:9200",
"authType": "none",
"index": "index_name"
}
],
"Connection Settings":[
{
"idleConnectionTimeout": 3600000,
"maxConnections": 100,
"maxConnectionsPerRoute": 10,
"connectionTimeout": 15000,
"socketTimeout": 15000,
"useThrottling": false,
"maxRetries": 3,
"retryWaitTime": 5000
}
],
"Cache": [
{
"cache": true,
"eviction": "size",
"evictionMaxSize": 1000
}
],
"Lookup Fields": [
{
"esIndexLookupField": "indexNaame",
"sourceLookupField": "myid",
"sourceLookupFieldToUpperCase": true,
"lookupOutputField": "myidOutput",
"debug": false,
"size": 1000
}
] |