Field | Required | Default | Multiple | Notes | Example |
---|---|---|---|---|---|
description | Yes | - | No | Name of the component application. | " |
tesseract-ocr" | ||||
properties | Yes | - | No | Configuration object |
---|
tesseractPath | Yes | - | No | Complete URL where the |
tesseract application is installed | C:\Tesseract-OCR\tesseract | |||
processTimeout | Yes | 600000 | No | Maximum time (in milliseconds) to |
wait for the process | 600000 |
imageDirectory |
Yes |
- | No |
Directory used to store the temporary files generated during OCR | C:\tempDir |
maxSize | Yes |
10mb | No |
Apply image correction only for those images which fall under this size. (i.e., 250kb, 5mb, 1gb) | 10mb | ||||
confidenceThreshold | Yes | 80.0 | No | Minimum confidence value to accept the OCR output | 80.0 |
outputFormat | Yes | jpg | No | Image format of the output | png |
imageType | Yes | bilevel | No | Image color scale of the output | bilevel |
dpi | Yes | 300 | No | Image dots per inch of the output | 300 |
mimeTypeXPath | Yes | /doc/mimeType | No | XPath's expression to get the document Mime type | /doc/normalizedMimeType |
pdfMimeTypes | Yes | - | Yes | Mime type for PDF documents | aspire/pdf |
imageMimeTypes | Yes | - | Yes | Mime type for image documents | aspire/drawing |
startPage | Yes | 0 | No | Page to start processing with OCR. If the value is 0 will start from the first page. | 0 |
endPage | Yes | 20 | No | Last page to process with OCR | 20 |
processThreads | Yes | 8 | No | Max number of threads used by the application | 8 |
processQueue | Yes | 30 | No | Size of application process queue, should be at least 3 times the process threads | 30 |
backoffTime | Yes | 1000 | No | Time (in milliseconds) to wait before trying to add a job to the queue when it is full | 1000 |
debug | No | false | No | Option if you want debug messages enabled. |
false |
NOTE: The The following structure is not ordered by the sections of the component configuration, as found on the Tesseract OCR Components Component - App Bundle page
Code Block | ||||
---|---|---|---|---|
| ||||
{ "descriptiontype": "Elastic Cache Lookupapplication", "properties "appName": { "Tesseract Ocr", "urlappType": "http://localhost:9200tesseract-ocr", "authType "config": "nonecom.accenture.aspire:app-ocr-processor", "index "description": "index_nametesseract-ocr", "idleConnectionTimeoutproperties": 3600000, "maxConnections": 100, "maxConnectionsPerRoute": 10, "connectionTimeout": 15000, "socketTimeout": 15000, "useThrottling": false, "maxRetries": 3, "retryWaitTime": 5000, "cache": true, "eviction": "size", "evictionMaxSize": 1000, "esIndexLookupField": "indexNaame", "sourceLookupField": "myid", "sourceLookupFieldToUpperCase": false, "lookupOutputField": "myidOutput", "debug": false, "size": 1000 } } { "tesseractPath": "C:\\Tesseract-OCR\\tesseract", "processTimeout": 600000, "imageDirectory": "C:\\tempDir", "maxSize": "10mb", "confidenceThreshold": 80, "outputFormat": "png", "imageType": "bilevel", "dpi": 300, "mimeTypeXPath": "/doc/normalizedMimeType", "pdfMimeTypes": "aspire/pdf", "imageMimeTypes": "aspire/drawing", "startPage": 0, "endPage": 20, "processThreads": 8, "processQueue": 30, "backoffTime": 1000, "debug": true } } |
Field | Required | Default | Multiple | Notes | Example |
---|---|---|---|---|---|
description | Yes | - | No | Name of the component application. | " |
tesseract-ocr" | ||||
properties | Yes | - | No | Configuration object |
---|
tesseractPath | Yes | - | No | Complete URL where the |
tesseract application is installed | C:\Tesseract-OCR\tesseract | ||||
processTimeout | Yes | 600000 | No | Maximum time (in milliseconds) to wait for the process | 600000 |
imageDirectory |
Yes | - | No |
Directory used to store the temporary files generated during OCR | C:\tempDir | |
maxSize | Yes |
10mb | No |
Apply image correction only for those images which fall under this size. (i.e., 250kb, 5mb, 1gb) | 10mb | ||||
confidenceThreshold | Yes | 80.0 | No | Minimum confidence value to accept the OCR output | 80.0 |
outputFormat | Yes | jpg | No | Image format of the output | png |
imageType | Yes | bilevel | No | Image color scale of the output | bilevel |
dpi | Yes | 300 | No | Image dots per inch of the output | 300 |
mimeTypeXPath | Yes | /doc/mimeType | No | XPath's expression to get the document Mime type | /doc/normalizedMimeType |
pdfMimeTypes | Yes | - | Yes | Mime type for PDF documents | aspire/pdf |
imageMimeTypes | Yes | - | Yes | Mime type for image documents | aspire/drawing |
startPage | Yes | 0 | No | Page to start processing with OCR. If the value is 0 will start from the first page. | 0 |
endPage | Yes | 20 | No | Last page to process with OCR | 20 |
processThreads | Yes | 8 | No | Max number of threads used by the application | 8 |
processQueue | Yes | 30 | No | Size of application process queue, should be at least 3 times the process threads | 30 |
backoffTime | Yes | 1000 | No | Time (in milliseconds) to wait before trying to add a job to the queue when it is full | 1000 |
debug |
No | false | No | Option if you want debug messages enabled. |
false |
Code Block | ||
---|---|---|
| ||
{ "descriptiontype": "Elastic Cache Lookupapplication", "properties "appName": { "Tesseract Ocr", "urlappType": "http://localhost:9200tesseract-ocr", "authType "config": "nonecom.accenture.aspire:app-ocr-processor", "index "description": "index_nametesseract-ocr", "idleConnectionTimeoutproperties": 3600000, "maxConnections": 100, "maxConnectionsPerRoute": 10, "connectionTimeout": 15000, "socketTimeout": 15000, "useThrottling": true, "maxRetries": 3, "retryWaitTime": 5000, "cache": true, "eviction": "size", "evictionMaxSize": 1000, "esIndexLookupField": "indexNaame", "sourceLookupField": "myid", "sourceLookupFieldToUpperCase": true, "lookupOutputField": "myidOutput", "debug": true, "size": 1000 } }{ "tesseractPath": "C:\\Tesseract-OCR\\tesseract", "processTimeout": 600000, "imageDirectory": "C:\\tempDir", "maxSize": "10mb", "confidenceThreshold": 80, "outputFormat": "png", "imageType": "bilevel", "dpi": 300, "mimeTypeXPath": "/doc/normalizedMimeType", "pdfMimeTypes": "aspire/pdf", "imageMimeTypes": "aspire/drawing", "startPage": 0, "endPage": 20, "processThreads": 8, "processQueue": 30, "backoffTime": 1000, "debug": true } } |