Field | Required | Default | Multiple | Notes | Example | |||
---|---|---|---|---|---|---|---|---|
description | Yes | - | No | Name of the component application. | "Elastic Cache Lookuptesseract-ocr" | |||
properties | Yes | - | NoConfiguration object | |||||
Server urltesseractPath | Yes | - | No | Complete URL where the feeds will be sent. | http://localhost:9200/bulk_ | |||
Authentication | No | None | Yes | User with the permissions to read from the Elastic index specified. | none, basic, aws | |||
tesseract application is installed | C:\Tesseract-OCR\tesseract | |||||||
processTimeout | Yes | 600000 | No | Maximum time (in milliseconds) to wait for the process | 600000 | |||
imageDirectoryIndex | Yes | - | No | The elastic index to crawl. Index name limitations: 1) Lowercase only. 2) Cannot include \\, \/, ?, \", <, >, |, (space character), ,, # 3) Cannot start with -, _, + 4) | [{"index":"index1"}] | |||
Idle connection timeout | Yes | 3600000 | No | Maximum time (in milliseconds) to keep an idle connection open. | 3600000 | |||
Max connections | Yes | 100 | No | Maximum number of connections to be opened. | 100 | |||
Connections per target | Yes | 10 | No | Maximum number of connections opened for the same target. | 10 | |||
Connection timeout | Yes | 15000 | No | Maximum time (in milliseconds) to wait for the connection. | 15000 | |||
Socket timeout | Yes | 15000 | No | Maximum time (in milliseconds) to wait for a socket response. | 15000 | |||
Directory used to store the temporary files generated during OCR | C:\tempDir | |||||||
maxSize | Yes | 10mb | No | Apply image correction only for those images falls under this size. (i.e. 250kb, 5mb, 1gb) | 10mb | |||
confidenceThreshold | Yes | 80.0 | No | Minimum confidence value to accept the ocr output | 80.0 | |||
outputFormat | Yes | - | No | Image format of the output | png | |||
imageType | Yes | - | No | Image color scale of the output | bilevel | |||
dpi | Yes | 300 | No | Image dots per inch of the output | 300 | |||
mimeTypeXPathThrottling period | Yes | 5000 | No | Time period (in milliseconds) to throttle the connection. | 5000 | |||
Max connections per period | Yes | 500 | No | Maximum number of connections used during the throttling period. | 500 | |||
Maximum retries | Yes | 3 | No | Maximum number of retries for a failed document. | 3 | |||
Retry delay | Yes | 5000 | No | Time (in milliseconds) to wait before a retry. | 5000 | |||
Max number of entries | No | 1000 | No | Max total number of entries to keep in the cache. | 1000 | |||
Max Total Weight (MB) | No | 500 | No | Specifies the maximum weight of entries the cache must contain. | 500 | |||
Time (min) | No | 5 | No | Remove records that have been idle for an amount of time in minutes. | 5 | |||
Index lookup field | Yes | - | No | Elastic index field name for the lookup, | [{"index":"index1"}] | |||
Source lookup field | Yes | - | No | Specify field name from the incoming AspireObject for the lookup. Field availability will be searched first in 'doc' and then in 'doc.connectorSpecific' section. | myid | |||
Uppercase the source lookup field value | No | true | No | Convert the value of the source field into UPPERCASE value. | FALSE | |||
Lookup output field | Yes | - | No | Output fields from the lookup will be placed under this configured object. | myidOutput | |||
Debug | No | false | No | Option if you want debug messages enabled. | FALSE | |||
Hit size | No | 1000 | No | Max mount of hits returned by the cache lookup. If -1 all hits will be returned. | 1000 |
NOTE: The following structure is not ordered by the sections of the component configuration, as found on the Elastic Cache Lookup App Bundle page
Code Block | ||||
---|---|---|---|---|
| ||||
{ "descriptiontype": "Elastic Cache Lookupapplication", "properties "_type": {"application", "appName": "Tesseract Ocr", "urlappType": "http://localhost:9200tesseract-ocr", "authType "config": "nonecom.accenture.aspire:app-ocr-processor", "index "description": "index_nametesseract-ocr", "idleConnectionTimeoutproperties": 3600000, "maxConnections": 100, "maxConnectionsPerRoute": 10, "connectionTimeout": 15000, "socketTimeout": 15000, "useThrottling": false, "maxRetries": 3, "retryWaitTime": 5000, "cache": true, "eviction": "size", "evictionMaxSize": 1000, "esIndexLookupField": "indexNaame", "sourceLookupField": "myid", "sourceLookupFieldToUpperCase": false, "lookupOutputField": "myidOutput", "debug": false, "size": 1000 }{ "tesseractPath": "C:\\Tesseract-OCR\\tesseract", "processTimeout": 600000, "imageDirectory": "C:\\dev\\tempDir", "maxSize": "10mb", "confidenceThreshold": 80, "outputFormat": "png", "imageType": "bilevel", "dpi": 300, "mimeTypeXPath": "/doc/normalizedMimeType", "pdfMimeTypes": "aspire/pdf", "imageMimeTypes": "aspire/drawing", "startPage": 0, "endPage": 20, "processThreads": 8, "processQueue": 30, "backoffTime": 1000, "debug": true } } |
Field | Required | Default | Multiple | Notes | Example |
---|---|---|---|---|---|
description | Yes | - | No | Name of the component application. | "Elastic Cache Lookup" |
properties | Yes | - | No | Configuration object | |
Server url | Yes | - | No | Complete URL where the feeds will be sent. | http://localhost:9200/bulk_ |
Authentication | No | None | Yes | User with the permissions to read from the Elastic index specified. | none, basic, aws |
properties | Yes | - | No | Configuration object | |
Index | Yes | - | No | The elastic index to crawl. Index name limitations: 1) Lowercase only. 2) Cannot include \\, \/, ?, \", <, >, |, (space character), ,, # 3) Cannot start with -, _, + 4) | [{"index":"index1"}] |
Idle connection timeout | Yes | 3600000 | No | Maximum time (in milliseconds) to keep an idle connection open. | 3600000 |
Max connections | Yes | 100 | No | Maximum number of connections to be opened. | 100 |
Connections per target | Yes | 10 | No | Maximum number of connections opened for the same target. | 10 |
Connection timeout | Yes | 15000 | No | Maximum time (in milliseconds) to wait for the connection. | 15000 |
Socket timeout | Yes | 15000 | No | Maximum time (in milliseconds) to wait for a socket response. | 15000 |
Throttling period | Yes | 5000 | No | Time period (in milliseconds) to throttle the connection. | 5000 |
Max connections per period | Yes | 500 | No | Maximum number of connections used during the throttling period. | 500 |
Maximum retries | Yes | 3 | No | Maximum number of retries for a failed document. | 3 |
Retry delay | Yes | 5000 | No | Time (in milliseconds) to wait before a retry. | 5000 |
Max number of entries | No | 1000 | No | Max total number of entries to keep in the cache. | 1000 |
Max Total Weight (MB) | No | 500 | No | Specifies the maximum weight of entries the cache must contain. | 500 |
Time (min) | No | 5 | No | Remove records that have been idle for an amount of time in minutes. | 5 |
Index lookup field | Yes | - | No | Elastic index field name for the lookup, | [{"index":"index1"}] |
Source lookup field | Yes | - | No | Specify field name from the incoming AspireObject for the lookup. Field availability will be searched first in 'doc' and then in 'doc.connectorSpecific' section. | myid |
Uppercase the source lookup field value | No | true | No | Convert the value of the source field into UPPERCASE value. | TRUE |
Lookup output field | Yes | - | No | Output fields from the lookup will be placed under this configured object. | myidOutput |
Debug | No | false | No | Option if you want debug messages enabled. | TRUE |
Hit size | No | 1000 | No | Max mount of hits returned by the cache lookup. If -1 all hits will be returned. | 1000 |
Code Block | ||
---|---|---|
| ||
{ "description": "Elastic Cache Lookuptesseract-ocr", "properties": { "urltesseractPath": "http://localhost:9200", "authType": "none", "index": "index_nameC:\\Tesseract-OCR\\tesseract", "processTimeout": 600000, "imageDirectory": "C:\\dev\\tempDir", "idleConnectionTimeoutmaxSize": 3600000"10mb", "maxConnections "confidenceThreshold": 10080, "maxConnectionsPerRoute "outputFormat": 10"png", "connectionTimeout": 15000, "socketTimeout": 15000, "useThrottling": true, "maxRetries": 3, "retryWaitTime": 5000 "imageType": "bilevel", "dpi": 300, "mimeTypeXPath": "/doc/normalizedMimeType", "cachepdfMimeTypes": true, "eviction"aspire/pdf", "imageMimeTypes": "sizeaspire/drawing", "evictionMaxSize "startPage": 10000, "esIndexLookupFieldendPage": "indexNaame"20, "sourceLookupField "processThreads": "myid"8, "sourceLookupFieldToUpperCase "processQueue": true30, "lookupOutputField "backoffTime": "myidOutput"1000, "debug": true, "size": 1000 } } |