Field | Required | Default | Multiple | Notes | Example |
---|---|---|---|---|---|
description | Yes | - | No | Name of the component application. | "tesseract-ocrElastic Cache Lookup" |
properties | Yes | - | No | Configuration object | |
tesseractPathServer url | Yes | - | No | Complete URL where the tesseract application is installed | C:\Tesseract-OCR\tesseract |
processTimeout | Yes | 600000 | No | Maximum time (in milliseconds) to wait for the process | 600000 |
feeds will be sent. | http://localhost:9200/bulk_ | ||||
Authentication | No | None | Yes | User with the permissions to read from the Elastic index specified. | none, basic, aws |
IndeximageDirectory | Yes | - | No | Directory used to store the temporary files generated during OCR | C:\tempDir |
maxSize | Yes | 10mb | No | Apply image correction only for those images falls under this size. (i.e. 250kb, 5mb, 1gb) | 10mb |
confidenceThreshold | Yes | 80.0 | No | Minimum confidence value to accept the ocr output | 80.0 |
outputFormat | Yes | - | No | Image format of the output | png |
imageType | Yes | - | No | Image color scale of the output | bilevel |
dpi | Yes | 300 | No | Image dots per inch of the output | 300 |
The elastic index to crawl. Index name limitations: 1) Lowercase only. 2) Cannot include \\, \/, ?, \", <, >, |, (space character), ,, # 3) Cannot start with -, _, + 4) | [{"index":"index1"}] | ||||
Idle connection timeout | Yes | 3600000 | No | Maximum time (in milliseconds) to keep an idle connection open. | 3600000 |
Max connections | Yes | 100 | No | Maximum number of connections to be opened. | 100 |
Connections per target | Yes | 10 | No | Maximum number of connections opened for the same target. | 10 |
Connection timeout | Yes | 15000 | No | Maximum time (in milliseconds) to wait for the connection. | 15000 |
Socket timeout | Yes | 15000 | No | Maximum time (in milliseconds) to wait for a socket response. | 15000 |
Throttling periodmimeTypeXPath | Yes | 5000 | No | Time period (in milliseconds) to throttle the connection. | 5000 |
Max connections per period | Yes | 500 | No | Maximum number of connections used during the throttling period. | 500 |
Maximum retries | Yes | 3 | No | Maximum number of retries for a failed document. | 3 |
Retry delay | Yes | 5000 | No | Time (in milliseconds) to wait before a retry. | 5000 |
Max number of entries | No | 1000 | No | Max total number of entries to keep in the cache. | 1000 |
Max Total Weight (MB) | No | 500 | No | Specifies the maximum weight of entries the cache must contain. | 500 |
Time (min) | No | 5 | No | Remove records that have been idle for an amount of time in minutes. | 5 |
Index lookup field | Yes | - | No | Elastic index field name for the lookup, | [{"index":"index1"}] |
Source lookup field | Yes | - | No | Specify field name from the incoming AspireObject for the lookup. Field availability will be searched first in 'doc' and then in 'doc.connectorSpecific' section. | myid |
Uppercase the source lookup field value | No | true | No | Convert the value of the source field into UPPERCASE value. | FALSE |
Lookup output field | Yes | - | No | Output fields from the lookup will be placed under this configured object. | myidOutput |
Debug | No | false | No | Option if you want debug messages enabled. | FALSE |
Hit size | No | 1000 | No | Max mount of hits returned by the cache lookup. If -1 all hits will be returned. | 1000 |
NOTE: The following structure is not ordered by the sections of the component configuration, as found on the Elastic Cache Lookup App Bundle page
Code Block | ||||
---|---|---|---|---|
| ||||
{ "typedescription": "applicationElastic Cache Lookup", "_type "properties": "application", { "appNameurl": "Tesseract Ocrhttp://localhost:9200", "appType "authType": "tesseract-ocrnone", "config "index": "com.accenture.aspire:app-ocr-processorindex_name", "descriptionidleConnectionTimeout": "tesseract-ocr"3600000, "properties "maxConnections": { "tesseractPath": "C:\\Tesseract-OCR\\tesseract", "processTimeout": 600000, "imageDirectory": "C:\\dev\\tempDir", "maxSize": "10mb", "confidenceThreshold": 80, "outputFormat": "png", "imageType": "bilevel", "dpi": 300, "mimeTypeXPath": "/doc/normalizedMimeType", "pdfMimeTypes": "aspire/pdf", "imageMimeTypes": "aspire/drawing", "startPage": 0, "endPage": 20, "processThreads": 8, "processQueue": 30, "backoffTime": 1000, "debug": true } 100, "maxConnectionsPerRoute": 10, "connectionTimeout": 15000, "socketTimeout": 15000, "useThrottling": false, "maxRetries": 3, "retryWaitTime": 5000, "cache": true, "eviction": "size", "evictionMaxSize": 1000, "esIndexLookupField": "indexName", "sourceLookupField": "myid", "sourceLookupFieldToUpperCase": false, "lookupOutputField": "myidOutput", "debug": false, "size": 1000 } } |
Field | Required | Default | Multiple | Notes | Example |
---|---|---|---|---|---|
description | Yes | - | No | Name of the component application. | "Elastic Cache Lookup" |
properties | Yes | - | No | Configuration object | |
Server url | Yes | - | No | Complete URL where the feeds will be sent. | http://localhost:9200/bulk_ |
Authentication | No | None | Yes | User with the permissions to read from the Elastic index specified. | none, basic, aws |
properties | Yes | - | No | Configuration object | |
Index | Yes | - | No | The elastic index to crawl. Index name limitations: 1) Lowercase only. 2) Cannot include \\, \/, ?, \", <, >, |, (space character), ,, # 3) Cannot start with -, _, + 4) | [{"index":"index1"}] |
Idle connection timeout | Yes | 3600000 | No | Maximum time (in milliseconds) to keep an idle connection open. | 3600000 |
Max connections | Yes | 100 | No | Maximum number of connections to be opened. | 100 |
Connections per target | Yes | 10 | No | Maximum number of connections opened for the same target. | 10 |
Connection timeout | Yes | 15000 | No | Maximum time (in milliseconds) to wait for the connection. | 15000 |
Socket timeout | Yes | 15000 | No | Maximum time (in milliseconds) to wait for a socket response. | 15000 |
Throttling period | Yes | 5000 | No | Time period (in milliseconds) to throttle the connection. | 5000 |
Max connections per period | Yes | 500 | No | Maximum number of connections used during the throttling period. | 500 |
Maximum retries | Yes | 3 | No | Maximum number of retries for a failed document. | 3 |
Retry delay | Yes | 5000 | No | Time (in milliseconds) to wait before a retry. | 5000 |
Max number of entries | No | 1000 | No | Max total number of entries to keep in the cache. | 1000 |
Max Total Weight (MB) | No | 500 | No | Specifies the maximum weight of entries the cache must contain. | 500 |
Time (min) | No | 5 | No | Remove records that have been idle for an amount of time in minutes. | 5 |
Index lookup field | Yes | - | No | Elastic index field name for the lookup, | [{"index":"index1"}] |
Source lookup field | Yes | - | No | Specify field name from the incoming AspireObject for the lookup. Field availability will be searched first in 'doc' and then in 'doc.connectorSpecific' section. | myid |
Uppercase the source lookup field value | No | true | No | Convert the value of the source field into UPPERCASE value. | TRUE |
Lookup output field | Yes | - | No | Output fields from the lookup will be placed under this configured object. | myidOutput |
Debug | No | false | No | Option if you want debug messages enabled. | TRUE |
Hit size | No | 1000 | No | Max mount of hits returned by the cache lookup. If -1 all hits will be returned. | 1000 |
Code Block | ||
---|---|---|
| ||
{ "description": "tesseract-ocrElastic Cache Lookup", "properties": { "tesseractPathurl": "C:\\Tesseract-OCR\\tesseract", "processTimeout": 600000, "imageDirectory": "C:\\dev\\tempDirhttp://localhost:9200", "authType": "none", "index": "index_name", "maxSizeidleConnectionTimeout": 3600000, "10mbmaxConnections": 100, "confidenceThreshold "maxConnectionsPerRoute": 8010, "outputFormat": "png", "imageType": "bilevel", "dpi": 300 "connectionTimeout": 15000, "socketTimeout": 15000, "useThrottling": true, "maxRetries": 3, "retryWaitTime": 5000, "mimeTypeXPathcache": "/doc/normalizedMimeType", "pdfMimeTypestrue, "eviction": "aspire/pdfsize", "imageMimeTypes "evictionMaxSize": "aspire/drawing"1000, "startPageesIndexLookupField": 0"indexName", "endPage "sourceLookupField": 20"myid", "processThreads "sourceLookupFieldToUpperCase": 8true, "processQueue "lookupOutputField": 30"myidOutput", "backoffTime "debug": 1000true, "debug": true "size": 1000 } } |