The Elastic Cache Lookup Tesseract Ocr component can be configured using the Aspire workflow section. It requires the following entities to be created

Below are the examples of how to configure the component. 

Create Workflow

NOTE: Some options in the following table collapse or are displayed only when selecting other options, such as a checkbox or selects.





descriptionYes-NoName of the component application.


Elastic Cache Lookup


propertiesYes-NoConfiguration object
Server url

tesseractPathYes-NoComplete URL where the
Index
tesseract application is installedC:\Tesseract-OCR\tesseract
processTimeoutYes600000NoMaximum time (in milliseconds) to
keep an idle connection open.
wait for the process600000
3600000Max connections
Maximum number of connections to be opened.100Connections per target
Directory used to store the temporary files generated during OCRC:\tempDir
Directory used to store the temporary files generated during OCR
Apply image correction only for those images falls under this size. (i.e. 250kb, 5mb, 1gb)10mb
confidenceThresholdYes80.0NoMinimum confidence value to accept the ocr output80.0
outputFormatYesjpgNoImage format of the outputpng
NoImage color scale of the outputbilevel
dpiYes300NoImage dots per inch of the output300
mimeTypeXPathYes/doc/mimeTypeNoXpath expression to get the document Mime type/doc/normalizedMimeType
pdfMimeTypesYes-YesMime type for PDF documentsaspire/pdf
imageMimeTypesYes-YesMime type for image documentsaspire/drawing
startPageYes0NoPage to start processing with OCR. If value is 0 will start from the first page0
endPageYes20NoLast page to process with OCR20
processThreadsYes8NoMax number of threads used by the application8
processQueueYes30NoSize of application process queue, should be at least 3 times the process threads30
backoffTimeYes1000NoTime (in milliseconds) to wait before trying to add a job to the queue when it is full1000
debugNofalseNoOption if you want debug messages enabled.
FALSEHit sizeNo1000NoMax mount of hits returned by the cache lookup. If -1 all hits will be returned.1000


NOTE: The following structure is not ordered by the sections of the component configuration, as found on the OCR Components App Bundle page

Code Block
	"descriptiontype": "Elastic Cache Lookupapplication",
    "properties	"appName": {
	 "Tesseract Ocr",
	"urlappType": "http://localhost:9200tesseract-ocr",
        "authType	"config": "",
        "index	"description": "index_nametesseract-ocr",
		"idleConnectionTimeoutproperties": 3600000,
        "maxConnections": 100,
        "maxConnectionsPerRoute": 10,
        "connectionTimeout": 15000,
        "socketTimeout": 15000,
        "useThrottling": false,
        "maxRetries": 3,
        "retryWaitTime": 5000,
		"cache": true,
        "eviction": "size",
        "evictionMaxSize": 1000,
		"esIndexLookupField": "indexNaame",
        "sourceLookupField": "myid",
        "sourceLookupFieldToUpperCase": false,
        "lookupOutputField": "myidOutput",
        "debug": false,
        "size": 1000       

		"tesseractPath": "C:\\Tesseract-OCR\\tesseract",
		"processTimeout": 600000,
		"imageDirectory": "C:\\tempDir",
		"maxSize": "10mb",
		"confidenceThreshold": 80,
		"outputFormat": "png",
		"imageType": "bilevel",
		"dpi": 300,
		"mimeTypeXPath": "/doc/normalizedMimeType",
		"pdfMimeTypes": "aspire/pdf",
		"imageMimeTypes": "aspire/drawing",
		"startPage": 0,
		"endPage": 20,
		"processThreads": 8,
		"processQueue": 30,
		"backoffTime": 1000,
		"debug": true

Code Block
	"descriptiontype": "Elastic Cache Lookupapplication",
    "properties	"appName": { "Tesseract Ocr",
	 	"urlappType": "http://localhost:9200tesseract-ocr",
        "authType	"config": "",
        "index	"description": "index_nametesseract-ocr",
		"idleConnectionTimeoutproperties": 3600000,
        "maxConnections": 100,
        "maxConnectionsPerRoute": 10,
        "connectionTimeout": 15000,
        "socketTimeout": 15000,
        "useThrottling": true,
        "maxRetries": 3,
        "retryWaitTime": 5000,
		"cache": true,
        "eviction": "size",
        "evictionMaxSize": 1000,
		"esIndexLookupField": "indexNaame",
        "sourceLookupField": "myid",
        "sourceLookupFieldToUpperCase": true,
        "lookupOutputField": "myidOutput",
        "debug": true,
        "size": 1000       
		"tesseractPath": "C:\\Tesseract-OCR\\tesseract",
		"processTimeout": 600000,
		"imageDirectory": "C:\\tempDir",
		"maxSize": "10mb",
		"confidenceThreshold": 80,
		"outputFormat": "png",
		"imageType": "bilevel",
		"dpi": 300,
		"mimeTypeXPath": "/doc/normalizedMimeType",
		"pdfMimeTypes": "aspire/pdf",
		"imageMimeTypes": "aspire/drawing",
		"startPage": 0,
		"endPage": 20,
		"processThreads": 8,
		"processQueue": 30,
		"backoffTime": 1000,
		"debug": true