Step 2. Add a new Content Source

For this step, follow the steps from the Configuration Tutorial for the connector of your choice. For more information, see Connector list.
Note: Make sure the "Disable Text extraction" box on Advanced Properties is checked.

Step 3. Add a new Tesseract OCR application to the Workflow

To add a Tesseract OCR application, drag from the Tesseract OCR rule in the Workflow Library and drop to the Workflow Tree where you want to add it. This will automatically open the Tesseract OCR window for application configuration.

Step 3a. Specify Application Information

In the Tesseract OCR window, specify the information to set up the application.

Tesseract executable file path
- Location to the tesseract executable file.
  Note: In Windows, only the program name can be put here if tesseract is part of the path.
Page segmentation mode:
- Page segmentation mode to be used during OCR execution.
  See link for more information.
Languages to detect:
- Languages used for the OCR execution.
- The order of the languages affect the output. See here.
  Note: Before using a language, the language training data must be installed.
Process timeout:
- Time in milliseconds to wait before killing the tesseract process.
Accept patterns:
- Regex to be matched against the document URL if it will be processed by the application.
Debug:
- Enable debug messages.

Page tree

Step 1. Launch Aspire and Open the Content Source Management page

Step 2. Add a new Content Source

Step 3. Add a new Tesseract OCR application to the Workflow

Step 3a. Specify Application Information

Page tree

Tesseract OCR How to Configure

Step 1. Launch Aspire and Open the Content Source Management page

Step 2. Add a new Content Source

Step 3. Add a new Tesseract OCR application to the Workflow

Step 3a. Specify Application Information