The Content Type Detector component can be configured using the Aspire Admin UI. It requires the following entities to be created:
Connector
Seed
This component is an application for workflow configuration, and it is used in the"onAddUpdate" WorkflowEvent.
Create Workflow
On the Aspire Admin UI, go to the workflow page
All existing workflows will be listed. Click on the new button
Enter the new workflow description.
Select the “Create” button.
Go to the Workflow Event “onAddUpdate”.
Search in “Type criteria” the Applications options and drag, using, theContent Type Detectorcomponent in the onAddUpdate section.
Enter a new description for this application component.
General:
Ignore Delete Jobs: Select if delete jobs need to be ignored.
Fetch file: Fetch the file before text extraction. If you disable this, make sure some preceding stage or component has assigned a content stream to the job.
Use the default document path: Select so that Aspire will use the fetchUrl or displayUrl as the location of the file. Clear if your Aspire document stores the path to the file in a different location.
Document fetch path: The location in the Aspire document of the path to the file to fetch.
Max Lookahead in MBytes for type detection: The maximum to consume the file stream to detect the type, specially for CSV/TSV detection.
Max percent of column variability to allow in text separated files: The maximum percentage of variability to allow in the number of columns when detecting the Content Type of separated value files. NOTE: If you set a high variability, you may get wrongly detected types for the files.
Apache Tika configuration path: Path for Apache Tika configuration file.