Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The Content The Content Type Detector Connector component can be configured using the Rest APIAspire workflow section. It requires the following entities to be created:

  • Credential
  • Connection
  • Connector
  • Seed

.
Below are the examples of how to
create the Connection and the Seed. For the Connector please check this page.configure the component. 

Easy Heading Free
navigationTitleOn this Page
wrapNavigationTexttrue
navigationExpandOptionexpand-all-by-default

Create

Credential

Workflow


Field

Required

Default

Multiple

NotesExample
typeYes-No

The value must be "Content Type Detector".

"Content Type Detector"
descriptionYes-NoName of the credential object.

"Content Type

DetectorCredential

Detector"

propertiesYes-NoConfiguration object
Example

Ignore Delete Jobs
code
No
theme
True
RDarktitlePOST aspire/_api/credentials
{
    "type": "<Connector Type>",
    "description": "<Connector Name> Credential",
    "properties": {
		
    }
}

Update Credential

Field

Required

Default

Multiple

NotesExampleidYes-NoId of the credential to update."2f287669-d163-4e35-ad17-6bbfe9df3778"descriptionYes-NoName of the credential object.

"Content Type DetectorCredential"

propertiesYes-NoConfiguration object

Example 

Code Block
themeRDark
titlePUT aspire/_api/credentials/2a5ca234-e328-4d40-bb2a-2df3e550b065
{
    "id": "2a5ca234-e328-4d40-bb2a-2df3e550b065",
    "description": "<Connector Name> Credential",
    "properties": {
		
    }
}

Create Connection

Field

Required

Default

Multiple

Notes

Example

typeYes-No

The value must be Content Type Detector

Content Type Detector

descriptionYes-NoName of the connection object.

"MyContent Type DetectorConnection"

throttlePolicyNo-NoId of the throttle policy that applies to this connection object."f5587cee-9116-4011-b3a9-6b235b333a1b"
NoOption to skip delete jobs.TRUE
Fetch fileNoFalseNoSelect if you need to fetch file.FALSE
Use default document pathNoTrueNoSelect so that Aspire will use the fetchUrl or displayUrl as the location of the file.FALSE
Document fetch pathYes
NoLocation in the Aspire document of the path to the file to fetch."/doc/fetchUrl"
Max Lookahead in MBytes for type detectionYes0.5NoMaximum to consume the file stream to detect the type.0.5
Max percent of column variability to allow in text separated filesYes0NoMaximum percentage of variability to allow in the number of columns.0
Apache Tika configuration pathNoNoneNoPath for Apache Tika configuration file."/path/to/tikaConfig.xml"
routingPoliciesNo[ ]YesThe ids of the routing policies that this connection will use.["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"]propertiesYes-NoConfiguration object

Example

Code Block
themeRDark
titlePOST aspire/_api/connectionscredentials
{
    "type": "<Connector Type>",
    "description": "<ConnectorContent Name>Type Test ConnectorDetector",
    "properties": {
         
    }
}

Update Connection

Field

Required

Default

Multiple

Notes

Example

idYes-NoId of the connection to update"89d6632a-a296-426c-adb0-d442adcab4b0",descriptionNo-NoName of the connection object.

"MyImage RemovedConnection"

throttlePolicyNo-NoId of the throttle policy that applies to this connection object."f5587cee-9116-4011-b3a9-6b235b333a1b"routingPoliciesNo[ ]YesThe ids of the routing policies that this connection will use.["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"]propertiesYes-NoConfiguration object

Example

Code Block
themeRDark
{
    "id": "89d6632a-a296-426c-adb0-d442adcab4b0",
    "description": "<Connector Name> Test Connector",
"ignoreDeleteJobs": true,
        "properties": {
        
    }
}

Create Connector Instance

For the creation of the Connector object using the Rest API check this page

Update Connector Instance

For the update of the Connector object using the Rest API check this page

Create Seed

Field

Required

Default

Multiple

Notes

Example

seedYes-No<seed description>typeYes-No

The value must be Content Type Detector.

Content Type Detector

descriptionYes-NoName of the seed object.

"My Content Type Detector Seed"

connectorYes-NoThe id of the connector to be used with this seed. The connector type must match the seed type."82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31"connectionYes-NoThe id of the connection to be used with this seed. The connection type must match the seed type."602d3700-28dd-4a6a-8b51-e4a663fe9ee6"workflowsNo[ ]YesThe ids of the workflows that will be executed for the documents crawled.["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"]throttlePolicyNo-NoId of the throttle policy that applies to this connection object."f5587cee-9116-4011-b3a9-6b235b333a1b"routingPoliciesNo[ ]YesThe ids of the routing policies that this seed will use.["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"]tagsNo[ ]YesThe tags of the seed. These can be used to filter the seed["tag1", "tag2"]propertiesYes-NoConfiguration object

Example 

Code Block
themeRDark
titlePOST aspire/_api/seeds
{
    "type": "<Connector Type>",
    "seed": "directory",
    "connector": "82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31",
    "description": "<connector>_Test_Seed",
    "throttlePolicy": "6b8b5f23-fc77-47a1-9b58-106577162e7b",
    "routingPolicies": ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"],
    "connection": "602d3700-28dd-4a6a-8b51-e4a663fe9ee6",
    "workflows": ["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"],
    "tags": ["tag1", "tag2"],
    "properties": {
        enableFetchUrl": false,
        "fetchPath": "/doc/fetchUrl",
        "maxLookaheadSize": 0.5,
        "variabilityPercent": 0,
        "tikaConfig": "/path/to/tikaConfig.xml"     
    }
}
Update Seed

Update 

Workflow


Field

Required

Default

Multiple

NotesExample
id
descriptionYes-No
Id
Name of the
seed to update
credential object.

"

2f287669-d163-4e35-ad17-6bbfe9df3778

Content Type Detector"

seed
properties
No
Yes-No
<seed description>
Configuration object
description

properties
No
Yes-No
Name of the seed
Configuration object
.

"MyContent Type DetectorSeed"

connector

Ignore Delete JobsNo
-
TrueNo
The id of the connector to be used with this seed. The connector type must match the seed type."82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31"
Option to skip delete jobs.TRUE
Fetch fileNoFalseNoSelect if you need to fetch file.FALSE
Use default document pathNoTrueNoSelect so that Aspire will use the fetchUrl or displayUrl as the location of the file.FALSE
Document fetch pathYes
NoLocation in the Aspire document of the path to the file to fetch."/doc/fetchUrl"
Max Lookahead in MBytes for type detectionYes0.5NoMaximum to consume the file stream to detect the type.0.5
Max percent of column variability to allow in text separated filesYes0NoMaximum percentage of variability to allow in the number of columns.0
Apache Tika configuration pathNoNoneNoPath for Apache Tika configuration file."/path/to/tikaConfig.xml"
connectionNo-NoThe id of the connection to be used with this seed. The connection type must match the seed type."602d3700-28dd-4a6a-8b51-e4a663fe9ee6"workflowsNo[ ]YesThe ids of the workflows that will be executed for the documents crawled.["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"]workflows.addNo[ ]YesThe ids of the workflows to add.["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"]workflows.removeNo[ ]YesThe ids of the workflows to remove.["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"]throttlePolicyNo-NoId of the throttle policy that applies to this connection object."f5587cee-9116-4011-b3a9-6b235b333a1b"routingPoliciesNo[ ]YesThe ids of the routing policies that this seed will use.["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"]routingPolicies.addNo[ ]YesThe ids of the routingPolicies to add.["b4d2579f-1a0a-4a8b-9fd4-d42780003b36"]routingPolicies.removeNo[ ]YesThe ids of the routingPolicies to remove.["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7"]tagsNo[ ]YesThe tags of the seed. These can be used to filter the seed["tag1", "tag3"]tags.addNo[ ]YesThe tags to add["tag4"]tags.removeNo[ ]YesThe tags to remove["tag2"]propertiesYes-NoConfiguration object

Example 

Code Block
themeRDark
titlePUT aspire/_api/seedscredentials/2f2876692a5ca234-d163e328-4e354d40-ad17bb2a-6bbfe9df37782df3e550b065
{
      
	"iddescription": "2f287669-d163-4e35-ad17-6bbfe9df3778Content Type Detector",
    "seedproperties": {
   "<seed example>",
    "connectorignoreDeleteJobs": "82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31"true,
    "description": "<connector>_Test_Seed",
    "throttlePolicyenableFetchUrl": "6b8b5f23-fc77-47a1-9b58-106577162e7b",
    "routingPolicies": ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"],
true,
		"defaultFetchPath": true,
		"fetchPath": true
        "connectionfetchPath": "602d3700-28dd-4a6a-8b51-e4a663fe9ee6/doc/fetchUrl",
        "workflowsmaxLookaheadSize": ["b255e950-1dac-46dc-8f86-1238b2fbdf27", "f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"],
0.5,
        "tagsvariabilityPercent": ["tag", "tag2"],
0,
        "propertiestikaConfig": {
"/path/to/tikaConfig.xml"          
    	}
}