Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Seed

The Content The Content Type Detector Connector component can be configured using the Rest API. It requires the following entities to be created:

  • Credential
  • Connection
  • Connector
  • Aspire workflow section. 


    Below are the examples of how to
    create the Connection and the Seed. For the Connector please check this page.configure the component. 

    Easy Heading Free
    navigationTitleOn this Page
    wrapNavigationTexttrue
    navigationExpandOptionexpand-all-by-default

    Create

    Credential

    Workflow


    Field

    Required

    Default

    Multiple

    NotesExample
    typeYes-No

    The value must be "Content Type Detector".

    "Content Type Detector"

    descriptionYes-NoName of the credential object.

    "Content Type

    DetectorCredential

    Detector"

    propertiesYes-NoConfiguration object
    Example

    Ignore Delete Jobs
    code
    No
    theme
    True
    RDarktitlePOST aspire/_api/credentials
    {
        "type": "<Connector Type>",
        "description": "<Connector Name> Credential",
        "properties": {
    		
        }
    }

    Update Credential

    Field

    Required

    Default

    Multiple

    NotesExampleidYes-NoId of the credential to update."2f287669-d163-4e35-ad17-6bbfe9df3778"descriptionYes-NoName of the credential object.

    "Content Type DetectorCredential"

    propertiesYes-NoConfiguration object

    Example 

    Code Block
    themeRDark
    titlePUT aspire/_api/credentials/2a5ca234-e328-4d40-bb2a-2df3e550b065
    {
        "id": "2a5ca234-e328-4d40-bb2a-2df3e550b065",
        "description": "<Connector Name> Credential",
        "properties": {
    		
        }
    }

    Create Connection

    Field

    Required

    Default

    Multiple

    Notes

    Example

    typeYes-No

    The value must be Content Type Detector

    Content Type Detector

    descriptionYes-NoName of the connection object.

    "MyContent Type DetectorConnection"

    throttlePolicyNo-NoId of the throttle policy that applies to this connection object."f5587cee-9116-4011-b3a9-6b235b333a1b"
    NoOption to skip delete jobs.TRUE
    Fetch fileNoFalseNoSelect if you need to fetch a file.FALSE
    Use default document pathNoTrueNoSelect so that Aspire will use the fetchUrl or displayUrl as the location of the file.FALSE
    Document fetch pathYes
    NoLocation in the Aspire document of the path to the file to fetch."/doc/fetchUrl"
    Max Lookahead in MBytes for type detectionYes0.5NoMaximum to consume the file stream to detect the type.0.5
    Max percent of column variability to allow in text separated filesYes0NoMaximum percentage of variability to allow in the number of columns.0
    Apache Tika configuration pathNoNoneNoPath for Apache Tika configuration file."/path/to/tikaConfig.xml"
    routingPoliciesNo[ ]YesThe ids of the routing policies that this connection will use.["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"]propertiesYes-NoConfiguration object

    Example

    Code Block
    themeRDark
    titlePOST aspire/_api/connectionscredentials
    {
        "type": "<Connector Type>",
        "description": "<ConnectorContent Name>Type Test ConnectorDetector",
        "properties": {
             
        }
    }

    Update Connection

    Field

    Required

    Default

    Multiple

    Notes

    Example

    idYes-NoId of the connection to update"89d6632a-a296-426c-adb0-d442adcab4b0",descriptionNo-NoName of the connection object.

    "MyImage RemovedConnection"

    throttlePolicyNo-NoId of the throttle policy that applies to this connection object."f5587cee-9116-4011-b3a9-6b235b333a1b"routingPoliciesNo[ ]YesThe ids of the routing policies that this connection will use.["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"]propertiesYes-NoConfiguration object

    Example

    Code Block
    themeRDark
    {
        "id": "89d6632a-a296-426c-adb0-d442adcab4b0""ignoreDeleteJobs": true,
        "description": "<Connector Name> Test Connector",
        "properties": {
            
        }
    }

    Create Connector Instance

    For the creation of the Connector object using the Rest API check this page

    Update Connector Instance

    For the update of the Connector object using the Rest API check this page

    Create Seed

    Field

    Required

    Default

    Multiple

    Notes

    Example

    seedYes-No<seed description>typeYes-No

    The value must be Content Type Detector.

    Content Type Detector

    descriptionYes-NoName of the seed object.

    "My Content Type Detector Seed"

    connectorYes-NoThe id of the connector to be used with this seed. The connector type must match the seed type."82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31"connectionYes-NoThe id of the connection to be used with this seed. The connection type must match the seed type."602d3700-28dd-4a6a-8b51-e4a663fe9ee6"workflowsNo[ ]YesThe ids of the workflows that will be executed for the documents crawled.["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"]throttlePolicyNo-NoId of the throttle policy that applies to this connection object."f5587cee-9116-4011-b3a9-6b235b333a1b"routingPoliciesNo[ ]YesThe ids of the routing policies that this seed will use.["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"]tagsNo[ ]YesThe tags of the seed. These can be used to filter the seed["tag1", "tag2"]
    "enableFetchUrl": false,
            "fetchPath": "/doc/fetchUrl",
            "maxLookaheadSize": 0.5,
            "variabilityPercent": 0,
            "tikaConfig": "/path/to/tikaConfig.xml"     
        }
    }

    Update Workflow

    propertiesYes-NoConfiguration object

    Example 

    Code Block
    themeRDark
    titlePOST aspire/_api/seeds
    {
        "type": "<Connector Type>",
        "seed": "directory",
        "connector": "82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31",
        "description": "<connector>_Test_Seed",
        "throttlePolicy": "6b8b5f23-fc77-47a1-9b58-106577162e7b",
        "routingPolicies": ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"],
        "connection": "602d3700-28dd-4a6a-8b51-e4a663fe9ee6",
        "workflows": ["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"],
        "tags": ["tag1", "tag2"],
        "properties": {
            
        }
    }
    Update Seed


    Field

    Required

    Default

    Multiple

    NotesExample
    id
    descriptionYes-No
    Id
    Name of the
    seed to update
    credential object.

    "

    2f287669-d163-4e35-ad17-6bbfe9df3778

    Content Type Detector"

    seed
    properties
    No
    Yes-No
    <seed description>
    Configuration object
    description

    properties
    No
    Yes-No
    Name of the seed
    Configuration object
    .

    "MyContent Type DetectorSeed"

    connector

    Ignore Delete JobsNo
    -
    TrueNo
    The id of the connector to be used with this seed. The connector type must match the seed type."82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31"
    Option to skip delete jobs.TRUE
    Fetch fileNoFalseNoSelect if you need to fetch a file.FALSE
    Use default document pathNoTrueNoSelect so that Aspire will use the fetchUrl or displayUrl as the location of the file.FALSE
    Document fetch pathYes
    NoLocation in the Aspire document of the path to the file to fetch."/doc/fetchUrl"
    Max Lookahead in MBytes for type detectionYes0.5NoMaximum to consume the file stream to detect the type.0.5
    Max percent of column variability to allow in text separated filesYes0NoMaximum percentage of variability to allow in the number of columns.0
    Apache Tika configuration pathNoNoneNoPath for Apache Tika configuration file."/path/to/tikaConfig.xml"
    connectionNo-NoThe id of the connection to be used with this seed. The connection type must match the seed type."602d3700-28dd-4a6a-8b51-e4a663fe9ee6"workflowsNo[ ]YesThe ids of the workflows that will be executed for the documents crawled.["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"]workflows.addNo[ ]YesThe ids of the workflows to add.["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"]workflows.removeNo[ ]YesThe ids of the workflows to remove.["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"]throttlePolicyNo-NoId of the throttle policy that applies to this connection object."f5587cee-9116-4011-b3a9-6b235b333a1b"routingPoliciesNo[ ]YesThe ids of the routing policies that this seed will use.["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"]routingPolicies.addNo[ ]YesThe ids of the routingPolicies to add.["b4d2579f-1a0a-4a8b-9fd4-d42780003b36"]routingPolicies.removeNo[ ]YesThe ids of the routingPolicies to remove.["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7"]tagsNo[ ]YesThe tags of the seed. These can be used to filter the seed["tag1", "tag3"]tags.addNo[ ]YesThe tags to add["tag4"]tags.removeNo[ ]YesThe tags to remove["tag2"]propertiesYes-NoConfiguration object

    Example 

    Code Block
    themeRDark
    titlePUT aspire/_api/seedscredentials/2f2876692a5ca234-d163e328-4e354d40-ad17bb2a-6bbfe9df37782df3e550b065
    {
          
    	"iddescription": "2f287669-d163-4e35-ad17-6bbfe9df3778Content Type Detector",
        "seedproperties": "<seed example>",
     {
       "connector": "82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31",
        "descriptionignoreDeleteJobs": "<connector>_Test_Seed"true,
        "throttlePolicy": "6b8b5f23-fc77-47a1-9b58-106577162e7b",
        "routingPoliciesenableFetchUrl": ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"],
    true,
    		"defaultFetchPath": true,
            "connectionfetchPath": "602d3700-28dd-4a6a-8b51-e4a663fe9ee6/doc/fetchUrl",
            "workflowsmaxLookaheadSize": ["b255e950-1dac-46dc-8f86-1238b2fbdf27", "f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"],
    0.5,
            "tagsvariabilityPercent": ["tag", "tag2"],
    0,
            "propertiestikaConfig": {
    "/path/to/tikaConfig.xml"          
        	}
    }