Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The File System Connector can be configured using the Rest API. It requires the following entities to be created:

  • Connection
  • Connector
  • Seed

Bellow are the examples of how to create the Connection and the Seed. For the Connector please check this page.

Easy Heading Free
navigationTitleOn this Page
navigationExpandOptionexpand-all-by-default

Panel
titleOn this page
toc


Create Connection


Field
Optional
RequiredDefaultMultipleNotesExample
idNo-NoId of the credential to update
"89d6632a-a296-426c-adb0-d442adcab4b0"
type
No
Yes-NoThe value must be "filesystem"."filesystem"
description
No
Yes-NoName of the connection object."MyFileSystemConnection"
throttlePolicy
Yes
No-NoId of the throttle policy that applies to this connection object."f5587cee-9116-4011-b3a9-6b235b333a1b"
properties
routingPoliciesNo
-
[ ]
NoConfiguration objecthostNo-NoName of the host where the smb share is located."share.example.com"portYes445NoPort where the smb protocol is listened.445indexContainersYesfalseNoEnable to process directories as regular files.true / falsescanExcludedItemsYesfalseNoEnable to scan directories previously excluded from the crawl
YesThe ids of the routing policies that this connection will use.["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"]
propertiesYes-NoConfiguration object
urlYes-NoPath of the base directory to crawl. All the seeds will be prefixed with this value to form the full path. All the seeds will be prefixed with this value to form the full path"C:\Directory"
ignoreSymLinksNofalseNoIf enabled symbolic links will not be processed and links in the root items will cause an error.true / false
stopOnScanError
Yes
NotrueNoIf enabled, the crawl will stop if there is an error on the scan phase.true / false
disableFetch
indexContainers
Yes
NofalseNo
Disable the content fetcher, only metadata will be retrievedfalse
Enable to index the directories.true / false
verboseSMBJYes
scanRecursivelyNo
Log SMBJ library output.
true
/ falsetraceSMBJYesfalse
No
Log TRACE level messages
Enable to scan discovered directories recursively.true / false
include
Yes
No

[ ]

YesPatterns to match against document URL, if any of them match, the document will be included in the crawl.[ ".*pdf$", ".*docx$" ]
exclude
Yes
No[ ]YesPatterns to match against document URL, if any of them match, the document will be excluded from the crawl.[ ".*png$", ".*jpeg$" ]
fetchACLs
scanExcludedItems
Yes
No
true
falseNoEnable to
retrieve the document's ACL information
force the scan of excluded directories, so child items within the scope can be found.true / false
resolveSIDs
staticAclNo

[ ]

YesStatic ACL configuration object
nameYes
true
-No
Enable to resolve the document Security Identifier (SID).true / falseaddACLSIDYesfalseNoEnable to add the SID to the ACL output.true / falseaddACLEncodedSIDYesfalseNoEnable to add the Base32 encoded SID to the ACL output.true / falseaddACLFlagsYesfalseNoEnable to add Access Control Entry flags to the ACL output.true / falseaddACLTypeYesfalseNoEnable to add Access Control Entry type to the ACL output.true / falseaddACLAccessMaskYesfalseNoEnable to add the Access Control Entry access mask (hex value) to the ACL output.true / falseenableDFSYesfalseNoEnable Distributed File System (DFS) path resolution.true / falseconnectionTimeoutYes60000NoTimeout in milliseconds for establishing a connection with the server.60000maxRetriesYes5NoTimes a request will be retried before reporting the error.5baseBackoffYes500NoInitial time to wait before retrying a failed request500backoffMultiplierYes2NoMultiplier for the backoff period before retrying a request1.5
Name of the static ACL."group1"
domainNo""NoDomain of the static ACL."testDomain"
entityNo"user"NoEntity (user / group) represented by the static ACL."user" / "group"
accessNo"allow"NoAccess (allow / deny) granted by the ACL."allow" / "deny"

Example

Code Block
themeRDark
titlePOST aspire/_api/connections
{
    "type": "filesystem",
    "description": "FileSystem Test Connector",
    "properties": {
        "url": "C:\\Directory",
        "ignoreSymLinks": true,
        "stopOnScanError": true,
        "indexContainers": true,
        "scanExcludedItems": true,
        "ignoreSymLinks": true,
        "includes": ".*\\.txt",
        "excludes": ".*\\.png",
        "staticAcl": [{
                "name": "test-user",
                "domain": "test-domain",
                "entity": "user",
                "access": "allow"
            }, {
                "name": "test-group",
                "domain": "",
                "entity": "group",
                "access": "deny"
            }
        ]
    }
}

Update Connection


FieldRequiredDefaultMultipleNotesExample
idYes-NoId of the connection to update"89d6632a-a296-426c-adb0-d442adcab4b0",
typeYes-NoThe value must be "filesystem"."filesystem"
descriptionNo-NoName of the connection object."MyFileSystemConnection"
throttlePolicyNo-NoId of the throttle policy that applies to this connection object."f5587cee-9116-4011-b3a9-6b235b333a1b"
routingPoliciesNo[ ]YesThe ids of the routing policies that this connection will use.["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"]
propertiesYes-NoConfiguration object
urlYes-NoPath of the base directory to crawl. All the seeds will be prefixed with this value to form the full path. All seeds will be prefixed with this value to form the full path"C:\\Directory"
ignoreSymLinksNofalseNoIf enabled symbolic links will not be processed and links in the root items will cause an error.true / false
stopOnScanErrorNotrueNoIf enabled, the crawl will stop if there is an error on the scan phase.true / false
indexContainersNofalseNoEnable to index the directories.true / false
scanRecursivelyNotrueNoEnable to scan discovered items recursively.true / false
includeNo

[ ]

YesPatterns to match against document URL, if any of them match, the document will be included in the crawl.[ ".*pdf$", ".*docx$" ]
excludeNo[ ]YesPatterns to match against document URL, if any of them match, the document will be excluded from the crawl.[ ".*png$", ".*jpeg$" ]
scanExcludedItemsNofalseNoEnable to force the scan of excluded directories, so child items within the scope can be found
lastAccessUpdateYesfalseNoIf enabled, the connector will override the last access date of the documents with the value they had before the crawl. NOTE: Requires writing permission
.true / false
staticAcl
Yes
No

[ ]

YesStatic ACL configuration object
name
No
Yes-NoName of the static ACL."group1"
domain
Yes
No""NoDomain of the static ACL."testDomain"
entity
Yes
No"user"NoEntity (user / group) represented by the static ACL."user" / "group"
access
Yes
No"allow"NoAccess (allow / deny) granted by the ACL."allow" / "deny"

Example

Code Block
themeRDark
titlePUT
Saga_json
TitlePOST aspire/_api/connections/89d6632a-a296-426c-adb0-d442adcab4b0
"type{
    "id": "smb89d6632a-a296-426c-adb0-d442adcab4b0",
    "descriptiontype": "mySMBConnfilesystem",
    "credentialdescription": "f6679d15-7afa-42fe-b941-ed26fdf0ecd4FileSystem Test Connector",
"throttlePolicy": "f5587cee-9116-4011-b3a9-6b235b333a1b",
    "properties": {
        "hosturl": "smb.hostC:\\Directory",
        "portignoreSymLinks": 445true,
        "indexContainersstopOnScanError": falsetrue,
        "ignoreScanErrorindexContainers": falsetrue,
        "scanExcludedItemsscanRecursively": falsetrue,
        "disableFetchscanExcludedItems": falsetrue,
    "verboseSMBJ": false,
    "includeincludes": [ ".*\\.*pdf$txt" ],
        "excludeexcludes": [ ".*\\.*tiff$png" ],
        "fetchACLsstaticAcl": true,
[{
                "resolveSIDsname": true"test-user",
                "addACLSIDdomain": false"test-domain",
                "addACLEncodedSIDentity": false"user",
      "addACLFlags": false,
    "addACLType": false,
    "addACLAccessMaskaccess": false,"allow"
    "staticAcl": [
       }, {
                "name": "myACLtest-group",
                "domain": "myDomain",
                "entity": "usergroup",
                "access": "allowdeny"
            },
        ]
    }
}

Create Connector


For the creation of the Connector object using the Rest API check this page

Update Connector


For the update of the Connector object using the Rest API check this page

Create Seed


FieldRequiredDefaultMultipleNotesExample
seedYes-NoThe subdirectory to crawl. This value will be appended to the url of the connection."directory"
typeYes-NoThe value must be "filesystem"."filesystem"
descriptionYes-NoName of the seed object."MyFileSystemConnection"
connectorYes-NoThe id of the connector to be used with this seed. The connector type must match the seed type."82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31"
connectionYes-NoThe id of the connection to be used with this seed. The connection type must match the seed type."602d3700-28dd-4a6a-8b51-e4a663fe9ee6"
workflowsNo[ ]YesThe ids of the workflows that will be executed for the documents crawled.["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"]
throttlePolicyNo-NoId of the throttle policy that applies to this connection object."f5587cee-9116-4011-b3a9-6b235b333a1b"
routingPoliciesNo[ ]YesThe ids of the routing policies that this seed will use.["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"]
tagsNo[ ]YesThe tags of the seed. These can be used to filter the seed["tag1", "tag2"]

Example

Code Block
themeRDark
titlePOST aspire/_api/seeds
{
    "type": "filesystem",
    "seed": "directory",
    "nameconnector": "myGroup82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31",
    "description": "FileSystem_Test_Seed",
    "throttlePolicy": "6b8b5f23-fc77-47a1-9b58-106577162e7b",
    "domainroutingPolicies": "example" ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"],
    "connection": "602d3700-28dd-4a6a-8b51-e4a663fe9ee6",
    "workflows": ["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"],
    "entitytags": "group"["tag1", "tag2"],
    "properties": {
        "seed": "directory"
    }
}

Update Seed


FieldRequiredDefaultMultipleNotesExample
idYes-NoId of the seed to update."2f287669-d163-4e35-ad17-6bbfe9df3778"
seedNo-NoThe subdirectory to crawl. This value will be appended to the url of the connection."directory"
descriptionNo-NoName of the seed object."MyFileSystemConnection"
connectorNo-NoThe id of the connector to be used with this seed. The connector type must match the seed type."82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31"
connectionNo-NoThe id of the connection to be used with this seed. The connection type must match the seed type."602d3700-28dd-4a6a-8b51-e4a663fe9ee6"
workflowsNo[ ]YesThe ids of the workflows that will be executed for the documents crawled.["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"]
workflows.addNo[ ]YesThe ids of the workflows to add.["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"]
workflows.removeNo[ ]YesThe ids of the workflows to remove.["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"]
throttlePolicyNo-NoId of the throttle policy that applies to this connection object."f5587cee-9116-4011-b3a9-6b235b333a1b"
routingPoliciesNo[ ]YesThe ids of the routing policies that this seed will use.["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"]
routingPolicies.addNo[ ]YesThe ids of the routingPolicies to add.["b4d2579f-1a0a-4a8b-9fd4-d42780003b36"]
routingPolicies.removeNo[ ]YesThe ids of the routingPolicies to remove.["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7"]
tagsNo[ ]YesThe tags of the seed. These can be used to filter the seed["tag1", "tag3"]
tags.addNo[ ]YesThe tags to add["tag4"]
tags.removeNo[ ]YesThe tags to remove["tag2"]

Example

Code Block
themeRDark
titlePUT aspire/_api/seeds/2f287669-d163-4e35-ad17-6bbfe9df3778
{
    "id": "2f287669-d163-4e35-ad17-6bbfe9df3778",
    "accesstype": "denyfilesystem",
    "seed": "directory",
     }
"connector": "82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31",
     ]"description": "FileSystem_Test_Seed",
    "enableDFSthrottlePolicy": true"6b8b5f23-fc77-47a1-9b58-106577162e7b",
    "connectionTimeoutroutingPolicies": 60000["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"],
    "maxRetriesconnection": 5"602d3700-28dd-4a6a-8b51-e4a663fe9ee6",
    "baseBackoffworkflows": 500 ["b255e950-1dac-46dc-8f86-1238b2fbdf27", "f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"],
    "backoffMultipliertags": 2 ["tag", "tag2"],
    "lastAccessedUpdates": false"properties": {
        "seed": "directory"
    }
}