Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Warning
titleTODO

The Selenium Crawler can be configured using the Rest API. It requires the following entities to be created:

  • Credential
  • Connection
  • Connector
  • Seed

Below are the examples of how to create the Credential, Connection and the Seed. For the Connector, please check this page.

Easy Heading Free
navigationTitleOn this Page
navigationExpandOptionexpand-all-by-default

Create Credential


FieldRequiredDefaultMultipleNotesExample
typeYes-NoThe value must be "selenium"."selenium"
descriptionYes-NoName of the credential object."My Selenium Credential"
propertiesYes-NoConfiguration object
authenticationHandlerYes-YesAuthentication handlers[]
hostNo-NoHostname of the urls URLs that will apply this handler, if no hostname is set, it will be used for all."domain.com"
portNo-1NoPort of the urlURL, if set to -1, any port will be accepted8080
loginUrlYes-NoUrl URL to the login page"http://yoursite/login.php"
userYes-NoUser nameUsername"admin"
passwordYes-NoUser password"password"
authenticationTypeYes-NoAuthentication implementation: The crawler will use this configuration to log in the page (SIMPLE/SCRIPT)"SIMPLE"




Simple Authentication
userSelectorTypeYesIdNoUsername field selector type (Class, Css, IdID, Name, XPath)"Id"
userSelectoryes-No

Username field: Field on the login form where the username should be set

"txtUsername"
passwordSelectorTypeYesIdNoPassword field selector type (Class, Css, IdID, Name, XPath)"Id"
passwordSelectoryes-No

Password field: Field on the login form where the password should be set

"txtPassword"






customFieldno-YesCustom field[]
selectorTypeyesIdNoType of selector used to locate the field within the page. (Class, Css, Id, Name, XPath)"Id"
selectoryes-noValue of the selector used to locate the field within the page."myField"
fieldTypenoTextnoType of field to locate within the page. (Button, Checkbox, Select, RadioButton, Text)"Text"
fieldValueyes-noValue of the field"myFieldValue"






submitSelectorTypeyesIdNoType of selector used to locate the field within the page. (Class, Css, IdID, Name, XPath)"Id"
submitSelectoryes--Value of the selector used to locate the field within the page."btnSubmit"




Scripted authentication
authenticationScriptyes-NoGroovy Code with instructions to fill the login form (see code block below)
Code Block
languagejava
titleAuthentication Script
// Write the instructions to fill the login form
//
n// Available variables:
//
// - seedId, String. Id of the current seed.
// - logger, ALogger. Logger instance.
// - driver, WebDriver. Selenium WebDriver instance for controlling a browser.
// - loginUrl, String. Login URL.
// - username, String. Username
// - password, String. Password (decrypted). IMPORTANT: DO NOT LOG THIS VALUE


FieldRequiredDefaultMultipleNotesExample
verificationTypeYes-noVerification style (SIMPLE, SCRIPT)"SIMPLE"




Simple verification
verificationFieldYes-yesFields to verify[]
fieldSelectorTypeyesIdnoType of selector used to locate the field within the page.  (Class, Css, IdID, Name, XPath)"Id"
fieldSelectoryes-noValue of the selector used to locate the field within the page."myField"




Scripted verification
verificationScriptyes-noGroovy Code with instructions to validate whether or not the login was successful or not. Must return a boolean (see code block below)
Code Block
languagejava
titleVerification Script
// Check if the session is still valid.
// Must return a boolean.
// If the session is invalid, a login shall be attempted
// 
// Available variables:
//
// - seedId, String. Id of the current seed.
// - logger, ALogger. Logger instance.
// - driver, WebDriver. Selenium WebDriver instance for controlling a browser.
// - loginUrl, String. Login URL.

return false;

Example

Code Block
themeRDark
titlePOST aspire/_api/credentials
{
    "type": "selenium",
    "description": "Selenium Desc",             
     "properties": {
       "authenticationHandler": [
          {
            "host": "domain.com",
            "port": 80,
            "loginUrl": "http://yoursite/login.php",
            "user": "user",
            "password": "password",
            "authenticationType": "SIMPLE",
            "userSelectorType": "Id",
            "userSelector": "txtUsername",
            "passwordSelectorType": "Id",
            "passwordSelector": "txtPassword",
            "customField": [],
            "submitSelectorType": "Id",
            "submitSelector": "btnSubmit",
            "verificationType": "SIMPLE",
            "verificationField": [
              {
                "fieldSelectorType": "Id",
                "fieldSelector": "myField"
               }
             ]
           }
         ]
    }
}

Update Credential


FieldRequiredDefaultMultipleNotesExample
idYes-NoId ID of the credential to update."2f287669-d163-4e35-ad17-6bbfe9df3778"
descriptionYes-NoName of the credential object."Selenium Credential"
propertiesYes-NoConfiguration object
(see create credential)




Example

Code Block
themeRDark
titlePUT aspire/_api/credentials/2a5ca234-e328-4d40-bb2a-2df3e550b065
{
    "id": "2a5ca234-e328-4d40-bb2a-2df3e550b065",
    "description": "SeleniumCredential",
      "properties": {
       "authenticationHandler": [
          {
            "host": "domain.com",
            "port": 80,
            "loginUrl": "http://yoursite/login.php",
            "user": "user",
            "password": "password",
            "authenticationType": "SIMPLE",
            "userSelectorType": "Id",
            "userSelector": "txtUsername",
            "passwordSelectorType": "Id",
            "passwordSelector": "txtPassword",
            "customField": [],
            "submitSelectorType": "Id",
            "submitSelector": "btnSubmit",
            "verificationType": "SIMPLE",
            "verificationField": [
              {
                "fieldSelectorType": "Id",
                "fieldSelector": "myField"
               }
             ]
           }
         ]
    } 
 }

Create Connection


FieldRequiredDefaultMultipleNotesExample
typeYes-NoThe value must be "selenium"."selenium"
descriptionYes-NoName of the connection object."My Selenium Connection"
throttlePolicyNo-NoId ID of the throttle policy that applies to this connection object."6b235b333a1b"
routingPoliciesNo[ ]YesThe ids IDs of the routing policies that this connection will use.["17f75ce7d0c7", "d42780003b36"]
credentialYes-NoId ID of the credential"6b235b333a1b"
propertiesYes-NoConfiguration object
Web driver
webDriverImplementationYesCHROMENoWeb driver implementation you want to use, this . This is related to the browser that will be controlled by Selenium. (CHROME, FIREFOX)"CHROME"
webDriverPathYes-No

Set the path to the web driver. Once a driver implementation is selected, a path to the executable driver must be provided.

"/driver/chromedriver.exe"
headlessModeNotrueNo

Set the headless mode if required, if . If the connector is running on headless mode, UI window will not be displayed for the browser.

true
Scope
crawlScopeyesHOSTNo

Determines the scope of the crawl.  The values are HOST, EVERYTHING, CUSTOM.

  • Everything will allow all URLs to be crawled.
  • Host only will allow only URLs like 'abc.domain.com' when your seeds contain 'abc.domain.com'.
  • Custom scope allows you to add one or more regular expression patterns to match the host name.

These scopes can be extended by the Include patterns.

"HOST"
scopePatternyes for crawl scope "CUSTOM"-yesFor Custom scope patterns. Any URL matching the following patterns will be included as part of the scope. Pattern evaluated against the document URL.[".*\\.example.com",".*\\.another.com"]
obeyRobotsnotrueNoIf checked, the crawler will obey the robots.txt restrictions of each site.true
caseSensitiveUrlsnotrueNoIf unchecked, all URLs will be transformed into lower-case before processing.true
maxHopsyes5NoCrawl Depth. How many hops from the seed is the crawler allowed to follow links.5
includesno-YesThe document will be processed by the connector if it matches one of the following patterns. Pattern evaluated against the document URL.".*\\.pdf"
excludesno-YesThe document will not be processed by the connector if it matches one of the following patterns. Pattern evaluated against the document URL.".*\\.xml"
Document processing
cleanupRuleno-Yes (see fields below)Content cleanup rules. Specific behavior will apply to the URLs that match the following patterns.
patternyes-NoThe URL will be matched against this pattern to check if it should be cleansed.".*\.xml$"
contentTypeyes-NoRegular expression evaluated against the document mime type to check if the document should be cleansed."text/html\\.*"
noIndexClassno-NoComma separated list of CSS classes that will be removed from the page content."noindex, nofollow"
cleanupPatternno-NoRegular expression to remove matching text from the page the content."<!-- noindex -->.*<!-- /noindex -->"






pageSettingsno-yes (see fields below)Page settings. The crawler will apply the following behavior to the URLs that match the patterns.
urlPatternyes-noThe URL will be matched against this pattern to check if it should be cleansed.".*\\.xml$"
cooldownyes1snoTime to wait the page to finish loading before further processing."1s"
useLinkExtractionScriptnofalsenoOverride link extraction logic. Check this field to override the way the crawl extracts the links from a page.no
linkExtractionScriptno-noLink extraction script. Script with the instructions to extract the links of a page. (see the code block below)
Code Block
languagejava
/* Add the discovered URLs to the variable 'discoveredUrls'
 *
 * Avalable variables:
 *  - seedId: String, Id of the current seed
 *  - logger: ALogger, logger implementation
 *  - discoveredUrls: List<String>, List with all the urls discovered in the current page
 */
import com.accenture.aspire.framework.utilities.StringUtilities;
import org.openqa.selenium.WebDriverException;

def pageLinks = [];

try {
    pageLinks = driver.findElements(By.tagName("a"));
}
catch (WebDriverException wde) {
    /* Do Nothing */
}

try {
    pageLinks = pageLinks + driver.findElements(By.xpath("//*[@src]"));
}
catch (WebDriverException wde) {
    /* Do Nothing */
}

pageLinks.each { url ->
    String link = url.getAttribute("href");
    
    if (StringUtilities.isEmpty(link))
        link = url.getAttribute("src");
    
    discoveredUrls.add(link);
}
Crawler
maxContentSizeyes10mbNoMax content size for a page. Maximum content size allowed to be fetched.15mb
showNon200AsErrorsnotrueNoShow 400s and 500s status codes as errors. Uncheck if you want to only mark those URLs as "excluded" instead of "errored".true
stopOnScanErrornotrueNoStop on scan error. If unchecked, scan errors will stop the crawl from continuingtrue
logCrawledUrlsnotrueNoLog crawled URLs. If checked a log with all the crawled urls URLs will be created.true
debugContentOutputnofalseNoWrite contents to file. If checked, the crawler will write every page in the local file system. The folder to where the files will be created is \"data/CONTENT-SOURCE-NAME/output\".true
incrementalUrlCleanupRegexno-NoUrl URL cleanup for incremental. Regex for cleaning up the url in case of dynamicaly generated parameters. This is to prevent the incremental crawls to consider the urls URLs as different documents when the only difference are dynamic parameters. For example http://myhost/my-page.html?mydinamic=123456 gets transformed to http://myhost/my-page.html for incremental purposes, but the original url URL is still going to be used for fetching."\\?.*"
excludeMultimedianotrueNoReject Images / Videos / Javascript / CSS. If checked js, css, swf, gif, png, jpg, jpeg, bmp, mp3, mp4, avi, mpg and mpeg files will be excluded from the crawl.false
Network
connectionTimeoutyes10sNoTimeout used when a connection is established.20s
connectionRequestTimeoutyes10sNoTimeout used when requesting a connection from the connection manager.15s
socketTimeoutyes10snoTimeout used for waiting for data. (Maximum period inactivity between two consecutive data packets.)10s
useProxyno (see fields below)falsenoUse proxy. If checked, the crawler will connect through a proxy.yes
proxyHostyes-noProxy hostname"your-proxy.domain.com"
proxyPortyes8080noProxy port8080
proxyAuthenticationno (if yes see fields below)falsenoUse proxytrue
proxyUseryes-noProxy username"user"
proxyPasswordyes-noProxy password"password"
Security
staticAclno-yes (see fields below)Static ACLs. These ACLs will be added to all of the documents.
nameyes-noname of the ACL"john.doe"
domainno-noDomain to which ACL belongs to"domain"
entitynousernoWhether or not this ACL is for a group or a user. (user/group)"group"
accessnoallownoWhether or not this ACL will have access to crawled files (allow/deny)"deny"

Example

Code Block
themeRDark
titlePOST aspire/_api/connections
{
   "type": selenium,
   "description": "Selenium",
   "properties": {
     "webDriverImplementation": "CHROME",
     "webDriverPath": "/tmp/ach1/driver/chromedriver.exe",
     "headlessMode": true,
     "crawlScope": "CUSTOM",
     "scopePattern": [
       ".*\\.example.com",
       ".*\\.another.com"
     ],
     "obeyRobots": true,
     "caseSensitiveUrls": true,
     "maxHops": 5,
     "includes": ".*\\.pdf",
     "excludes": ".*\\.xml",
     "pageSettings": [
       {
         "urlPattern": ".*\\.xml$",
         "cooldown": "1s",
         "useLinkExtractionScript": true,
         "linkExtractionScript": "/* Add the discovered URLs to the variable 'discoveredUrls'\r\n *\r\n * Avalable variables:\r\n *  - seedId: String, Id of the current seed\r\n *  - logger: ALogger, logger implementation\r\n *  - discoveredUrls: List<String>, List with all the urls discovered in the current page\r\n */\r\nimport com.accenture.aspire.framework.utilities.StringUtilities;\r\nimport org.openqa.selenium.WebDriverException;\r\n\r\ndef pageLinks = [];\r\n\r\ntry {\r\n    pageLinks = driver.findElements(By.tagName(\"a\"));\r\n}\r\ncatch (WebDriverException wde) {\r\n    /* Do Nothing */\r\n}\r\n\r\ntry {\r\n    pageLinks = pageLinks + driver.findElements(By.xpath(\"//*[@src]\"));\r\n}\r\ncatch (WebDriverException wde) {\r\n    /* Do Nothing */\r\n}\r\n\r\npageLinks.each { url ->\r\n    String link = url.getAttribute(\"href\");\r\n    \r\n    if (StringUtilities.isEmpty(link))\r\n        link = url.getAttribute(\"src\");\r\n    \r\n    discoveredUrls.add(link);\r\n}"
         },
       {
        "urlPattern": ".*tmp[^/]$",
        "cooldown": "1s",
        "useLinkExtractionScript": false
        }
      ],
      "cleanupRule": [
        {
          "pattern": ".*\\.xml$",
          "contentType": "text/html\\.*",
          "noIndexClass": "",
          "cleanupPattern": ""
         },
         {
           "pattern": ".*tmp[^/]$",
           "contentType": "text/html\\.*",
           "noIndexClass": "",
           "cleanupPattern": ""
          }
        ],
        "maxContentSize": "10mb",
        "showNon200AsErrors": false,
        "stopOnScanError": true,
        "logCrawledUrls": true,
        "debugContentOutput": false,
        "incrementalUrlCleanupRegex": "",
        "excludeMultimedia": true,
        "connectTimeout": "10s",
        "connectionRequestTimeout": "10s",
        "socketTimeout": "10s",
        "useProxy": true,
        "proxyHost": "your-proxy.domain.com",
        "proxyPort": 8080,
        "proxyAuthentication": true,
        "proxyUser": "admin",
        "proxyPassword": "passord",
        "staticAcl": [
          {
           "name": "john.doe",
           "domain": "",
           "entity": "user",
           "access": "allow"
          },
          {
            "name": "jana.doe",
            "domain": "",
            "entity": "user",
            "access": "allow"
          }
        ]
     }
}

Update Connection


FieldRequiredDefaultMultipleNotesExample
idYes-NoId ID of the connection to update"d442adcab4b0",
descriptionNo-NoName of the connection object."My Selenium Connection"
throttlePolicyNo-NoId ID of the throttle policy that applies to this connection object."b3a9-6b235b333a1b"
routingPoliciesNo[ ]YesThe ids IDs of the routing policies that this connection will use.["17f75ce7d0c7", "d42780003b36"]
credentialNo-NoId ID of the credential"6b235b333a1b"
propertiesNo-NoConfiguration object
(see create connection)




Example

Code Block
themeRDark
titlePUT aspire/_api/connections/89d6632a-a296-426c-adb0-d442adcab4b0
{
   "id": "89d6632a-a296-426c-adb0-d442adcab4b0",
   "description": "Selenium",
    "properties": {
     "webDriverImplementation": "CHROME",
     "webDriverPath": "/tmp/ach1/driver/chromedriver.exe",
     "headlessMode": true,
     "crawlScope": "CUSTOM",
     "scopePattern": [
       ".*\\.example.com",
       ".*\\.another.com"
     ],
     "obeyRobots": true,
     "caseSensitiveUrls": true,
     "maxHops": 5,
     "includes": ".*\\.pdf",
     "excludes": ".*\\.xml",
     "pageSettings": [
       {
         "urlPattern": ".*\\.xml$",
         "cooldown": "1s",
         "useLinkExtractionScript": true,
         "linkExtractionScript": "/* Add the discovered URLs to the variable 'discoveredUrls'\r\n *\r\n * Avalable variables:\r\n *  - seedId: String, Id of the current seed\r\n *  - logger: ALogger, logger implementation\r\n *  - discoveredUrls: List<String>, List with all the urls discovered in the current page\r\n */\r\nimport com.accenture.aspire.framework.utilities.StringUtilities;\r\nimport org.openqa.selenium.WebDriverException;\r\n\r\ndef pageLinks = [];\r\n\r\ntry {\r\n    pageLinks = driver.findElements(By.tagName(\"a\"));\r\n}\r\ncatch (WebDriverException wde) {\r\n    /* Do Nothing */\r\n}\r\n\r\ntry {\r\n    pageLinks = pageLinks + driver.findElements(By.xpath(\"//*[@src]\"));\r\n}\r\ncatch (WebDriverException wde) {\r\n    /* Do Nothing */\r\n}\r\n\r\npageLinks.each { url ->\r\n    String link = url.getAttribute(\"href\");\r\n    \r\n    if (StringUtilities.isEmpty(link))\r\n        link = url.getAttribute(\"src\");\r\n    \r\n    discoveredUrls.add(link);\r\n}"
         },
       {
        "urlPattern": ".*tmp[^/]$",
        "cooldown": "1s",
        "useLinkExtractionScript": false
        }
      ],
      "cleanupRule": [
        {
          "pattern": ".*\\.xml$",
          "contentType": "text/html\\.*",
          "noIndexClass": "",
          "cleanupPattern": ""
         },
         {
           "pattern": ".*tmp[^/]$",
           "contentType": "text/html\\.*",
           "noIndexClass": "",
           "cleanupPattern": ""
          }
        ],
        "maxContentSize": "10mb",
        "showNon200AsErrors": false,
        "stopOnScanError": true,
        "logCrawledUrls": true,
        "debugContentOutput": false,
        "incrementalUrlCleanupRegex": "",
        "excludeMultimedia": true,
        "connectTimeout": "10s",
        "connectionRequestTimeout": "10s",
        "socketTimeout": "10s",
        "useProxy": true,
        "proxyHost": "your-proxy.domain.com",
        "proxyPort": 8080,
        "proxyAuthentication": true,
        "proxyUser": "admin",
        "proxyPassword": "passord",
        "staticAcl": [
          {
           "name": "john.doe",
           "domain": "",
           "entity": "user",
           "access": "allow"
          },
          {
            "name": "jana.doe",
            "domain": "",
            "entity": "user",
            "access": "allow"
          }
        ]
     } }

Create Connector


For the creation of the Connector object using the Rest API, check this page

Update Connector


For the update of the Connector object using the Rest API, check this page

Create Seed


FieldRequiredDefaultMultipleNotesExample
seedYes-NoThe name of the database. It will replace the marker {DATABASE} used in the field jdbcUrl of connection objectURL where the crawl will start."https://example.com"test_db"
typeYes-NoThe value must be "rdb-snapshotselenium"."rdb-snapshotselenium"
descriptionYes-NoName of the seed object."My RDB SeedSelenium seed"
connectorYes-NoThe id ID of the connector to be used with this seed. The connector type must match the seed type."e3ca414b0d31"
connectionYes-NoThe id ID of the connection to be used with this seed. The connection type must match the seed type."e4a663fe9ee6"
workflowsNo[ ]YesThe ids IDs of the workflows that will be executed for the documents crawled.["5696c3f0bda4"]
throttlePolicyNo-NoId ID of the throttle policy that applies to this seed object."6b235b333a1b"
routingPoliciesNo[ ]YesThe ids IDs of the routing policies that this seed will use.["17f75ce7d0c7", "d42780003b36"]
tagsNo[ ]YesThe tags of the seed. These can be used to filter the seed["tag1", "tag2"]
propertiesYes-NoConfiguration object
fullSQLYes (this or discoverySQL + extractionSQL)-NoThe "SELECT" query to be run to retrieve all documents. This query is used for full or incremental scans. The "WHERE" clause can be used to specify any required condition for crawling the desired documents. Any change to any column selected in this SQL will cause the document to be re-indexed. For example "SELECT idCol, col1, col2, col3 FROM data_table" When slicing is enabled, add a "WHERE" clause containing "{SLICES}". For example "SELECT idCol, col1, col2, col3 FROM data_table WHERE {SLICES}" ."SELECT * FROM table"
discoverySQLYes (this or fullSQL)-NoThe "SELECT" query to run for discovering documents. This query is used for full or incremental scans. A "WHERE" clause can be used to specify any required condition for crawling the desired documents. A change to any column selected in this SQL will cause the document to be re-indexed. For example: "SELECT idCol, lastModifiedDate FROM data_table". When slicing is enabled, add a "WHERE" clause containing "{SLICES}". For example: "SELECT idCol, col1 FROM data_table WHERE {SLICES}"SELECT id, lastModified FROM table"
extractionSQLYes (this or fullSQL)-No

"SELECT" query for extracting all data for each document found in the Discovery SQL. At the least, you MUST include a "WHERE" clause containing the expression "idColumnName IN {IDS}", where idColumnName corresponds to a unique key field name. {IDS} is replaced automatically by the connector with the corresponding unique key values. For example: "SELECT col1, col2, col3 FROM data_table WHERE idCol in {IDS}" You must not include the {SLICES} condition here.

"SELECT * FROM table WHERE id IN {IDS}"
idColumnYes-NoThe column name that holds the unique key. The default name of the column which holds the value to use as the document id. This column must be present in both discoverySQL and extractionSQL. SQL aliases are NOT supported."id"
stringIdColumnNofalseNoCheck if the unique key is a string valuetrue
quoteIdNodoNotQuoteNoQuote id column - use if you have a name clashing with RDBMS keywords. You can use one of the values: doNotQuote, `, "doNotQuote
ACL
aclColumnYes (aclColumn or aclSQL)-NoThe column name that holds the ACLs. Each ACL must be separated by semi-colons and must follow this format: my-domain\userOrGroup@NT"acl"
aclSQLYes (aclColumn or aclSQL)-NoThe query to use for extracting and building ACLs. This query depends of the Database engine, so the syntax could vary. For example on Oracle: SELECT 'my-domain\\' || user || '@NT;' FROM myTable"SELECT * FROM table_acl"

Example

Code Block
themeRDark
titlePOST aspire/_api/seeds
{
  "seed":"test_db",
  "type":"rdb-snapshot",
  "description" : "RDB_TEST",
  "properties" : {
      "idColumn" : "film_id",
      "stringIdColumn" : false,
      "aclSQL" : null,
      "aclColumn" : "acl",
      "quoteId" : "doNotQuote",
      "discoverySQL" : "SELECT film_id, title FROM film",
      "extractionSQL" : "SELECT * FROM film WHERE film_id IN {IDS}",
      "fullSQL" : null"
  }
}

isSitemapnofalsenoSitemap URL. Check if the start URL is for a sitemap.false

Example

Code Block
themeRDark
titlePOST aspire/_api/seeds
{
    "type": "selenium",
    "seed": "https://www.autoopravna-lahoda.cz/",
    "connector": "82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31",
    "description": "Selenium_Test_Seed",
    "throttlePolicy": "6b8b5f23-fc77-47a1-9b58-106577162e7b",
    "routingPolicies": ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"],
    "connection": "602d3700-28dd-4a6a-8b51-e4a663fe9ee6",
    "workflows": ["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"],
    "tags": ["tag1", "tag2"],
    "properties": {          
       "isSitemap": false
    }
}

Update Seed


FieldRequiredDefaultMultipleNotesExample
idYes-NoID of the seed to update."2f287669-d163-4e35-ad17-6bbfe9df3778"
seedYes-NoURL where the crawl will start."https://example.com"
descriptionNo-NoName of the seed object."My<connector>Seed"
connectorNo-NoThe ID of the connector to be used with this seed. The connector type must match the seed type."82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31"
connectionNo-NoThe ID of the connection to be used with this seed. The connection type must match the seed type."602d3700-28dd-4a6a-8b51-e4a663fe9ee6"
workflowsNo[ ]YesThe IDs of the workflows that will be executed for the documents crawled.["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"]
throttlePolicyNo-NoID of the throttle policy that applies to this connection object."f5587cee-9116-4011-b3a9-6b235b333a1b"
routingPoliciesNo[ ]YesThe IDs of the routing policies that this seed will use.["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"]
tagsNo[ ]YesThe tags of the seed. These can be used to filter the seed["tag1", "tag3"]
propertiesYes-NoConfiguration object
(see create seed

Update Seed

FieldRequiredDefaultMultipleNotesExample
idYes-NoId of the seed to update"2f287669-d163-4e35-ad17-6bbfe9df3778"
(see the "Create seed" for other fields)




Example

Code Block
themeRDark
titlePUT aspire/_api/seeds/2f287669-d163-4e35-ad17-6bbfe9df3778
{
    "id": "2f287669-d163-4e35-ad17-6bbfe9df3778-ad17-6bbfe9df3778",
    "seed": "https://example.com",
    "connector": "82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31",
  "seed":"test_db",
  "description" : "RDBSelenium_Test_TESTSeed",
    "propertiesthrottlePolicy" : {
 "6b8b5f23-fc77-47a1-9b58-106577162e7b",
     "idColumnroutingPolicies" : "film_id",
      "stringIdColumn" : false,
 ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"],
     "aclSQLconnection" : null"602d3700-28dd-4a6a-8b51-e4a663fe9ee6",
      "aclColumnworkflows" : "acl",
      "quoteId" : "doNotQuote",
  ["b255e950-1dac-46dc-8f86-1238b2fbdf27", "f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"],
    "discoverySQLtags" : "SELECT film_id, title FROM film",
  ["tag", "tag2"],
    "extractionSQLproperties" : "SELECT{ *  FROM  film  WHERE
  film_id IN {IDS}",
      "fullSQLisSitemap" : null"false
      }
}