We support these crawled repositories authentication types:
Field | Required | Default | Multiple | Notes | Example |
---|---|---|---|---|---|
type | Yes | - | No | The value must be "rest-api". | "rest-api" |
description | Yes | - | No | Name of the credential object. | "My REST Credential" |
properties | Yes | - | No | Configuration object | |
type | yes | - | No | Authentication type: basic, apiToken, bearer, none, | basic |
type: basic | |||||
loginAccount | Yes | - | No | User name. | "admin" |
password | Yes | - | No | Password (can be encrypted in Aspire fashion) | "adminPassword" |
type: apiToken | |||||
headerName | yes | - | No | The name of HTTP header field to be sent with a request | "tokenName1" |
headerValue | yes | - | No | The value of the "headerName" field | "tokenValue1" |
type: bearer | |||||
preExpirationLimitInMs | Yes | 0 | No | Pre expiration limit. The time (in ms) used for calculating when to ask for the new accessToken | 5000 |
query | yes | - | No | bearer query: JSON object representing the query to be sent for getting the accessToken | |
urlTemplate | yes | - | No | The context path of the URL | "/login" |
method | yes | - | No | HTTP method. Must be POST in this version | "POST" |
body | yes | - | No | The query body. Fields ${loginAccount}, ${password} are expected to be used as a part of the body. | "{\"username\" : \"${username}\",\"password\" : \"${password}\"}" |
queryType | yes | - | No | Use the value "metadataExtraction" here | "metadataExtraction" |
resultField | yes | - | No | The field in the response with the access token | "accessToken" |
loginAccount | Yes | - | No | User name. Used as a value for ${loginAccount} query body field | "admin" |
password | Yes | - | No | Password. Used as a value for ${password} query body field | "adminPassword" |
{ "type": "rest-api", "description": "My credential", "properties": { "type": "bearer", "query": { "urlTemplate": "/login", "method": "POST", "body": "{\"username\" : \"${username}\",\"password\" : \"${password}\"}", "queryType": "metadataExtraction", "resultField": "accessToken" "username": "admin", "password": "encrypted:xxxxx", } } }
Field | Required | Default | Multiple | Notes | Example |
---|---|---|---|---|---|
type | Yes | - | No | The value must be "rest-api". | "rest-api" |
description | Yes | - | No | Name of the connection object. | "My REST Connection" |
throttlePolicy | No | - | No | Id of the throttle policy that applies to this connection object. | "6b235b333a1b" |
routingPolicies | No | [ ] | Yes | The ids of the routing policies that this connection will use. | ["17f75ce7d0c7", "d42780003b36"] |
credential | Yes | - | No | Id of the credential | "6b235b333a1b" |
properties | Yes | - | No | Configuration object | |
baseUrl | Yes | - | No | Your rest service API url | "https://your-service/api/v2/" |
trustAllCertificates | Yes | false | No | If selected, no HTTPS certificate validation will be done. | true |
{ "type": "rest-api", "description": "Rest conn 3", "credential": "0b6fd9c8-d722-4874-aca1-e57c6eff2089", "properties": { "baseUrl": "http://aspire_manager:50443/aspire/_api" } }
Field | Required | Default | Multiple | Notes | Example |
---|---|---|---|---|---|
id | Yes | - | No | Id of the connection to update | "d442adcab4b0", |
description | No | - | No | Name of the connection object. | "My REST Connection" |
throttlePolicy | No | - | No | Id of the throttle policy that applies to this connection object. | "b3a9-6b235b333a1b" |
routingPolicies | No | [ ] | Yes | The ids of the routing policies that this connection will use. | ["17f75ce7d0c7", "d42780003b36"] |
credential | No | - | No | Id of the credential | "6b235b333a1b" |
properties | No | - | No | Configuration object | |
(see create connection) |
{ "id": "89d6632a-a296-426c-adb0-d442adcab4b0", "description": "REST connection", "properties": { "baseUrl": "http://aspire_manager:50443/aspire/_api" } }
Field | Required | Default | Multiple | Notes | Example |
---|---|---|---|---|---|
seed | Yes | - | No | The name of the database. It will replace the marker {DATABASE} used in the field jdbcUrl of connection object | "test_db" |
type | Yes | - | No | The value must be "rdb-snapshot". | "rdb-snapshot" |
description | Yes | - | No | Name of the seed object. | "My RDB Seed" |
connector | Yes | - | No | The id of the connector to be used with this seed. The connector type must match the seed type. | "e3ca414b0d31" |
connection | Yes | - | No | The id of the connection to be used with this seed. The connection type must match the seed type. | "e4a663fe9ee6" |
workflows | No | [ ] | Yes | The ids of the workflows that will be executed for the documents crawled. | ["5696c3f0bda4"] |
throttlePolicy | No | - | No | Id of the throttle policy that applies to this seed object. | "6b235b333a1b" |
routingPolicies | No | [ ] | Yes | The ids of the routing policies that this seed will use. | ["17f75ce7d0c7", "d42780003b36"] |
tags | No | [ ] | Yes | The tags of the seed. These can be used to filter the seed | ["tag1", "tag2"] |
properties | Yes | - | No | Configuration object | |
crawlRules | yes | - | yes | Crawl rules | |
condition | No | - | No | Groovy condition to determine which items should execute this set of queries. Groovy script to determine if a given item should execute this set of queries. The following matches the root item: item.getType().toString().equals('root') The following matches any extracted entity from a scan: item.getType().toString().equals('entity') | "item.getType().toString().equals('root')" |
shouldStop | No | false | No | If selected, then no other queries will be executed for the given item. | true |
shouldIndex | No | false | No | If selected, the item matching this crawl rule will be indexed. | true |
queries | No | - | yes | Queries to execute inside the rule | |
urlTemplate | Yes | - | No | The query to execute. If ${metadataParameter} is found inside the field it will be replaced with a specific value (for example from the scan result entity) | "/serviceEndpoint/${name}" |
method | Yes | - | No | HTTP method. Options: GET, POST, PUT | "GET" |
body (if method POST or PUT) | Yes (if method is POST or PUT) | - | No | The body of the POST or PUT body. Can include parameters to be replaced as: ${param1.paramA} | "{\"username\" : \"${username}\",\"password\" : \"${password}\"}" |
contentType (if method POST or PUT) | No | json | no | The body mime type: json/xml/text | "xml" |
queryType | yes | - | no | The query type: scan/metadataExtraction/binaryFetch | "scan" |
Scan | |||||
childrenPath | No | response | No | Extraction path. The path to the response array that contains the children to extract. For example if the response comes as {"response":{"entitities":[{1},{2},{..},{n}]}} response.entities should be used. If the array is the response, then leave this field empty | "response.entities" |
idField | Yes | - | No | Child ID field. Field within each child holding its ID. For example if each child has the following structure: {"entity":{"entityId":"abc-ef-1234"}, "att1":"val1"} then entity.entityId should be used | "entity.entityId" |
signatureFields | No | - | Yes | Incremental configuration signature fields | |
path | yes | - | no | Signature Json Path (e.g. $.attribute). Json path to extract fields to use as signature. Check out https://github.com/json-path/JsonPath for JsonPath documentation | "$.attribute" |
Extended signatures | |||||
extendedSignature | no | false | no | Use this option if extra requests must be executed to obtain metadata needed to calculate modifications properly. Use this option carefully as this decreases the performance upon incremental crawls linearly. | true |
queries | no | - | yes | Queries | |
queryType | yes | - | no | Query type - must be "metadataExtraction" | "metadataExtraction" |
urlTemplate | yes | - | no | The query to execute | "/serviceEndpoint/${metadataParameter}" |
method | Yes | - | No | HTTP method. Options: GET, POST, PUT | "GET" |
body (if method POST or PUT) | Yes (if method is POST or PUT) | - | No | The body of the POST or PUT body. Can include parameters to be replaced as: ${param1.paramA} | "{\"username\" : \"${username}\",\"password\" : \"${password}\"}" |
contentType (if method POST or PUT) | No | json | no | The body mime type: json/xml/text | "xml" |
signatureFields | No | - | Yes | Signature fields | |
path | yes | - | no | Signature Json Path (e.g. $.attribute). Json path to extract fields to use as signature. Check out https://github.com/json-path/JsonPath for JsonPath documentation | "$.attribute" |
resultField | yes | - | no | Internal name of metadata where the the results will be extracted into |
{ "seed":"test_db", "type":"rdb-snapshot", "description" : "RDB_TEST", "properties" : { "idColumn" : "film_id", "stringIdColumn" : false, "aclSQL" : null, "aclColumn" : "acl", "quoteId" : "doNotQuote", "discoverySQL" : "SELECT film_id, title FROM film", "extractionSQL" : "SELECT * FROM film WHERE film_id IN {IDS}", "fullSQL" : null" } }
Field | Required | Default | Multiple | Notes | Example |
---|---|---|---|---|---|
id | Yes | - | No | Id of the seed to update | "2f287669-d163-4e35-ad17-6bbfe9df3778" |
(see the "Create seed" for other fields) |
{ "id": "2f287669-d163-4e35-ad17-6bbfe9df3778", "seed":"test_db", "description" : "RDB_TEST", "properties" : { "idColumn" : "film_id", "stringIdColumn" : false, "aclSQL" : null, "aclColumn" : "acl", "quoteId" : "doNotQuote", "discoverySQL" : "SELECT film_id, title FROM film", "extractionSQL" : "SELECT * FROM film WHERE film_id IN {IDS}", "fullSQL" : null" } }