The REST connector can retrieve data from any JSON-based REST endpoint. It is configured to query a base endpoint, extract JSON elements from its response and send the element as an individual document. Each extracted entity can be enriched with more metadata from the same endpoint, or even recursively scan for more contents based on each entity.
The connector configuration is based on crawl rules, each rule is evaluated for every entity discovered. If an entity matches a crawl rule, then it executes the list of requests configured for that rule. There are three types of requests: scan (to discover new entities), metadata extraction (to enrich the current entity with more data), binary fetch (to fetch documents associated with the current entity).
Each request may be executed with entity-specific metadata. For example, if a metadata enrichment needs to execute GET /entities/${entityId}, then ${entityId} may be configured to be replaced with a known field from the source entity, such as its ID.
Name | Supported |
---|---|
Content Crawling | yes |
Identity Crawling | no |
Snapshot-based Incrementals | yes |
Non-snapshot-based Incrementals | no |
Document Hierarchy | yes |
The connector cannot paginate if the links to each page are given by a page link which must be followed. This feature may be added in the future.
The connector cannot make external requests to sites outside the base REST Endpoint. This may be added in the future.
The connector does not extract ACLs without explicit configuration, this is because there isn't a single standard on how REST Endpoints should present permissions data.