Panel | ||||||||
---|---|---|---|---|---|---|---|---|
|
The REST connector can retrieve data from any jsonJSON-based REST endpoint. It is configured to query a base endpoint, extract JSON elements from its response and send the element as an individual document. Each extracted entity can be enriched with more metadata from the same endpoint, or even recursively scan for more contents based on each entity.
The connector configuration is based on crawl rules, each rule is evaluated for every entity discovered. If an entity matches a crawl rule, then it executes the list of requests configured for that rule. There are three types of requests: scan (to discover new entities), metadata extraction (to enrich the current entity with more data), binary fetch (to fetch documents associated with the current entity).
Each request may be executed with entity-specific metadata. For example, for example if a metadata enrichment needs to execute GET /entities/${entityId}, then ${entityId} may be configured to be replaced with a known field from the source entity, such as its ID.
The following example illustrates how the connector works and how it should be configured. Let's say there is a REST API listing the information of all states (or provinces) of all the countries in the world:
There are 5 different endpoints:
Lists all countries in the world, and return the following response:
Code Block | ||||
---|---|---|---|---|
| ||||
{
"countries" : [
{
"id" : "crc",
"name" : "Costa Rica"
},
{
"id" : "cze",
"name" : "Czech Republic"
},
...
]
} |
Lists all states within a country, takes the country id as the parameter.
For instance GET /country/crc/states would return
Code Block | ||||
---|---|---|---|---|
| ||||
{
"states" : [
{
"id" : "crc-sj",
"name" : "San Jose"
},
{
"id" : "crc-her",
"name" : "Heredia"
},
...
]
} |
Given a country Id, returns specific country data
For instance GET /country/crc would return
Code Block | ||||
---|---|---|---|---|
| ||||
{
"officialName" : "República de Costa Rica",
"population" : "5000000",
"languages" : ["spanish", "bribri"],
"areaKmSqr" : "51100"
} |
Given a state Id, returns specific state data
For instance GET /state/crc-sj would return
Code Block | ||||
---|---|---|---|---|
| ||||
{
"officialName" : "Provincia de San José",
"population" : "2158898",
"languages" : ["spanish"],
"areaKmSqr" : "2044"
} |
Given a state id, downloads a PDF with the full history of that state.
Name | Supported |
---|---|
Content Crawling | yes |
Identity Crawling | no |
Snapshot-based Incrementals | yes |
Non-snapshot-based Incrementals | no |
Document Hierarchy | yes |
The connector cannot paginate if the links to each page are given by a page link which must be followed. This feature may be added in the future.
The connector cannot make external requests to sites outside the base REST Endpoint. This may be added in the future.
The connector does not extract ACLs without explicit configuration, this is because there isn't a single standard on how REST Endpoints should present permissions data.