Page tree
Skip to end of metadata
Go to start of metadata

The Kinesis connector fetches data from Amazon Kinesis Data Streams.

On this page

Features

Some of the features of the Kinesis connector:

  • Support for incremental and full crawling (with limitations, see below)
  • Configurable starting point for data retrieval. Starting position can be specified using timestamps or sequence numbers
  • Is search engine independent. The content retrieved can be published by Aspire to any search engine
  • Runs from any machine that has access to Amazon Kinesis 


Content Retrieved


The Kinesis connector publishes all the data available with each record:

  • Data (as text)
  • Approximate arrival timestamp
  • Sequence number
  • Shard ID
  • Partition key

Limitations


Due to API limitations and the nature of Kinesis Data Streams itself, the Kinesis connector has the following limitations:

  • Due to the streaming nature of the data, crawls (whether full or incremental) run continuously without end, unless paused/stopped or when the shards that were picked up at the start of the crawl are closed due to a reshard operation.
  • Kinesis Data will be fetched as text data only.
  • Cannot adapt to resharding. To get all the data you will have to restart the crawl after a reshard operation.
  • Cannot keep track of expired (trimmed) messages. This means that the connector is unable to post update or delete operations to publishers.
  • No labels