Step 2b. Specify the Connector Information
Kafka Servers: The servers(s) to connect to, in <host>:<port> format. This can be a comma separated list of servers
Topic: The message stream to subscribe to.
Starting Offset for Full Crawls: Which messages to start fetching when starting a full crawl
- Earliest: Start from the earliest message available in the stream.
- Manually Specify Offsets: Manually specify the offset of the message to start from
- Partition: Partition number
- Offset: The message offset to start from
Note: If certain partitions are not specified, Kafka will default to fetching the earliest message.
Step 3: Initiate a Crawl
Now that the content source is set up, the crawl can be initiated.
- Click on the crawl type option to set it as "Full" (is set as "Incremental" by default and the first time it'll work like a full crawl. After the first crawl, set it to "Incremental" to crawl for any changes done in the repository).
- Click on the Start button.
During the Crawl
During the crawl, you can do the following:
- Click on the "Refresh" button on the Content Sources page to view the latest status of the crawl.
The status will show RUNNING while the crawl is going, and CRAWLED when it is finished.
- Click on "Complete" to view the number of documents crawled so far, the number of documents submitted, and the number of documents with errors.
If there are errors, you will get a clickable "Error" flag that will take you to a detailed error message page.
Note: For the Kafka connector, the behavior of the crawl is different between Incremental and Full crawls
- Full: Obtains all messages that are available during crawl start time, stop when these messages have been processed.
- Incremental: Obtain messages continuously, stop only when the content source is manually stopped or paused.