Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The Box Scanner component performs full and incremental scans over a Box repository, maintaining a stream position value that it is used to get all the events (update, add or delete) that happened slightly before to that stream position, up to the current stream_position. Updated content is then submitted to the configured pipeline in AspireObjects attached to Jobs. As well as the URL of the changed item, the AspireObject will also contain metadata extracted from the repository. Updated content is split into three types -add, update and delete-. Each type of content is published as a different event so that it may be handled by different Aspire pipelines.

The scanner reacts to an incoming job. This job may instruct the scanner to startstoppause orresume. Typically the start job will contain all information required by the job to perform the crawl. However, the scanner can be configured with default values via application.xml file. When pausing or stopping, the scanner will wait until all the jobs it published have completed before completing itself.

Box Scanner
Factory Namecom.searchtechnologies.aspire:aspire-box-scanner
InputsAspireObject from a content source submitter holding all the information required for a crawl
OutputsJobs from the crawl