Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Performs incremental crawling (so that only new/updated documents are indexed)
  • Metadata extraction
  • Is search engine independent
  • Runs from any machine with HTTP access to the given HDFS Namenode
  • Filters the crawled documents by paths (including file names) using regex patterns
  • Supports Kerberized Clusters by using a delegation token.
  • Supports Archive file processing; for more information, visit Archive files processing