Page History

Versions Compared

Key

This line was added.
This line was removed.
Formatting was changed.

...

Performs incremental crawling (so that only new/updated documents are indexed)
Metadata extraction
Is search engine independent
Runs from any machine with HTTP access to the given HDFS Namenode
Filter Filters the crawled documents by paths (including file names) using regex patterns
Supports Archive file processing, ; for more info information, visit Archive files processing