...
- Performs incremental crawling (so that only new/updated documents are indexed)
- Metadata extraction
- Is search engine independent
- Runs from any machine with HTTP access to the given HDFS Namenode
- Filter Filters the crawled documents by paths (including file names) using regex patterns
- Supports Archive file processing, ; for more info information, visit Archive files processing
{"serverDuration": 67, "requestCorrelationId": "2c6b363a1c2e4a02"}