You are viewing an old version of this page. View the current version.
Compare with Current
View Page History
« Previous
Version 3
Next »
The Hadoop Distributed File system (HDFS) connector will crawl content from any given HDFS Cluster using the WebHDFS http interface.
Features
Some of the features of the HDFS connector include:
- Performs incremental crawling (so that only new/updated documents are indexed)
- Metadata extraction
- Is search engine independent
- Runs from any machine with HTTP access to the given HDFS Namenode
- Filters the crawled documents by paths (including file names) using regex patterns
- Supports Archive file processing; for more information, visit Archive files processing