Page History

Versions Compared

Old Version 1

changes.mady.by.user Andres Aguilar

Saved on Dec 06, 2016

compared with

New Version 2

changes.mady.by.user Andres Aguilar

Saved on Dec 06, 2016

Key

This line was added.
This line was removed.
Formatting was changed.

The Hadoop Distributed File system (HDFS) connector will crawl content from any given HDFS Cluster using the WebHDFS http interface.

Features

...

Some of the features of the HDFS connector include:

Performs incremental crawling (so that only new/updated documents are indexed)
Metadata extraction
Is search engine independent
Runs from any machine with HTTP access to the given HDFS Namenode
Filter the crawled documents by paths (including file names) using regex patterns
Supports Archive file processing, for more info visit Archive files processing