HDFS Connector Introduction

Created by Andres Aguilar, last modified by user-1b188 on Dec 07, 2016

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

The Hadoop Distributed File system (HDFS) connector will crawl content from any given HDFS Cluster using the WebHDFS http interface.

Features

Some of the features of the HDFS connector include:

Performs incremental crawling (so that only new/updated documents are indexed)
Metadata extraction
Is search engine independent
Runs from any machine with HTTP access to the given HDFS Namenode
Filters the crawled documents by paths (including file names) using regex patterns
Supports Archive file processing; for more information, visit Archive files processing

No labels