The Hadoop Distributed File system (HDFS) connector will crawl content from any given HDFS Cluster using the WebHDFS http interface.

Features


Some of the features of the HDFS connector include:

WebHDFS Operations


Only two operations are used by this connector:

http://<host>:<port>/webhdfs/v1/<path>?op=OPEN

http://<host>:<port>/webhdfs/v1/<path>?op=LISTSTATUS