Features
The CDH HDFS publisher will send content in the form of Key/AspireObject entries to an specific folder in HDFS using either the HDFS API or the WebHDFS http interface.
Some of the features of the CDH HDFS publisher include:
- Works with CDH 5
- Runs on any machine with access to the HDFS cluster (Windows and Linux).
- The output key can be defined from an entry of the AspireObject of each document.
- AspireObjects are serialized/deserialized as JSON in the HDFS files.
- Output files size can be customized to take advantage of HDFS block sizes. Also makes it easier to move smaller files of a single collection.
Overview
Content Tools