Features

The CDH HDFS publisher will send content in the form of Key/AspireObject entries to an specific folder in HDFS using either the HDFS API or the WebHDFS http interface.

Some of the features of the CDH HDFS publisher include:

  • Works with CDH 5
  • Runs on any machine with access to the HDFS cluster (Windows and Linux).
  • The output key can be defined from an entry of the AspireObject of each document.
  • AspireObjects are serialized/deserialized as JSON in the HDFS files.
  • Output files size can be customized to take advantage of HDFS block sizes. Also makes it easier to move smaller files of a single collection.
  • No labels