he Post HDFS Stage stage writes key/value pairs into HDFS where the key is a user-defined field from the job's AspireObject (or the job id, if the key is not defined) and the value is the AspireObject of the job. Each key/value pair will be written to a single file until a file size threshold is reached. A new file is then created with a sequential id (i.e. aspire-00000, aspire-00001, aspire-00002, ..., aspire-N).
Communication to HDFS will be through the HDFS API FileSystem methods.
This section lists all configuration parameters available to configure the Post HDFS component.
|hdfsUrl||String||hdfs://localhost:8020||The HDFS Namenode URL.|
|folderPath||String||The path within the HDFS server where the files will be stored. If empty, the user home folder will be used.|
|filePrefixName||String||aspire||The prefix of the name of the files that will be stored. Each file name will be completed with a sequential counter value. (I.e. aspíre-00000).|
|fileSize||long||HDFS Default Block Size||The max size of each file to be created. When the file size is reached, a new file is created.|
|outputKey||String||An AXPath of the metadata field to use as the output key.|
|ignoreAspireBatch||boolean||true||Tells the component whether or not create a new file for each Aspire batch. NOTE: If this is false and Aspire Job batching is enabled, the fileSize value will be ignored and each file will contain exactly as many key/value pairs as the batch size.|
|timeout||int||30000||Time in milliseconds to wait until the file can be closed, after the last job has been processed.|
This section provides an example of Post HDFS configuration to a local HDFS server.