Page History
The Post WebHDFS Stage stage writes key/value pairs into HDFS where the key is a user-defined field from the job's AspireObject (or the job id, if the key is not defined) and the value is the AspireObject of the job. Each key/value pair will be written to a single file until a file size threshold is reached. A new file is then created with a sequential id (i.e. aspire-00000, aspire-00001, aspire-00002, ..., aspire-N).
Communication to HDFS will be through the WebHDFS REST API.
Post WebHDFS | |
---|---|
Factory Name | com.searchtechnologies.aspire:aspire-post-webhdfs |
subType | default |
Inputs | An AspireObject with the metadata of each document to be posted and a key (optional). |
Outputs | A HDFS file entry consisting of the key and a JSON representation of the AspireObject as the value. |
Configuration
This section lists all configuration parameters available to configure the Post WebHDFS component.
Element | Type | Default | Description | ||
---|---|---|---|---|---|
hdfsUrl | String | http://localhost:8020 | The HDFS Namenode URL. | ||
folderPath | String | The path within the HDFS server where the files will be stored. If empty, the user home folder will be used. | |||
filePrefixName | String | aspire | The prefix of the name of the files that will be stored. Each file name will be completed with a sequential counter value. (I.e. aspíre-00000). | ||
username | String | The username to set to the HTTP calls. | |||
fileSize | long | 64*1024*1024 (64Mb) | The max size of each file to be created. When the file size is reached, a new file is created. | ||
outputKey | String | An AXPath of the metadata field to use as the output key. | |||
ignoreAspireBatch | boolean | true | Tells the component whether or not create a new file for each Aspire batch.
| ||
timeout | int | 30000 | Time in milliseconds to wait until the file can be closed, after the last job has been processed. |
Example Configuration
This section provides an example of Post WebHDFS configuration to a local HDFS server.
Code Block | ||
---|---|---|
| ||
<component name="PostWebHDFS" subType="default" factoryName="aspire-post-webhdfs"> <hdfsUrl>http://localhost:8020/</hdfsUrl> <folderPath>/webhdfs/v1/user/jsmith/test/</folderPath> <filePrefixName>aspire-</filePrefixName> <username>jsmith</username> <outputKey>weekDay</outputKey> </component> |
Output
Panel | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Monday
Wednesday
|
HDFS Configuration Requirements
This component uses the APPEND operation to add data to the HDFS files, so your HDFS System must have it enabled at the config/hdfs-site.xml configuration file in your HDFS Server.