The Load HDFS stage gets an AspireObject associated with a key from HDFS and set it into the incoming AspireObject. When stage gets initialized it stores a MapDB cache of the key/values inside HDFS making future retrieves faster.
Communication to HDFS will be through the HDFS API FileSystem methods.
Configuration
Element | Type | Default | Description |
---|---|---|---|
hdfsLocation | String | hdfs://localhost:8020 | The HDFS Namenode URL. |
folderPath | String | The path within the HDFS server where the files will be retrieved from. | |
dbFile | String | data/${app.bundle.name}/jobs.mapdb | The folder where the MapDB files are going to be stored. |
warmOnStartup | boolean | false | Whether or not to warm the MapDB cache on initialization |
keyField | String | hdfsKey | The field name where the key to retrieve comes from the incoming AspireObject |
outputField | String | hdfsValue | The field from the incoming AspireObject where the result of the retrieve is going to store the resulting AspireObject. |
Example
This section provides an example of Load HDFS configuration to a local HDFS server.
<component name="PostHDFS" subType="default" factoryName="aspire-hadoop-hdfs"> <hdfsLocation>hdfs://localhost:8020/</hdfsLocation> <folderPath>/user/jsmith/test/</folderPath> <warmOnStartup>true</warmOnStartup> <dbFile>jobs.mapdb</dbFile> <keyField>hdfsKey</keyField> <outputField>hdfsValue</outputField> </component>
Output
<doc> <hdfsKey>Monday</hdfsKey> <hdfsValue> <doc> <weekDay>Monday</weekDay> <name>jsmith</name> <date>2013/07/16</date> <url>http://www.searctechnologies.com/products/we-are-great.html</url> </doc> </hdfsValue> </doc>
Overview
Content Tools