The Hadoop Emit stage writes a key/value pair into a Hadoop Context. The Hadoop Context reference is in a job variable called hadoopContext. The key is configured as a SimpleTemplate and the value is the job's AspireObject.

Configuration

Element Type Default Description keyTemplate SimpleTemplate String {hadoopKey} A simple template to extract the value of the key from the Aspire Job.

Example Configuration

This section provides an example of Hadoop Emit configuration.

Set a key from a job variable

<component name="Emit" subType="default" factoryName="aspire-hadoop-emit">
  <keyTemplate>{hadoopKey}</keyTemplate>
</component>

Set a key from a field in the AspireObject

<component name="Emit" subType="default" factoryName="aspire-hadoop-emit">
  <keyTemplate>{XML:url}</keyTemplate>
</component>
<component name="Emit" subType="default" factoryName="aspire-hadoop-emit">
  <keyTemplate>{TAG:url}</keyTemplate>
</component>

Note: Both TAG and XML work the same way.

Set a key querying with AXPath the AspireObject

<component name="Emit" subType="default" factoryName="aspire-hadoop-emit">
  <keyTemplate>{XPATH:/doc/field[@name='url']/.}</keyTemplate>
</component>

Set a key as the JobId

<component name="Emit" subType="default" factoryName="aspire-hadoop-emit">
  <keyTemplate>{JOBID}</keyTemplate>
</component>


Output

The stage has no job output, the key, value pair will be written directly to the Hadoop Context class referenced in the job variable hadoopContext.

  • No labels