The Load XML Stage stage loads XML from a stream into the job's AspireObject. The XML will be loaded as a sub-element. This stage is typically used after a Fetch URL stage (which creates the stream).

XML Loader (Aspire 2)
Factory Name com.searchtechnologies.aspire:aspire-xml-files (previously aspire.XML)
subType loadXML
Inputs Either object['contentStream'] (an InputStream which contains the XML file to be loaded) or object['contentBytes'] (an array of bytes which contains the XML file to be loaded).
Outputs The XML file specified by the content stream or bytes will be loaded into memory and stored as a sub-element within the <doc> element attached to the AspireObject which is attached to the job.

Example Configuration

Simple

<component name="LoadXML" subType="loadXML" factoryName="aspire-xml-files"/>

With a Locally Stored DTDs

Use this version if the XML file calls out DTDs which you can not access through the internet.

  <component name="LoadXML" subType="loadXML" factoryName="aspire-xml-files">
    <localResourceDir>resources/dtds</localResourceDir>
  </component>

Example Use Within A Pipeline

  <pipeline name="process-feedOne-test">
    <stages>
      <stage component="FetchUrl" />
      <stage component="LoadXML" />
    </stages>
  </pipeline>

Example

In the following example suppose that there's a file called "file:test.xml" which contains the following:

<testRootNode>
  <speech name="George Washington">The period for a new election of a citizen, 
    to administer the executive government of the United States, being not far distant, 
    and the time actually arrived...
  </speech>
  <speech name="Abraham Lincoln">Four score and seven years ago our forefathers 
    brought forth upon this country...
  </speech>
  <speech name="Thomas Jefferson">We hold these truths to be self-evident, 
    that all men are created equal, that they are endowed by their Creator 
    with certain unalienable Rights, that among these are Life, Liberty and 
    the pursuit of Happiness...
  </speech>
</testRootNode>


Further suppose that "file:test.xml" is read by the Fetch URL stage. Once executing the Load XML stage, the AspireObject will contain the following structure. Notice how the <testRootNode> is nested within the <doc> node which is the root node of the AspireObject.

<doc>
  <fetchUrl>file:test.xml</fetchUrl>
  <protocol source="FetchURLStage/protocol">file</protocol>
  <mimeType source="FetchURLStage/mimeType">application/xml</mimeType>
  <extension source="FetchURLStage">
    <field name="modificationDate">2009-12-06T05:06:06Z</field>
    <field name="content-type">application/xml</field>
    <field name="content-length">618</field>
    <field name="last-modified">Sun, 06 Dec 2009 05:06:06 GMT</field>
  </extension>
  <testRootNode>
    <speech name="George Washington">The period for a new election of a citizen, 
      to administer the executive government of the United States, being not far distant, 
      and the time actually arrived...
    </speech>
    <speech name="Abraham Lincoln">Four score and seven years ago our forefathers 
      brought forth upon this country...
    </speech>
    <speech name="Thomas Jefferson">We hold these truths to be self-evident, 
      that all men are created equal, that they are endowed by their Creator 
      with certain unalienable Rights, that among these are Life, Liberty and 
      the pursuit of Happiness...
    </speech>
  </testRootNode>
</doc>
  • No labels