The Load XML Stage stage loads XML from a stream into the job's AspireObject. The XML will be loaded as a sub-element. This stage is typically used after a Fetch URL stage (which creates the stream).
XML Loader | |
---|---|
Factory Name | com.searchtechnologies.aspire:aspire-xml-files (previously aspire.XML) |
subType | loadXML |
Inputs | Either object['contentStream'] (an InputStream which contains the XML file to be loaded) or object['contentBytes'] (an array of bytes which contains the XML file to be loaded). |
Outputs | The XML file specified by the content stream or bytes will be loaded into memory and stored as a sub-element within the <doc> element attached to the AspireObject which is attached to the job. |
Element | Type | Default | Description |
---|---|---|---|
localResourceDir | string | null | The directory on the local system where DTD files and other required XML resources are located. The local directory will be consulted for these DTD files before going across the web. This often works better for large and complex files from third party resources, and when working on machines that are not connected to the internet (i.e., behind a firewall). Also, it improves the performance of fetching these files. If null (the default), DTD files will always be fetched from across the internet. |
cleanse | boolean | false | Set to true if you want to clean the XML content from non-readable characters (i.e., ASCII code 15). |
encoding | string | null | Allows to specify a concrete XML character encoding. The specified encoding will be used to read all XML files, if the encoding cannot be determined automatically from the input XML stream. |
<component name="LoadXML" subType="loadXML" factoryName="aspire-xml-files"/>
Use this version if the XML file calls out DTDs which you can not access through the internet.
<component name="LoadXML" subType="loadXML" factoryName="aspire-xml-files"> <localResourceDir>resources/dtds</localResourceDir> </component>
<pipeline name="process-feedOne-test"> <stages> <stage component="FetchUrl" /> <stage component="LoadXML" /> </stages> </pipeline>
In the following example suppose that there's a file called "file:test.xml" which contains the following:
<testRootNode> <speech name="George Washington">The period for a new election of a citizen, to administer the executive government of the United States, being not far distant, and the time actually arrived... </speech> <speech name="Abraham Lincoln">Four score and seven years ago our forefathers brought forth upon this country... </speech> <speech name="Thomas Jefferson">We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness... </speech> </testRootNode>
Further suppose that "file:test.xml" is read by the Fetch URL stage. Once executing the Load XML stage, the AspireObject will contain the following structure. Notice how the <testRootNode> is nested within the <doc> node which is the root node of the AspireObject.
<doc> <fetchUrl>file:test.xml</fetchUrl> <protocol source="FetchURLStage/protocol">file</protocol> <mimeType source="FetchURLStage/mimeType">application/xml</mimeType> <extension source="FetchURLStage"> <field name="modificationDate">2009-12-06T05:06:06Z</field> <field name="content-type">application/xml</field> <field name="content-length">618</field> <field name="last-modified">Sun, 06 Dec 2009 05:06:06 GMT</field> </extension> <testRootNode> <speech name="George Washington">The period for a new election of a citizen, to administer the executive government of the United States, being not far distant, and the time actually arrived... </speech> <speech name="Abraham Lincoln">Four score and seven years ago our forefathers brought forth upon this country... </speech> <speech name="Thomas Jefferson">We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness... </speech> </testRootNode> </doc>