HTTP Feeder

Use the HTTP Feeder to receive RESTFul requests and to feed these requests to an Aspire pipeline. This feeder can turn Aspire into a "RESTful Web Service", accepting requests from outside clients, processing jobs, and then returning results.

The HTTP feeder will register a brand new servlet URL, based on the Aspire server path. For example, if your servletName is "submitFiles", then the new URL will be http://server:50505/submitFiles. In other words, it is separate and apart from the standard Aspire admin user interface (which is under "/aspire").

There are two modes of operation for the HTTP Feeder: 1) Input parameters specified on the URL, and 2) Input data POST'ed to the feeder. In the case of parameters on the URL, the input parameters are added to the AspireObject which is fed down the pipeline. In the case of POSTed data, this may either be parameters from a form that will be added to AspireObject which is fed down the pipeline or data streamed to the servlet which is attached to the published Job as a stream.

The HTTP Feeder can also be used to upload files, using a Multipart form submission. See below for details.

HTTP Feeder
Factory Name	com.searchtechnologies.aspire:aspire-http-feeder
subType	default
Inputs	RESTful requests in standard URL query string format (name=value pairs).
Outputs	AspireObjects containing HTTP Request data, including all name=value pairs from the query string.

Using the HTTP Feeder as a User Interface

The HTTP Feeder can be used as a user interface. See here for details.

Parameters Specified on the URL

In the first mode, parameters are specified on the URL in param=value format. For example: http://server:50505/submitFiles?param1=value1&param2=value2 .

These parameters will be stored in the resulting AspireDocument passed down the pipeline as XML tags at the top level. For example:

 <doc>
   <feederLabel>HttpFeeder</feederLabel>
   <param1 source="HTTPFeederServlet">value1</param1>
   <param2 source="HTTPFeederServlet">value2</param2>
 </doc>

The pipeline would then be responsible (via groovy scripting or whatever) for processing the job as necessary. The results would be returned as XML data.

Information from the Servlet

Information from the servlet is also added to the job published by the HTTPFeeder Information is added as elements to the <aspireHttpFeederServlet> tag:

 <doc>
   <aspireHttpFeederServlet remotePort="52124" relativePath="/xml-search" serverName="localhost" source="HTTPFeederServlet"
      remoteHost="127.0.0.1" serverPort="50505" remoteAddr="127.0.0.1" fullPath="/cgi-bin/xml-search" servletPath="/cgi-bin">
     <queryString>param1=value1&param2=value2</queryString>
   </aspireHttpFeederServlet>
   .
   .
 </doc>

The following information is available:

Attribute	Description
source	The name of the HttpFeeder
remoteHost	The hostname of the client (e.g., browser).
remoteAddr	The IP address of the client (e.g., browser).
remotePort	The port used by the client (e.g., browser).
serverName	The name of the server running the HttpFeeder.
serverPort	The port the HttpFeeder is listening on.
servletPath	The path the HttpFeeder is responding to.
fullPath	The full path requested by the client.
relativePath	The path requested by the client relative to the servletPath.
queryString	The entire query string (ie, everything after the ? in the URL).
maxUploadSize	The maximum size of file that can be uploaded (in bytes - defaults to 10,485,760 bytes - 10Mb). This may be specified using a suffix to specify bytes/kilobytes/megabytes/gigabytes (b/kb/mb/gb). If the suffix is not given, the parameter is in bytes.

XML Data POSTed to the Service

If you wish to actually post data to the service, this can currently be done by setting the "XMLContent" parameter to TRUE below.

Despite its name, XMLContent does not actually require that the content be in XML. The content can be HTML, PDF, or anything. Perhaps the config parameter will be renamed in the future.

When XML content is true, data streamed to the servlet via POST will be set as an input stream attached to the job published by the feeder. You can access the data using the Standards.Basic.getContentStream(Job j) method in the package com.searchtechnologies.aspire.framework.

This also means that you can follow the HTTP feeder with any pipeline stage that uses the content stream. For example, XML Sub Job Extractor, Tabular Files Extractor_Aspire_2, XML File Loader, and Extract Text can all be the first pipeline stage to receive the job.

Also, just FYI, the "curl" command (available with http://www.cygwin.com or on most Linux installs) is a great way to test submitting data to the service. For example, to POST the document as the content to an Aspire servlet, you could do the following:

 curl --data-binary "@data\full_text.xml" http://localhost:50515/submitFiles

Multipart Form Submissions

HTML supports submitting "multipart forms" made up of multiple parameters, some of which may represent uploaded file content.

In order for the HTTP feeder to receive multipart forms, you need to enable them and then specify how files are handled. You may choose to handle posted files as a stream (choose stream for the <fileHandler> option), or as files (choose file for the <fileHandler> option). If you choose to handle posted files as files, you must also specify the directory they are uploaded to.

Note: setting the XMLContent option of the HttpFeeder automatically disables multipart form submission processing

Stream Handler

When the file handler is set to stream, only a single file may be uploaded at a time. Also, all parameters which are received BEFORE the file will be added to the job's as XML tags on the AspireObject. Parameters received AFTER the file are ignored. The file itself will be attached as an InputStream to the job and subsequent stages can access the data using the Standards.Basic.getContentStream(Job j) method in the package com.searchtechnologies.aspire.framework and so data can be streamed directly from the client through whatever processing you need to do. The file is NOT stored locally on the Aspire server by the HttpFeeder

Example configuration

 <component name="MyHTTPFeeder" factoryName="aspire-http-feeder" subType="default">
   .
   .
   <multipartForm>
     <fileHandler>stream</fileHandler>
   </multipartForm>
 </component>

File Handler

When the file handler is set to file, multiple files may be uploaded by a single form submission. Using the file handler requires the HttpFeeder <uploadDir> to be configured. Any file submitted will be uploaded and saved to this directory. The uploaded file is saved using its original filename (filename only, not the complete path).

No streams are added to the Aspire job, and if you wish to reference the file, you will need to access the job's AspireObject and extract the value for the tag corresponding to the HTML form input that caused the file to be uploaded. This value is the full path to the saved copy of the uploaded file on the Aspire server.

For example, if the file was uploaded via the following form:

 <form enctype="multipart/form-data" method=POST  action="http://localhost:50505/xmlfeed">
   XML file to push:
   <input type="file" name="data">
   <input type="submit" value=">Submit<">
 </form>

The AspireObject for the job would look similar too:

 <doc>
   <aspireHttpFeederServlet remotePort="56494" serverName="localhost" source="HTTPFeederServlet" remoteHost="127.0.0.1" serverPort="50505" remoteAddr="127.0.0.1" fullPath="/xmlfeed" servletPath="/xmlfeed">
     <queryString/>
   </aspireHttpFeederServlet>
   C:\tmp\3.0distroTest\distro-test\target\aspire-distribution-1.0-distribution/data/upload\htmlContentFeed.xml
 </doc>

All ordinary HTML form input parameters will be added to the job's AspireObject as XML tags.

Example configuration

 <component name="MyHTTPFeeder" factoryName="aspire-http-feeder" subType="default">
   .
   .
   <multipartForm>
     <fileHandler>file</fileHandler>
     <uploadDir>data/upload</uploadDir>
   </multipartForm>
 </component>

Configuration

Element	Type	Default	Description
branches	parent tag	None	The configuration of the pipeline to publish to. See below.
waitForJob	boolean	true	Indicates to the component whether or not wait for the job to complete .
servletName	String	httpFeeder	Name of the servlet that will feed the files. For example, if servletName is "submitFiles", then you would send files to the httpFeeder using the "http://localhost:50505/submitFiles?params..." URL.
feederLabel	String	HttpFeeder	The <feederLabel> value to be included with the document as it is sent to the pipeline. For example, HttpFeeder.
XMLContent	boolean	true	Set this parameter to true if you will be POST-ing XML data to the HTTP Feeder. This XML data will be set as an input stream attached to the job published by the feeder. Subsequent stages can access the data using the Standards.Basic.getContentStream(Job j) method in the package com.searchtechnologies.aspire.framework.
xsltFileName	String	null	The path of the XSL transform file to be used to format the output xml. Path names will be relative to Aspire Home.
outputMime	String	text/xml	Specifies the mime type which the HTTP feeder will report back to the HTTP client. Change this to "text/html" if your transform creates HTML which should be shown by a browser.
resultMimeTypeField	String		Set the mime type using the value found in the field specified. The field must exist as a child of the root (ie a parameter value of mimeType looks for value in the /doc/mimeType field in the default AspireObject) . If the field does not exist or is empty, then the mimeType reverts back to the value from the parameter <outputMime> NOTE: The value is extracted before the transformation (if any) is applied.
multipartForm	parent tag		Enable multi-part form submission, which allows for uploading files to the HTTP server through HTML forms, as well as other input elements.
multipartForm/fileHandler	String	stream	Specify the type of file handler to use for posted files. The stream (default) handler will attach an InputStream to the file stream to the job and subsequent stages can access the data using the Standards.Basic.getContentStream(Job j) method in the package com.searchtechnologies.aspire.framework. The file handler will upload the file to the specified directory (see below). No input stream is attached to the job for the file handler. See above for more details and restrictions.
multipartForm/uploadDir	String		Specify the location where files from multi-part forms will be uploaded when using the file handler. See above for more details.
saxonProcessor	boolean	false	Set on true if you want to use SAXON Processors to transform using XSLT 2.0 files.
debugOutFile	String		Specify the location where the XSLT processed output will be written to. This is used for debugging the transforms.
headers	parent tag	None	The configuration of the http headers. See below.

Example Configurations for HTML Form-Style Parameters

This will handle either parameters specified on the URL with HTTP GET, or parameters POST'ed from an HTML <form>.

 <component name="MyHTTPFeeder" factoryName="aspire-http-feeder" subType="default">
   <servletName>submitFiles</servletName>
   <feederLabel>HttpFeeder</feederLabel>
   <xsltFileName>config/categorizeOutput.xsl</xsltFileName>
   <branches>
     <branch event="onPublish" pipelineManager="CategorizeFolderOrFile" />
   </branches> 
 </component>

Example configuration for posting XML to Aspire

 <component name="MyHTTPFeeder" factoryName="aspire-http-feeder" subType="default">
   <servletName>submitFiles</servletName>
   <feederLabel>HttpFeeder</feederLabel>
   <XMLContent>true</XMLContent>
   <xsltFileName>config/extractor.xsl</xsltFileName>
   <branches>
     <branch event="onPublish" pipelineManager="CategorizeFolderOrFile" />
   </branches> 
 </component>

Example configuration for configuring HTTP headers

You can specify required HTTP headers in the configuration as following. Then feeder will add those header information to the response.

 <component name="MyHTTPFeeder" factoryName="aspire-http-feeder" subType="default">
  .
  .
  .
    <headers>
        <header name="Authorisation">simple</header>
        <header name="Accept">text/plain</header>
    </headers>
 </component>

Serving Files

The HTTPFeeder can also serve up ordinary HTML files so it can be used as a more complete, end-to-end user interface for simple user interfaces.

Files are stored inside the Aspire Home directory, in the "web/httpfeeder/<servlet-name>" directory.

For example, a request for:

http://localhost:50505/submitFiles/test.html

Will access the file from:

$ASPIRE_HOME/web/httpfeeder/submitFiles/test.html

Note that “index.html” is also supported. So, a request for:

http://localhost:50505/submitFiles/

Will return:

$ASPIRE_HOME/web/httpfeeder/submitFiles/index.html

If it exists.

Returning Binary Data

Raw binary data can be returned from the HTTPFeeder. This will happen automatically if the following conditions are met:

The output mime type is "application/octet-stream"
- This can be set with either the <outputMime> or <resultMimeTypeField> configuration parameters.
There is a job variable called "byteDataResults"

Note: The job variable must (currently) hold data of type ByteArrayOutputStream.

If the above situation occurs, the HTTPFeeder will do the following:

Fetch the array of bytes from the ByteArrayOutputStream.
Set the returned content-length to the length of the array of bytes.
Write the byte data back to the client.

Page tree

HTTP Feeder

Using the HTTP Feeder as a User Interface

Parameters Specified on the URL

Information from the Servlet

XML Data POSTed to the Service

Multipart Form Submissions

Stream Handler

Example configuration

File Handler

Example configuration

Configuration

Example Configurations for HTML Form-Style Parameters

Example configuration for posting XML to Aspire

Example configuration for configuring HTTP headers

Serving Files

Returning Binary Data