Aspire programming usually means creating custom components (pipeline stages) for processing components. These custom components can be written in Groovy (directly into the application.xml file) or Java.
The following is intended to provide an orientation to programming with Aspire for those approaching it for the first time.
Know these Java Objects
Programming Aspire is mostly about these three fundamental Java classes:
1. ComponentImpl - Introduction
This is your connection to Aspire at large. Use ComponentImpl to access the Aspire Application, other components, logging, and configuration data. All components (including pipeline stages) extend ComponentImpl.
2. Job - Introduction
Job represents a unit of work in Aspire. Jobs are created by feeders (or sub job extractors) and fed through pipelines. Jobs contain metadata and other objects which are manipulated by pipeline stages to do things.
3. AspireObject - Introduction
The AspireObject is the primary holder of hierarchically structured, tagged data. It can hold either XML or JSON, and can even be used to convert between the two. It can be used with XPath and XSLT. Every Job has an AspireObject to hold metadata appropriate to the job.
Groovy Scripting
Once you know the objects above, you can start creating your own Groovy scripts. Groovy scripts are written directly into your application.xml file , using the aspire-groovy component, like this:
<component name="MyComponent" subType="default" factoryName="aspire-groovy"> <script> <![CDATA[ println "Hello World, my document looks like this:"; println doc.toXmlString(true); ]]> </script> </component>
Inside Groovy scripts, you can use the "component" variable to call any method of ComponentImpl, the "job" variable to access the current job object, and the "doc" variable (as in the example above) to access the current AspireObject.
Go to Groovy Scripting to learn more about the Groovy scripting component.
Set Up Your Environment
When programming new Aspire components in Java, we recommend using Eclipse and Maven. Specifically:
- Install Java JDK (install appropriate version for the Aspire target version).
- Install the Eclipse IDE
- Install Maven command line
- Install m2Eclipse (Maven for Eclipse)
The version of Java you should use depends on the Aspire version you are targeting to:
- Aspire 2.1.2 and earlier runs on Java 1.6 or Java 1.7
- Aspire 2.2 and up requires to run at Java 1.7
Search Technologies uses subversion internally for source code control, but it's not required to program Aspire.
See Developer Environment Setup for step-by-step details on setting up your environment.
Component Class Hierarchy
Aspire is made up of a few basic Java interfaces: Component, Stage, ComponentManager (a group of components), Pipeline Manager (processes jobs through pipelines), and the Aspire Application itself.
These Java interfaces are arranged into a hierarchy like this:
You will only ever need to concern yourself with "Stage" and "Component". All of the other interfaces are fully implemented by the Aspire framework.
Each of these interfaces has an associated "Impl" class. For example, StageImpl, ComponentImpl, etc.
The Anatomy of a Component
Most new components are pipeline stages, and all of these have the same basic structure:
public class MyComponent extends StageImpl { public void initialize(Element config) throws AspireException { . . . } public void process(Job j) throws AspireException { . . . } public void close() throws AspireException { . . . } }
Those are the only methods that you will ever need to implement.
initialize(Element config)
- Put code to initialize your component here, for example opening files, opening connections to databases, initializing data structures, etc.
- "config" will be a W3C Element object that contains the XML that was specified in the application.xml for your component. Extract any configuration data you need from this Element.
- initialize() is guaranteed to be called before any job is processed with process().
- initialize() is guaranteed to be only called by a single thread.
process(Job j)
- Put code here to actually processing the job.
- job.get() retrieves the associated AspireObject (aka the "document"). This is the primary metadata holder for the job.
- Jobs can also hold any other object. In fact, Job implements Map<String,Object>, so you can literally store anything in a Job. Anything at all. (Note: Data stored in the map are called Job "variables")
close()
- Code for freeing any resources used by your component goes here, for example closing file pointers, releasing connections, releasing memory, etc.
The Stage Archetype
Creating new components is made much easier with a Maven archetype. This archetype will create a complete, working Java Maven project for a brand-new pipeline stage, complete with unit tests!
The stage archetype will prompt you to enter some data so it can create the stage properly. Specifically, you will be asked for:
- The Maven coordinates (group ID, artifact ID, version) for your new pipeline stage
- The Java class name to use for your new component
- Some textual information to help annotate your Maven POM file (does not affect functionality)
See Creating a New Pipeline Stage for a detailed, step-by-step description for using the Stage archetype to create a new Aspire pipeline stage.
Where to Go From Here
- Create a new pipeline stage, explore it, run the unit test, and see how it works.
- Read the java doc for ComponentImpl, Job, and AspireObject
- Browse the Component Development Topics for lots of detailed information on how to handle all sorts of situations that you will likely encounter.