Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Pipelines

Most pipeline configurations are a simple list of stages, for example:


1
2
3
4
5
6
7
8
9
10
11
12
13

 <pipelines>
  <pipeline name="doc-process" default="true">
    <stages>
      <stage component="fetchUrl" />
      <stage component="extractText" />
      <stage component="splitter" />
      <stage component="dateChooser" />
      <stage component="extractDomain" />
      <stage component="printToFile" />
      <stage component="feed2Solr" />
    </stages>
  </pipeline>
</pipelines>


Enabling and Disabling Pipelines and Stages

...

These flags are useful for turning on or off pipelines and references to stages in response to property settings (either as an App Bundle or via property settings specified in the settings.xml file).

Example:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

 <pipelines>
  <!-- The next two pipelines are declared, but disabled. -->
  <pipeline name="doc-process1" enable="false">
    <stages>
      <stage component="fetchUrl" />
      <stage component="extractText" />
      <stage component="splitter" />
      <stage component="dateChooser" />
      <stage component="extractDomain" />
      <stage component="printToFile" />
      <stage component="feed2Solr" />
    </stages>
  </pipeline>
  <pipeline name="doc-process2" disable="true">
    <stages>
      <stage component="fetchUrl" />
      <stage component="extractText" />
      <stage component="splitter" />
      <stage component="dateChooser" />
      <stage component="extractDomain" />
      <stage component="printToFile" />
      <stage component="feed2Solr" />
    </stages>
  </pipeline>
   
  <!-- The next pipeline is enabled, but disables the 'splitter', 'dateChooser' and 'extractDomain' components. -->
  <pipeline name="doc-process3" enable="true">
    <stages>
      <stage component="fetchUrl" />
      <stage component="extractText" />
      <stage component="splitter" enable="false" />
      <stage component="dateChooser" disable="true" />
      <stage component="extractDomain" enable="false" />
      <stage component="printToFile" />
      <stage component="feed2Solr" />
    </stages>
  </pipeline>
</pipelines>


If neither @enable or @disable are present, then it is assumed that the pipeline or stage is enabled.

Pipeline Configuration

pipeline/@nameThe name of the pipeline. Can be used to branch from one pipeline to another (see branching statements below).
pipeline/@default"true" if the pipeline is the default pipeline for the pipeline manager. Jobs sent to the pipeline manager will be automatically sent to the default pipeline unless another pipeline is specified by name.
pipeline/@enableTrue if the the pipeline should be enabled.
pipeline/@disableTrue if the the pipeline should be disabled.
pipeline/stages/stageThe list of stages which make up the pipeline. Each pipeline is a single linear list of stages.
pipeline/stage/@componentThe name of the component which will serve as the pipeline stage. Note that all pipeline stages are also Aspire components (the reverse is not true).
pipeline/stage/@enableTrue if the the stage should be enabled.
pipeline/stage/@disableTrue if the the stage should be disabled.

Typically these references are "local" references, i.e., references to components defined within the same pipeline manager. However, it is perfectly okay to use absolute path names, such as /Common/OtherPipelineManager/OtherStage, or relative paths, such as ../OtherPipelineManager/OtherStage, as the component attribute. In this way you can share components across pipeline manager configurations.

...

Pipelines can also be configured with branches which determine what happens to a job/document when certain events occur. Branches are configured inside the pipeline using a <branches> tag, like below:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

 <pipelines>
  <pipeline name="doc-process" default="true">
    <stages>
      <stage component="FetchUrl" />
      <stage component="ExtractText" />
      <stage component="Splitter" />
      <stage component="DateChooser" />
      <stage component="ExtractDomain" />
      <stage component="PrintToFile" />
      <stage component="Feed2Solr" />
    </stages>
    <branches>
      <branch event="onError" pipeline="error-pipeline" />
      <branch event="onComplete" pipelineManager="SomeOtherPipemgr" pipeline="some-other-pipeline" />
      <branch event="onMyEvent" pipelineManager="SomeOtherPipemgr" pipeline="some-other-pipeline" stage="some-stage"/>
    </branches>
  </pipeline>
 
  <pipeline name="error-pipeline">
    <!-- process packages for which exception errors are thrown -->
    .
    .
    .
  </pipeline>
</pipelines>


If @pipelineManager is not specified, then the event will branch to the same pipeline manager. If @pipeline is not specified, the event will branch to the same pipeline on this pipeline manager (if @pipelineManager is not given), or the default pipeline on the specified pipeline manager. If @stage is specified, then the processing of the job will continue with that stage (which could be in the middle of the pipeline), on the pipeline manager and pipeline determined by the above rules.

There are three built-in events which can be triggered for a job which is being processed by the pipeline:

onErrorIf any exception error is thrown by a pipeline stage processing a job, the pipeline manager will look for an "onEvent" branch and will route the job to the specified destination if it exists.
onCompleteWhen the job has completed a pipeline, the pipeline manager will look for an "onComplete" branch. If it exists, the job will be routed to the specified destination.
onTerminateIf any job is terminated by a pipeline stage (note: this is different than an exception error, see below), the pipeline manager will check for an "onTerminate" event and if found will route the terminated job to the specified destination. Once the job is routed, it no longer becomes "terminated" and then it continues as before.

However, other components may raise other events.

...

Also note that pipelines can have branches and that a new _optional_ "onTerminate" event has been added to the pipeline manager.

For example:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

   <pipeline name="test2">
    <stages>
      <stage component="Schwarzenegger"/>
      <stage component="OldFashionedSgml"/>
    </stages>
    <branches>
      <branch event="onTerminate" pipeline="process-terminate"/>
    </branches>
  </pipeline>
   <pipeline name="process-terminate">
    <stages>
      <stage component="NewFangledXml"/>
      <stage component="AndAnother"/>
    </stages>
  </pipeline>


In the above example, the "Schwarzenegger" stage causes the job to be terminated (Arnold is the Terminator, right?). This is trapped by the pipeline's "onTerminate" branch, which then sends the job to the "process-terminate" pipeline where it continues.

...