Saga Library processes text through a pipeline of text processing stages. The typical pipeline consists of the following sets of stages:
The pipeline can be specified in a JSON format which can be stored in a resource (see Resources). A sample is shown below:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
{ "reader": { "type": "SimpleReader", "splitRegex": "\r\n" }, "stages": [ { "type": "WhitespaceTokenizer" }, { "type": "CharacterSplitter" }, { "type": "com.accenture.saga.engine.stages.CaseAnalysisStage" }, { "type": "DictionaryTagger", "dictionary": "resources-provider:dictionary", "required":["TOKEN", "ALL_LOWER_CASE"] } ] } |
There are two sections to the pipeline configuration:
Stage configurations are documented with each pipeline stage.
The "type" field specifies the Java class which is the pipeline stage. This can be: