Saga Library processes text through a pipeline of text processing stages. The typical pipeline consists of the following sets of stages:
The pipeline can be specified in a JSON format which can be stored in a resource (see Resources). A sample is shown below:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
{
"reader": {
"type": "SimpleReader",
"splitRegex": "\r\n"
},
"stages": [
{ "type": "WhitespaceTokenizer" },
{ "type": "CharacterSplitter" },
{ "type": "com.accenture.saga.engine.stages.CaseAnalysisStage" },
{
"type": "DictionaryTagger",
"dictionary": "resources-provider:dictionary",
"required":["TOKEN", "ALL_LOWER_CASE"]
}
]
} |
There are two sections to the pipeline configuration:
Stage configurations are documented with
...
each pipeline stage.
The "type" field specifies the Java class which is the pipeline stage. This can be: