Saga Library processes text through a pipeline of text processing stages.
...
A typical pipeline consists of the following sets of stages:
...
...
...
...
...
The pipeline can be specified in a JSON format which can be stored in a resource (see Resources). A sample is shown below:
Code Block | ||
---|---|---|
|
...
...
"reader": {
"type": "SimpleReader",
"splitRegex": "\r\n"
},
"stages": [
{ "type": "WhitespaceTokenizer" },
{ "type": "CharacterSplitter" },
{ "type": "com.accenture.saga.engine.stages.CaseAnalysisStage" },
{
"type": "DictionaryTagger",
"dictionary": "resources-provider:dictionary",
"required":["TOKEN", "ALL_LOWER_CASE"]
}
] |
...
There are two sections to the pipeline configuration:
...
...
...
Stage configurations are documented with
...
each pipeline stage.
The "type" field specifies the Java class which is the pipeline stage. This can be:
...