You are viewing an old version of this page. View the current version.
Compare with Current
View Page History
Version 1
Next »
Markup Producer stage will producer strings of TEXT_BLOCK or SENTENCE texts with normalized tags based on the graph's path with highest confidence.
Operates On: Every lexical Item in the graph.
Generic Configuration Parameters
-
boundaryFlags ( type=string array
| optional
)
- List of vertex flags that indicate the beginning and end of a text block.
Tokens to process must be inside two vertices marked with this flag (e.g ["TEXT_BLOCK_SPLIT"]) -
skipFlags ( type=string array
| optional
)
- Flags to be skipped by this stage.
Tokens marked with this flag will be ignored by this stage, and no processing will be performed. -
requiredFlags ( type=string array
| optional
)
- Lex items flags required by every token to be processed.
Tokens need to have all of the specified flags in order to be processed. -
atLeastOneFlag ( type=string array
| optional
)
- Lex items flags needed by every token to be processed.
Tokens will need at least one of the flags specified in this array. -
confidenceAdjustment ( type=double
| default=1
| required
)
- Adjustment factor to apply to the confidence value of 0.0 to 2.0 from (Applies for every pattern match).
- 0.0 to < 1.0 decreases confidence value
- 1.0 confidence value remains the same
- > 1.0 to 2.0 increases confidence value
-
debug ( type=boolean
| default=false
| optional
)
- Enable all debug log functionality for the stage, if any.
-
enable ( type=boolean
| default=true
| optional
)
- Indicates if the current stage should be consider for the Pipeline Manager
- Only applies for automatic pipeline building
Unable to render {include} The included page could not be found.
Configuration Parameters
- normalizeTags (String array, optional) - List
- replaceTags (String array, optional) - Defaults to empty. If non empty, will only add entities of the given names in the whitelist to the JSON output.
- replacePattern (String, optional) - Defaults to empty. If non empty, will add any entity to the JSON output, except for the ones in the blacklist.
- separator (String, optional) -
- preferFlags (String array, optional) -
{
"type": "JsonProducerStage",
"name": "JsonProducer",
"boundaryFlags": [
"TEXT_BLOCK_SPLIT"
],
"onlyEntities": true,
"queueTimeout": 10,
"queueRetries": 1
}
Example Output
If you have a text block like the following:
V----------[300 ml of water]----------V
^----------[300 ml of water]----------^
^-[300]-V---[ml]---V--[of]--V-[water]-^
^-[{#}]-^-[{unit}]-^-[have]-^
^-[{measurement}]--^
the stage will produce the following JSON (if onlyEntities = true):
{"entities":[{
"text":"300 ml",
"value":[
{
"value":"300",
"entity":"#"
},
{
"value":"mililiters",
"entity":"unit"
}
],
"entity":"measurement",
"startPos":0,
"endPos":6
}]}
or the following (if onlyEntities = false):
{"tokens":[
{
"text":"300 ml",
"value":[
{
"value":"300",
"entity":"#"
},
{
"value":"mililiters",
"entity":"unit"
}
],
"entity":"measurement",
"startPos":0,
"endPos":6
},
{
"text":"of",
"startPos":7,
"endPos":9
},
{
"text":"water",
"startPos":10,
"endPos":15
}
]}