The Markup Producer Stage will produce strings of TEXT_BLOCK with normalized tags based on the graph's path with highest confidence.


Operates On:  Every lexical Item in the graph.

Generic Configuration Parameters

  • boundaryFlags ( type=string array | optional ) - List of vertex flags that indicate the beginning and end of a text block.
    Tokens to process must be inside two vertices marked with this flag (e.g ["TEXT_BLOCK_SPLIT"])
  • skipFlags ( type=string array | optional ) - Flags to be skipped by this stage.
    Tokens marked with this flag will be ignored by this stage, and no processing will be performed.
  • requiredFlags ( type=string array | optional ) - Lex items flags required by every token to be processed.
    Tokens need to have all of the specified flags in order to be processed.
  • atLeastOneFlag ( type=string array | optional ) - Lex items flags needed by every token to be processed.
    Tokens will need at least one of the flags specified in this array.
  • confidenceAdjustment ( type=double | default=1 | required ) - Adjustment factor to apply to the confidence value of 0.0 to 2.0 from (Applies for every pattern match).
    • 0.0 to < 1.0  decreases confidence value
    • 1.0 confidence value remains the same
    • > 1.0 to  2.0 increases confidence value
  • debug ( type=boolean | default=false | optional ) - Enable all debug log functionality for the stage, if any.
  • enable ( type=boolean | default=true | optional ) - Indicates if the current stage should be consider for the Pipeline Manager
    • Only applies for automatic pipeline building

Generic Producer Configuration Parameters

  • name ( type=string | required ) - Unique name to identify the stage in the pipeline
    •  It is used programmatically to retrieve the stage and consume the produced output.
  • queueSize ( type=integer | default=10000 | optional ) - Max number of produced items to keep in memory at a time.
  • queueTimeout ( type=integer | default=1000 | optional ) - Blocking queue timeout to wait for new items in miliseconds
  • queueRetries ( type=integer | default=3 | optional ) - Number of retries before unblocking the queue
  • singleSubscriber ( type=boolean | default=false | optional )
    • If true, a single queue is created and a thread can consume the items being produced by multiple reset/advance calls to the engine. 
    • If false, each time an engine reset is issued, the queue is cleared (consume the queue before reset). 
    • Single subscriber works well with asynchronous subscription to the queue.

Get a reference to the producer stage and consume the queue

Producer Stage
ResultsProducerStage stage = engine.getProducer("ProducerName");
String inputText = "This is a test entry";
Reader in = new StringReader(inputText);
engine.reset(in);
while(engine.advance() != null);
List<String> output = (List<String>) producer.stream().collect(Collectors.toList());

Configuration Parameters

  • normalizeTags ( type=string array | required ) - List of tag names to normalize on the output
  • replaceTags ( type=string array | optional ) - Defaults to empty. If non empty, list of tag names to apply the replace pattern to
  • replacePattern ( type=string array | required ) - Required when replaceTags is set. The pattern expects a %tag and/or %value. Pattern will replace the appearance of tags from replaceTags with the pattern defined
  • separator ( type=string array | default=" " | optional ) - Used to separate tokens
  • preferFlags ( type=string array | optional ) - If non empty, when a token has multiple flags and one is in the preferFlags, it will take precedence over other flags (to use the value of the token from that variation). That is, LOWERED flag over original text version.
  • ignoreTags ( type=string array | optional ) - Ignore matches with tags specified in the ignoreTags list
  • anyWithTags ( type=string array | optional ) - Include matches with tags specified in the anyWithTags list

$action.getHelper().renderConfluenceMacro("$codeS$body$codeE")

Example Output

If you have a text block like the following:

V-----------------------------------[300 ml of Water. Use XX g of FLOUR]------------------------------------V 
^----------------[300 ml of Water]----------------V-------------------[Use XX g of FLOUR]-------------------^ 
^-[300]-V------[ml]-------V-[of]-V----[Water]-----^-[Use]-V-[XX]--V-------[g]-------V-[of]-V----[FLOUR]-----^ 
^-[{#}]-^-[{measurement}]-^      ^----[water]-----^-[use]-^-[xx]--^-[{measurement}]-^      ^----[flour]-----^ 
                                                          ^-[{#}]-^                        ^-[{ingredient}]-^ 
                                                                                           ^-[{ingredient}]-^ 
                                 ^-[{ingredient}]-^ 
                                 ^-[{ingredient}]-^ 

the stage will produce the following output:

{#} {measurement} of <START:ingredient> Water <END>
use {#} {measurement} of <START:ingredient> FLOUR <END>
  • No labels