Best Bets Stage

This stage maintains a list tokens used to identify possible subjects of interest and suggest a URL reference. The Best Bets recog

This stage is based on the Dictionary Tagger Stage.

Operates On: Lexical Items with TOKEN and possibly other flags as specified below.

Stage is a Recognizer for Saga Solution, and can also be used as part of a manual pipeline or a base pipeline

Generic Configuration Parameters

boundaryFlags ( type=string array | optional ) - List of vertex flags that indicate the beginning and end of a text block.
Tokens to process must be inside two vertices marked with this flag (e.g ["TEXT_BLOCK_SPLIT"])
skipFlags ( type=string array | optional ) - Flags to be skipped by this stage.
Tokens marked with this flag will be ignored by this stage, and no processing will be performed.
requiredFlags ( type=string array | optional ) - Lex items flags required by every token to be processed.
Tokens need to have all of the specified flags in order to be processed.
atLeastOneFlag ( type=string array | optional ) - Lex items flags needed by every token to be processed.
Tokens will need at least one of the flags specified in this array.
confidenceAdjustment ( type=double | default=1 | required ) - Adjustment factor to apply to the confidence value of 0.0 to 2.0 from (Applies for every pattern match).
- 0.0 to < 1.0 decreases confidence value
- 1.0 confidence value remains the same
- > 1.0 to 2.0 increases confidence value
debug ( type=boolean | default=false | optional ) - Enable all debug log functionality for the stage, if any.
enable ( type=boolean | default=true | optional ) - Indicates if the current stage should be consider for the Pipeline Manager
- Only applies for automatic pipeline building

Use the same configuration template as in Dictionary Tagger Stage. $action.getHelper().renderConfluenceMacro("$codeS$body$codeE")

Description $action.getHelper().renderConfluenceMacro("$codeS$body$codeE")

SEMANTIC_TAG - Identifies all lexical items which are semantic tags.
PROCESSED - Placed on all the tokens which composed the semantic tag.
BESTBET - Identifies that the token as a possible reference to a subject to which Saga has a link for.

This stage is an extension of a Dictionary Tagger Stage and so the resource structure behaves in the same way.

id(required, string) - Identifies the entity by unique ID. This identifier must be unique across all entities (across all dictionaries).
Typically, this is an identifier with meaning to the larger application which is using the Language Processing Toolkit.
tags(required, array of string) - The list of semantic tags that will be added to the interpretation graph whenever any of the patterns are matched.
- These will all be added to the interpretation graph with the SEMANTIC_TAG flag.
- Typically, multiple tags are hierarchical representations of the same intent. For example, {city} → {administrative-area} → {geographical-area}
patterns(required, array of string) - A list of patterns to match in the content.

- Patterns will be tokenized and there may be multiple variations which can match.

confidence(optional, float) - Specifies the confidence level of the entity, independent of any patterns matched.
- This is the confidence of the entity, in comparison to all of the other entities. Essentially, the likelihood that this entity will be randomly encountered.
Title: this text will appear as the hit title in the search results page in ESUI
Description: this text will appear as the hit description in the search results page in ESUI
URL: the URL where ESUI will navigate to when the user click the title in the search results page in ESUI
Use partial matching: If true, when a pattern is composed of several words, the matching will only use a percentage of the words present in the pattern. This percentage can be configured in the recognizer settings, by default is set to 50%.

Other, Optional Fields

display(optional, string) - What to show the user when browsing this entity.context(optional, object) - A context vector that can help disambiguate this entity from others with the same pattern.
Format TBD, but probably a list of weighted words, phrases and tags.