Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

This stage flags vertices with “Skip-Sentence”.  The vertex flag is the start of the sentence. This can be used to ignore a complete sentence by a later stage.

The conditions evaluated by the processor are:

  • Sentence length, given by the token count, not vertices.
  • A list of tags that work as an exception to the count, meaning that if the tag is found within the sentence the count is irrelevant and the sentence is not flagged (whitelistingallow listing).
  • A list of tags that if found in the sentence it should be flagged (deny listing).

Deny listing a tag always has precedence over the other values, so any sentence with a deny listed flag will always be flagged as “SKIP_SENTENCE”.  Allow listed tags will always have precedence over the token limit restriction. And finally token limit restriction is on effect.


Operates On:  Lexical Items with VERTEX and possibly other flags as specified below.

Saga_is_recognizer
Recognizerfalse

Info

At this moment only the Python Model Recognizer Stage is capable of using this flag.

Include Page
Generic Configuration Parameters
Generic Configuration Parameters

Configuration Parameters

  • Parameter
    summaryEnables marking of the sentence by length limit.
    defaulttrue
    nameremoveSimpleSentence
    typeboolean

    • By enabling this parameter the minTokenOnSentence parameter is taken into account.
  • Parameter
    summaryEqual or less number of tokens in sentence.
    default3
    nameminTokensOnSentence
    typeinteger

    • This parameter is inclusive, meaning that sentences up to 3 (by default) tokens long will be flagged.
  • Parameter
    summaryEnables the list of tags exceptions for the length limit.
    defaultfalse
    namekeepSemanticTags
    typeboolean
    • If the sentence length is within the minimum tokens parameter value but the sentence contains a tag (flagged as SEMANTIC_TAG) with the list of "keep" tags the sentence vertex is not flagged.
  • Parameter
    summaryList of tags (comma separated) used as exception of the flagging or the vertex..
    nametagsList
    • At least one of the tags should be present on the sentence in order not to be flagged.
  • Parameter
    summaryList of tags used to mark the sentence (flag the vertex).
    namemarkTagsList
    • At least one of the tags should be present on the sentence.
Saga_config_stage
boundaryFlagstext block split
"removeSimpleSentence": true,
"minTokensOnSentence": 3,
"keepSemanticTags": true,
"tagsList": ["works"],
"markTagsList": ["filtered"]

Example Output

Saga_graph
V----------------------[This is short.  This is a longer sentence.  This {works}. This is a {filtered}]-----------------------V
^-[This]-V-[is]-V-[short]-V-[This]-V-[is]-V-[a]-V-[longer]-V-[sentence]-V-[This]-V-{works}-V-[This]-V-[is]-V-[a]-V-{filtered}-^
1                         2                                             3                  4

Vertex 1: SKIP_SENTENCE (3 or lest tokens)
Vertex 2: (larger than 4 tokens)
Vertex 3: (tag {works} found, not flagged)
Vertex 3: SKIP_SENTENCE (tag {filtered} found, flagged)

Output Flags

Vertex Flags:

Info

No vertices are created in this stage

  • SKIP_SENTENCE - Identifies the vertex as the start of a sentence that should be skipped.