Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Flags are bits which can be turned on (e.g. 'set') for lexical items and vertexes. Flags are typically used for unambiguous, processing-related functions. Their function is often to control down-stream processing to make the pipelines more efficient.

Once they are set, they can never be un-set (well, frankly, you can actually change the bits at any time, so this is more of an honor-system).

Flags typically identify obvious and unambiguous characteristics of the lexical item and/or vertex. For example lexical item type (TEXT_BLOCK, TOKEN, SEMANTIC_TAG), case (ALL_UPPER_CASE, TITLE_CASE, MIXED_CASE), vertex characters (WHITESPACE, PUNCTUATION), etc.Flags are typically used to control down-stream processing to make the pipelines more efficient.

Flags Only Describe the Lexical Item Itself

...

Note that you can traverse the component links from the derived item ("president") to the original item ("President") to  determine if some token was original TITLE_CASE.

Semantic Tags

Semantic tags identify interpretations of