Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

What's not shown in the above diagram are confidence factors, which are tagged on every interpretation.

Interpretation Graphs are made from Vertexes and Lexical Items

  • Lexical Items - Can be a text block, token, or semantic tag
    • Typically important carriers of semantic information
  • Vertexes - Are the junction points between interpretations
    • Typically the white-space or punctuation between lexical items

It is this "node and edge" structure which makes this an interpretation graph.

Interpretation Graphs are "Add Only"

Information can only be added to an interpretation graph. It can never be removed or changed. By this we mean:

...

And so, we never actually remove any interpretations from the graph. Instead, all interpretations are kept at all times and disambiguation is used to choose which interpretation the application will be most likely to be correct.

Everything is Saved

Along with the "add only" approach, we endeavor to save everything. For example:

  • Lexical items contain character buffers of the text for the item.
  • Vertexes contain character buffers of the characters which they cover (e.g. the spaces, punctuation, etc.).

Further, every vertex and lexical item identifies the start and end character position (from the original content stream) which it covers.

Flags

Flags are bits which can be turned on (e.g. 'set') for lexical items and vertexes.

Once they are set, they can never be un-set (well, frankly, you can actually change the bits at any time, so this is more of an honor-system).

Flags typically identify obvious and unambiguous characteristics of the lexical item and/or vertex. For example lexical item type (TEXT_BLOCK, TOKEN, SEMANTIC_TAG), case (ALL_UPPER_CASE, TITLE_CASE, MIXED_CASE), vertex characters (WHITESPACE, PUNCTUATION), etc.

Flags are typically used to control down-stream processing to make the pipelines more efficient.

Flags Only Describe the Lexical Item Itself

It may seem obvious, but flags describe the Lexical Item itself, and do not describe any items from which it was derived.

...

Note that you can traverse the component links from the derived item ("president") to the original item ("President") to  determine if some token was original TITLE_CASE.

Semantic Tags