Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Semantic Tags

Semantic tags identify interpretations of (typically) semantic interpretations of sections of the content. This can include anything from entities (like {person}, {place}, etc.) to full sentence interpretation (as in {person-fact-request}, {restrictive-covenant-term}, {language-fluency-statement}, etc.) or possibly more.

Unlike flags (see above), the Language Processing Toolkit does not pre-define any semantic tags. Instead, semantic tags are determined based on the requirements of the text to be processed.

Specifically:

  • Taggers will add semantic tags for entities
    • For example, to look up names from a dictionary and to tag those names where they occur in the document
  • Advanced pattern recognizers will identify combinations of tags and literal text and create new tags
    • They are called "advanced" because they allow for patterns which have nested and recursive tagging

Semantic Tags will be Ambiguous

A key philosophy of this toolkit is that ambiguity is embraced rather than dreaded. To this end, the system will generate all possible semantic tags, including many and various ambiguous alternatives.

Confidence Values

All lexical items will have a confidence value, which describe the confidence of the interpretation. This is key for semantic tags where the confidence value can initially come from external sources (e.g. the likelyhood of a entity occurring randomly) and then will build up based on context and how the entity participates in larger patterns.

In addition, patterns can be generated by statistical techniques and then entered into the system. Systems which generate patterns in this way are encouraged to include a confidence value which then is then combined with the confidence of the supporting parts to generate a confidence value for every interpretation.

Confidence can be Strengthened with Context


Using the Output

The output of the processing engine will be an interpretation graph with confidence values.