Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Include Page
Overview
Overview

UNDER CONSTRUCTION

The Saga Language Processing Toolkit (Saga Library) processes raw text into normalized tokens, entities and semantic tags. The output can be used for question-answering, full-text analysis (fact extraction), semantic search, content vectors and matching, and many other purposes.

  • Handles the full range of text processing
    • Tokens extraction & cleansing, entity extraction, syntactic analysis and semantic analysis
  • Scalable to extremely large dictionaries and pattern databases (>10s of millions of patterns)
    • Makes it possible to build patterns from machine learning algorithms
  • Disambiguation is a first-class citizen
    • Saves all interpretations all the time (nothing is thrown away)
    • Multiple disambiguation methods
  • Confidence is captured at every step
    • Confidence builds up as patterns are matched
  • Fast enough to process documents for full database scans

Use Cases:

  • Query interpretation
  • Question answering
  • Chatbots
  • Full document fact extraction
  • Vector generation for statistical and machine learning

Getting Started

The Language Processing Toolkit is a lean java library that can be used anywhere.

  • Getting Started

Search this Wiki

Livesearch
spaceKeysaga131

...