Saga Library (formerly Language Processing Toolkit) processes raw text into normalized tokens, entities and semantic tags. The output can be used for question-answering, full-text analysis (fact extraction), semantic search, content vectors and matching, and many other purposes.


  • Handles the full range of text processing
    • Tokens extraction & cleansing, entity extraction, syntactic analysis and semantic analysis
  • Scalable to extremely large dictionaries and pattern databases (>10s of millions of patterns)
    • Makes it possible to build patterns from machine learning algorithms
  • Disambiguation is a first-class citizen
    • Saves all interpretations all the time (nothing is thrown away)
    • Multiple disambiguation methods
  • Confidence is captured at every step
    • Confidence builds up as patterns are matched
  • Fast enough to process documents for full database scans

Use Cases

  • Query interpretation
  • Question answering
  • Chatbots
  • Full document fact extraction
  • Vector generation for statistical and machine learning


Check out the following sections for more information:

  • No labels