Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

Creates a bag of words / tfidf tag with the vector information for the document/text_block/sentence. Accumulates the vector until the engine cannot read any further

Operates On:  Lexical Items with TOKEN and possibly other flags as specified below.

Saga_is_recognizer
Recognizerfalse

Include Page
Generic Configuration Parameters
Generic Configuration Parameters

Configuration Parameters

  • Parameter
    summaryJSON map resource in which the vocabulary is stored
    namevocabulary
    requiredtrue
  • Parameter
    summaryType of algorithm to use then building the vector, can be either BOW or TF_IDF
    defaultBOW
    namevectorType
    requiredtrue
  • Parameter
    summaryDataset ID from which the vocabulary was extracted
    namedatasetId
    requiredtrue
  • Parameter
    summaryMinimum number of tokens to match
    default1
    namemin
    typeinteger
    requiredtrue
  • Parameter
    summaryMaximum number of tokens to match
    default2
    namemax
    typeinteger
    requiredtrue


Saga_config_stage
boundaryFlagstext block split
requiredFlagstoken, semantic tag
titleDefault Config
skipFlagsskip
"vocabulary": "saga-provider:saga_vocabulary",
"vectorType": "BOW",
"datasetId": "dataset-234ifgbqafgoail3",
"min": 1,
"max": 3,

Example Output

In this example the stage load a predefined vocabulary to generate a vector for the sentence using BOW, the same is done but using TF_IDF

Saga_graph
V---------------------------[The pilot landed safely the aircraft after gear failed when approaching the runaway.]----------------------------V 
^-[The]-V-[pilot]-V-[landed]-V-[safely]-V-[the]-V-[aircraft]-V-[after]-V-[gear]-V-[failed]-V-[when]-V-[approaching]-V-[the]-V---[runaway.]----^ 
^-[the]-^         ^---[landed safely]---^---[the aircraft]---^---[after gear]---^---[failed when]---^---[approaching the]---^-[runaway]-V-[.]-^ 
        ^---[pilot landed]---^---[safely the]---^---[aircraft after]---^---[gear failed]---^---[when approaching]---^       ^---[runaway .]---^ 
^---[The pilot]---^                                                                                                 ^-----[the runaway.]------^ 
^---[the pilot]---^                                                                                                 ^---[the runaway]---^ 
^-------------------------------------------------------------------[{BOW}]-------------------------------------------------------------------^ 


V---------------------------[The pilot landed safely the aircraft after gear failed when approaching the runaway.]----------------------------V 
^-[The]-V-[pilot]-V-[landed]-V-[safely]-V-[the]-V-[aircraft]-V-[after]-V-[gear]-V-[failed]-V-[when]-V-[approaching]-V-[the]-V---[runaway.]----^ 
^-[the]-^         ^---[landed safely]---^---[the aircraft]---^---[after gear]---^---[failed when]---^---[approaching the]---^-[runaway]-V-[.]-^ 
        ^---[pilot landed]---^---[safely the]---^---[aircraft after]---^---[gear failed]---^---[when approaching]---^       ^---[runaway .]---^ 
^---[The pilot]---^                                                                                                 ^-----[the runaway.]------^ 
^---[the pilot]---^                                                                                                 ^---[the runaway]---^ 
^-----------------------------------------------------------------[{TF_IDF}]------------------------------------------------------------------^ 

Output Flags

Lex-Item Flags:

  • WEIGHT_VECTOR - Identifies the tag as a weight vector representation of a sentence

Resource Data

Description of resource.

Resource Format

Saga_json
TitleEntity Json Format
"_id" : "KGAAJGsBemSwA0nZTLXA",
"tag": "recipe",
"pattern": "("how many"|"how much") {ingredient} ",
"confAdjust": 0.95

. . . additional fields as needed go here . . . 
Note
  • Multiple entries can have the same pattern. If the pattern is matched, then it will be tagged with multiple (ambiguous) entry IDs.
  • Additional fielded data can be added to the record; as needed by downstream processes.

Fields

  • Parameter
    summaryWhat to show the user when browsing this entity
    namedisplay
    requiredtrue
  • Parameter
    summaryTag which will identify any match in the graph, as an interpretation
    nametag
    requiredtrue
    • These will all be added to the interpretation graph with the SEMANTIC_TAG flag.

      Tip

      Tags are hierarchical representations of the same intent. For example, {city} → {administrative-area} → {geographical-area}

  • Parameter
    summaryPattern to match in the content
    namepattern
    requiredtrue

Include Page
Generic Resource Fields
Generic Resource Fields