Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

Creates a bag of words / tfidf tag with the vector information for the document/text_block/sentence. Accumulates the vector until the engine cannot read any further

Operates On:  Lexical Items with TOKEN and possibly other flags as specified below.

Saga_is_recognizer
Recognizerfalse

Include Page
Generic Configuration Parameters
Generic Configuration Parameters

Configuration Parameters

  • Parameter
    summaryJSON map resource in which the vocabulary is stored
    namevocabulary
    requiredtrue
  • Parameter
    summaryType of algorithm to use then building the vector, can be either BOW or TF_IDF
    defaultBOW
    namevectorType
    requiredtrue
  • Parameter
    summaryDataset ID from which the vocabulary was extracted
    namedatasetId
    requiredtrue
  • Parameter
    summaryMinimum number of tokens to match
    default1
    namemin
    typeinteger
    requiredtrue
  • Parameter
    summaryMaximum number of tokens to match
    default2
    namemax
    typeinteger
    requiredtrue


Saga_config_stage
"vocabulary": "saga-provider:saga_vocabulary",
"vectorType": "BOW",
"datasetId": "dataset-234ifgbqafgoail3",
"min": 1,
"max": 3,

Example Output

In this example the stage load a predefined vocabulary to generate a vector for the sentence using BOW, the same is done but using TF_IDF

Saga_graph
V---------------------------[The pilot landed safely the aircraft after gear failed when approaching the runaway.]----------------------------V 
^-[The]-V-[pilot]-V-[landed]-V-[safely]-V-[the]-V-[aircraft]-V-[after]-V-[gear]-V-[failed]-V-[when]-V-[approaching]-V-[the]-V---[runaway.]----^ 
^-[the]-^         ^---[landed safely]---^---[the aircraft]---^---[after gear]---^---[failed when]---^---[approaching the]---^-[runaway]-V-[.]-^ 
        ^---[pilot landed]---^---[safely the]---^---[aircraft after]---^---[gear failed]---^---[when approaching]---^       ^---[runaway .]---^ 
^---[The pilot]---^                                                                                                 ^-----[the runaway.]------^ 
^---[the pilot]---^                                                                                                 ^---[the runaway]---^ 
^-------------------------------------------------------------------[{BOW}]-------------------------------------------------------------------^ 


V---------------------------[The pilot landed safely the aircraft after gear failed when approaching the runaway.]----------------------------V 
^-[The]-V-[pilot]-V-[landed]-V-[safely]-V-[the]-V-[aircraft]-V-[after]-V-[gear]-V-[failed]-V-[when]-V-[approaching]-V-[the]-V---[runaway.]----^ 
^-[the]-^         ^---[landed safely]---^---[the aircraft]---^---[after gear]---^---[failed when]---^---[approaching the]---^-[runaway]-V-[.]-^ 
        ^---[pilot landed]---^---[safely the]---^---[aircraft after]---^---[gear failed]---^---[when approaching]---^       ^---[runaway .]---^ 
^---[The pilot]---^                                                                                                 ^-----[the runaway.]------^ 
^---[the pilot]---^                                                                                                 ^---[the runaway]---^ 
^-----------------------------------------------------------------[{TF_IDF}]------------------------------------------------------------------^ 

Output Flags

Lex-Item Flags:

  • WEIGHT_VECTOR - Identifies the tag as a weight vector representation of a sentence

Resource Data

Description of resource.

Resource Format

Saga_json
TitleEntity Json Format
"_idcount" : 15,
"KGAAJGsBemSwA0nZTLXAdocsPerTerm" : 15,
"tagdatasetId" : "recipef92e1394-5f52-3331-aa6a-9c510ad31da5",
"patterntokenCount" : 1,
"(docCount"how many"|"how much") {ingredient} ": 204021,
"confAdjustword" : 0.95

. . . additional fields as needed go here . . . 
Note
  • Multiple entries can have the same pattern. If the pattern is matched, then it will be tagged with multiple (ambiguous) entry IDs.
  • Additional fielded data can be added to the record; as needed by downstream processes.

Fields

"depict",


Fields

  • Parameter
    summarynumber of time the word appeared
    namecount
    typeinteger
    requiredtrue
  • Parameter
    summaryNumber of document in which the word appeared
    namedocsPerTerm
    typeinteger
    Parameter
    summaryWhat to show the user when browsing this entity
    namedisplay
    requiredtrue
  • Parameter
    summaryTag which will identify any match in the graph, as an interpretationdataset ID in from which the vocabulary was extracted
    nametagdatasetId
    requiredtrue

    These will all be added to the interpretation graph with the SEMANTIC_TAG flag.

  • Parameter
    summarynumber of tokens for the word
    nametokenCount
    typeinteger
    requiredtrue
    TipTags are hierarchical representations of the same intent. For example, {city} → {administrative-area} → {geographical-area}
  • Parameter
    summary
    Pattern to match
    number of documents in the
    content
    dataset
    namedocCount
    pattern
    typeinteger
    requiredtrue
Include PageGeneric Resource Fields
  • Parameter
    summaryword of the vocabulary
    nameword
    requiredtrue

Generic Resource Fields