Page History

Excerpt
Creates a bag of words / tfidf tag with the vector information for the document/text_block/sentence. Accumulates the vector until the engine cannot read any further

Operates On: Lexical Items with TOKEN and possibly other flags as specified below all lexical Items.

Saga_is_recognizer

Recognizer	false

Warning
This stage is disabled in version 1.2.2

Include Page

	Generic Configuration Parameters
	Generic Configuration Parameters

Configuration Parameters

Parameter
summary JSON map resource in which the vocabulary is stored
name vocabulary
required true
Parameter
summary Type of algorithm to use then building the vector, can be either BOW or TF_IDF
default BOW
name vectorType
required true
Parameter
summary Dataset ID from which the vocabulary was extracted
name datasetId
required true
Parameter
summary Minimum number of tokens to match
default 1
name min
type integer
required true
Parameter
summary Maximum number of tokens to match
default 2
name max
type integer
required true

Saga_config_stage
"vocabulary": "saga-provider:saga_vocabulary", "vectorType": "BOW", "datasetId": "dataset-234ifgbqafgoail3", "min": 1, "max": 3,

Example Output

In this example the stage load a predefined vocabulary to generate a vector for the sentence using BOW, the same is done but using TF_IDF

Saga_graph

V---------------------------[The pilot landed safely the aircraft after gear failed when approaching the runaway.]----------------------------V 
^-[The]-V-[pilot]-V-[landed]-V-[safely]-V-[the]-V-[aircraft]-V-[after]-V-[gear]-V-[failed]-V-[when]-V-[approaching]-V-[the]-V---[runaway.]----^ 
^-[the]-^         ^---[landed safely]---^---[the aircraft]---^---[after gear]---^---[failed when]---^---[approaching the]---^-[runaway]-V-[.]-^ 
        ^---[pilot landed]---^---[safely the]---^---[aircraft after]---^---[gear failed]---^---[when approaching]---^       ^---[runaway .]---^ 
^---[The pilot]---^                                                                                                 ^-----[the runaway.]------^ 
^---[the pilot]---^                                                                                                 ^---[the runaway]---^ 
^-------------------------------------------------------------------[{BOW}]-------------------------------------------------------------------^ 


V---------------------------[The pilot landed safely the aircraft after gear failed when approaching the runaway.]----------------------------V 
^-[The]-V-[pilot]-V-[landed]-V-[safely]-V-[the]-V-[aircraft]-V-[after]-V-[gear]-V-[failed]-V-[when]-V-[approaching]-V-[the]-V---[runaway.]----^ 
^-[the]-^         ^---[landed safely]---^---[the aircraft]---^---[after gear]---^---[failed when]---^---[approaching the]---^-[runaway]-V-[.]-^ 
        ^---[pilot landed]---^---[safely the]---^---[aircraft after]---^---[gear failed]---^---[when approaching]---^       ^---[runaway .]---^ 
^---[The pilot]---^                                                                                                 ^-----[the runaway.]------^ 
^---[the pilot]---^                                                                                                 ^---[the runaway]---^ 
^-----------------------------------------------------------------[{TF_IDF}]------------------------------------------------------------------^

Output Flags

Lex-Item Flags:

WEIGHT_VECTOR - Identifies the tag as a weight vector representation of a sentence
TOKEN - Identifies that the Lex-Items produced by this stage are tokens and not text blocks.

Vertex Flags:

Info
No vertices are created in this stage

Resource Data

Description of resource.

Resource Format

Saga_json

Title	Entity Json Vocabulary Format

"_idcount" : 15,
"KGAAJGsBemSwA0nZTLXAdocsPerTerm" : 15,
"tagdatasetId" : "recipef92e1394-5f52-3331-aa6a-9c510ad31da5",
"patterntokenCount" : 1,
"(docCount"how many"|"how much") {ingredient} ": 204021,
"confAdjustword" : 0.95

. . . additional fields as needed go here . . .

Note
Multiple entries can have the same pattern. If the pattern is matched, then it will be tagged with multiple (ambiguous) entry IDs. Additional fielded data can be added to the record; as needed by downstream processes.

Fields

"depict"

Fields

Parameter
summary number of time the word appeared
name count
type integer
required true
Parameter
summary Number of document in which the word appeared
name docsPerTerm
type integer
Parameter
summary What to show the user when browsing this entity
name display
required true
Parameter
summary Tag which will identify any match in the graph, as an interpretationdataset ID in from which the vocabulary was extracted
name tagdatasetId
required true
These will all be added to the interpretation graph with the SEMANTIC_TAG flag.
Parameter
summary number of tokens for the word
name tokenCount
type integer
required true
TipTags are hierarchical representations of the same intent. For example, {city} → {administrative-area} → {geographical-area}
Parameter
summary
Pattern to match
number of documents in the
content
dataset
name
pattern
docCount
type integer
required true

Include PageGeneric Resource Fields

Parameter
summary word of the vocabulary
name word
required true

Generic Resource Fields

Page tree

Versions Compared

Old Version 2

New Version Current

Key

Configuration Parameters

Example Output

Output Flags

Lex-Item Flags:

Vertex Flags:

Resource Data

Resource Format

Fields

Fields