Page History

Excerpt
Creates a bag of words / tfidf tag with the vector information for the document/text_block/sentence. Accumulates the vector until the engine cannot read any further

Operates On: all lexical Items.

Saga_is_recognizer

Recognizer	false

Include Page

	Generic Configuration Parameters
	Generic Configuration Parameters

Configuration Parameters

Parameter
summary JSON map resource in which the vocabulary is stored
name vocabulary
required true
Parameter
summary Type of algorithm to use then building the vector, can be either BOW or TF_IDF
default BOW
name vectorType
required true
Parameter
summary Dataset ID from which the vocabulary was extracted
name datasetId
required true
Parameter
summary Minimum number of tokens to match
default 1
name min
type integer
required true
Parameter
summary Maximum number of tokens to match
default 2
name max
type integer
required true

Saga_config_stage
"vocabulary": "saga-provider:saga_vocabulary", "vectorType": "BOW", "datasetId": "dataset-234ifgbqafgoail3", "min": 1, "max": 3,

Example Output

In this example the stage load a predefined vocabulary to generate a vector for the sentence using BOW, the same is done but using TF_IDF

Saga_graph

V---------------------------[The pilot landed safely the aircraft after gear failed when approaching the runaway.]----------------------------V 
^-[The]-V-[pilot]-V-[landed]-V-[safely]-V-[the]-V-[aircraft]-V-[after]-V-[gear]-V-[failed]-V-[when]-V-[approaching]-V-[the]-V---[runaway.]----^ 
^-[the]-^         ^---[landed safely]---^---[the aircraft]---^---[after gear]---^---[failed when]---^---[approaching the]---^-[runaway]-V-[.]-^ 
        ^---[pilot landed]---^---[safely the]---^---[aircraft after]---^---[gear failed]---^---[when approaching]---^       ^---[runaway .]---^ 
^---[The pilot]---^                                                                                                 ^-----[the runaway.]------^ 
^---[the pilot]---^                                                                                                 ^---[the runaway]---^ 
^-------------------------------------------------------------------[{BOW}]-------------------------------------------------------------------^ 


V---------------------------[The pilot landed safely the aircraft after gear failed when approaching the runaway.]----------------------------V 
^-[The]-V-[pilot]-V-[landed]-V-[safely]-V-[the]-V-[aircraft]-V-[after]-V-[gear]-V-[failed]-V-[when]-V-[approaching]-V-[the]-V---[runaway.]----^ 
^-[the]-^         ^---[landed safely]---^---[the aircraft]---^---[after gear]---^---[failed when]---^---[approaching the]---^-[runaway]-V-[.]-^ 
        ^---[pilot landed]---^---[safely the]---^---[aircraft after]---^---[gear failed]---^---[when approaching]---^       ^---[runaway .]---^ 
^---[The pilot]---^                                                                                                 ^-----[the runaway.]------^ 
^---[the pilot]---^                                                                                                 ^---[the runaway]---^ 
^-----------------------------------------------------------------[{TF_IDF}]------------------------------------------------------------------^

Output Flags

Lex-Item Flags:

WEIGHT_VECTOR - Identifies the tag as a weight vector representation of a sentence
TOKEN - Identifies that the Lex-Items produced by this stage are tokens and not text blocks.

Vertex Flags:

Info
No vertices are created in this stage

Resource Data

Description of resource.

Resource Format

Saga_json

Title	Vocabulary Format

"count" : 15,
"docsPerTerm" : 15,
"datasetId" : "f92e1394-5f52-3331-aa6a-9c510ad31da5",
"tokenCount" : 1,
"docCount" : 204021,
"word" : "depict"

Fields

Parameter
summary number of time the word appeared
name count
type integer
required true
Parameter
summary Number of document in which the word appeared
name docsPerTerm
type integer
required true
Parameter
summary dataset ID in from which the vocabulary was extracted
name datasetId
required true
Parameter
summary number of tokens for the word
name tokenCount
type integer
required true
Parameter
summary number of documents in the dataset
name docCount
type integer
required true
Parameter
summary word of the vocabulary
name word
required true

Page tree

Versions Compared

Old Version 13

New Version 14

Key