Page History

...

Operates On: Lexical Items with TOKEN or SEMANTIC_TAG and

...

other possible flags as specified below, but not on TEXT_BLOCK.

Saga_is_recognizer

Include Page

	Generic Configuration Parameters

...

	Generic Configuration Parameters

Configuration Parameters

Parameter
summary The resource that contains the pattern database

See below for the format.

...

The tokens to process must be inside two vertex mark with this flags (e.g ["TEXT_BLOCK_SPLIT"])

...

Tokens marked with this flags will be ignore by this stage, and no process will be performed.

...

Tokens need to have all the specified flags, in order to be processed

...

Enable all debug log functionality of the stage, if any.

.
name patterns
required true
- For the patterns format see below.

Info

In version 1.2.2 this parameter was added:

Parameter
summary If true the stage will prefer larger patterns
default true
name preferLarge
type boolean

Saga_config_stage

boundaryFlags	text block split
stage	Fragmentation
requiredFlags	token, semantic tag
skipFlags	skip

"patterns":"saga_provider:fragmented_patterns",
"maxRepeats": 5

Example Output

Saga_graph

Code Block

language	js
theme	Eclipse
title	Example Configuration

{
 "type":"Fragmentation",
 "patameter":"fragmented-provider:patterns",
 "boundaryFlags":["TEXT_BLOCK_SPLIT"]
}

Example Output

Description

...


V--------------[abraham lincoln likes macaroni and cheese]--------------------V ^--[abraham]--V--[lincoln]--V--[likes]--V--[macaroni]--V--[and]--V--[cheese]--^

...

              ^---{place}---^           ^----{food}----^         ^---{food}---^
^----------{person}---------^           ^-----------------{food}--------------^

Output Flags

Lex-Item Flags:

SEMANTIC_TAG - Identifies all lexical items

...

that are semantic tags.
FRAGMENT- Identifies all lexical items

...

that were created from a fragmentation pattern.

...

Vertex Flags:

...

Info
No vertices are created in this stage

Resource Data

The resource data

...

is a database of fragmented patterns, and the resulting semantic

...

tags they produce.

Resource Format

The only required file

...

is the entity dictionary. It is a series of JSON records, typically indexed by entity ID.

Description of entity

...

Entity JSON

...

Format
...
Anchor

...

resourceFormat

...

resourceFormat

...

Saga_json

Title	Entity

...

Json Format

...

tag": "

...

{city}",

...

pattern":

...

("

...

how many"

...

|"how much") {ingredient} ",
"confAdjust": 0.95
. . . additional fields as needed go here . . .

Code Block

language	js
theme	Eclipse
title	Entity JSON Format

"_id" : "KGAAJGsBemSwA0nZTLXA",
"tag":["recipe"],
"pattern": "{number} {ingredient}",
"options": {

...

  "minTokens": 3,

...

  "maxTokens":

...

2,

...

"combination": true

...

confAdjust":0.95

...

. . . additional fields as needed go here . . .

...

Note
Multiple

...

entries can have the same pattern.

...

If the pattern is matched, then it will be tagged with multiple (ambiguous)

...

entry IDs.
Additional fielded data can be added to the record

...

; as needed by downstream processes.

Fields

...

Typically this is an identifier with meaning to the larger application which is using the Language Processing Toolkit.

Parameter
summary Tag which will identify any match in the graph, as an interpretation
name tag
required true

...

- These will all be added to the interpretation graph with the SEMANTIC_TAG flag.

...

- Tip
  Tags are hierarchical representations of the same intent. For example, {city} → {administrative-area} → {geographical-area}

...

Parameter
summary Pattern to match in the content

...

Patterns will be tokenized and there may be multiple variations which can match.
NOTE: Currenty, tokens are separated on simple white-space and punctuation, and then reduced to lowercase.
TODO: This will need to be improved in the future, perhaps by specifying a pipeline to perform the tokenization and to allow for multiple variations.

name pattern
required true

Parameter
summary Object

...

with options applicable for this entity

...

name options
type json
- Parameter
  summary Minimum number of tokens the match must contains to be valid

...

- . The default is the number of tokens contained in each pattern

...

- .
  name minTokens
  type integer
- Parameter
  summary Maximum number of tokens

...

- the match must

...

- contain to be valid

...

- . The default is the number of tokens contained in each pattern

...

- name maxTokens
  type integer
- Parameter
  summary ndicates

...

- if the given tokens can be

...

- matched in any order

...

- as long as all appear in the match

...

- . If false,

...

- the tokens

...

- must be in the order provided

...

This is the confidence of the entity, in comparison to all of the other entities. Essentially, the likelihood that this entity will be randomly encountered.

Other, Optional Fields

...

- .
  default true
  name combination

Include Page

	Generic Resource Fields
	Generic Resource Fields

Page tree

Versions Compared

Old Version 1

New Version Current

Key

Configuration Parameters

Example Output

Output Flags

Lex-Item Flags:

Vertex Flags:

Resource Data

Resource Format

Description of entity

Entity JSON

Format
...
Anchor

resourceFormat

resourceFormat

Fields

Other, Optional Fields

Page tree

Page History

Versions Compared

Old Version 1

New Version Current

Key

Configuration Parameters

Example Output

Output Flags

Lex-Item Flags:

Vertex Flags:

Resource Data

Resource Format

Description of entity

Entity JSON

Format ... Anchor

resourceFormat

resourceFormat

Fields

Other, Optional Fields

Format
...
Anchor