Page History

...

Operates On: Lexical Items with TOKEN or SEMANTIC_TAG and

...

other possible flags as specified below, but not on TEXT_BLOCK.

Saga_is_recognizer

Include Page

	Generic Configuration Parameters
	Generic Configuration Parameters

Configuration Parameters

boundaryFlags (string, optional)
- The tokens to process must be inside two vertex mark with this flags (e.g ["TEXT_BLOCK_SPLIT"])
skipFlags (string array, optional) - Flags to be skipped by this stage
- Tokens marked with this flags will be ignore by this stage, and no process will be performed.
requiredFlags (string array, optional)
- Tokens need to have all the specified flags, in order to be processed.
atLeastOneFlag (string array, optional)
- Tokens will need at least one of the flags specify in this array.
debug (boolean, optional)
- Enable all debug log functionality of the stage, if any.

Configuration Parameters

patterns (string, required) - The resource which contains the pattern database
- See below for the format.

Code Block

language	js
theme	Eclipse
title	Example Configuration

{
 "type":"Fragmentation",
 "patameter":"fragmented-provider:patterns",
 "boundaryFlags":["TEXT_BLOCK_SPLIT"]
}

Example Output

Description

...

language	text
theme	FadeToGrey

Parameter
summary The resource that contains the pattern database.
name patterns
required true
- For the patterns format see below.
Parameter
summary If true the stage will prefer larger patterns
default true
name preferLarge
type boolean

Code Block

boundaryFlags	text block split
stage	Fragmentation
requiredFlags	token, semantic tag
language	js
skipFlags	skip

"patterns":"saga_provider:fragmented_patterns",
"preferLarge":true

Example Output

Code Block

language	text

V--------------[abraham lincoln likes macaroni and cheese]--------------------V
^--[abraham]--V--[lincoln]--V--[likes]--V--[macaroni]--V--[and]--V--[cheese]--^

...

              ^---{place}---^           ^----{food}----^         ^---{food}---^
^----------{person}---------^           ^-----------------{food}--------------^

Output Flags

Lex-Item Flags:

SEMANTIC_TAG - Identifies all lexical items

...

that are semantic tags.
FRAGMENT- Identifies all lexical items

...

that were created from a fragmentation pattern.

...

Vertex Flags:

...

Info
No vertices are created in this stage

Resource Data

The resource data

...

is a database of fragmented patterns, and the resulting semantic

...

tags they produce.

Resource Format

The only required file

...

is

...

the entity dictionary. It is a series of JSON records, typically indexed by entity ID.

Description of entity

...

Entity JSON

...

Format
...
Anchor

...

resourceFormat

...

resourceFormat

...

Code Block

...

Title	Entity

...

Json Format

...


language	js

"tag": "

...

{city}",

...

pattern":

...

("

...

how many"

...

|"how much") {ingredient} ",
"confAdjust": 0.95
. . . additional fields as needed go here . . .

Code Block

language	js
title	Entity JSON Format

"_id" : "KGAAJGsBemSwA0nZTLXA",
"tag":["recipe"],
"pattern": "{number} {ingredient}",
"options": {

...

  "minTokens": 3,

...

"maxTokens":

...

2,

...

  "combination": true

...

confAdjust":0.95

...

. . . additional fields as needed go here . . .

...

Note
Multiple

...

entries can have the same pattern.

...

If the pattern is matched, then it will be tagged with multiple (ambiguous)

...

entry IDs.
Additional fielded data can be added to the record

...

; as needed by downstream processes.

Fields

...

Typically this is an identifier with meaning to the larger application which is using the Language Processing Toolkit.

Parameter
summary Tag which will identify any match in the graph, as an interpretation
name tag
required true

...

- These will all be added to the interpretation graph with the SEMANTIC_TAG flag.

...

- Tip
  Tags are hierarchical representations of the same intent. For example, {city} → {administrative-area} → {geographical-area}

...

Parameter
summary Pattern to match in the content

...

Patterns will be tokenized and there may be multiple variations which can match.
NOTE: Currenty, tokens are separated on simple white-space and punctuation, and then reduced to lowercase.
TODO: This will need to be improved in the future, perhaps by specifying a pipeline to perform the tokenization and to allow for multiple variations.

name pattern
required true

Parameter
summary

Object with options applicable for this entity

...

name options
type json
- Parameter
  summary Minimum number of tokens the match must contains to be valid

...

- . The default is the number of tokens contained in each pattern

...

- .
  name minTokens
  type integer
- Parameter
  summary Maximum number of tokens

...

- the match must

...

- contain to be valid

...

- . The default is the number of tokens contained in each pattern

...

- name maxTokens
  type integer
- Parameter
  summary ndicates

...

- if the given tokens can be

...

- matched in any order

...

- as long as all appear in the match

...

- . If false,

...

- the tokens

...

- must be in the order provided

...

This is the confidence of the entity, in comparison to all of the other entities. Essentially, the likelihood that this entity will be randomly encountered.

Other, Optional Fields

...

- .
  default true
  name combination

Include Page

	Generic Resource Fields
	Generic Resource Fields

Page tree

Versions Compared

Old Version 3

New Version Current

Key

Configuration Parameters

Configuration Parameters

Example Output

Output Flags

Lex-Item Flags:

Vertex Flags:

Info
No vertices are created in this stage

Resource Data

Resource Format

Description of entity

Entity JSON

Format
...
Anchor

resourceFormat

resourceFormat

Fields

Other, Optional Fields

Page tree

Page History

Versions Compared

Old Version 3

New Version Current

Key

Configuration Parameters

Configuration Parameters

Example Output

Output Flags

Lex-Item Flags:

Vertex Flags:

InfoNo vertices are created in this stage

Resource Data

Resource Format

Description of entity

Entity JSON

Format ... Anchor

resourceFormat

resourceFormat

Fields

Other, Optional Fields

Info
No vertices are created in this stage

Format
...
Anchor