Page History

Excerpt
This stage uses Apache Lucene™ to create a custom Lucene pipeline. It offers a large amount of possible tokenizers and filters to adapt to the users needs.

Operates On: Lexical Items with TOKEN and possibly other flags as specified below.

Saga_is_recognizer

Info
A Lucene Custom Analyzer is composed of two components: the Tokenizer and the Filters (which can be stacked to use more than one at a time).

Include Page

	Generic Configuration Parameters
	Generic Configuration Parameters

Configuration Parameters

Parameter
summary Tokenizer to use for the pipeline (only one can be used at a time).
default None
name Tokenizer
required true
- It offers more than 10 different Tokenizers from nGrams to Japanese Tokenizers.
Parameter
summary Filter to use for the pipeline (can be stacked).
default None
name Filter

Code Block

boundaryFlags	text block split
language	js

"atLeastOneFlag": []
"boundaryFlags": []
"confidenceAdjustment": 1
"debug": false
"requiredFlags": []
"skipFlags": []
"tokenizer": "whitespace",
"filter": None

Example Output

Using Whitespace Tokenizer alone

Code Block

language	text

V-------------[Hey there! I am using Lucene Pipeline]-------------V 
^-[Hey]-V-[there!]-V-[I]-V-[am]-V-[using]-V-[Lucene]-V-[Pipeline]-^

Output Flags

Lex-Item Flags:

ALL_LETTERS- All of the characters in the token are characters.
ALL_PUNCTUATION - All of the characters in the token have punctuation.
ALL_DIGITS - All of the characters in the token are digits (0-9)
TOKEN - All tokens produced are tagged as TOKEN
HAS_LETTER - Tokens produced with at least one letter character are tagged as HAS_LETTER
HAS_DIGIT - Tokens produced with at least one digit character are tagged as HAS_DIGIT
HAS_PUNCTUATION - Tokens produced with at least one punctuation character are tagged as HAS_PUNCTUATION. (ALL_PUNCTUATION will not be tagged as HAS_PUNCTUATION)
LUCENE_STAGE- All words retrieved will be marked as LUCENE_STAGE

Vertex Flags:

Info
No vertices are created in this stage

Page tree

Versions Compared

Old Version 8

New Version Current

Key

Configuration Parameters

Example Output

Output Flags

Lex-Item Flags:

Vertex Flags: