Parts Of Speech Stage

Part Of Speech tags a word in a text (corpus) as corresponding to a particular part of speech such as noun, verb, adjective, etc., based on its definition, as well as its context. Using OpenNLP (https://opennlp.apache.org/) and its POS TaggerThe tagging of each token is done with flags, meaning that no semantic tag is created with this stage.

Operates On: Lexical Items with TOKEN and possibly other flags as specified below.

Library: saga-parts-of-speech-stage

Stage is a Recognizer for Saga Solution, and can also be used as part of a manual pipeline or a base pipeline

Currently only English is supported

Generic Configuration Parameters

boundaryFlags ( type=string array | optional ) - List of vertex flags that indicate the beginning and end of a text block.
Tokens to process must be inside two vertices marked with this flag (e.g ["TEXT_BLOCK_SPLIT"])
skipFlags ( type=string array | optional ) - Flags to be skipped by this stage.
Tokens marked with this flag will be ignored by this stage, and no processing will be performed.
requiredFlags ( type=string array | optional ) - Lex items flags required by every token to be processed.
Tokens need to have all of the specified flags in order to be processed.
atLeastOneFlag ( type=string array | optional ) - Lex items flags needed by every token to be processed.
Tokens will need at least one of the flags specified in this array.
confidenceAdjustment ( type=double | default=1 | required ) - Adjustment factor to apply to the confidence value of 0.0 to 2.0 from (Applies for every pattern match).
- 0.0 to < 1.0 decreases confidence value
- 1.0 confidence value remains the same
- > 1.0 to 2.0 increases confidence value
debug ( type=boolean | default=false | optional ) - Enable all debug log functionality for the stage, if any.
enable ( type=boolean | default=true | optional ) - Indicates if the current stage should be consider for the Pipeline Manager
- Only applies for automatic pipeline building

Configuration Parameters

prob ( type=double | default=0.7 | optional ) - Threshold within a part of speech is accepted as one
language ( type=string | default=en | optional ) - prefix of the model to use as part of the speech model. Currently only English is supported
modelPath ( type=string | optional ) - Path to the folder where the models are stored

Example Configuration

{
	"prob": 0.7,
	"language": "en",
	"modelPath": null
}

Example Output

V-----------------------------------[Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .]-----------------------------------V 
^-[Pierre]-V-[Vinken]-V-[,]-V-[61]-V-[years]-V-[old]-V-[,]-V-[will]-V-[join]-V-[the]-V-[board]-V-[as]-V-[a]-V-[nonexecutive]-V-[director]-V-[Nov.]-V-[29]-V-[.]-^ 

Item [as] - [TOKEN,ORIGINAL,POS_TOKEN,POS_IN]
Item [years] - [TOKEN,ORIGINAL,POS_TOKEN,POS_NNS]
Item [old] - [TOKEN,ORIGINAL,POS_TOKEN,POS_JJ]
Item [director] - [TOKEN,ORIGINAL,POS_TOKEN,POS_NN]
Item [Pierre] - [TOKEN,ORIGINAL,POS_NNP,POS_TOKEN]
Item [the] - [TOKEN,ORIGINAL,POS_TOKEN,POS_DT]
Item [Nov.] - [TOKEN,ORIGINAL,POS_NNP,POS_TOKEN,HAS_PUNCTUATION]
Item [61] - [ALL_DIGITS,TOKEN,ORIGINAL,POS_TOKEN,POS_CD]
Item [29] - [ALL_DIGITS,TOKEN,ORIGINAL,POS_TOKEN,POS_CD]
Item [will] - [TOKEN,ORIGINAL,POS_TOKEN,POS_MD]
Item [,] - [TOKEN,ORIGINAL,ALL_PUNCTUATION,POS_TOKEN,POS_,]
Item [,] - [TOKEN,ORIGINAL,ALL_PUNCTUATION,POS_TOKEN,POS_,]
Item [join] - [TOKEN,ORIGINAL,POS_TOKEN,POS_VB]
Item [board] - [TOKEN,ORIGINAL,POS_TOKEN,POS_NN]
Item [.] - [TOKEN,ORIGINAL,ALL_PUNCTUATION,POS_TOKEN,POS_.]
Item [nonexecutive] - [TOKEN,ORIGINAL,POS_TOKEN,POS_JJ]
Item [a] - [TOKEN,ORIGINAL,POS_TOKEN,POS_DT]
Item [Vinken] - [TOKEN,ORIGINAL,POS_NNP,POS_TOKEN]

Output Flags

Lex-Item Flags:

TOKEN - All tokens produced are tagged as TOKEN
POS_TOKEN - Identifies the token as recognized as a part of speech
POS_??? - Flags all TOKENs where a part of speech was recognized.

Notice '???' at the end of the Flag. This is replaced by an acronym of the part-to-speech identified.
For example, if a base form verb is detected, the acronym is VB, and the Flag will be "POS_VB"

Vertex Flags:

No vertices are created in this stage

Flag	Definition
POS_CC	Coordinating conjunction
POS_CD	Cardinal number
POS_DT	Determiner
POS_EX	Existential there
POS_FW	Foreign word
POS_IN	Preposition or subordinating conjunction
POS_JJ	Adjective
POS_JJR	Adjective, comparative
POS_JJS	Adjective, superlative
POS_LS	List item marker
POS_MD	Modal
POS_NN	Noun, singular or mass
POS_NNS	Noun, plural
POS_NNP	Proper noun, singular
POS_NNPS	'Proper noun, plural
POS_PDT	Predeterminer
POS_POS	Possessive ending
POS_PRP	Personal pronoun
POS_PRP$	Possessive pronoun
POS_RB	Adverb
POS_RBR	Adverb, comparative
POS_RBS	Adverb, superlative
POS_RP	Particle
POS_SYM	Symbol
POS_TO	to
POS_UH	Interjection
POS_VB	Verb, base form
POS_VBD	Verb, past tense
POS_VBG	Verb, gerund or present participle
POS_VBN	Verb, past participle
POS_VBP	Verb, non-3rd person singular present
POS_VBZ	Verb, 3rd person singular present
POS_WDT	Wh-determiner
POS_WP	Wh-pronoun
POS_WP$	Possessive wh-pronoun
POS_WRB	Wh-adverb

Page tree