Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

Part Of Speech tags a word in a text (corpus) as corresponding to a particular part of speech such as noun, verb, adjective, etc., based on its definition, as well as its context. Using OpenNLP (https://opennlp.apache.org/) and its POS Tagger

The tagging of each token is done with flags, meaning that no semantic tag is created with this stage.

Operates On:  Lexical Items with TOKEN and possibly other flags as specified below.

Library: saga-parts-of-speech-stage

Saga_is_recognizer

Warning

Currently only English is supported

Include Page
Generic Configuration Parameters
Generic Configuration Parameters

Configuration Parameters

  • Parameter
    summaryThreshold within a part of speech is accepted as one
    default0.7
    nameprob
    typedouble
  • Parameter
    summaryprefix of the model to use as part of the speech model. Currently only English is supported
    defaulten
    namelanguage
  • Parameter
    summaryPath to the folder where the models are stored
    namemodelPath


Code Block
languagejs
themeEclipse
titleExample Configuration
{
	
Saga_config_stage
boundaryFlagstext block split
stagePartsOfSpeech
"prob": 0.7,
	"language": "en",
	"modelPath": null,
}


Example Output


saga_graph
Code Block
languagetext
themeFadeToGrey
V-----------------------------------[Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .]-----------------------------------V 
^-[Pierre]-V-[Vinken]-V-[,]-V-[61]-V-[years]-V-[old]-V-[,]-V-[will]-V-[join]-V-[the]-V-[board]-V-[as]-V-[a]-V-[nonexecutive]-V-[director]-V-[Nov.]-V-[29]-V-[.]-^ 

Item [as] - [TOKEN,ORIGINAL,POS_TOKEN,POS_IN]
Item [years] - [TOKEN,ORIGINAL,POS_TOKEN,POS_NNS]
Item [old] - [TOKEN,ORIGINAL,POS_TOKEN,POS_JJ]
Item [director] - [TOKEN,ORIGINAL,POS_TOKEN,POS_NN]
Item [Pierre] - [TOKEN,ORIGINAL,POS_NNP,POS_TOKEN]
Item [the] - [TOKEN,ORIGINAL,POS_TOKEN,POS_DT]
Item [Nov.] - [TOKEN,ORIGINAL,POS_NNP,POS_TOKEN,HAS_PUNCTUATION]
Item [61] - [ALL_DIGITS,TOKEN,ORIGINAL,POS_TOKEN,POS_CD]
Item [29] - [ALL_DIGITS,TOKEN,ORIGINAL,POS_TOKEN,POS_CD]
Item [will] - [TOKEN,ORIGINAL,POS_TOKEN,POS_MD]
Item [,] - [TOKEN,ORIGINAL,ALL_PUNCTUATION,POS_TOKEN,POS_,]
Item [,] - [TOKEN,ORIGINAL,ALL_PUNCTUATION,POS_TOKEN,POS_,]
Item [join] - [TOKEN,ORIGINAL,POS_TOKEN,POS_VB]
Item [board] - [TOKEN,ORIGINAL,POS_TOKEN,POS_NN]
Item [.] - [TOKEN,ORIGINAL,ALL_PUNCTUATION,POS_TOKEN,POS_.]
Item [nonexecutive] - [TOKEN,ORIGINAL,POS_TOKEN,POS_JJ]
Item [a] - [TOKEN,ORIGINAL,POS_TOKEN,POS_DT]
Item [Vinken] - [TOKEN,ORIGINAL,POS_NNP,POS_TOKEN]


Output Flags

Lex-Item Flags:

  • TOKEN - All tokens produced are tagged as TOKEN 
  • POS_TOKEN -  Identifies the token as recognized as a part of speech
  • POS_??? - Flags all TOKENs where a part of speech was recognized. 

    Note

    Notice '???' at the end of the Flag. This is replaced by an acronym of the part-to-speech identified. 

    For example, if a base form verb is detected, the acronym is VB, and the Flag will be "POS_VB"

Vertex Flags:

Info

No vertices are created in this stage

FlagDefinition

POS_CC

Coordinating conjunction

POS_CD

Cardinal number

POS_DT

Determiner

POS_EX

Existential there

POS_FW

Foreign word

POS_IN

Preposition or subordinating conjunction

POS_JJ

Adjective

POS_JJR

Adjective, comparative

POS_JJS

Adjective, superlative

POS_LS

List item marker

POS_MD

Modal

POS_NN

Noun, singular or mass

POS_NNS

Noun, plural

POS_NNP

Proper noun, singular

POS_NNPS

'Proper noun, plural

POS_PDT

Predeterminer

POS_POS

Possessive ending

POS_PRP

Personal pronoun

POS_PRP$

Possessive pronoun

POS_RB

Adverb

POS_RBR

Adverb, comparative

POS_RBS

Adverb, superlative

POS_RP

Particle

POS_SYM

Symbol

POS_TO

to

POS_UH

Interjection

POS_VB

Verb, base form

POS_VBD

Verb, past tense

POS_VBG

Verb, gerund or present participle

POS_VBN

Verb, past participle

POS_VBP

Verb, non-3rd person singular present

POS_VBZ

Verb, 3rd person singular present

POS_WDT

Wh-determiner

POS_WP

Wh-pronoun

POS_WP$

Possessive wh-pronoun

POS_WRB

Wh-adverb