Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

Part Of Speech tags a word in a text (corpus) as corresponding to a particular part of speech such as noun, verb, adjective, etc., based on its definition, as well as its context – i.e. relationship with adjacent and related words in a phrase, sentence, or paragraph.

The tagging of each token is done with flags, meaning that no semantic tag is created with this stage.

Operates On:  Lexical Items with TOKEN and possibly other flags as specified below.

Library: saga-parts-of-speech-stage

Saga_is_recognizer

Warning

Currently only English is supported

Include Page
Generic Configuration Parameters
Generic Configuration Parameters

Configuration Parameters

  • Parameter
    summaryThreshold within a part of speech is accepted as one
    default0.7
    nameprob
    typedouble
  • Parameter
    summaryprefix of the model to use as part of the speech model. Currently only English is supported
    defaulten
    namelanguage
  • Parameter
    summaryPath to the folder where the models are stored
    namemodelPath


Saga_config_stage
boundaryFlagstext block split
stagePartsOfSpeech
"prob": 0.7,
"language": "en",
"modelPath": null,

Example Output

Saga_graph
V-----------------------------------[Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .]-----------------------------------V 
^-[Pierre]-V-[Vinken]-V-[,]-V-[61]-V-[years]-V-[old]-V-[,]-V-[will]-V-[join]-V-[the]-V-[board]-V-[as]-V-[a]-V-[nonexecutive]-V-[director]-V-[Nov.]-V-[29]-V-[.]-^ 

Item [as] - [TOKEN,ORIGINAL,POS_TOKEN,POS_IN]
Item [years] - [TOKEN,ORIGINAL,POS_TOKEN,POS_NNS]
Item [old] - [TOKEN,ORIGINAL,POS_TOKEN,POS_JJ]
Item [director] - [TOKEN,ORIGINAL,POS_TOKEN,POS_NN]
Item [Pierre] - [TOKEN,ORIGINAL,POS_NNP,POS_TOKEN]
Item [the] - [TOKEN,ORIGINAL,POS_TOKEN,POS_DT]
Item [Nov.] - [TOKEN,ORIGINAL,POS_NNP,POS_TOKEN,HAS_PUNCTUATION]
Item [61] - [ALL_DIGITS,TOKEN,ORIGINAL,POS_TOKEN,POS_CD]
Item [29] - [ALL_DIGITS,TOKEN,ORIGINAL,POS_TOKEN,POS_CD]
Item [will] - [TOKEN,ORIGINAL,POS_TOKEN,POS_MD]
Item [,] - [TOKEN,ORIGINAL,ALL_PUNCTUATION,POS_TOKEN,POS_,]
Item [,] - [TOKEN,ORIGINAL,ALL_PUNCTUATION,POS_TOKEN,POS_,]
Item [join] - [TOKEN,ORIGINAL,POS_TOKEN,POS_VB]
Item [board] - [TOKEN,ORIGINAL,POS_TOKEN,POS_NN]
Item [.] - [TOKEN,ORIGINAL,ALL_PUNCTUATION,POS_TOKEN,POS_.]
Item [nonexecutive] - [TOKEN,ORIGINAL,POS_TOKEN,POS_JJ]
Item [a] - [TOKEN,ORIGINAL,POS_TOKEN,POS_DT]
Item [Vinken] - [TOKEN,ORIGINAL,POS_NNP,POS_TOKEN]

Output Flags

Lex-Item Flags:

  • TOKEN - All tokens produced are tagged as TOKEN 
  • POS_TOKEN -  Identifies the token as recognized as a part of speech
  • LANG_??? - Flags all TOKENs where a part of speech was recognized. 

    Note

    Notice '???' at the end of the Flag. This is replaced by an ISO three letter language code. 

    For example, if Spanish is detected, the three letter code is SPA, and the Flag will be "LANG_SPA"

Vertex Flags:

Info

No vertices are created in this stage

FlagDefinition

POS_CC

Coordinating conjunction

POS_CD

Cardinal number

POS_DT

Determiner

POS_EX

Existential there

POS_FW

Foreign word

POS_IN

Preposition or subordinating conjunction

POS_JJ

Adjective

POS_JJR

Adjective, comparative

POS_JJS

Adjective, superlative

POS_LS

List item marker

POS_MD

Modal

POS_NN

Noun, singular or mass

POS_NNS

Noun, plural

POS_NNP

Proper noun, singular

POS_NNPS

'Proper noun, plural

POS_PDT

Predeterminer

POS_POS

Possessive ending

POS_PRP

Personal pronoun

POS_PRP$

Possessive pronoun

POS_RB

Adverb

POS_RBR

Adverb, comparative

POS_RBS

Adverb, superlative

POS_RP

Particle

POS_SYM

Symbol

POS_TO

to

POS_UH

Interjection

POS_VB

Verb, base form

POS_VBD

Verb, past tense

POS_VBG

Verb, gerund or present participle

POS_VBN

Verb, past participle

POS_VBP

Verb, non-3rd person singular present

POS_VBZ

Verb, 3rd person singular present

POS_WDT

Wh-determiner

POS_WP

Wh-pronoun

POS_WP$

Possessive wh-pronoun

POS_WRB

Wh-adverb