You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Part Of Speech tags a word in a text (corpus) as corresponding to a particular part of speech such as noun, verb, adjective, etc., based on its definition, as well as its context – i.e. relationship with adjacent and related words in a phrase, sentence, or paragraph.The tagging of each token is done with flags, meaning that no semantic tag is created with this stage.

Operates On:  Lexical Items with TOKEN and possibly other flags as specified below.

Library: saga-parts-of-speech-stage

Stage is a Recognizer for Saga Solution, and can also be used as part of a manual pipeline or a base pipeline

Currently only English is supported

Generic Configuration Parameters

  • boundaryFlags ( type=string array | optional ) - List of vertex flags that indicate the beginning and end of a text block.
    Tokens to process must be inside two vertices marked with this flag (e.g ["TEXT_BLOCK_SPLIT"])
  • skipFlags ( type=string array | optional ) - Flags to be skipped by this stage.
    Tokens marked with this flag will be ignored by this stage, and no processing will be performed.
  • requiredFlags ( type=string array | optional ) - Lex items flags required by every token to be processed.
    Tokens need to have all of the specified flags in order to be processed.
  • atLeastOneFlag ( type=string array | optional ) - Lex items flags needed by every token to be processed.
    Tokens will need at least one of the flags specified in this array.
  • confidenceAdjustment ( type=double | default=1 | required ) - Adjustment factor to apply to the confidence value of 0.0 to 2.0 from (Applies for every pattern match).
    • 0.0 to < 1.0  decreases confidence value
    • 1.0 confidence value remains the same
    • > 1.0 to  2.0 increases confidence value
  • debug ( type=boolean | default=false | optional ) - Enable all debug log functionality for the stage, if any.
  • enable ( type=boolean | default=true | optional ) - Indicates if the current stage should be consider for the Pipeline Manager
    • Only applies for automatic pipeline building

Configuration Parameters

  • prob ( type=double | default=0.7 | optional ) - Threshold within a part of speech is accepted as one
  • language ( type=string | default=en | optional ) - prefix of the model to use as part of the speech model. Currently only English is supported
  • modelPath ( type=string | optional ) - Path to the folder where the models are stored


$action.getHelper().renderConfluenceMacro("$codeS$body$codeE")

Example Output

$action.getHelper().renderConfluenceMacro("$codeS$body$codeE")

Output Flags

Lex-Item Flags:

  • TOKEN - All tokens produced are tagged as TOKEN 
  • POS_TOKEN -  Identifies the token as recognized as a part of speech
  • LANG_??? - Flags all TOKENs where a part of speech was recognized. 

    Notice '???' at the end of the Flag. This is replaced by an ISO three letter language code. 

    For example, if Spanish is detected, the three letter code is SPA, and the Flag will be "LANG_SPA"

Vertex Flags:

No vertices are created in this stage

FlagDefinition

POS_CC

Coordinating conjunction

POS_CD

Cardinal number

POS_DT

Determiner

POS_EX

Existential there

POS_FW

Foreign word

POS_IN

Preposition or subordinating conjunction

POS_JJ

Adjective

POS_JJR

Adjective, comparative

POS_JJS

Adjective, superlative

POS_LS

List item marker

POS_MD

Modal

POS_NN

Noun, singular or mass

POS_NNS

Noun, plural

POS_NNP

Proper noun, singular

POS_NNPS

'Proper noun, plural

POS_PDT

Predeterminer

POS_POS

Possessive ending

POS_PRP

Personal pronoun

POS_PRP$

Possessive pronoun

POS_RB

Adverb

POS_RBR

Adverb, comparative

POS_RBS

Adverb, superlative

POS_RP

Particle

POS_SYM

Symbol

POS_TO

to

POS_UH

Interjection

POS_VB

Verb, base form

POS_VBD

Verb, past tense

POS_VBG

Verb, gerund or present participle

POS_VBN

Verb, past participle

POS_VBP

Verb, non-3rd person singular present

POS_VBZ

Verb, 3rd person singular present

POS_WDT

Wh-determiner

POS_WP

Wh-pronoun

POS_WP$

Possessive wh-pronoun

POS_WRB

Wh-adverb


  • No labels