Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

Identifies accents in tokens and creates a new token without the accents, using the most similar letter as a replacement.


Operates On:  Lexical Items with TOKEN and possibly other flags as specified below.

Saga_is_recognizer
Recognizerfalse

Note

Currently the only unicode scripts supported are:

  • Latin
  • Greek

Any other unicode script will remain untouched

Include Page
Generic Configuration Parameters
Generic Configuration Parameters

Configuration Parameters

Saga_config_stage
boundaryFlagstext block split
requiredFlagstoken

Example Output

Saga_graph
V-----------[ÀÁÂÃÄÅ ÈÉÊË ÌÍÎÑÏ ÒÓÔÕÖ ÙÚÛÜ ÝŸ]-----------V 
^-[ÀÁÂÃÄÅ]-V-[ÈÉÊË]-V-[ÌÍÎÑÏ]-V-[ÒÓÔÕÖ]-V-[ÙÚÛÜ]-V-[ÝŸ]-^ 
^-[àáâãäå]-^-[èéêë]-^-[ìíîñï]-^-[òóôõö]-^-[ùúûü]-^-[ýÿ]-^ 
^-[AAAAAA]-^-[EEEE]-^-[IIINI]-^-[OOOOO]-^-[UUUU]-^-[YY]-^ 
^-[aaaaaa]-^-[eeee]-^-[iiini]-^-[ooooo]-^-[uuuu]-^-[yy]-^ 


V---------[ÏÐÑÒ ÓÔÕÖ × ØÙÑÚ ÛÜÝÞß!]---------V 
^-[ÏÐÑÒ]-V-[ÓÔÕÖ]-V-[×]-V-[ØÙÑÚ]-V-[ÛÜÝÞß!]-^ 
^-[ïðñò]-^-[óôõö]-^     ^-[øùñú]-^-[ûüýþß!]-^ 
^-[IÐNO]-^-[OOOO]-^     ^-[ØUNU]-^-[UUYÞß!]-^ 
^-[iðno]-^-[oooo]-^     ^-[øunu]-^-[uuyþß!]-^ 


V-------------[Ç Š Ž Œ Æ Þ Ð]-------------V 
^-[Ç]-V-[Š]-V-[Ž]-V-[Œ]-V-[Æ]-V-[Þ]-V-[Ð]-^ 
^-[ç]-^-[š]-^-[ž]-^-[œ]-^-[æ]-^-[þ]-^-[ð]-^ 
^-[C]-^-[S]-^-[Z]-^ 
^-[c]-^-[s]-^-[z]-^ 

V----------[Star, Inc., Lighting the Way...]----------V 
^-[Star,]-V-[Inc.,]-V-[Lighting]-V-[the]-V--[Way...]--^ 
^-[star,]-^-[inc.,]-^-[lighting]-^       ^--[way...]--^ 
                                         ^-[Way...]-^ 
                                         ^-[way...]-^ 

Output Flags

Lex-Item Flags:

  • HAS_ACCENT - Identifies all lexical items which contains an accent and are of the valid unicode script.
  • ACCENT_STRIPPED - All the tokens created without the accents will have .
  • TOKEN - All tokens produced are tagged as TOKEN 

Vertex Flags:

Info

No vertices are created in this stage