Identifies geo locations, based on the patterns loaded.

Configuration

  • Remove Accents & Diacritics ( type=boolean | default=false | optional ) - Removes accents and other symbols like "Ä, ë, Ï" that might affect the recognizer and replaces them by their normalized form "A, e, I".
  • Remove Characters ( type=boolean | default=false | optional ) - Removes user defined characters and replaces them with whitespaces.
  • To Lowercase ( type=boolean | default=true | optional ) - Transforms every line to lowercase.
  • Characters To Remove ( type=string | default=false | optional ) - Only active if "Remove Characters" is set to True. User defined characters to remove.
  • Language Tokenizer ( type=string | default=Latin-script Alphabet | optional ) - Internal tokenizer of the recognizer
    • The available tokenizers are:
      • Latin-script alphabet
      • Korean
      • Japanese
      • Chinese

  • Minimum Length Characters ( type=integer | default=3 | required ) - Sets a minimum length for geonames.
  • stream, lake, ... ( type=boolean | default=false | optional ) - Filter all rivers, lakes, streams, etc from GeoNames.
  • city, village, ... ( type=boolean | default=true | optional ) - Filter all cities, villages, towns, etc from GeoNames.
  • spot, building, farm ( type=boolean | default=false | optional ) - Filter all spots, buildings and farms from GeoNames.
  • undersea ( type=boolean | default=false | optional ) - Filter all undersea elements from GeoNames.
  • uncategorized ( type=boolean | default=false | optional ) - Filter all uncategorized elements from GeoNames.
  • country, state, region, ... ( type=boolean | default=false | optional ) - Filter all countries, states, regions, etc from GeoNames.
  • parks, area, ... ( type=boolean | default=false | optional ) - Filter all parks, areas, etc elements from GeoNames.
  • road, railroad ( type=boolean | default=false | optional ) - Filter all roads and railroads from GeoNames.
  • mountain, hill, rock, ... ( type=boolean | default=false | optional ) - Filter all mountains, hills, rocks, etc elements from GeoNames.
  • forest, heath, ... ( type=boolean | default=false | optional ) - Filter all forests, heaths, etc elements from GeoNames.


Adding a new GeoName

Click on the  button to open the "Add new " dialog


  • Patterns ( type=string | required ) - Patterns to find in GeoNames
  • ID ( type=string | default=A0000 | required ) - ID for the pattern
  • Display ( type=string | optional ) - Display on UI for the pattern
  • Confidence Adjustment ( type=double | default=1 | required ) - Adjustment factor to apply to the confidence value of 0.0 to 2.0 from (Applies for every pattern match).
    • 0.0 to < 1.0  decreases confidence value
    • 1.0 confidence value remains the same
    • > 1.0 to  2.0 increases confidence value

General Settings

The general settings can be accessed by clicking on

More settings could be displayed in the same dialog, it varies per recognizer.


  • Enable - Enable the processor to be use in pipelines.
  • Base Pipeline - Indicates the last stage, from a pipeline, needed by the recognizer.
  • Skip Flags ( optional ) - Lexical items flags to be ignored by this processor.
  • Boundary Flags  ( optional ) - List of vertex flags that indicate the beginning and end of a text block.
  • Required Flags ( optional ) - Lexical items flags required by every token to be processed.
  • At Least One Flag ( optional ) - Lexical items flags needed by every token to be processed.
  • Don't Process Flags ( optional ) - List of lexical items flags that are not processed. The difference with "Skip Flags" is that this will drop the path in the Saga graph, skip just skips the token and continues in the same path.
  • Confidence Adjustment - Adjustment factor to apply to the confidence value of 0.0 to 2.0 from (Applies for every match).
    • 0.0 to < 1.0  decreases confidence value
    • 1.0 confidence value remains the same
    • > 1.0 to  2.0 increases confidence value
  • Debug - Enable debug logging.

  • No labels