You are viewing an old version of this page. View the current version.
Compare with Current
View Page History
« Previous
Version 9
Next »
Identifies tokens which look like numbers and flags them with the "NUMBER" flag.
Currently handles the following situations:
- Integers: 0, 1, 100
- Floats: 0.5, 12.2, 3.14159 (.)
- Negative: -1003, -12.2
- Thousands with/without separator: 1000000, 1,000,000 (,)
- Numbers with scientific notation: 1,1x10^-8, 1.1x10??, 1,1x10^8, -1.1x10??
- Romans numbers: MMC, XII, IV
- English ordinals numbers: 1st, 12th, 23rd
- Exponents: 4?, 4^9, 4^-9, 4??
Does NOT currently recognize:
- Computer literals: 0xBEA1, 07832
- European numbers with commas and periods swapped
- Positive: +102.3
Operates On: Lexical Items with TOKEN
Configuration Parameters
- skipFlags (string array, optional) - Flags to be skipped by this stage
- Tokens marked with this flags will be ignore by this stage, and no process will be performed.
- requiredFlags (string array, optional)
- Tokens need to have all the specified flags, in order to be processed
- debug (boolean, optional)
- Enable all debug log functionality of the stage, if any.
- scientificNotation (boolean, optional)
- Enable recognition of scientific notation.
- ordinals (boolean, optional)
- Enable ordinals numbers recognition.
- romans (boolean, optional)
- Enable romans number recognition.
- ordinalsLang (string, optional)
- Choose ordinals language. Currently supported english: "en"
{
"type": "NumberRecognizer",
"scientificNotation": false,
"ordinals": true,
"romans": false,
"ordinalsLang" = "en"
}
Flags
Lex-Item Flags:
- NUMBER - Flagged on all tokens which are numbers according to the rules above.
Vertex Flags:
None.
Example
V----------[1984 42 -10 3.14 ]-----------V
^--[1984]--V--[42]--V--[-10]--V--[3.14]--^
Item [1984] - [NUMBER,TOKEN]
Item [42] - [NUMBER,TOKEN]
Item [-10] - [NUMBER,TOKEN]
Item [3.14] - [NUMBER,TOKEN]