All Confidence Values are from 0.0 → 1.0
Current philosophy on confidence calculations
Original Text & Token Confidence = 0.5
Stages that increase ambiguity have ConfAdj < 1.0
- Lower Case, Lemmatize: ConfAdj = 0.9
Splitting does not change confidence
- Char Change Splitter, Advanced Splitter: ConfAdj = 1.0
Less Useful Items have lowered confidence
- Stop Words: ConfAdj = 0.8
Simple Recognizers increase confidence
- Email, date, number recognizers have Conf Adj = 1.1 (default)
Original Text & Token Confidence = 0.5
Stages that increase ambiguity have ConfAdj < 1.0
- Lower Case, Lemmatize: ConfAdj = 0.9
Splitting does not change confidence
- Char Change Splitter, Advanced Splitter: ConfAdj = 1.0
Less Useful Items have lowered confidence
- Stop Words: ConfAdj = 0.8
Simple Recognizers increase confidence
- Email, date, number recognizers haveConf Adj = 1.1 (default)