Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

All Confidence Values are from 0.0 → 1.0

Confidence = Confidence factor that the interpretation is correct

Note

Note: Currently it’s a factor, not a probability

  • FUTURE WORK:  Make confidence a true probability score
Note

Note: Multiple different interpretations can be correct at the same time

  • Example:  A Token interpretation and a Semantic Interpretation

Confidence Adjustment (ConfAdj) = Human-adjustable configuration parameter to adjust (boost or reduce) confidence values

  • Confidence Adjustment < 1.0 : Reduce confidence
    • Multiplied by underlying confidence value
    • For example:  0.7 = “new confidence is 70% of old confidence”
  • Confidence Adjustment = 1.0 : Leave confidence alone
  • Confidence Adjustment > 1.0 : Increase confidence
    • Identifies the percentage confidence growth towards 1.0
    • For example:  1.3 = “move 30% of the way towards 1.0”
Panel
borderColor#ababab
titleColorWhite
borderWidth2
titleBGColor#ababab
borderStylesolid
titleConfidence Adjustment

Original Confidence = 0.5

If ConfAdj = 0.7
  • New Confidence = 0.35
  • (0.5 * 0.7) = 0.35

If ConfAdj = 1.3

  • New Confidence = 0.65
  • 0.5 + (1.3 - 1) * (1 - 0.5) = 0.65

If ConfAdj = 0

  • New Confidence = 0

If ConfAdj = 1.0

  • Confidence is unchanged

Current philosophy on confidence calculations

Original Text & Token Confidence = 0.5

Stages that increase ambiguity have ConfAdj < 1.0

  • Lower Case, Lemmatize: ConfAdj = 0.9

Splitting does not change confidence

  • Char Change Splitter, Advanced Splitter: ConfAdj = 1.0

Less Useful Items have lowered confidence

  • Stop Words: ConfAdj = 0.8

Simple Recognizers increase confidence

  • Email, date, number recognizers have Conf Adj = 1.1 (default)

Original Text & Token Confidence = 0.5

Stages that increase ambiguity have ConfAdj < 1.0

  • Lower Case, Lemmatize: ConfAdj = 0.9

Splitting does not change confidence

  • Char Change Splitter, Advanced Splitter: ConfAdj = 1.0

Less Useful Items have lowered confidence

  • Stop Words: ConfAdj = 0.8

Simple Recognizers increase confidence

Larger patterns increase confidence

  • Regex default global CondAdj = 1.1
  • Can be further adjusted per individual pattern

Advanced patterns combine confidence of their sub-components

  • Using the “Probability Combination” formula:
    • ProbComb(c1, c2, c3, . . . ) =  1 - (1-c1)(1-c2). . .
  • Additional adjustment on top is usually 1.1
  • Further can be adjusted per individual pattern

Best Overall Interpretation is based on Average

  • “Best Route” = The route through the interpretation graph which has the best average confidence from start to finishEmail, date, number recognizers haveConf Adj = 1.1 (default)