Identifies tokens that look like dates or time indicators and flags them with the "DATE" flag.

Operates On:  Lexical Items with TOKEN and possibly other flags as specified below, but not on ALL_PUNCTUATION.

Stage is a Recognizer for Saga Solution, and can also be used as part of a manual pipeline or a base pipeline


Currently handles the following situations:

  • Month and Day only: Jan 25, January 25

  • Year and Month only: 2019 Jan, 2019 January
  • Month and Year only: Jan 2019, January 2019, January of 2019, Jan of '19
  • Day Month Year: 26 April 2019, 26th of April, 2019
  • Month Day Year: Jan. 22, 2001, January the 22nd of 2001
  • Day Name, Day Month Year: Friday, 26 Apr 19, Friday the 26th of April 2019
  • YYYYMMDD Format: 20190125
  • MMDDYYYY Format: 01252019
  • Using separators too: 2019-01-25, 2019/1/25, 01/25/2019, 2019/01/2
  • Dates with Time: 2019-01-25T10:25
  • Dates with Time wihout separators: 20190125T102500
  • Using the 'Z' character at the end: 2019-01-25T10:25:10Z
  • Using AM or PM for the time: 2019-01-25T10:25:10AM, 2019-01-25T10:25:10pm
  • Using 24hr format: 2019-01-25T14:25:10Z
  • The time only: 01:59:59PM, 01:59:59am


Generic Configuration Parameters

  • boundaryFlags ( type=string array | optional ) - List of vertex flags that indicate the beginning and end of a text block.
    Tokens to process must be inside two vertices marked with this flag (e.g ["TEXT_BLOCK_SPLIT"])
  • skipFlags ( type=string array | optional ) - Flags to be skipped by this stage.
    Tokens marked with this flag will be ignored by this stage, and no processing will be performed.
  • requiredFlags ( type=string array | optional ) - Lex items flags required by every token to be processed.
    Tokens need to have all of the specified flags in order to be processed.
  • atLeastOneFlag ( type=string array | optional ) - Lex items flags needed by every token to be processed.
    Tokens will need at least one of the flags specified in this array.
  • confidenceAdjustment ( type=double | default=1 | required ) - Adjustment factor to apply to the confidence value of 0.0 to 2.0 from (Applies for every pattern match).
    • 0.0 to < 1.0  decreases confidence value
    • 1.0 confidence value remains the same
    • > 1.0 to  2.0 increases confidence value
  • debug ( type=boolean | default=false | optional ) - Enable all debug log functionality for the stage, if any.
  • enable ( type=boolean | default=true | optional ) - Indicates if the current stage should be consider for the Pipeline Manager
    • Only applies for automatic pipeline building

Configuration Parameters

  • dayNames ( type=string | default=monday tuesday wednesday thursday friday saturday sunday mon tue tu tues thu th thur thurs fri | optional ) - Day names separated by space
    • Defaults:

      • monday
      • tuesday
      • wednesday
      • thursday
      • friday
      • saturday
      • sunday
      • mon
      • tue
      • tu
      • tues
      • thu
      • th
      • thur
      • thurs
      • fri
  • monthNames ( type=string | optional ) - Month names separated by space
    • Defaults:

      • january
      • february
      • march
      • april
      • may
      • june
      • july
      • august
      • september
      • october
      • november
      • december
      • jan
      • feb
      • mar
      • apr
      • jun
      • jul
      • aug
      • sep
      • sept
      • oct
      • nov
      • dec
      • jan.
      • feb.
      • mar.
      • apr.
      • jun.
      • jul.
      • aug.
      • sep.
      • sept.
      • oct.
      • nov.
      • dec.
  • ordinalWords ( type=string | optional ) - Ordinal number in text separated by space
    • Defaults:

      • first
      • second
      • third
      • fourth
      • fifth
      • sixth
      • seventh
      • eighth
      • ninth
      • tenth
      • eleventh
      • twelfth
      • thirteenth
      • fourteenth
      • fifteenth
      • sixteenth
      • seventeenth
      • eighteenth
      • nineteenth
      • twentieth
      • twenty-first
      • twenty-second
      • twenty-third
      • twenty-fourth
      • twenty-fifth
      • twenty-sixth
      • twenty-seventh
      • twenty-eighth
      • twenty-ninth
      • thirtieth
      • thirty-first
  • recognizeDate ( type=boolean | default=True | optional ) - Recognize dates. (Completely independent from Time, can be used alone or with the Recognize Time flag checked))
  • recognizeTime ( type=boolean | default=True | optional ) - Recognize time. (Completely independent from Date, can be used alone or with the Recognize Date flag checked))
  • timeIndicators ( type=string | optional ) - Month names separated by space
    • Defaults:

      • am
      • pm
      • a.m
      • p.m"
      • a.m.
      • p.m.

"dayNames": "monday tuesday wednesday thursday friday saturday sunday mon tue tu tues thu th thur thurs fri",
"monthNames": "january february march april may june july august september october november december jan feb mar apr jun jul aug sep sept oct nov dec jan. feb. mar. apr. jun. jul. aug. sep. sept. oct. nov. dec.",
"ordinalWords": "first second third fourth fifth sixth seventh eighth ninth tenth eleventh twelfth thirteenth fourteenth fifteenth sixteenth seventeenth eighteenth nineteenth twentieth twenty-first twenty-second twenty-third twenty-fourth twenty-fifth twenty-sixth twenty-seventh twenty-eighth twenty-ninth thirtieth thirty-first",
"recognizeDate": true,
"recognizeTime": true,
"timeIndicators": "am pm a.m p.m a.m. p.m."

Example Output

V-----[2016-06-06T15:30:30Z] started.-----V 
^-[2016-06-06T15:30:30Z]-V---[started.]---^ 
^------[{datetime}]------^ 

V--------------------------------------[On January 1, 2017 drilling started, on February 1st of 2017 equipment broke down.]---------------------------------------V 
^-[On]-V--[January]--V---[1,]----V-[2017]-V-[drilling]-V---[started,]----V-[on]-V-[February]--V---[1st]----V-[of]-V-[2017]-V-[equipment]-V-[broke]-V---[down.]----^ 
       ^-[{_month_}]-^-[1]-V-[,]-^                     ^-[started]-V-[,]-^      ^-[{_month_}]-^-[1]-V-[st]-^                                       ^-[down]-V-[.]-^ 
       ^----------[{_datetime_}]----------^                                     ^--------------[{_datetime_}]--------------^

Output Flags

Lex-Item Flags

  • DATE_TIME - Identifies the token as a date or time indicator.
  • SEMANTIC_TAG - Identifies all lexical items which are semantic tags.
  • TIME - Identifies the token as just time indicator
  • HAS_PUNCTUATION - Tokens produced with at least one punctuation character are tagged as HAS_PUNCTUATION. (ALL_PUNCTUATION will not be tagged as HAS_PUNCTUATION).
  • ALL_PUNCTUATION - Tokens processed or produced composed only of punctuation characters are tagged as ALL_PUNCTUATION.


Vertex Flags:

No vertices are created in this stage

  • No labels