You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Breaks TEXT_BLOCK tokens into other TEXT_BLOCK tokens separating the non quoted text from the quoted one. This breaker respects the grammatical  rules of quotes.Operates On:  Lexical Items with TEXT_BLOCK and possibly other flags as specified below.

Configuration Parameters

  • singleQuotes (boolean, required) - Indicates if the stage must also break nested quotes (i.e. single quoted text) from the quoted text found.
    • Nested quotes are indicated by single quote ('), single quote outside of double quotes is not a quote, and not processed like a quote.
  • skipFlags (string array, optional) - Flags to be skipped by this stage
    • Tokens marked with this flags will be ignore by this stage, and no process will be performed.
  • requiredFlags (string array, optional)
    • Tokens need to have all the specified flags, in order to be processed
  • debug (boolean, optional)
    • Enable all debug log functionality of the stage, if any.


Example Configuration
{
 "type": "QuotationBreakerStage",
 "singleQuotes": true,
 "debug": true
}



Example Output

Description

V----[Lamarr said, "The case is far from over, and we will win."]----V  
^--[Lamarr said,]--V--[The case is far from over, and we will win.]--^  

V----[He said, "I don't care."]----V  
^--[He said,]--V--[I don't care.]--^

With nested quotes
V----------[Dan said: "In a town outside Brisbane, I saw 'Tourists go home' written on a wall. But then someone told me, 'Pay it no mind, lad.' "]-----------V  
^--[Dan said:]--V---------[In a town outside Brisbane, I saw 'Tourists go home' written on a wall. But then someone told me, 'Pay it no mind, lad.']---------^  
                ^--[In a town outside Brisbane, I saw]--V--[Tourists go home]--V--[written on a wall. But then someone told me,]--V--[Pay it no mind, lad.]--^  


Output Flags

Lex-Item Flags:

  • PROCESSED - Placed on all the tokens which composed the semantic tag.
  • TEXT_BLOCK - Flags all text blocks produced by the SimpleReade
  • QUOTED_TEXT - Marks the TEXT_BLOCK between quotes as QUOTED_TEXT

Vertex Flags:

  • DOUBLE_QUOTE_BEGIN - Indicates the start of a quoted text with double quote, it also contains the double quote, and any whitespace next to it
  • DOUBLE_QUOTE_END - Indicates the end of a quoted text with double quote, it also contains the double quote, and any whitespace next to it
  • SINGLE_QUOTE_BEGIN - Indicates the start of a quoted text with single quote, it also contains the single quote, and any whitespace next to it
  • SINGLE_QUOTE_END - Indicates the end of a quoted text with singlequote, it also contains the single quote, and any whitespace next to it


  • No labels