Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info

Feature compatible only with Saga 1.3+ responses

PyQPL allows you to work with responses from Saga, enabling you to convert LexItems into synonyms, QPL keywords, and more complex query structures. By leveraging the capabilities of Saga, including the ability to maintain, configure, and tune dictionaries for entity extraction, PyQPL with Saga provides a powerful tool for creating  queries with mutable syntax and enhanced capabilities.


Table of Contents

Saga QPL Options

The SagaQPLOptions allow you to specify tags to be transformed into QPL (Query Processing Language) structures in Saga. These options enable features such as synonym expansion, QPL keyword replacement, and complex QPL structures using saga_special_cases.
FieldTypeRequiredDefaultDescriptionExamples

implicit_operator

Literal['or', 'and']

(error)

or

Default operator to use when the relationship, between 2 operands is ambiguous

Info

Applicable only to the parser



"or", "and"

fields

List[str] OR Dict[str, float]

OR List[QPLField] OR str

(tick)



Fields to be use when match term, phrases, spans, ...


In the string formats you can add the boots by adding ^ and the amount to boots, e.g. ^2 or ^0.5

As string:

"field"

As a List:

["field1", "field2^2", "field3^3"]

As a Dictionary:

{"field1": 1, "field2" : 2, "field3": 3.0}

as a List of QPLFields:

[{"name":"field1", "boost": 1}, {"name":"field2", "boost": 2}, {"name":"field3", "boost": 3}]

date_fields

List[str] OR Dict[str, float]

OR List[QPLField] OR str

(error)

[ ]

Fields to be use when date ranges, if no range queries are 


In the string formats you can add the boots by adding ^ and the amount to boots, e.g. ^2 or ^0.5

As string:

"field"

As a List:

["field1", "field2^2", "field3^3"]

As a Dictionary:

{"field1": 1, "field2" : 2, "field3": 3.0}

as a List of QPLFields:

[{"name":"field1", "boost": 1}, {"name":"field2", "boost": 2}, {"name":"field3", "boost": 3}]

range_fields

List[str] OR Dict[str, float]

OR List[QPLField] OR str

(error)

[ ]

Fields to be use when ranges


In the string formats you can add the boots by adding ^ and the amount to boots, e.g. ^2 or ^0.5

As string:

"field"

As a List:

["field1", "field2^2", "field3^3"]

As a Dictionary:

{"field1": 1, "field2" : 2, "field3": 3.0}

as a List of QPLFields:

[{"name":"field1", "boost": 1}, {"name":"field2", "boost": 2}, {"name":"field3", "boost": 3}]

date_format

str

(error)


Date format used to convert date values in the query. 

Note

The format to use must be compatible with the engine specific query




timezone

str

(error)


Coordinated Universal Time (UTC) offset or IANA time zone used to convert date values in the query to UTC.

Note

The timezone to use must be compatible with the engine specific query




slop_near

number

(error)

10

Slop value used for the NEAR operator


slop_before

number

(error)

2

Slop value used for the BEFORE operator


slop_adj

number

(error)

0

Slop value used for the ADJ operator


slop_span_not

number

(error)

0

Slop value used for the SPAN NOT operator


wildcard

bool

(error)

False

Use wildcard operators

Info

Applicable only to the parser




grammar

File Path Or str

(error)


File path to the grammar or the actual raw grammar in string, in case you need to parse text to QPL with custom operators. QPL parser uses a Lalr parser implemented with Lark library, for more information check https://lark-parser.readthedocs.io/en/latest/grammar.html#


Warning

We recommend that before starting to make a new grammar, consult with the development team

Info

Applicable only to the parser



For more detail on how to build a grammar please check Grammar Composition from the Lark documentation

custom_operators

Dict[str, Operand]

(error)

{}

Dictionary with the name type of the custom operator as keys, and the class with their logic as values. All classes must inherit from Operator

Info

Applicable only to the parser



Info

A more detailed use of the custom operators can be found on Custom Operator page



synonyms_call

Func[]

(error)


Function returning requested synonyms for the specified string

Info

Applicable only to the parser



Info

A more detailed use of the synonym expansion can be found on Synonym Expansion page



saga_keywords

List[str]

(error)

[]

Saga tags to be normalize as QPL keywords. The display of the tag will use to replace the tag.
Example: "This {tag} here", where {tag}'s display is OR, will be normalized to "This OR here".

["tag1", "tag2"]

saga_synonyms

List[str]

(error)

[]

Saga tags marked as synonyms, For each entity in the matched tag, the list of patterns within the entity, will be used as the synonyms to replace the matched text.

["tag1", "tag2"]

saga_synonyms_boost

float

(error)

0.8

Boots to be used on each synonym added to the query.

0.8

saga_special_case

Dict[str, Callable[LexItem, str]

(error)

{}

Dictionary with tags as per keys, where each key has a function assigned, which will receive a LexItem and transformed it into a suitable query statement.

def use_case(token: LexItem) -> str
...

--------
{

'tagName': use_case

}



Keyword Replacement

The keyword replacement can be use as an alternative to manipulate the grammar, with saga_keywords you can specify specific tags that will be normalized to their display value, which should be a QPL keyword, that way you can transform tokens such as "and", "y", "und", "et", "&&" into the keyword AND, ot "not", "no", "nicht", "pas", "!" into the keyworkd NOT


In Saga you must have a tag that once matched, the result should be something like the one below, the important bits from it are:

  • tags: This is where Saga QPL will look for the tags to transform
  • display: stored in the entities, this is the keyword Saga QPL will use

    Note

    it doesn't matter if the display is in lower case, sin the tag is being use in saga_keywords, all the displays are transformed into upper case  

Code Block
languagejs
themeDJango
titleExample Keyword entity
{
  "stage": "DictionaryTagger",
  "confidence": 0.5,
  "match": "notnicht",
  "flags": [
    "ENTITY",
    "SEMANTIC_TAG"
  ],
  "text": "{unaryOperator}",
  "startPos": 7,
  "endPos": 10,
  "metadata": {
    "display": "not",
    "id": "A0006"
  },
  "entities": [
    {
      "display": "not",
      "patterns": [
        "not",
        "non",
        "nicht",
        "no",
        "pass"
      ],
      "id": "A0006",
      "fields": {},
      "tags": [
        "unaryOperator"
      ]
    }
  ],
  "tags": [
    "unaryOperator"
  ]
}

Synonym Expansion

Excerpt

The synonym expansion with Saga uses the entity extraction from Saga to expand the matches with the patterns provided in each entity. To make use of this functionality just add the tags you want to expand into saga_synonyms, Saga will look for LexItems with these tags, and make the expansion on them.

Additionally
Info

This is the lexical item obtain for the token "nicht" from the query "cancer nicht headache"


Example Implementation

Use SagaQPLOptions instead of QPLOptions, and assign saga_keywords with the list of tags to expand as synonyms, optionally you can specify a custom

boots

boost for the synonyms

options = SagaQPLOptions(
implicit_operator='and',
    fields={'content': 6, 'title': 3},
    saga_
synonyms
keywords=['
synonyms
unaryOperator']
, saga_synonyms_boost=0.8 ) parser

)

parser = SagaParser(options=options)
Code Block
languagepy
themeDJango
titleProvided source for synomyms
options = SagaQPLOptions(
    
using saga_synonyms_boost, the original token will be left untouched

Example Implementation

Use SagaQPLOptions instead of QPLOptions, and assign saga_synonyms with the list of tags to expand as synonymns, optionally you can specify a custom boost for the synonyms

Code Block
languagepy
themeDJango
titleProvided source for synomyms


Get the Saga response with whatever method you see fit, the simplest method is to make an http request to Saga Client API

Code Block
titleGenerate Saga Request
import requests
import json

response = requests.get('http://localhost:8080/saga/api/client/process/text', data=json.dumps({
    'unit': 'unit_name',
    'doc': 'cancer 
not
nich headache'
}))

saga_response = response.json()


Execute the parsing of the entire Saga response and print the qpl_tree

Code Block
languagepy
themeDJango
titleExecute Query
qpl_tree = parser._parse(data=saga_response['highestRoute'])
print(qpl_tree.pretty())
Info

The function .pretty() of the QPL tree will show a visual representation of how the query is form

As show below the query is has now being expanded with the synonyms found, for each of the tokens, on the left you can see how the original query without expansion would look, and on the right the same query with the synonym expansion applied

Code Block
languagetext
titleQuery without synonym expansionkeyword replacement
and
  term	cancer
  term	notnicht
  term	headache
Code Block
languagetext
titleQuery with synonym expansionkeyword replacement
and
  or
    boost
      term	cancer
      1not
    boost
      term	malignancy
      0.8
    boost
      term	363346000
      0.8
    boost
      term	cancers
      0.8
    boost
      term	malignancies
      0.8
    boost
      phrase	"malignant growth"
      0.8
    boost
      phrase	"malignant neoplasm"
      0.8
    boostterm	headache

Synonym Expansion

Excerpt

The synonym expansion with Saga uses the entity extraction from Saga to expand the matches with the patterns provided in each entity. To make use of this functionality just add the tags you want to expand into saga_synonyms, Saga will look for LexItems with these tags, and make the expansion on them.

Additionally you can specify a custom boots for the synonyms using saga_synonyms_boost, the original token will be left untouched


In Saga you must have a tag that once matched, the result should be something like the one below, the important bits from it are:

  • entities/patterns: This is where Saga QPL will look for the patterns and use them as synonyms for the expansion


Code Block
languagejs
themeDJango
titleExample Keyword entity
{
  "stage": "DictionaryTagger",
  "confidence": 1,
  "match": "cancer",
  "flags": [
    "ENTITY",
    "SEMANTIC_TAG"
  ],
  "text": "{synonyms}",
  "startPos": 0,
  "endPos": 6,
  "metadata": {
    "display": "cancer",
    "id": "syn0000000141"
  },
  "entities": [
    {
      
phrase
"display"
malignant
: 
neoplasms
"cancer",
      
0.8
"patterns": [
    
boost
    "malignancy",
      
phrase "malignant
 
neoplastic
 
disease
"cancer"
      
0.8
],
    
boost
  "id": "syn0000000141",
      
phrase
"fields"
malignant tumor"
: {},
      
0.8
"tags": [
    
boost
    "synonym",
      
phrase "malignant tumors
  "synonyms"
      
0.8
]
    },
    
boost
{
      
phrase
"display"
neoplasm
: 
malignant
"cancer",
      
0.8
"patterns": [
        
boost
"363346000",
        
phrase
"
neoplasm/
cancer",
      
0.8
  "cancers",
  
boost
      
phrase
"malignancies"
tumor
,
  
malignant"
      
0.8
"malignancy",
   
term not
  
or
   "malignant 
boost
growth",
      
term headache
  "malignant neoplasm",
   
1
    
boost
 "malignant neoplasms",
    
term 25064002
    "malignant 
0.8
neoplastic disease",
    
boost
    
term cephalalgia
"malignant tumor",
     
0.8
   "malignant 
boost
tumors",
      
term cephalgia
  "neoplasm malignant",
    
0.8
    
boost
"neoplasm/cancer",
      
term cephalgias
  "tumor, malignant"
   
0.8
   
boost
],
      
phrase
"id"
cranial
: 
pain
"363346000",
      
0.8
"tags": [
    
boost
    "snomed",
  
phrase "have
 
headaches"
     
0.8
"synonyms"
    
boost
  ]
    
phrase "head ache"
}
  ],
  
0.8
"tags": [
    
boost
"snomed",
    
phrase
"
head pain
synonym",
    
0.8
"synonyms"
  
boost phrase "head pain cephalgia" 0.8 boost phrase "head pains" 0.8 boost term headaches 0.8 boost phrase "mild global headache" 0.8 boost phrase "mild headache" 0.8 boost phrase "pain head" 0.8 boost phrase "pain in head" 0.8 boost phrase "pain, head" 0.8

Special Cases

Sometime there will be cases to specific to be coded into PyQPL, this is where Saga special cases enters into action, Saga can identify the case to be treated with a tag, this tag is later transformed into a string representation of a QPL query, replacing the original content, and finally being parsed by PyQPL.

The saga_special_case parameter accepts a dictionary with tag names a its keys, and the value of each key being a callable which receives a LexItem and returns a string (being this the string representation of the query)

]
}
Info

This is the lexical item obtain for the token "cancer" from the query "cancer not headache"

Example Implementation

Use SagaQPLOptions instead of QPLOptions, and assign saga_synonyms with the list of tags to expand as synonymns, optionally you can specify a custom boost for the synonyms

Code Block
languagepy
themeDJango
titleProvided source for synomyms
options = SagaQPLOptions(
    implicit_operator='and',
    fields={'content': 6, 'title': 3}
    saga_synonyms=['synonyms'],
	saga_synonyms_boost=0.8
)

parser = SagaParser(options=options)


Get the Saga response with whatever method you see fit, the simplest method is to make an http request to Saga Client API

Code Block
titleGenerate Saga Request
import requests
import json

response = requests.get('http://localhost:8080/saga/api/client/process/text', data=json.dumps({
    'unit': 'unit_name',
    'doc': 'cancer not headache'
}))

saga_response = response.json()


Execute the parsing of the entire Saga response and print the qpl_tree

Code Block
languagepy
themeDJango
titleExecute Query
qpl_tree = parser._parse(data=saga_response['highestRoute'])
print(qpl_tree.pretty())
Info

The function .pretty() of the QPL tree will show a visual representation of how the query is form


As show below the query is has now being expanded with the synonyms found, for each of the tokens, on the left you can see how the original query without expansion would look, and on the right the same query with the synonym expansion applied

Code Block
languagetext
titleQuery without synonym expansion
and
  term	cancer
  term	not
  term	headache
Code Block
languagetext
titleQuery with synonym expansion
and
  or
    boost
      term	cancer
      1
    boost
      term	malignancy
      0.8
    boost
      term	363346000
      0.8
    boost
      term	cancers
      0.8
    boost
      term	malignancies
      0.8
    boost
      phrase	"malignant growth"
      0.8
    boost
      phrase	"malignant neoplasm"
      0.8
    boost
      phrase	"malignant neoplasms"
      0.8
    boost
      phrase	"malignant neoplastic disease"
      0.8
    boost
      phrase	"malignant tumor"
      0.8
    boost
      phrase	"malignant tumors"
      0.8
    boost
      phrase	"neoplasm malignant"
      0.8
    boost
      phrase	"neoplasm/cancer"
      0.8
    boost
      phrase	"tumor, malignant"
      0.8
  term	not
  or
    boost
      term	headache
      1
    boost
      term	25064002
      0.8
    boost
      term	cephalalgia
      0.8
    boost
      term	cephalgia
      0.8
    boost
      term	cephalgias
      0.8
    boost
      phrase	"cranial pain"
      0.8
    boost
      phrase	"have headaches"
      0.8
    boost
      phrase	"head ache"
      0.8
    boost
      phrase	"head pain"
      0.8
    boost
      phrase	"head pain cephalgia"
      0.8
    boost
      phrase	"head pains"
      0.8
    boost
      term	headaches
      0.8
    boost
      phrase	"mild global headache"
      0.8
    boost
      phrase	"mild headache"
      0.8
    boost
      phrase	"pain head"
      0.8
    boost
      phrase	"pain in head"
      0.8
    boost
      phrase	"pain, head"
      0.8
Note

in this case since we are not doing keyword replacement and "not" is not in upper case it remains as a term

Special Cases

Sometime there will be cases to specific to be coded into PyQPL, this is where Saga special cases enters into action, Saga can identify the case to be treated with a tag, this tag is later transformed into a string representation of a QPL query, replacing the original content, and finally being parsed by PyQPL.

The saga_special_case parameter accepts a dictionary with tag names a its keys, and the value of each key being a callable which receives a LexItem and returns a string (being this the string representation of the query)


Example Implementation

For this example we are using a scenario in which the user wants to look for "small cell", referring to cases of cancer, due to the nature of the query results are returned like "small non cell cancer", which refers to a non cancer case, which is the opposite of what the user want, if the user types "small cancer -non" this would remove any results that contains the word "non", even if it doesn't have any relationship with "small cell", for this we want to check for results where the token "non" is not near the token "small".

We will assume this LexItem is returned in the Saga response:

Code Block
languagepy
themeDJango
{
    "stage": "DictionaryTagger",
    "confidence": 1,
    "match": "small",
    "flags": [
        "ENTITY",
        "SEMANTIC_TAG"
    ],
    "text": "{specialCase}",
    "startPos": 0,
    "endPos": 5,
    "metadata": {
        "not": "non small",
        "display": "small",
        "id": "A0000"
    },
    "entities": [
        {
            "display": "small",
            "patterns": [
                "small"
            ],
            "id": "A0000",
            "fields": {
                "not": "non small"
            },
            "tags": [
                "specialCase"
            ]
        }
    ],
    "tags": [
        "specialCase"
    ]
}
Info

This is the lexical item obtain for the token "small" from the query "small cell". Notice this token has metadata included


We start creating the function which will get the LexItem and transform it into a string query, and as you see below we make use of the information added in the metadata, the match, but you can make use of everything available in the LexItem

Code Block
languagepy
themeDJango
def span_not(token: LexItem) -> str:

    if token.metadata and 'not' in token.metadata:
        exclude_value = ' NEAR '.join(token.metadata['not'].split())

        return f'{token.match} SPAN_NOT ({exclude_value})'
    else:
        return token.match


Use SagaQPLOptions instead of QPLOptions, and assign saga_special_case with a dictionary of the tags we want to work with as the keys, and the functions making the transformation as the values.

Code Block
titleProvide special cases
options = SagaQPLOptions(
    implicit_operator='and',
    fields={'content': 6, 'title': 3},
    saga_special_case={
        'specialCase': span_not
    }
)

parser = SagaParser(options=options)


Get the Saga response with whatever method you see fit, the simplest method is to make an http request to Saga Client API
Code Block
titleGenerate Saga Request
import requests
import json

response = requests.get('http://localhost:8080/saga/api/client/process/text', data=json.dumps({
    'unit': 'unit_name',
    'doc': 'small cell'
}))

saga_response = response.json()
Code Block
titleExecute Query
qpl_tree = parser._parse(data=saga_response['highestRoute'])
print(qpl_tree.pretty())
Info

The function .pretty() of the QPL tree will show a visual representation of how the query is form


As show below the query is has now transformed the tagged text with the new structure, which is later parsed. On the left you can see how the original query without special case would look, and on the right the same query with the special case applied

Code Block
titleQuery without special case
and
  term	small
  term	cell
Code Block
titleQuery with special case
and
  span_not
    term	small
    near
      term	non
      term	small
  term	cell
Info

This is the equivalent for the user to type "small SPAN_NOT non NEAR small cell" but since no regular user is going to do that, this implementation works best

Example Implementation

For this we need a function

Code Block
languagepy
themeDJango
def span_not(token: LexItem) -> str:

    if token.metadata and 'not' in token.metadata:
        exclude_value = ' NEAR '.join(token.metadata['not'].split())

        return f'{token.match} SPAN_NOT ({exclude_value})'
    else:
        return token.match
Code Block
options = SagaQPLOptions(
    implicit_operator='and',
    fields={'content': 6, 'title': 3},
    saga_special_case={
        'specialCase': span_not
    }
)
Code Block
parser = SagaParser(options=options)
translator = ElasticsearchTranslator()
Code Block
import requests
import json

response = requests.get('http://localhost:8080/saga/api/client/process/text, data=json.dumps({
    'unit': 'unit_name',
    'doc': 'cancer not headache'
}))

saga_response = response.json()
Code Blockqpl_tree = parser._parse(data=saga_response['highestRoute']) print(qpl_tree.pretty())