Saga QPL Options

The SagaQPLOptions allow you to specify tags to be transformed into QPL (Query Processing Language) structures in Saga. These options enable features such as synonym expansion, QPL keyword replacement, and complex QPL structures using saga_special_cases.

Field	Type	Default	Description	Examples
implicit_operator	Literal['or', 'and']	or	Default operator to use when the relationship, between 2 operands is ambiguous Applicable only to the parser	"or", "and"
fields	List[str] OR Dict[str, float] OR List[QPLField] OR str		Fields to be use when match term, phrases, spans, ... In the string formats you can add the boots by adding ^ and the amount to boots, e.g. ^2 or ^0.5	As string: `"field"` As a List: `["field1", "field2^2", "field3^3"]` As a Dictionary: `{"field1": 1, "field2" : 2, "field3": 3.0}` as a List of QPLFields: `[{"name":"field1", "boost": 1}, {"name":"field2", "boost": 2}, {"name":"field3", "boost": 3}]`
date_fields	List[str] OR Dict[str, float] OR List[QPLField] OR str	[ ]	Fields to be use when date ranges, if no range queries are In the string formats you can add the boots by adding ^ and the amount to boots, e.g. ^2 or ^0.5	As string: `"field"` As a List: `["field1", "field2^2", "field3^3"]` As a Dictionary: `{"field1": 1, "field2" : 2, "field3": 3.0}` as a List of QPLFields: `[{"name":"field1", "boost": 1}, {"name":"field2", "boost": 2}, {"name":"field3", "boost": 3}]`
range_fields	List[str] OR Dict[str, float] OR List[QPLField] OR str	[ ]	Fields to be use when ranges In the string formats you can add the boots by adding ^ and the amount to boots, e.g. ^2 or ^0.5	As string: `"field"` As a List: `["field1", "field2^2", "field3^3"]` As a Dictionary: `{"field1": 1, "field2" : 2, "field3": 3.0}` as a List of QPLFields: `[{"name":"field1", "boost": 1}, {"name":"field2", "boost": 2}, {"name":"field3", "boost": 3}]`
date_format	str		Date format used to convert date values in the query. The format to use must be compatible with the engine specific query
timezone	str		Coordinated Universal Time (UTC) offset or IANA time zone used to convert date values in the query to UTC. The timezone to use must be compatible with the engine specific query
slop_near	number	10	Slop value used for the NEAR operator
slop_before	number	2	Slop value used for the BEFORE operator
slop_adj	number	0	Slop value used for the ADJ operator
slop_span_not	number	0	Slop value used for the SPAN NOT operator
wildcard	bool	False	Use wildcard operators Applicable only to the parser
grammar	File Path Or str		File path to the grammar or the actual raw grammar in string, in case you need to parse text to QPL with custom operators. QPL parser uses a Lalr parser implemented with Lark library, for more information check https://lark-parser.readthedocs.io/en/latest/grammar.html# We recommend that before starting to make a new grammar, consult with the development team Applicable only to the parser	For more detail on how to build a grammar please check Grammar Composition from the Lark documentation
custom_operators	Dict[str, Operand]	{}	Dictionary with the name type of the custom operator as keys, and the class with their logic as values. All classes must inherit from Operator Applicable only to the parser	A more detailed use of the custom operators can be found on Custom Operator page
synonyms_call	Func[]		Function returning requested synonyms for the specified string Applicable only to the parser	A more detailed use of the synonym expansion can be found on Synonym Expansion page
saga_keywords	List[str]	[]	Saga tags to be normalize as QPL keywords. The display of the tag will use to replace the tag. Example: "This {tag} here", where {tag}'s display is OR, will be normalized to "This OR here".	["tag1", "tag2"]
saga_synonyms	List[str]	[]	Saga tags marked as synonyms, For each entity in the matched tag, the list of patterns within the entity, will be used as the synonyms to replace the matched text.	["tag1", "tag2"]
saga_synonyms_boost	float	0.8	Boots to be used on each synonym added to the query.	0.8
saga_special_case	Dict[str, Callable[LexItem, str]	{}	Dictionary with tags as per keys, where each key has a function assigned, which will receive a LexItem and transformed it into a suitable query statement.	def use_case(token: LexItem) -> str ... -------- { 'tagName': use_case }

Keyword Replacement

The keyword replacement can be use as an alternative to manipulate the grammar, with saga_keywords you can specify specific tags that will be normalized to their display value, which should be a QPL keyword, that way you can transform tokens such as "and", "y", "und", "et", "&&" into the keyword AND, ot "not", "no", "nicht", "pas", "!" into the keyworkd NOT

In Saga you must have a tag that once matched, the result should be something like the one below, the important bits from it are:

tags: This is where Saga QPL will look for the tags to transform
display: stored in the entities, this is the keyword Saga QPL will use

it doesn't matter if the display is in lower case, sin the tag is being use in saga_keywords, all the displays are transformed into upper case

Example Keyword entity

{
  "stage": "DictionaryTagger",
  "confidence": 0.5,
  "match": "nicht",
  "flags": [
    "ENTITY",
    "SEMANTIC_TAG"
  ],
  "text": "{unaryOperator}",
  "startPos": 7,
  "endPos": 10,
  "metadata": {
    "display": "not",
    "id": "A0006"
  },
  "entities": [
    {
      "display": "not",
      "patterns": [
        "not",
        "non",
        "nicht",
        "no",
        "pass"
      ],
      "id": "A0006",
      "fields": {},
      "tags": [
        "unaryOperator"
      ]
    }
  ],
  "tags": [
    "unaryOperator"
  ]
}

This is the lexical item obtain for the token "nicht" from the query "cancer nicht headache"

Example Implementation

Use SagaQPLOptions instead of QPLOptions, and assign saga_keywords with the list of tags to expand as synonyms, optionally you can specify a custom boost for the synonyms

Provided source for synomyms

options = SagaQPLOptions(
    implicit_operator='and',
    fields={'content': 6, 'title': 3},
    saga_keywords=['unaryOperator']
)

parser = SagaParser(options=options)

Get the Saga response with whatever method you see fit, the simplest method is to make an http request to Saga Client API

Generate Saga Request

import requests
import json

response = requests.get('http://localhost:8080/saga/api/client/process/text', data=json.dumps({
    'unit': 'unit_name',
    'doc': 'cancer nich headache'
}))

saga_response = response.json()

Execute the parsing of the entire Saga response and print the qpl_tree

Execute Query

qpl_tree = parser._parse(data=saga_response['highestRoute'])
print(qpl_tree.pretty())

The function .pretty() of the QPL tree will show a visual representation of how the query is form

As show below the query is has now being expanded with the synonyms found, for each of the tokens, on the left you can see how the original query without expansion would look, and on the right the same query with the synonym expansion applied

Query without keyword replacement

and
  term	cancer
  term	nicht
  term	headache

Query with keyword replacement

and
  term	cancer
  not
     term	headache

Synonym Expansion

The synonym expansion with Saga uses the entity extraction from Saga to expand the matches with the patterns provided in each entity. To make use of this functionality just add the tags you want to expand into saga_synonyms, Saga will look for LexItems with these tags, and make the expansion on them.

Additionally you can specify a custom boots for the synonyms using saga_synonyms_boost, the original token will be left untouched

In Saga you must have a tag that once matched, the result should be something like the one below, the important bits from it are:

entities/patterns: This is where Saga QPL will look for the patterns and use them as synonyms for the expansion

Example Keyword entity

{
  "stage": "DictionaryTagger",
  "confidence": 1,
  "match": "cancer",
  "flags": [
    "ENTITY",
    "SEMANTIC_TAG"
  ],
  "text": "{synonyms}",
  "startPos": 0,
  "endPos": 6,
  "metadata": {
    "display": "cancer",
    "id": "syn0000000141"
  },
  "entities": [
    {
      "display": "cancer",
      "patterns": [
        "malignancy",
        "cancer"
      ],
      "id": "syn0000000141",
      "fields": {},
      "tags": [
        "synonym",
        "synonyms"
      ]
    },
    {
      "display": "cancer",
      "patterns": [
        "363346000",
        "cancer",
        "cancers",
        "malignancies",
        "malignancy",
        "malignant growth",
        "malignant neoplasm",
        "malignant neoplasms",
        "malignant neoplastic disease",
        "malignant tumor",
        "malignant tumors",
        "neoplasm malignant",
        "neoplasm/cancer",
        "tumor, malignant"
      ],
      "id": "363346000",
      "tags": [
        "snomed",
        "synonyms"
      ]
    }
  ],
  "tags": [
    "snomed",
    "synonym",
    "synonyms"
  ]
}

This is the lexical item obtain for the token "cancer" from the query "cancer not headache"

Example Implementation

Use SagaQPLOptions instead of QPLOptions, and assign saga_synonyms with the list of tags to expand as synonymns, optionally you can specify a custom boost for the synonyms

Provided source for synomyms

options = SagaQPLOptions(
    implicit_operator='and',
    fields={'content': 6, 'title': 3}
    saga_synonyms=['synonyms'],
	saga_synonyms_boost=0.8
)

parser = SagaParser(options=options)

Get the Saga response with whatever method you see fit, the simplest method is to make an http request to Saga Client API

Generate Saga Request

import requests
import json

response = requests.get('http://localhost:8080/saga/api/client/process/text', data=json.dumps({
    'unit': 'unit_name',
    'doc': 'cancer not headache'
}))

saga_response = response.json()

Execute the parsing of the entire Saga response and print the qpl_tree

Execute Query

qpl_tree = parser._parse(data=saga_response['highestRoute'])
print(qpl_tree.pretty())

The function .pretty() of the QPL tree will show a visual representation of how the query is form

As show below the query is has now being expanded with the synonyms found, for each of the tokens, on the left you can see how the original query without expansion would look, and on the right the same query with the synonym expansion applied

Query without synonym expansion

and
  term	cancer
  term	not
  term	headache

Query with synonym expansion

and
  or
    boost
      term	cancer
      1
    boost
      term	malignancy
      0.8
    boost
      term	363346000
      0.8
    boost
      term	cancers
      0.8
    boost
      term	malignancies
      0.8
    boost
      phrase	"malignant growth"
      0.8
    boost
      phrase	"malignant neoplasm"
      0.8
    boost
      phrase	"malignant neoplasms"
      0.8
    boost
      phrase	"malignant neoplastic disease"
      0.8
    boost
      phrase	"malignant tumor"
      0.8
    boost
      phrase	"malignant tumors"
      0.8
    boost
      phrase	"neoplasm malignant"
      0.8
    boost
      phrase	"neoplasm/cancer"
      0.8
    boost
      phrase	"tumor, malignant"
      0.8
  term	not
  or
    boost
      term	headache
      1
    boost
      term	25064002
      0.8
    boost
      term	cephalalgia
      0.8
    boost
      term	cephalgia
      0.8
    boost
      term	cephalgias
      0.8
    boost
      phrase	"cranial pain"
      0.8
    boost
      phrase	"have headaches"
      0.8
    boost
      phrase	"head ache"
      0.8
    boost
      phrase	"head pain"
      0.8
    boost
      phrase	"head pain cephalgia"
      0.8
    boost
      phrase	"head pains"
      0.8
    boost
      term	headaches
      0.8
    boost
      phrase	"mild global headache"
      0.8
    boost
      phrase	"mild headache"
      0.8
    boost
      phrase	"pain head"
      0.8
    boost
      phrase	"pain in head"
      0.8
    boost
      phrase	"pain, head"
      0.8

in this case since we are not doing keyword replacement and "not" is not in upper case it remains as a term

Special Cases

Sometime there will be cases to specific to be coded into PyQPL, this is where Saga special cases enters into action, Saga can identify the case to be treated with a tag, this tag is later transformed into a string representation of a QPL query, replacing the original content, and finally being parsed by PyQPL.

The saga_special_case parameter accepts a dictionary with tag names a its keys, and the value of each key being a callable which receives a LexItem and returns a string (being this the string representation of the query)

Example Implementation

For this example we are using a scenario in which the user wants to look for "small cell", referring to cases of cancer, due to the nature of the query results are returned like "small non cell cancer", which refers to a non cancer case, which is the opposite of what the user want, if the user types "small cancer -non" this would remove any results that contains the word "non", even if it doesn't have any relationship with "small cell", for this we want to check for results where the token "non" is not near the token "small".

We will assume this LexItem is returned in the Saga response:

{
    "stage": "DictionaryTagger",
    "confidence": 1,
    "match": "small",
    "flags": [
        "ENTITY",
        "SEMANTIC_TAG"
    ],
    "text": "{specialCase}",
    "startPos": 0,
    "endPos": 5,
    "metadata": {
        "not": "non small",
        "display": "small",
        "id": "A0000"
    },
    "entities": [
        {
            "display": "small",
            "patterns": [
                "small"
            ],
            "id": "A0000",
            "fields": {
                "not": "non small"
            },
            "tags": [
                "specialCase"
            ]
        }
    ],
    "tags": [
        "specialCase"
    ]
}

This is the lexical item obtain for the token "small" from the query "small cell". Notice this token has metadata included

We start creating the function which will get the LexItem and transform it into a string query, and as you see below we make use of the information added in the metadata, the match, but you can make use of everything available in the LexItem

def span_not(token: LexItem) -> str:

    if token.metadata and 'not' in token.metadata:
        exclude_value = ' NEAR '.join(token.metadata['not'].split())

        return f'{token.match} SPAN_NOT ({exclude_value})'
    else:
        return token.match

Use SagaQPLOptions instead of QPLOptions, and assign saga_special_case with a dictionary of the tags we want to work with as the keys, and the functions making the transformation as the values.

Provide special cases

options = SagaQPLOptions(
    implicit_operator='and',
    fields={'content': 6, 'title': 3},
    saga_special_case={
        'specialCase': span_not
    }
)

parser = SagaParser(options=options)

Get the Saga response with whatever method you see fit, the simplest method is to make an http request to Saga Client API

Generate Saga Request

import requests
import json

response = requests.get('http://localhost:8080/saga/api/client/process/text', data=json.dumps({
    'unit': 'unit_name',
    'doc': 'small cell'
}))

saga_response = response.json()

Execute Query

qpl_tree = parser._parse(data=saga_response['highestRoute'])
print(qpl_tree.pretty())

The function .pretty() of the QPL tree will show a visual representation of how the query is form

As show below the query is has now transformed the tagged text with the new structure, which is later parsed. On the left you can see how the original query without special case would look, and on the right the same query with the special case applied

Query without special case

and
  term	small
  term	cell

Query with special case

and
  span_not
    term	small
    near
      term	non
      term	small
  term	cell

This is the equivalent for the user to type "small SPAN_NOT non NEAR small cell" but since no regular user is going to do that, this implementation works best

Page tree

Saga Coupling

Saga QPL Options

Keyword Replacement

Example Implementation

Synonym Expansion

Example Implementation

Special Cases

Example Implementation