Feature compatible only with Saga 1.3+ responses

PyQPL allows you to work with responses from Saga, enabling you to convert LexItems into synonyms, QPL keywords, and more complex query structures. By leveraging the capabilities of Saga, including the ability to maintain, configure, and tune dictionaries for entity extraction, PyQPL with Saga provides a powerful tool for creating  queries with mutable syntax and enhanced capabilities.


Saga QPL Options

The SagaQPLOptions allow you to specify tags to be transformed into QPL (Query Processing Language) structures in Saga. These options enable features such as synonym expansion, QPL keyword replacement, and complex QPL structures using saga_special_cases.
FieldTypeRequiredDefaultDescriptionExamples

implicit_operator

Literal['or', 'and']

(error)

or

Default operator to use when the relationship, between 2 operands is ambiguous

Applicable only to the parser



"or", "and"

fields

List[str] OR Dict[str, float]

OR List[QPLField] OR str

(tick)



Fields to be use when match term, phrases, spans, ...


In the string formats you can add the boots by adding ^ and the amount to boots, e.g. ^2 or ^0.5

As string:

"field"

As a List:

["field1", "field2^2", "field3^3"]

As a Dictionary:

{"field1": 1, "field2" : 2, "field3": 3.0}

as a List of QPLFields:

[{"name":"field1", "boost": 1}, {"name":"field2", "boost": 2}, {"name":"field3", "boost": 3}]

date_fields

List[str] OR Dict[str, float]

OR List[QPLField] OR str

(error)

[ ]

Fields to be use when date ranges, if no range queries are 


In the string formats you can add the boots by adding ^ and the amount to boots, e.g. ^2 or ^0.5

As string:

"field"

As a List:

["field1", "field2^2", "field3^3"]

As a Dictionary:

{"field1": 1, "field2" : 2, "field3": 3.0}

as a List of QPLFields:

[{"name":"field1", "boost": 1}, {"name":"field2", "boost": 2}, {"name":"field3", "boost": 3}]

range_fields

List[str] OR Dict[str, float]

OR List[QPLField] OR str

(error)

[ ]

Fields to be use when ranges


In the string formats you can add the boots by adding ^ and the amount to boots, e.g. ^2 or ^0.5

As string:

"field"

As a List:

["field1", "field2^2", "field3^3"]

As a Dictionary:

{"field1": 1, "field2" : 2, "field3": 3.0}

as a List of QPLFields:

[{"name":"field1", "boost": 1}, {"name":"field2", "boost": 2}, {"name":"field3", "boost": 3}]

date_format

str

(error)


Date format used to convert date values in the query. 

The format to use must be compatible with the engine specific query




timezone

str

(error)


Coordinated Universal Time (UTC) offset or IANA time zone used to convert date values in the query to UTC.

The timezone to use must be compatible with the engine specific query




slop_near

number

(error)

10

Slop value used for the NEAR operator


slop_before

number

(error)

2

Slop value used for the BEFORE operator


slop_adj

number

(error)

0

Slop value used for the ADJ operator


slop_span_not

number

(error)

0

Slop value used for the SPAN NOT operator


wildcard

bool

(error)

False

Use wildcard operators

Applicable only to the parser




grammar

File Path Or str

(error)


File path to the grammar or the actual raw grammar in string, in case you need to parse text to QPL with custom operators. QPL parser uses a Lalr parser implemented with Lark library, for more information check https://lark-parser.readthedocs.io/en/latest/grammar.html#


We recommend that before starting to make a new grammar, consult with the development team

Applicable only to the parser



For more detail on how to build a grammar please check Grammar Composition from the Lark documentation

custom_operators

Dict[str, Operand]

(error)

{}

Dictionary with the name type of the custom operator as keys, and the class with their logic as values. All classes must inherit from Operator

Applicable only to the parser



A more detailed use of the custom operators can be found on Custom Operator page



synonyms_call

Func[]

(error)


Function returning requested synonyms for the specified string

Applicable only to the parser



A more detailed use of the synonym expansion can be found on Synonym Expansion page



saga_keywords

List[str]

(error)

[]

Saga tags to be normalize as QPL keywords. The display of the tag will use to replace the tag.
Example: "This {tag} here", where {tag}'s display is OR, will be normalized to "This OR here".

["tag1", "tag2"]

saga_synonyms

List[str]

(error)

[]

Saga tags marked as synonyms, For each entity in the matched tag, the list of patterns within the entity, will be used as the synonyms to replace the matched text.

["tag1", "tag2"]

saga_synonyms_boost

float

(error)

0.8

Boots to be used on each synonym added to the query.

0.8

saga_special_case

Dict[str, Callable[LexItem, str]

(error)

{}

Dictionary with tags as per keys, where each key has a function assigned, which will receive a LexItem and transformed it into a suitable query statement.

def use_case(token: LexItem) -> str
...

--------
{

'tagName': use_case

}



Keyword Replacement

The keyword replacement can be use as an alternative to manipulate the grammar, with saga_keywords you can specify specific tags that will be normalized to their display value, which should be a QPL keyword, that way you can transform tokens such as "and", "y", "und", "et", "&&" into the keyword AND, ot "not", "no", "nicht", "pas", "!" into the keyworkd NOT


In Saga you must have a tag that once matched, the result should be something like the one below, the important bits from it are:

  • tags: This is where Saga QPL will look for the tags to transform
  • display: stored in the entities, this is the keyword Saga QPL will use

    it doesn't matter if the display is in lower case, sin the tag is being use in saga_keywords, all the displays are transformed into upper case  

Example Keyword entity
{
  "stage": "DictionaryTagger",
  "confidence": 0.5,
  "match": "nicht",
  "flags": [
    "ENTITY",
    "SEMANTIC_TAG"
  ],
  "text": "{unaryOperator}",
  "startPos": 7,
  "endPos": 10,
  "metadata": {
    "display": "not",
    "id": "A0006"
  },
  "entities": [
    {
      "display": "not",
      "patterns": [
        "not",
        "non",
        "nicht",
        "no",
        "pass"
      ],
      "id": "A0006",
      "fields": {},
      "tags": [
        "unaryOperator"
      ]
    }
  ],
  "tags": [
    "unaryOperator"
  ]
}

This is the lexical item obtain for the token "nicht" from the query "cancer nicht headache"


Example Implementation

Use SagaQPLOptions instead of QPLOptions, and assign saga_keywords with the list of tags to expand as synonyms, optionally you can specify a custom boost for the synonyms

Provided source for synomyms
options = SagaQPLOptions(
    implicit_operator='and',
    fields={'content': 6, 'title': 3},
    saga_keywords=['unaryOperator']
)

parser = SagaParser(options=options)


Get the Saga response with whatever method you see fit, the simplest method is to make an http request to Saga Client API

Generate Saga Request
import requests
import json

response = requests.get('http://localhost:8080/saga/api/client/process/text', data=json.dumps({
    'unit': 'unit_name',
    'doc': 'cancer nich headache'
}))

saga_response = response.json()


Execute the parsing of the entire Saga response and print the qpl_tree

Execute Query
qpl_tree = parser._parse(data=saga_response['highestRoute'])
print(qpl_tree.pretty())

The function .pretty() of the QPL tree will show a visual representation of how the query is form

As show below the query is has now being expanded with the synonyms found, for each of the tokens, on the left you can see how the original query without expansion would look, and on the right the same query with the synonym expansion applied

Query without keyword replacement
and
  term	cancer
  term	nicht
  term	headache
Query with keyword replacement
and
  term	cancer
  not
     term	headache

Synonym Expansion

The synonym expansion with Saga uses the entity extraction from Saga to expand the matches with the patterns provided in each entity. To make use of this functionality just add the tags you want to expand into saga_synonyms, Saga will look for LexItems with these tags, and make the expansion on them.

Additionally you can specify a custom boots for the synonyms using saga_synonyms_boost, the original token will be left untouched


In Saga you must have a tag that once matched, the result should be something like the one below, the important bits from it are:

  • entities/patterns: This is where Saga QPL will look for the patterns and use them as synonyms for the expansion


Example Keyword entity
{
  "stage": "DictionaryTagger",
  "confidence": 1,
  "match": "cancer",
  "flags": [
    "ENTITY",
    "SEMANTIC_TAG"
  ],
  "text": "{synonyms}",
  "startPos": 0,
  "endPos": 6,
  "metadata": {
    "display": "cancer",
    "id": "syn0000000141"
  },
  "entities": [
    {
      "display": "cancer",
      "patterns": [
        "malignancy",
        "cancer"
      ],
      "id": "syn0000000141",
      "fields": {},
      "tags": [
        "synonym",
        "synonyms"
      ]
    },
    {
      "display": "cancer",
      "patterns": [
        "363346000",
        "cancer",
        "cancers",
        "malignancies",
        "malignancy",
        "malignant growth",
        "malignant neoplasm",
        "malignant neoplasms",
        "malignant neoplastic disease",
        "malignant tumor",
        "malignant tumors",
        "neoplasm malignant",
        "neoplasm/cancer",
        "tumor, malignant"
      ],
      "id": "363346000",
      "tags": [
        "snomed",
        "synonyms"
      ]
    }
  ],
  "tags": [
    "snomed",
    "synonym",
    "synonyms"
  ]
}

This is the lexical item obtain for the token "cancer" from the query "cancer not headache"

Example Implementation

Use SagaQPLOptions instead of QPLOptions, and assign saga_synonyms with the list of tags to expand as synonymns, optionally you can specify a custom boost for the synonyms

Provided source for synomyms
options = SagaQPLOptions(
    implicit_operator='and',
    fields={'content': 6, 'title': 3}
    saga_synonyms=['synonyms'],
	saga_synonyms_boost=0.8
)

parser = SagaParser(options=options)


Get the Saga response with whatever method you see fit, the simplest method is to make an http request to Saga Client API

Generate Saga Request
import requests
import json

response = requests.get('http://localhost:8080/saga/api/client/process/text', data=json.dumps({
    'unit': 'unit_name',
    'doc': 'cancer not headache'
}))

saga_response = response.json()


Execute the parsing of the entire Saga response and print the qpl_tree

Execute Query
qpl_tree = parser._parse(data=saga_response['highestRoute'])
print(qpl_tree.pretty())

The function .pretty() of the QPL tree will show a visual representation of how the query is form


As show below the query is has now being expanded with the synonyms found, for each of the tokens, on the left you can see how the original query without expansion would look, and on the right the same query with the synonym expansion applied

Query without synonym expansion
and
  term	cancer
  term	not
  term	headache
Query with synonym expansion
and
  or
    boost
      term	cancer
      1
    boost
      term	malignancy
      0.8
    boost
      term	363346000
      0.8
    boost
      term	cancers
      0.8
    boost
      term	malignancies
      0.8
    boost
      phrase	"malignant growth"
      0.8
    boost
      phrase	"malignant neoplasm"
      0.8
    boost
      phrase	"malignant neoplasms"
      0.8
    boost
      phrase	"malignant neoplastic disease"
      0.8
    boost
      phrase	"malignant tumor"
      0.8
    boost
      phrase	"malignant tumors"
      0.8
    boost
      phrase	"neoplasm malignant"
      0.8
    boost
      phrase	"neoplasm/cancer"
      0.8
    boost
      phrase	"tumor, malignant"
      0.8
  term	not
  or
    boost
      term	headache
      1
    boost
      term	25064002
      0.8
    boost
      term	cephalalgia
      0.8
    boost
      term	cephalgia
      0.8
    boost
      term	cephalgias
      0.8
    boost
      phrase	"cranial pain"
      0.8
    boost
      phrase	"have headaches"
      0.8
    boost
      phrase	"head ache"
      0.8
    boost
      phrase	"head pain"
      0.8
    boost
      phrase	"head pain cephalgia"
      0.8
    boost
      phrase	"head pains"
      0.8
    boost
      term	headaches
      0.8
    boost
      phrase	"mild global headache"
      0.8
    boost
      phrase	"mild headache"
      0.8
    boost
      phrase	"pain head"
      0.8
    boost
      phrase	"pain in head"
      0.8
    boost
      phrase	"pain, head"
      0.8

in this case since we are not doing keyword replacement and "not" is not in upper case it remains as a term

Special Cases

Sometime there will be cases to specific to be coded into PyQPL, this is where Saga special cases enters into action, Saga can identify the case to be treated with a tag, this tag is later transformed into a string representation of a QPL query, replacing the original content, and finally being parsed by PyQPL.

The saga_special_case parameter accepts a dictionary with tag names a its keys, and the value of each key being a callable which receives a LexItem and returns a string (being this the string representation of the query)


Example Implementation

For this example we are using a scenario in which the user wants to look for "small cell", referring to cases of cancer, due to the nature of the query results are returned like "small non cell cancer", which refers to a non cancer case, which is the opposite of what the user want, if the user types "small cancer -non" this would remove any results that contains the word "non", even if it doesn't have any relationship with "small cell", for this we want to check for results where the token "non" is not near the token "small".

We will assume this LexItem is returned in the Saga response:

{
    "stage": "DictionaryTagger",
    "confidence": 1,
    "match": "small",
    "flags": [
        "ENTITY",
        "SEMANTIC_TAG"
    ],
    "text": "{specialCase}",
    "startPos": 0,
    "endPos": 5,
    "metadata": {
        "not": "non small",
        "display": "small",
        "id": "A0000"
    },
    "entities": [
        {
            "display": "small",
            "patterns": [
                "small"
            ],
            "id": "A0000",
            "fields": {
                "not": "non small"
            },
            "tags": [
                "specialCase"
            ]
        }
    ],
    "tags": [
        "specialCase"
    ]
}

This is the lexical item obtain for the token "small" from the query "small cell". Notice this token has metadata included


We start creating the function which will get the LexItem and transform it into a string query, and as you see below we make use of the information added in the metadata, the match, but you can make use of everything available in the LexItem

def span_not(token: LexItem) -> str:

    if token.metadata and 'not' in token.metadata:
        exclude_value = ' NEAR '.join(token.metadata['not'].split())

        return f'{token.match} SPAN_NOT ({exclude_value})'
    else:
        return token.match


Use SagaQPLOptions instead of QPLOptions, and assign saga_special_case with a dictionary of the tags we want to work with as the keys, and the functions making the transformation as the values.

Provide special cases
options = SagaQPLOptions(
    implicit_operator='and',
    fields={'content': 6, 'title': 3},
    saga_special_case={
        'specialCase': span_not
    }
)

parser = SagaParser(options=options)


Get the Saga response with whatever method you see fit, the simplest method is to make an http request to Saga Client API
Generate Saga Request
import requests
import json

response = requests.get('http://localhost:8080/saga/api/client/process/text', data=json.dumps({
    'unit': 'unit_name',
    'doc': 'small cell'
}))

saga_response = response.json()
Execute Query
qpl_tree = parser._parse(data=saga_response['highestRoute'])
print(qpl_tree.pretty())

The function .pretty() of the QPL tree will show a visual representation of how the query is form


As show below the query is has now transformed the tagged text with the new structure, which is later parsed. On the left you can see how the original query without special case would look, and on the right the same query with the special case applied

Query without special case
and
  term	small
  term	cell
Query with special case
and
  span_not
    term	small
    near
      term	non
      term	small
  term	cell

This is the equivalent for the user to type "small SPAN_NOT non NEAR small cell" but since no regular user is going to do that, this implementation works best

  • No labels