Info
Feature compatible only with Saga 1.3+ responses

PyQPL allows you to work with responses from Saga, enabling you to convert LexItems into synonyms, QPL keywords, and more complex query structures. By leveraging the capabilities of Saga, including the ability to maintain, configure, and tune dictionaries for entity extraction, PyQPL with Saga provides a powerful tool for creating queries with mutable syntax and enhanced capabilities.

Table of Contents

Saga QPL Options

The SagaQPLOptions allow you to specify tags to be transformed into QPL (Query Processing Language) structures in Saga. These options enable features such as synonym expansion, QPL keyword replacement, and complex QPL structures using saga_special_cases.

Field Type Required Default Description Examples

implicit_operator

Literal['or', 'and']

or

Default operator to use when the relationship, between 2 operands is ambiguous

Info
Applicable only to the parser

"or", "and"

fields

List[str] OR Dict[str, float]

OR List[QPLField] OR str

Fields to be use when match term, phrases, spans, ...

In the string formats you can add the boots by adding ^ and the amount to boots, e.g. ^2 or ^0.5

As string:

"field"

As a List:

["field1", "field2^2", "field3^3"]

As a Dictionary:

{"field1": 1, "field2" : 2, "field3": 3.0}

as a List of QPLFields:

[{"name":"field1", "boost": 1}, {"name":"field2", "boost": 2}, {"name":"field3", "boost": 3}]

date_fields

List[str] OR Dict[str, float]

OR List[QPLField] OR str

[ ]

Fields to be use when date ranges, if no range queries are

In the string formats you can add the boots by adding ^ and the amount to boots, e.g. ^2 or ^0.5

As string:

"field"

As a List:

["field1", "field2^2", "field3^3"]

As a Dictionary:

{"field1": 1, "field2" : 2, "field3": 3.0}

as a List of QPLFields:

[{"name":"field1", "boost": 1}, {"name":"field2", "boost": 2}, {"name":"field3", "boost": 3}]

range_fields

List[str] OR Dict[str, float]

OR List[QPLField] OR str

[ ]

Fields to be use when ranges

In the string formats you can add the boots by adding ^ and the amount to boots, e.g. ^2 or ^0.5

As string:

"field"

As a List:

["field1", "field2^2", "field3^3"]

As a Dictionary:

{"field1": 1, "field2" : 2, "field3": 3.0}

as a List of QPLFields:

[{"name":"field1", "boost": 1}, {"name":"field2", "boost": 2}, {"name":"field3", "boost": 3}]

date_format

str

Date format used to convert date values in the query.

Note
The format to use must be compatible with the engine specific query

timezone

str

Coordinated Universal Time (UTC) offset or IANA time zone used to convert date values in the query to UTC.

Note
The timezone to use must be compatible with the engine specific query

slop_near

number

10

Slop value used for the NEAR operator

slop_before

number

2

Slop value used for the BEFORE operator

slop_adj

number

0

Slop value used for the ADJ operator

slop_span_not

number

0

Slop value used for the SPAN NOT operator

wildcard

bool

False

Use wildcard operators

Info
Applicable only to the parser

grammar

File Path Or str

File path to the grammar or the actual raw grammar in string, in case you need to parse text to QPL with custom operators. QPL parser uses a Lalr parser implemented with Lark library, for more information check https://lark-parser.readthedocs.io/en/latest/grammar.html#

Warning
We recommend that before starting to make a new grammar, consult with the development team

Info
Applicable only to the parser

For more detail on how to build a grammar please check Grammar Composition from the Lark documentation

custom_operators

Dict[str, Operand]

{}

Dictionary with the name type of the custom operator as keys, and the class with their logic as values. All classes must inherit from Operator

Info
Applicable only to the parser

Info
A more detailed use of the custom operators can be found on Custom Operator page

synonyms_call

Func[]

Function returning requested synonyms for the specified string

Info
Applicable only to the parser

Info
A more detailed use of the synonym expansion can be found on Synonym Expansion page

saga_keywords

List[str]

[]

Saga tags to be normalize as QPL keywords. The display of the tag will use to replace the tag.
Example: "This {tag} here", where {tag}'s display is OR, will be normalized to "This OR here".

["tag1", "tag2"]

saga_synonyms

List[str]

[]

Saga tags marked as synonyms, For each entity in the matched tag, the list of patterns within the entity, will be used as the synonyms to replace the matched text.

["tag1", "tag2"]

saga_synonyms_boost

float

0.8

Boots to be used on each synonym added to the query.

0.8

saga_special_case

Dict[str, Callable[LexItem, str]

{}

Dictionary with tags as per keys, where each key has a function assigned, which will receive a LexItem and transformed it into a suitable query statement.

def use_case(token: LexItem) -> str

...

--------

{

    'tagName': use_case

}

Keyword Replacement

The keyword replacement can be use as an alternative to manipulate the grammar, with saga_keywords you can specify specific tags that will be normalized to their display value, which should be a QPL keyword, that way you can transform tokens such as "and", "y", "und", "et", "&&" into the keyword AND, ot "not", "no", "nicht", "pas", "!" into the keyworkd NOT

In Saga you must have a tag that once matched, the result should be something like the one below, the important bits from it are:

tags: This is where Saga QPL will look for the tags to transform
display: stored in the entities, this is the keyword Saga QPL will use
Note
it doesn't matter if the display is in lower case, sin the tag is being use in saga_keywords, all the displays are transformed into upper case

Code Block

language	js
theme	DJango
title	Example Keyword entity

{
  "stage": "DictionaryTagger",
  "confidence": 0.5,
  "match": "notnicht",
  "flags": [
    "ENTITY",
    "SEMANTIC_TAG"
  ],
  "text": "{unaryOperator}",
  "startPos": 7,
  "endPos": 10,
  "metadata": {
    "display": "not",
    "id": "A0006"
  },
  "entities": [
    {
      "display": "not",
      "patterns": [
        "not",
        "non",
        "nicht",
        "no",
        "pass"
      ],
      "id": "A0006",
      "fields": {},
      "tags": [
        "unaryOperator"
      ]
    }
  ],
  "tags": [
    "unaryOperator"
  ]
}

Synonym Expansion

Excerpt

The synonym expansion with Saga uses the entity extraction from Saga to expand the matches with the patterns provided in each entity. To make use of this functionality just add the tags you want to expand into saga_synonyms, Saga will look for LexItems with these tags, and make the expansion on them.

Additionally

Info
This is the lexical item obtain for the token "nicht" from the query "cancer nicht headache"

Example Implementation

Use SagaQPLOptions instead of QPLOptions, and assign saga_keywords with the list of tags to expand as synonyms, optionally you can specify a custom

boots

boost for the synonyms

options = SagaQPLOptions(

implicit_operator='and',
    fields={'content': 6, 'title': 3},
    saga_

synonyms

keywords=['

synonyms

unaryOperator']

, saga_synonyms_boost=0.8 ) parser


)

parser = SagaParser(options=options)

Code Block

language	py
theme	DJango
title	Provided source for synomyms

options = SagaQPLOptions(

using saga_synonyms_boost, the original token will be left untouched

Example Implementation

Use SagaQPLOptions instead of QPLOptions, and assign saga_synonyms with the list of tags to expand as synonymns, optionally you can specify a custom boost for the synonyms

Code Block

language	py
theme	DJango
title	Provided source for synomyms

Get the Saga response with whatever method you see fit, the simplest method is to make an http request to Saga Client API

Code Block

title	Generate Saga Request

import requests
import json

response = requests.get('http://localhost:8080/saga/api/client/process/text', data=json.dumps({
    'unit': 'unit_name',
    'doc': 'cancer

not

nich headache'
}))

saga_response = response.json()

Execute the parsing of the entire Saga response and print the qpl_tree

Code Block

language	py
theme	DJango
title	Execute Query

qpl_tree = parser._parse(data=saga_response['highestRoute'])
print(qpl_tree.pretty())

Info
The function .pretty() of the QPL tree will show a visual representation of how the query is form

As show below the query is has now being expanded with the synonyms found, for each of the tokens, on the left you can see how the original query without expansion would look, and on the right the same query with the synonym expansion applied

Code Block

language	text
title	Query without synonym expansionkeyword replacement

and
  term	cancer
  term	notnicht
  term	headache

Code Block

language	text
title	Query with synonym expansionkeyword replacement

and
  or
    boost
      term	cancer
      1not
    boost
      term	malignancy
      0.8
    boost
      term	363346000
      0.8
    boost
      term	cancers
      0.8
    boost
      term	malignancies
      0.8
    boost
      phrase	"malignant growth"
      0.8
    boost
      phrase	"malignant neoplasm"
      0.8
    boostterm	headache

Synonym Expansion

Excerpt

The synonym expansion with Saga uses the entity extraction from Saga to expand the matches with the patterns provided in each entity. To make use of this functionality just add the tags you want to expand into saga_synonyms, Saga will look for LexItems with these tags, and make the expansion on them.

Additionally you can specify a custom boots for the synonyms using saga_synonyms_boost, the original token will be left untouched

In Saga you must have a tag that once matched, the result should be something like the one below, the important bits from it are:

entities/patterns: This is where Saga QPL will look for the patterns and use them as synonyms for the expansion

Code Block

language	js
theme	DJango
title	Example Keyword entity

{
  "stage": "DictionaryTagger",
  "confidence": 1,
  "match": "cancer",
  "flags": [
    "ENTITY",
    "SEMANTIC_TAG"
  ],
  "text": "{synonyms}",
  "startPos": 0,
  "endPos": 6,
  "metadata": {
    "display": "cancer",
    "id": "syn0000000141"
  },
  "entities": [
    {

phrase

"display"

malignant

neoplasms

"cancer",

0.8

"patterns": [

boost

    "malignancy",

phrase "malignant

neoplastic

disease

"cancer"

0.8

],

boost

  "id": "syn0000000141",

phrase

"fields"

malignant tumor"

: {},

0.8

"tags": [

boost

    "synonym",

phrase "malignant tumors

  "synonyms"

0.8

]
    },

boost

phrase

"display"

neoplasm

malignant

"cancer",

0.8

"patterns": [

boost

"363346000",

phrase

neoplasm/

cancer",

0.8

  "cancers",

boost

phrase

"malignancies"

tumor

malignant"

0.8

"malignancy",

term not

or

   "malignant

boost

growth",

term headache

  "malignant neoplasm",

1

boost

 "malignant neoplasms",

term 25064002

    "malignant

0.8

neoplastic disease",

boost

term cephalalgia

"malignant tumor",

0.8

   "malignant

boost

tumors",

term cephalgia

  "neoplasm malignant",

0.8

boost

"neoplasm/cancer",

term cephalgias

  "tumor, malignant"

0.8

boost

],

phrase

"id"

cranial

pain

"363346000",

0.8

"tags": [

boost

    "snomed",

phrase "have

headaches"

0.8

"synonyms"

boost

phrase "head ache"

}
  ],

0.8

"tags": [

boost

"snomed",

phrase

head pain

synonym",

0.8

"synonyms"

boost phrase "head pain cephalgia" 0.8 boost phrase "head pains" 0.8 boost term headaches 0.8 boost phrase "mild global headache" 0.8 boost phrase "mild headache" 0.8 boost phrase "pain head" 0.8 boost phrase "pain in head" 0.8 boost phrase "pain, head" 0.8

Special Cases

Sometime there will be cases to specific to be coded into PyQPL, this is where Saga special cases enters into action, Saga can identify the case to be treated with a tag, this tag is later transformed into a string representation of a QPL query, replacing the original content, and finally being parsed by PyQPL.

The saga_special_case parameter accepts a dictionary with tag names a its keys, and the value of each key being a callable which receives a LexItem and returns a string (being this the string representation of the query)

]
}

Info
This is the lexical item obtain for the token "cancer" from the query "cancer not headache"

Example Implementation

Use SagaQPLOptions instead of QPLOptions, and assign saga_synonyms with the list of tags to expand as synonymns, optionally you can specify a custom boost for the synonyms

Code Block

language	py
theme	DJango
title	Provided source for synomyms

options = SagaQPLOptions(
    implicit_operator='and',
    fields={'content': 6, 'title': 3}
    saga_synonyms=['synonyms'],
	saga_synonyms_boost=0.8
)

parser = SagaParser(options=options)

Get the Saga response with whatever method you see fit, the simplest method is to make an http request to Saga Client API

Code Block

title	Generate Saga Request

import requests
import json

response = requests.get('http://localhost:8080/saga/api/client/process/text', data=json.dumps({
    'unit': 'unit_name',
    'doc': 'cancer not headache'
}))

saga_response = response.json()

Execute the parsing of the entire Saga response and print the qpl_tree

Code Block

language	py
theme	DJango
title	Execute Query

qpl_tree = parser._parse(data=saga_response['highestRoute'])
print(qpl_tree.pretty())

Info
The function .pretty() of the QPL tree will show a visual representation of how the query is form

As show below the query is has now being expanded with the synonyms found, for each of the tokens, on the left you can see how the original query without expansion would look, and on the right the same query with the synonym expansion applied

Code Block

language	text
title	Query without synonym expansion

and
  term	cancer
  term	not
  term	headache

Code Block

language	text
title	Query with synonym expansion

and
  or
    boost
      term	cancer
      1
    boost
      term	malignancy
      0.8
    boost
      term	363346000
      0.8
    boost
      term	cancers
      0.8
    boost
      term	malignancies
      0.8
    boost
      phrase	"malignant growth"
      0.8
    boost
      phrase	"malignant neoplasm"
      0.8
    boost
      phrase	"malignant neoplasms"
      0.8
    boost
      phrase	"malignant neoplastic disease"
      0.8
    boost
      phrase	"malignant tumor"
      0.8
    boost
      phrase	"malignant tumors"
      0.8
    boost
      phrase	"neoplasm malignant"
      0.8
    boost
      phrase	"neoplasm/cancer"
      0.8
    boost
      phrase	"tumor, malignant"
      0.8
  term	not
  or
    boost
      term	headache
      1
    boost
      term	25064002
      0.8
    boost
      term	cephalalgia
      0.8
    boost
      term	cephalgia
      0.8
    boost
      term	cephalgias
      0.8
    boost
      phrase	"cranial pain"
      0.8
    boost
      phrase	"have headaches"
      0.8
    boost
      phrase	"head ache"
      0.8
    boost
      phrase	"head pain"
      0.8
    boost
      phrase	"head pain cephalgia"
      0.8
    boost
      phrase	"head pains"
      0.8
    boost
      term	headaches
      0.8
    boost
      phrase	"mild global headache"
      0.8
    boost
      phrase	"mild headache"
      0.8
    boost
      phrase	"pain head"
      0.8
    boost
      phrase	"pain in head"
      0.8
    boost
      phrase	"pain, head"
      0.8

Note
in this case since we are not doing keyword replacement and "not" is not in upper case it remains as a term

Special Cases

Sometime there will be cases to specific to be coded into PyQPL, this is where Saga special cases enters into action, Saga can identify the case to be treated with a tag, this tag is later transformed into a string representation of a QPL query, replacing the original content, and finally being parsed by PyQPL.

The saga_special_case parameter accepts a dictionary with tag names a its keys, and the value of each key being a callable which receives a LexItem and returns a string (being this the string representation of the query)

Example Implementation

For this example we are using a scenario in which the user wants to look for "small cell", referring to cases of cancer, due to the nature of the query results are returned like "small non cell cancer", which refers to a non cancer case, which is the opposite of what the user want, if the user types "small cancer -non" this would remove any results that contains the word "non", even if it doesn't have any relationship with "small cell", for this we want to check for results where the token "non" is not near the token "small".

We will assume this LexItem is returned in the Saga response:

Code Block

language	py
theme	DJango

{
    "stage": "DictionaryTagger",
    "confidence": 1,
    "match": "small",
    "flags": [
        "ENTITY",
        "SEMANTIC_TAG"
    ],
    "text": "{specialCase}",
    "startPos": 0,
    "endPos": 5,
    "metadata": {
        "not": "non small",
        "display": "small",
        "id": "A0000"
    },
    "entities": [
        {
            "display": "small",
            "patterns": [
                "small"
            ],
            "id": "A0000",
            "fields": {
                "not": "non small"
            },
            "tags": [
                "specialCase"
            ]
        }
    ],
    "tags": [
        "specialCase"
    ]
}

Info
This is the lexical item obtain for the token "small" from the query "small cell". Notice this token has metadata included

We start creating the function which will get the LexItem and transform it into a string query, and as you see below we make use of the information added in the metadata, the match, but you can make use of everything available in the LexItem

Code Block

language	py
theme	DJango

def span_not(token: LexItem) -> str:

    if token.metadata and 'not' in token.metadata:
        exclude_value = ' NEAR '.join(token.metadata['not'].split())

        return f'{token.match} SPAN_NOT ({exclude_value})'
    else:
        return token.match

Use SagaQPLOptions instead of QPLOptions, and assign saga_special_case with a dictionary of the tags we want to work with as the keys, and the functions making the transformation as the values.

Code Block

title	Provide special cases

options = SagaQPLOptions(
    implicit_operator='and',
    fields={'content': 6, 'title': 3},
    saga_special_case={
        'specialCase': span_not
    }
)

parser = SagaParser(options=options)

Get the Saga response with whatever method you see fit, the simplest method is to make an http request to Saga Client API

Code Block

title	Generate Saga Request

import requests
import json

response = requests.get('http://localhost:8080/saga/api/client/process/text', data=json.dumps({
    'unit': 'unit_name',
    'doc': 'small cell'
}))

saga_response = response.json()

Code Block

title	Execute Query

qpl_tree = parser._parse(data=saga_response['highestRoute'])
print(qpl_tree.pretty())

Info
The function .pretty() of the QPL tree will show a visual representation of how the query is form

As show below the query is has now transformed the tagged text with the new structure, which is later parsed. On the left you can see how the original query without special case would look, and on the right the same query with the special case applied

Code Block

title	Query without special case

and
  term	small
  term	cell

Code Block

title	Query with special case

and
  span_not
    term	small
    near
      term	non
      term	small
  term	cell

Info
This is the equivalent for the user to type "small SPAN_NOT non NEAR small cell" but since no regular user is going to do that, this implementation works best

Example Implementation

For this we need a function

Code Block

language	py
theme	DJango

def span_not(token: LexItem) -> str:

    if token.metadata and 'not' in token.metadata:
        exclude_value = ' NEAR '.join(token.metadata['not'].split())

        return f'{token.match} SPAN_NOT ({exclude_value})'
    else:
        return token.match

Code Block
options = SagaQPLOptions( implicit_operator='and', fields={'content': 6, 'title': 3}, saga_special_case={ 'specialCase': span_not } )

Code Block
parser = SagaParser(options=options) translator = ElasticsearchTranslator()

Code Block

import requests
import json

response = requests.get('http://localhost:8080/saga/api/client/process/text, data=json.dumps({
    'unit': 'unit_name',
    'doc': 'cancer not headache'
}))

saga_response = response.json()

Code Blockqpl_tree = parser._parse(data=saga_response['highestRoute']) print(qpl_tree.pretty())

Page tree

Versions Compared

Old Version 6

New Version Current

Key

Saga QPL Options

Keyword Replacement

Synonym Expansion

Example Implementation

Example Implementation

Synonym Expansion

Special Cases

Example Implementation

Special Cases

Example Implementation

Example Implementation

Page tree

Page History

Versions Compared

Old Version 6

New Version Current

Key

Saga QPL Options

Keyword Replacement

Synonym Expansion

Example Implementation

Example Implementation

Synonym Expansion

Special Cases

Example Implementation

Special Cases

Example Implementation

Example Implementation