Field | Type | Required | Default | Description | Examples |
---|---|---|---|---|---|
implicit_operator | Literal['or', 'and'] | or | Default operator to use when the relationship, between 2 operands is ambiguous Applicable only to the parser | "or", "and" | |
fields | List[str] OR Dict[str, float] OR List[QPLField] OR str | Fields to be use when match term, phrases, spans, ... In the string formats you can add the boots by adding ^ and the amount to boots, e.g. ^2 or ^0.5 | As string:
As a List:
As a Dictionary:
as a List of QPLFields:
| ||
date_fields | List[str] OR Dict[str, float] OR List[QPLField] OR str | [ ] | Fields to be use when date ranges, if no range queries are In the string formats you can add the boots by adding ^ and the amount to boots, e.g. ^2 or ^0.5 | As string:
As a List:
As a Dictionary:
as a List of QPLFields:
| |
range_fields | List[str] OR Dict[str, float] OR List[QPLField] OR str | [ ] | Fields to be use when ranges In the string formats you can add the boots by adding ^ and the amount to boots, e.g. ^2 or ^0.5 | As string:
As a List:
As a Dictionary:
as a List of QPLFields:
| |
date_format | str | Date format used to convert date values in the query. The format to use must be compatible with the engine specific query | |||
timezone | str | Coordinated Universal Time (UTC) offset or IANA time zone used to convert date values in the query to UTC. The timezone to use must be compatible with the engine specific query | |||
slop_near | number | 10 | Slop value used for the NEAR operator | ||
slop_before | number | 2 | Slop value used for the BEFORE operator | ||
slop_adj | number | 0 | Slop value used for the ADJ operator | ||
slop_span_not | number | 0 | Slop value used for the SPAN NOT operator | ||
wildcard | bool | False | Use wildcard operators Applicable only to the parser | ||
grammar | File Path Or str | File path to the grammar or the actual raw grammar in string, in case you need to parse text to QPL with custom operators. QPL parser uses a Lalr parser implemented with Lark library, for more information check https://lark-parser.readthedocs.io/en/latest/grammar.html# We recommend that before starting to make a new grammar, consult with the development team Applicable only to the parser | For more detail on how to build a grammar please check Grammar Composition from the Lark documentation | ||
custom_operators | Dict[str, Operand] | {} | Dictionary with the name type of the custom operator as keys, and the class with their logic as values. All classes must inherit from Operator Applicable only to the parser | ||
synonyms_call | Func[] | Function returning requested synonyms for the specified string Applicable only to the parser | |||
saga_keywords | List[str] | [] | Saga tags to be normalize as QPL keywords. The display of the tag will use to replace the tag. | ["tag1", "tag2"] | |
saga_synonyms | List[str] | [] | Saga tags marked as synonyms, For each entity in the matched tag, the list of patterns within the entity, will be used as the synonyms to replace the matched text. | ["tag1", "tag2"] | |
saga_synonyms_boost | float | 0.8 | Boots to be used on each synonym added to the query. | 0.8 | |
saga_special_case | Dict[str, Callable[LexItem, str] | {} | Dictionary with tags as per keys, where each key has a function assigned, which will receive a LexItem and transformed it into a suitable query statement. | def use_case(token: LexItem) -> str ... { |
The keyword replacement can be use as an alternative to manipulate the grammar, with saga_keywords you can specify specific tags that will be normalized to their display value, which should be a QPL keyword, that way you can transform tokens such as "and", "y", "und", "et", "&&" into the keyword AND, ot "not", "no", "nicht", "pas", "!" into the keyworkd NOT
In Saga you must have a tag that once matched, the result should be something like the one below, the important bits from it are:
display: stored in the entities, this is the keyword Saga QPL will use
it doesn't matter if the display is in lower case, sin the tag is being use in saga_keywords, all the displays are transformed into upper case
{ "stage": "DictionaryTagger", "confidence": 0.5, "match": "nicht", "flags": [ "ENTITY", "SEMANTIC_TAG" ], "text": "{unaryOperator}", "startPos": 7, "endPos": 10, "metadata": { "display": "not", "id": "A0006" }, "entities": [ { "display": "not", "patterns": [ "not", "non", "nicht", "no", "pass" ], "id": "A0006", "fields": {}, "tags": [ "unaryOperator" ] } ], "tags": [ "unaryOperator" ] }
This is the lexical item obtain for the token "nicht" from the query "cancer nicht headache"
Use SagaQPLOptions instead of QPLOptions, and assign saga_keywords with the list of tags to expand as synonyms, optionally you can specify a custom boost for the synonyms
options = SagaQPLOptions( implicit_operator='and', fields={'content': 6, 'title': 3}, saga_keywords=['unaryOperator'] ) parser = SagaParser(options=options)
Get the Saga response with whatever method you see fit, the simplest method is to make an http request to Saga Client API
import requests import json response = requests.get('http://localhost:8080/saga/api/client/process/text', data=json.dumps({ 'unit': 'unit_name', 'doc': 'cancer nich headache' })) saga_response = response.json()
Execute the parsing of the entire Saga response and print the qpl_tree
qpl_tree = parser._parse(data=saga_response['highestRoute']) print(qpl_tree.pretty())
The function .pretty() of the QPL tree will show a visual representation of how the query is form
As show below the query is has now being expanded with the synonyms found, for each of the tokens, on the left you can see how the original query without expansion would look, and on the right the same query with the synonym expansion applied
and term cancer term nicht term headache
and term cancer not term headache
The synonym expansion with Saga uses the entity extraction from Saga to expand the matches with the patterns provided in each entity. To make use of this functionality just add the tags you want to expand into saga_synonyms, Saga will look for LexItems with these tags, and make the expansion on them. Additionally you can specify a custom boots for the synonyms using saga_synonyms_boost, the original token will be left untouched In Saga you must have a tag that once matched, the result should be something like the one below, the important bits from it are: This is the lexical item obtain for the token "cancer" from the query "cancer not headache" Use SagaQPLOptions instead of QPLOptions, and assign saga_synonyms with the list of tags to expand as synonymns, optionally you can specify a custom boost for the synonyms Get the Saga response with whatever method you see fit, the simplest method is to make an http request to Saga Client API Execute the parsing of the entire Saga response and print the qpl_tree The function .pretty() of the QPL tree will show a visual representation of how the query is form As show below the query is has now being expanded with the synonyms found, for each of the tokens, on the left you can see how the original query without expansion would look, and on the right the same query with the synonym expansion applied{
"stage": "DictionaryTagger",
"confidence": 1,
"match": "cancer",
"flags": [
"ENTITY",
"SEMANTIC_TAG"
],
"text": "{synonyms}",
"startPos": 0,
"endPos": 6,
"metadata": {
"display": "cancer",
"id": "syn0000000141"
},
"entities": [
{
"display": "cancer",
"patterns": [
"malignancy",
"cancer"
],
"id": "syn0000000141",
"fields": {},
"tags": [
"synonym",
"synonyms"
]
},
{
"display": "cancer",
"patterns": [
"363346000",
"cancer",
"cancers",
"malignancies",
"malignancy",
"malignant growth",
"malignant neoplasm",
"malignant neoplasms",
"malignant neoplastic disease",
"malignant tumor",
"malignant tumors",
"neoplasm malignant",
"neoplasm/cancer",
"tumor, malignant"
],
"id": "363346000",
"tags": [
"snomed",
"synonyms"
]
}
],
"tags": [
"snomed",
"synonym",
"synonyms"
]
}
Example Implementation
options = SagaQPLOptions(
implicit_operator='and',
fields={'content': 6, 'title': 3}
saga_synonyms=['synonyms'],
saga_synonyms_boost=0.8
)
parser = SagaParser(options=options)
import requests
import json
response = requests.get('http://localhost:8080/saga/api/client/process/text', data=json.dumps({
'unit': 'unit_name',
'doc': 'cancer not headache'
}))
saga_response = response.json()
qpl_tree = parser._parse(data=saga_response['highestRoute'])
print(qpl_tree.pretty())
and term cancer term not term headache
and or boost term cancer 1 boost term malignancy 0.8 boost term 363346000 0.8 boost term cancers 0.8 boost term malignancies 0.8 boost phrase "malignant growth" 0.8 boost phrase "malignant neoplasm" 0.8 boost phrase "malignant neoplasms" 0.8 boost phrase "malignant neoplastic disease" 0.8 boost phrase "malignant tumor" 0.8 boost phrase "malignant tumors" 0.8 boost phrase "neoplasm malignant" 0.8 boost phrase "neoplasm/cancer" 0.8 boost phrase "tumor, malignant" 0.8 term not or boost term headache 1 boost term 25064002 0.8 boost term cephalalgia 0.8 boost term cephalgia 0.8 boost term cephalgias 0.8 boost phrase "cranial pain" 0.8 boost phrase "have headaches" 0.8 boost phrase "head ache" 0.8 boost phrase "head pain" 0.8 boost phrase "head pain cephalgia" 0.8 boost phrase "head pains" 0.8 boost term headaches 0.8 boost phrase "mild global headache" 0.8 boost phrase "mild headache" 0.8 boost phrase "pain head" 0.8 boost phrase "pain in head" 0.8 boost phrase "pain, head" 0.8
in this case since we are not doing keyword replacement and "not" is not in upper case it remains as a term
Sometime there will be cases to specific to be coded into PyQPL, this is where Saga special cases enters into action, Saga can identify the case to be treated with a tag, this tag is later transformed into a string representation of a QPL query, replacing the original content, and finally being parsed by PyQPL.
The saga_special_case parameter accepts a dictionary with tag names a its keys, and the value of each key being a callable which receives a LexItem and returns a string (being this the string representation of the query)
For this example we are using a scenario in which the user wants to look for "small cell", referring to cases of cancer, due to the nature of the query results are returned like "small non cell cancer", which refers to a non cancer case, which is the opposite of what the user want, if the user types "small cancer -non" this would remove any results that contains the word "non", even if it doesn't have any relationship with "small cell", for this we want to check for results where the token "non" is not near the token "small".
We will assume this LexItem is returned in the Saga response:
{ "stage": "DictionaryTagger", "confidence": 1, "match": "small", "flags": [ "ENTITY", "SEMANTIC_TAG" ], "text": "{specialCase}", "startPos": 0, "endPos": 5, "metadata": { "not": "non small", "display": "small", "id": "A0000" }, "entities": [ { "display": "small", "patterns": [ "small" ], "id": "A0000", "fields": { "not": "non small" }, "tags": [ "specialCase" ] } ], "tags": [ "specialCase" ] }
This is the lexical item obtain for the token "small" from the query "small cell". Notice this token has metadata included
We start creating the function which will get the LexItem and transform it into a string query, and as you see below we make use of the information added in the metadata, the match, but you can make use of everything available in the LexItem
def span_not(token: LexItem) -> str: if token.metadata and 'not' in token.metadata: exclude_value = ' NEAR '.join(token.metadata['not'].split()) return f'{token.match} SPAN_NOT ({exclude_value})' else: return token.match
Use SagaQPLOptions instead of QPLOptions, and assign saga_special_case with a dictionary of the tags we want to work with as the keys, and the functions making the transformation as the values.
options = SagaQPLOptions( implicit_operator='and', fields={'content': 6, 'title': 3}, saga_special_case={ 'specialCase': span_not } ) parser = SagaParser(options=options)
import requests import json response = requests.get('http://localhost:8080/saga/api/client/process/text', data=json.dumps({ 'unit': 'unit_name', 'doc': 'small cell' })) saga_response = response.json()
qpl_tree = parser._parse(data=saga_response['highestRoute']) print(qpl_tree.pretty())
The function .pretty() of the QPL tree will show a visual representation of how the query is form
As show below the query is has now transformed the tagged text with the new structure, which is later parsed. On the left you can see how the original query without special case would look, and on the right the same query with the special case applied
and term small term cell
and span_not term small near term non term small term cell
This is the equivalent for the user to type "small SPAN_NOT non NEAR small cell" but since no regular user is going to do that, this implementation works best