PyQPL posses 2 options to implement synonym expansion, one embedded using a function to provided the token synonyms, and the second one with Saga, translating the matches into string representations of QPL.
The embedded synonym expansion feature utilizes a function provided in the QPLOptions called synonyms_call. This function is responsible for receiving a string (either a token or phrase) and returning either a list of string (tokens and/or phrases) or None if no synonyms are found.
To implement this feature, you have the freedom to define the synonyms_call function as long as it adheres to the specified signature. This means it can read from a document, database, call a third-party API or have a manual list of synonyms from which look for the synonyms and return them.
Have a source from which to look for the synonyms, the simplest way to do this is by providing a dictionary, where the key is the text to look for and the value is the list of synonyms
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
synonyms = { 'cancer': ['cancer', 'malignancy', '363346000', 'cancers', 'malignancies', '"malignant growth"', '"malignant neoplasm"', '"malignant neoplasms"', '"malignant neoplastic disease"', '"malignant tumor"', '"malignant tumors"', '"neoplasm malignant"', '"neoplasm/cancer"', '"tumor, malignant"'], 'headache': ['headache', '25064002', 'cephalalgia', 'cephalgia', 'cephalgias', '"cranial pain"', '"have headaches"', '"head ache"', '"head pain"', '"head pain cephalgia"', '"head pains"', 'headaches', '"mild global headache"', '"mild headache"', '"pain head"', '"pain in head"', '"pain, head"'] } |
Then create a function and assign it to synonyms_call, going with the simplest example we declared a lambda function which receives an argument (the string) and use it as key for the dictionary returning None by default
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
options = QPLOptions(fields='content', implicit_operator='or') options.synonyms_call = lambda x: synonyms.get(x, None) parser = QPLParser(options=options) |
Execute the parsing of our query and print the qpl_tree
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
qpl_tree = parser._parse(data='cancer NOT headache') print(qpl_tree.pretty()) |
Info |
---|
The function .pretty() of the QPL tree will show a visual representation of how the query is form |
As show below the query is has now being expanded with the synonyms found, for each of the tokens, on the left you can see how the original query without expansion would look, and on the right the same query with the synonym expansion applied
Code Block | ||||
---|---|---|---|---|
| ||||
or term cancer not term headache |
Code Block | ||||
---|---|---|---|---|
| ||||
or phrase "malignant growth" phrase "malignant neoplasm" phrase "malignant neoplasms" phrase "malignant neoplastic disease" phrase "malignant tumor" phrase "malignant tumors" phrase "neoplasm malignant" phrase "neoplasm/cancer" phrase "tumor, malignant" term 363346000 term cancer term cancers term malignancies term malignancy not or phrase "cranial pain" phrase "have headaches" phrase "head ache" phrase "head pain cephalgia" phrase "head pain" phrase "head pains" phrase "mild global headache" phrase "mild headache" phrase "pain head" phrase "pain in head" phrase "pain, head" term 25064002 term cephalalgia term cephalgia term cephalgias term headache term headaches |
The synonym expansion with Saga uses the entity extraction from Saga to expand the matches with the patterns provided in each entity. To make use of this functionality just add the tags you want to expand into saga_synonyms, Saga will look for LexItems with these tags, and make the expansion on them.
Additionally you can specify a custom boots for the synonyms using saga_synonyms_boost, the original token will be left untouch
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
options = SagaQPLOptions(
implicit_operator='and',
fields={'content': 6, 'title': 3}
saga_synonyms=['synonyms'],
saga_synonyms_boost=0.8
)
parser = SagaParser(options=options) |
Get the Saga response with whatever method you see fit, the simplest method is to make an http request to Saga Client API
Code Block |
---|
import requests
import json
response = requests.get('http://localhost:8080/saga/api/client/process/text, data=json.dumps({
'unit': 'unit_name',
'doc': 'cancer not headache'
}))
saga_response = response.json() |
Execute the parsing of the entire Saga response and print the qpl_tree
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
qpl_tree = parser._parse(data=saga_response['highestRoute'])
print(qpl_tree.pretty()) |
Info |
---|
The function .pretty() of the QPL tree will show a visual representation of how the query is form |
As show below the query is has now being expanded with the synonyms found, for each of the tokens, on the left you can see how the original query without expansion would look, and on the right the same query with the synonym expansion applied
Code Block | ||||
---|---|---|---|---|
| ||||
and term cancer term not term headache |
Code Block | ||||
---|---|---|---|---|
| ||||
and or boost term cancer 1 boost term malignancy 0.8 boost term 363346000 0.8 boost term cancers 0.8 boost term malignancies 0.8 boost phrase "malignant growth" 0.8 boost phrase "malignant neoplasm" 0.8 boost phrase "malignant neoplasms" 0.8 boost phrase "malignant neoplastic disease" 0.8 boost phrase "malignant tumor" 0.8 boost phrase "malignant tumors" 0.8 boost phrase "neoplasm malignant" 0.8 boost phrase "neoplasm/cancer" 0.8 boost phrase "tumor, malignant" 0.8 term not or boost term headache 1 boost term 25064002 0.8 boost term cephalalgia 0.8 boost term cephalgia 0.8 boost term cephalgias 0.8 boost phrase "cranial pain" 0.8 boost phrase "have headaches" 0.8 boost phrase "head ache" 0.8 boost phrase "head pain" 0.8 boost phrase "head pain cephalgia" 0.8 boost phrase "head pains" 0.8 boost term headaches 0.8 boost phrase "mild global headache" 0.8 boost phrase "mild headache" 0.8 boost phrase "pain head" 0.8 boost phrase "pain in head" 0.8 boost phrase "pain, head" 0.8 |
Note |
---|
You may have notice that in this example not, is treated as a term and not as the key word NOT, that's because of 2 reasons
|