PyQPL posses 2 options to implement synonym expansion, one embedded using a function to provided the token synonyms, and the second one with Saga, translating the matches into string representations of QPL.
The embedded synonym expansion feature utilizes a function provided in the QPLOptions called synonyms_call. This function is responsible for receiving a string (either a token or phrase) and returning either a list of string (tokens and/or phrases) or None if no synonyms are found.
To implement this feature, you have the freedom to define the synonyms_call function as long as it adheres to the specified signature. This means it can read from a document, database, call a third-party API or have a manual list of synonyms from which look for the synonyms and return them.
Have a source from which to look for the synonyms, the simplest way to do this is by providing a dictionary, where the key is the text to look for and the value is the list of synonyms
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
synonyms = {
'cancer': ['cancer', 'malignancy', '363346000', 'cancers', 'malignancies', '"malignant growth"',
'"malignant neoplasm"', '"malignant neoplasms"', '"malignant neoplastic disease"',
'"malignant tumor"', '"malignant tumors"', '"neoplasm malignant"', '"neoplasm/cancer"',
'"tumor, malignant"'],
'headache': ['headache', '25064002', 'cephalalgia', 'cephalgia', 'cephalgias', '"cranial pain"',
'"have headaches"', '"head ache"', '"head pain"', '"head pain cephalgia"', '"head pains"',
'headaches', '"mild global headache"', '"mild headache"', '"pain head"', '"pain in head"',
'"pain, head"']
} |
Then create a function and assign it to synonyms_call, going with the simplest example we declared a lambda function which receives an argument (the string) and use it as key for the dictionary returning None by default
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
options = QPLOptions(fields='content', implicit_operator='or')
options.synonyms_call = lambda x: synonyms.get(x, None)
parser = QPLParser(options=options) |
Execute the parsing of our query and print the qpl_tree
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
qpl_tree = parser._parse(data='cancer NOT headache')
print(qpl_tree.pretty()) |
Info |
---|
The function .pretty() of the QPL tree will show a visual representation of how the query is form |
As show below the query is has now being expanded with the synonyms found, for each of the tokens, on the left you can see how the original query without expansion would look, and on the right the same query with the synonym expansion applied
Code Block | ||||
---|---|---|---|---|
| ||||
or
term cancer
not
term headache |
Code Block | ||||
---|---|---|---|---|
| ||||
or
phrase "malignant growth"
phrase "malignant neoplasm"
phrase "malignant neoplasms"
phrase "malignant neoplastic disease"
phrase "malignant tumor"
phrase "malignant tumors"
phrase "neoplasm malignant"
phrase "neoplasm/cancer"
phrase "tumor, malignant"
term 363346000
term cancer
term cancers
term malignancies
term malignancy
not
or
phrase "cranial pain"
phrase "have headaches"
phrase "head ache"
phrase "head pain cephalgia"
phrase "head pain"
phrase "head pains"
phrase "mild global headache"
phrase "mild headache"
phrase "pain head"
phrase "pain in head"
phrase "pain, head"
term 25064002
term cephalalgia
term cephalgia
term cephalgias
term headache
term headaches |
The synonym expansion with Saga uses the entity extraction from Saga to expand the matches with the patterns provided in each entity
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
options = SagaQPLOptions(
implicit_operator='and',
fields={'content': 6, 'title': 3}
saga_synonyms=['synonyms']
)
parser = SagaParser(options=options) |
Code Block | ||||
---|---|---|---|---|
| ||||
and
term cancer
term not
term headache |
Code Block | ||||
---|---|---|---|---|
| ||||
and
or
boost
term cancer
1
boost
term malignancy
0.8
boost
term 363346000
0.8
boost
term cancers
0.8
boost
term malignancies
0.8
boost
phrase "malignant growth"
0.8
boost
phrase "malignant neoplasm"
0.8
boost
phrase "malignant neoplasms"
0.8
boost
phrase "malignant neoplastic disease"
0.8
boost
phrase "malignant tumor"
0.8
boost
phrase "malignant tumors"
0.8
boost
phrase "neoplasm malignant"
0.8
boost
phrase "neoplasm/cancer"
0.8
boost
phrase "tumor, malignant"
0.8
term not
or
boost
term headache
1
boost
term 25064002
0.8
boost
term cephalalgia
0.8
boost
term cephalgia
0.8
boost
term cephalgias
0.8
boost
phrase "cranial pain"
0.8
boost
phrase "have headaches"
0.8
boost
phrase "head ache"
0.8
boost
phrase "head pain"
0.8
boost
phrase "head pain cephalgia"
0.8
boost
phrase "head pains"
0.8
boost
term headaches
0.8
boost
phrase "mild global headache"
0.8
boost
phrase "mild headache"
0.8
boost
phrase "pain head"
0.8
boost
phrase "pain in head"
0.8
boost
phrase "pain, head"
0.8 |
Note |
---|
You may have notice that in this example not, is treated as a term and not as the key word NOT, that's because of 2 reasons
|