PyQPL posses 2 options to implement synonym expansion, one embedded using a function to provided the token synonyms, and the second one with Saga, translating the matches into string representations of QPL.

Embedded Synonym Expansion

The embedded synonym expansion feature utilizes a function provided in the QPLOptions called synonyms_call. This function is responsible for receiving a string (either a token or phrase) and returning either a list of string (tokens and/or phrases) or None if no synonyms are found.

To implement this feature, you have the freedom to define the synonyms_call function as long as it adheres to the specified signature. This means it can read from a document, database, call a third-party API or have a manual list of synonyms from which look for the synonyms and return them.

Example Implementation

Have a source from which to look for the synonyms, the simplest way to do this is by providing a dictionary, where the key is the text to look for and the value is the list of synonyms

Code Block

language	py
theme	DJango
title	Provided source for synomyms

synonyms = {
    'cancer': ['cancer', 'malignancy', '363346000', 'cancers', 'malignancies', '"malignant growth"',
               '"malignant neoplasm"', '"malignant neoplasms"', '"malignant neoplastic disease"',
               '"malignant tumor"', '"malignant tumors"', '"neoplasm malignant"', '"neoplasm/cancer"',
               '"tumor, malignant"'],
    'headache': ['headache', '25064002', 'cephalalgia', 'cephalgia', 'cephalgias', '"cranial pain"',
                 '"have headaches"', '"head ache"', '"head pain"', '"head pain cephalgia"', '"head pains"',
                 'headaches', '"mild global headache"', '"mild headache"', '"pain head"', '"pain in head"',
                 '"pain, head"']
}

Then create a function and assign it to synonyms_call, going with the simplest example we declared a lambda function which receives an argument (the string) and use it as key for the dictionary returning None by default

Code Block

language	py
theme	DJango
firstline	QPLOptions initialization and synonyms_call assign

options = QPLOptions(fields='content', implicit_operator='or')
options.synonyms_call = lambda x: synonyms.get(x, None)

parser = QPLParser(options=options)

Execute the parsing of our query and print the qpl_tree

Code Block

language	py
theme	DJango
title	Execute Query

qpl_tree = parser._parse(data='cancer NOT headache')
print(qpl_tree.pretty())

Info
The function .pretty() of the QPL tree will show a visual representation of how the query is form

As show below the query is has now being expanded with the synonyms found, for each of the tokens, on the left you can see how the original query without expansion would look, and on the right the same query with the synonym expansion applied

Code Block

language	text
title	Query without synonym expansion

or
  term	cancer
  not
    term	headache

Code Block

language	text
title	Query with synonym expansion

or
  phrase	"malignant growth"
  phrase	"malignant neoplasm"
  phrase	"malignant neoplasms"
  phrase	"malignant neoplastic disease"
  phrase	"malignant tumor"
  phrase	"malignant tumors"
  phrase	"neoplasm malignant"
  phrase	"neoplasm/cancer"
  phrase	"tumor, malignant"
  term	363346000
  term	cancer
  term	cancers
  term	malignancies
  term	malignancy
  not
    or
      phrase	"cranial pain"
      phrase	"have headaches"
      phrase	"head ache"
      phrase	"head pain cephalgia"
      phrase	"head pain"
      phrase	"head pains"
      phrase	"mild global headache"
      phrase	"mild headache"
      phrase	"pain head"
      phrase	"pain in head"
      phrase	"pain, head"
      term	25064002
      term	cephalalgia
      term	cephalgia
      term	cephalgias
      term	headache
      term	headaches

Saga Synonym Expansion

The synonym expansion with Saga uses the entity extraction from Saga to expand the matches with the patterns provided in each entity. To make use of this functionality just add the tags you want to expand into saga_synonyms, Saga will look for LexItems with these tags, and make the expansion on them.

Additionally you can specify a custom boots for the synonyms using saga_synonyms_boost, the original token will be left untouch

Code Block

language	py
theme	DJango
title	Provided source for synomyms

options = SagaQPLOptions(
    implicit_operator='and',
    fields={'content': 6, 'title': 3}
    saga_synonyms=['synonyms'],
	saga_synonyms_boost=0.8
)

parser = SagaParser(options=options)

Get the Saga response with whatever method you see fit, the simplest method is to make an http request to Saga Client API

Code Block

import requests
import json


response = requests.get('http://localhost:8080/saga/api/client/process/text, data=json.dumps({
    'unit': 'unit_name',
    'doc': 'cancer not headache'
}))

saga_response = response.json()

Execute the parsing of the entire Saga response and print the qpl_tree

Code Block

language	py
theme	DJango
title	Execute Query

qpl_tree = parser._parse(data=saga_response['highestRoute'])
print(qpl_tree.pretty())

Info
The function .pretty() of the QPL tree will show a visual representation of how the query is form

As show below the query is has now being expanded with the synonyms found, for each of the tokens, on the left you can see how the original query without expansion would look, and on the right the same query with the synonym expansion applied

Code Block

language	text
title	Query without synonym expansion

and
  term	cancer
  term	not
  term	headache

Code Block

language	text
title	Query with synonym expansion

and
  or
    boost
      term	cancer
      1
    boost
      term	malignancy
      0.8
    boost
      term	363346000
      0.8
    boost
      term	cancers
      0.8
    boost
      term	malignancies
      0.8
    boost
      phrase	"malignant growth"
      0.8
    boost
      phrase	"malignant neoplasm"
      0.8
    boost
      phrase	"malignant neoplasms"
      0.8
    boost
      phrase	"malignant neoplastic disease"
      0.8
    boost
      phrase	"malignant tumor"
      0.8
    boost
      phrase	"malignant tumors"
      0.8
    boost
      phrase	"neoplasm malignant"
      0.8
    boost
      phrase	"neoplasm/cancer"
      0.8
    boost
      phrase	"tumor, malignant"
      0.8
  term	not
  or
    boost
      term	headache
      1
    boost
      term	25064002
      0.8
    boost
      term	cephalalgia
      0.8
    boost
      term	cephalgia
      0.8
    boost
      term	cephalgias
      0.8
    boost
      phrase	"cranial pain"
      0.8
    boost
      phrase	"have headaches"
      0.8
    boost
      phrase	"head ache"
      0.8
    boost
      phrase	"head pain"
      0.8
    boost
      phrase	"head pain cephalgia"
      0.8
    boost
      phrase	"head pains"
      0.8
    boost
      term	headaches
      0.8
    boost
      phrase	"mild global headache"
      0.8
    boost
      phrase	"mild headache"
      0.8
    boost
      phrase	"pain head"
      0.8
    boost
      phrase	"pain in head"
      0.8
    boost
      phrase	"pain, head"
      0.8

Note

You may have notice that in this example not, is treated as a term and not as the key word NOT, that's because of 2 reasons

not in lower case is not the keyword NOT in upper case so PyQPL will recognize it as term
Sag can interpret the not as keyword only if saga_keywords are provided, for more information go to Saga Coupling

Page tree

Versions Compared

Old Version 2

New Version 3

Key

Embedded Synonym Expansion

Example Implementation

Saga Synonym Expansion

Page tree

Page History

Versions Compared

Old Version 2

New Version 3

Key

Embedded Synonym Expansion

Example Implementation

Saga Synonym Expansion