PyQPL posses 2 options to implement synonym expansion, one embedded using a function to provided the token synonyms, and the second one with Saga, translating the matches into string representations of QPL.

Embedded Synonym Expansion

The embedded synonym expansion feature utilizes a function provided in the QPLOptions called synonyms_call. This function is responsible for receiving a string (either a token or phrase) and returning either a list of string (tokens and/or phrases) or None if no synonyms are found.

To implement this feature, you have the freedom to define the synonyms_call function as long as it adheres to the specified signature. This means it can read from a document, database, call a third-party API or have a manual list of synonyms from which look for the synonyms and return them.


Example Implementation

Have a source from which to look for the synonyms, the simplest way to do this is by providing a dictionary, where the key is the text to look for and the value is the list of synonyms

Provided source for synomyms
synonyms = {
    'cancer': ['cancer', 'malignancy', '363346000', 'cancers', 'malignancies', '"malignant growth"',
               '"malignant neoplasm"', '"malignant neoplasms"', '"malignant neoplastic disease"',
               '"malignant tumor"', '"malignant tumors"', '"neoplasm malignant"', '"neoplasm/cancer"',
               '"tumor, malignant"'],
    'headache': ['headache', '25064002', 'cephalalgia', 'cephalgia', 'cephalgias', '"cranial pain"',
                 '"have headaches"', '"head ache"', '"head pain"', '"head pain cephalgia"', '"head pains"',
                 'headaches', '"mild global headache"', '"mild headache"', '"pain head"', '"pain in head"',
                 '"pain, head"']
}


Then create a function and assign it to synonyms_call, going with the simplest example we declared a lambda function which receives an argument (the string) and use it as key for the dictionary returning None by default

Error rendering macro 'code': Invalid value specified for parameter 'com.atlassian.confluence.ext.code.render.InvalidValueException'
options = QPLOptions(fields='content', implicit_operator='or')
options.synonyms_call = lambda x: synonyms.get(x, None)

parser = QPLParser(options=options)


Execute the parsing of our query and print the qpl_tree

Execute Query
qpl_tree = parser._parse(data='cancer NOT headache')
print(qpl_tree.pretty())

The function .pretty() of the QPL tree will show a visual representation of how the query is form


As show below the query is has now being expanded with the synonyms found, for each of the tokens, on the left you can see how the original query without expansion would look, and on the right the same query with the synonym expansion applied

Query without synonym expansion
or
  term	cancer
  not
    term	headache
Query with synonym expansion
or
  phrase	"malignant growth"
  phrase	"malignant neoplasm"
  phrase	"malignant neoplasms"
  phrase	"malignant neoplastic disease"
  phrase	"malignant tumor"
  phrase	"malignant tumors"
  phrase	"neoplasm malignant"
  phrase	"neoplasm/cancer"
  phrase	"tumor, malignant"
  term	363346000
  term	cancer
  term	cancers
  term	malignancies
  term	malignancy
  not
    or
      phrase	"cranial pain"
      phrase	"have headaches"
      phrase	"head ache"
      phrase	"head pain cephalgia"
      phrase	"head pain"
      phrase	"head pains"
      phrase	"mild global headache"
      phrase	"mild headache"
      phrase	"pain head"
      phrase	"pain in head"
      phrase	"pain, head"
      term	25064002
      term	cephalalgia
      term	cephalgia
      term	cephalgias
      term	headache
      term	headaches





Saga Synonym Expansion

The synonym expansion with Saga uses the entity extraction from Saga to expand the matches with the patterns provided in each entity. To make use of this functionality just add the tags you want to expand into saga_synonyms, Saga will look for LexItems with these tags, and make the expansion on them.

Additionally you can specify a custom boots for the synonyms using saga_synonyms_boost, the original token will be left untouched


In Saga you must have a tag that once matched, the result should be something like the one below, the important bits from it are:

  • entities/patterns: This is where Saga QPL will look for the patterns and use them as synonyms for the expansion


Example Keyword entity
{
  "stage": "DictionaryTagger",
  "confidence": 1,
  "match": "cancer",
  "flags": [
    "ENTITY",
    "SEMANTIC_TAG"
  ],
  "text": "{synonyms}",
  "startPos": 0,
  "endPos": 6,
  "metadata": {
    "display": "cancer",
    "id": "syn0000000141"
  },
  "entities": [
    {
      "display": "cancer",
      "patterns": [
        "malignancy",
        "cancer"
      ],
      "id": "syn0000000141",
      "fields": {},
      "tags": [
        "synonym",
        "synonyms"
      ]
    },
    {
      "display": "cancer",
      "patterns": [
        "363346000",
        "cancer",
        "cancers",
        "malignancies",
        "malignancy",
        "malignant growth",
        "malignant neoplasm",
        "malignant neoplasms",
        "malignant neoplastic disease",
        "malignant tumor",
        "malignant tumors",
        "neoplasm malignant",
        "neoplasm/cancer",
        "tumor, malignant"
      ],
      "id": "363346000",
      "tags": [
        "snomed",
        "synonyms"
      ]
    }
  ],
  "tags": [
    "snomed",
    "synonym",
    "synonyms"
  ]
}

This is the lexical item obtain for the token "cancer" from the query "cancer not headache"

Example Implementation

Use SagaQPLOptions instead of QPLOptions, and assign saga_synonyms with the list of tags to expand as synonymns, optionally you can specify a custom boost for the synonyms

Provided source for synomyms
options = SagaQPLOptions(
    implicit_operator='and',
    fields={'content': 6, 'title': 3}
    saga_synonyms=['synonyms'],
	saga_synonyms_boost=0.8
)

parser = SagaParser(options=options)


Get the Saga response with whatever method you see fit, the simplest method is to make an http request to Saga Client API

Generate Saga Request
import requests
import json

response = requests.get('http://localhost:8080/saga/api/client/process/text', data=json.dumps({
    'unit': 'unit_name',
    'doc': 'cancer not headache'
}))

saga_response = response.json()


Execute the parsing of the entire Saga response and print the qpl_tree

Execute Query
qpl_tree = parser._parse(data=saga_response['highestRoute'])
print(qpl_tree.pretty())

The function .pretty() of the QPL tree will show a visual representation of how the query is form


As show below the query is has now being expanded with the synonyms found, for each of the tokens, on the left you can see how the original query without expansion would look, and on the right the same query with the synonym expansion applied

Query without synonym expansion
and
  term	cancer
  term	not
  term	headache
Query with synonym expansion
and
  or
    boost
      term	cancer
      1
    boost
      term	malignancy
      0.8
    boost
      term	363346000
      0.8
    boost
      term	cancers
      0.8
    boost
      term	malignancies
      0.8
    boost
      phrase	"malignant growth"
      0.8
    boost
      phrase	"malignant neoplasm"
      0.8
    boost
      phrase	"malignant neoplasms"
      0.8
    boost
      phrase	"malignant neoplastic disease"
      0.8
    boost
      phrase	"malignant tumor"
      0.8
    boost
      phrase	"malignant tumors"
      0.8
    boost
      phrase	"neoplasm malignant"
      0.8
    boost
      phrase	"neoplasm/cancer"
      0.8
    boost
      phrase	"tumor, malignant"
      0.8
  term	not
  or
    boost
      term	headache
      1
    boost
      term	25064002
      0.8
    boost
      term	cephalalgia
      0.8
    boost
      term	cephalgia
      0.8
    boost
      term	cephalgias
      0.8
    boost
      phrase	"cranial pain"
      0.8
    boost
      phrase	"have headaches"
      0.8
    boost
      phrase	"head ache"
      0.8
    boost
      phrase	"head pain"
      0.8
    boost
      phrase	"head pain cephalgia"
      0.8
    boost
      phrase	"head pains"
      0.8
    boost
      term	headaches
      0.8
    boost
      phrase	"mild global headache"
      0.8
    boost
      phrase	"mild headache"
      0.8
    boost
      phrase	"pain head"
      0.8
    boost
      phrase	"pain in head"
      0.8
    boost
      phrase	"pain, head"
      0.8

You may have notice that in this example not, is treated as a term and not as the key word NOT, that's because of 2 reasons

  1. not in lower case is not the keyword NOT in upper case, so PyQPL will recognize it as term
  2. Sag can interpret the not as keyword only if saga_keywords are provided, for more information go to Saga Coupling