You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Current »

The API server uses a flexible model for processing scripts when a request to an endpoint is received. The model performs sequential parallel processing of scripts. That is, it performs a sequence of steps and in each of these steps, it performs a number of executions in parallel. Each of these executions can be further divided to perform a number of smaller scripts in sequence. This allows, for instance, a single call to a search engine or multiple parallel calls to one or more search engines with the results merged afterwards.

Endpoint.json processing section

The form of the processing section of the endpoint.json file is an array of arrays of processing units to execute. Entries in the outer array are executed in sequence, while those in the inner array are executed in parallel. The section can be thought of as below:

"processing": [
  ArrayOfProcessingUnitToExecuteFirst,
  ArrayOfProcessingUnitsToExecuteSecond,
  ...
]

Processing Unit

Each array of processing units to execute defines a number of processing units (which may be one or more scripts) that will be executed in parallel. In the case of processing units that connect to search engines, each processing unit is expected to connect to a single search engine, and therefore specifies the engine id of the engine configuration to use. If the processing unit is not connecting to a search engine, the engine id can be omitted.

When the processing unit returns results, they are attached to a label. This allows different processing units to add results to different labels. Subsequent processing units will have access to these results (and the corresponding label) and will be able to process them further (for example merge them).

Thus each processing unit can be thought of as below:

{
  "label": "<label>",
  "engine": "<engine id>",
  "scripts": ArrayOfScriptsToExecute
}

Scripts

The array of scripts defines a number of scripts to execute in order. Any results will be attached to the label and available in the next script, allowing (in the case of search engine connections) a script to query the engine, and a subsequent script to perform a further query if the first returned no results. The scripts specify the name of the JavaScript file to execute and a configuration to be passed when the script is executed.

Thus, each script definition looks like:

{
  "script": "<JavaScript file>",
  "config": {
      .
      .
      .
  }
}

Processing Unit Scripts

Processing unit scripts must conform to a particular format shown below:

module.exports = function(config, scriptConfig) {

    /**
     * Main function required by the system to process with this endpoint
     * @param payload
     * @param opts
     * @param callback
     */
    this.process = (payload, opts, callback) => {
    // Perform some processing
    ....


    // Perform the query at the engine, passing the callback so the results can be returned to the user
     opts.engine.execute(payload, (err, results) => {
         callback(err, opts.qpl, results)
     })
  }
};


When the Enterprise Search server is started up, an instance of script is created. The instantiation is passed to the configuration of the Enterprise Search server and the configuration defined in the script block of the processing unit. At this time, any one-off initialization my be performed.

  • When the endpoint is called, the process method of the script instance is called, the payload passed to the endpointl, which also holds the qpl.
  • The opts which holds engine (if configured), the groups (If group expansion is configured)and results from previous script executions.
  • A callback function is also supplied.
    • This callback must be called when the script completes.
    • The callback allows errors, the qpl, and the results from the script execution to be passed out to subsequent scripts.

Simple (Single call) Example

endpoint.json

The shipped configuration for the search endpoint is configured as below:

"processing": [
  [
    {
      "label": "search",
      "engine": "elastic",
      "scripts": [
        {
          "script": "scripts/search",
          "config": {
            "compositeSearchFields": {
              "grank1": 4.0,
              "grank2": 2.0,
              "content": 1.0,
              "title": 1.0,
              "url": 1.0,
              "description": 1.0
            },
            "security": {
              "aclField": "acls"
            },
            parsed_options: {
              "extendedOperators": true,
              "customOperators": true,
              "wildcards": true
            }
          }
        }
      ]
    }
  ]
]

This causes a single call to the search.js script when the endpoint is called. The results are written under the search label.

Script

The shipped script for the search endpoint is shown below:

const Qpl = require('qpl').Core
const Parser = require('qpl').Parser
const TokenList = require('qpl').TokenList
const FTT = require('qpl').FTT

const isNil = require('lodash.isnil')

module.exports = function(config, scriptConfig) {
    
    this.process = (payload, opts, callback) => {
        const qry = payload.q;
        
        __logger.debug('Processing: ' + qry);
        
        if (opts.groups !== undefined)
            __logger.debug('Groups: ' + opts.groups);
        
        // Handle an 'everything' query (including Solr style)
        if ((Qpl.isEmpty(qry) || qry === '*' || qry === '*:*')) {
            // Just query a wildcard of everything
            if (isNil(payload.qpl)) {
                payload.qpl = Qpl.wildcard('*')
            }
        } else {
            const fields = scriptConfig.compositeSearchFields
            const poptions = scriptConfig.parsed_options
            
            poptions.tokenizer = {
                type: (field) => {
                    if (Qpl.isEmpty(field)) { return FTT.FieldType.STRING }
            
                    const ft = _fields[field]
            
                    if (isNil(ft)) { return FTT.FieldType.UNKNOWN }
            
                    return FTT.validType(ft) ? ft : FTT.FieldType.UNKNOWN
                },
        
                tokenize: (str, field) => {
                    return TokenList.tokenizeOnWhitespace(str).getStrings()
                },
        
                valid: (field) => {
                    return true
                }
            }
            const qp = new Parser(poptions)
    
            const parsed = qp.parse(qry)
    
            payload.qpl = Qpl.compositeOr(fields, parsed)
            payload.suggest = scriptConfig.suggest
        }
    
        // Add the security if required
        if (!isNil(opts.groups) && !isNil(scriptConfig.security)) {
            __logger.debug('Groups: ' + opts.groups)
            payload.securityFilter = Qpl.securityFilter(scriptConfig.security.aclField, opts.groups)
        }
        // Perform the query at the engine, passing the callback so the results can be returned to the user
        engine.execute(payload, function(err, results){
            callback(err, qpl, results);
        });
    }
};

The script requires QPL and, on initialization, loads a MongoThesaurus from QPL, using the Mongo URI passed in from the SEIA server configuration and the thesaurus parameters passed in from the script configuration.

When the endpoint is called, the process() function is called. This receives a reference to the engine, the body of the request to the endpoint, the qpl and results (both undefined) from previous scripts and the call. The query is extracted from the request, the desired QPL built and then the request is passed to the engine (via the execute() function). Once the engine completes, the callback() function is used to pass the results back to the caller.

  • No labels