Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The API server uses a flexible model for processing scripts when a request to an endpoint is received. The model performs sequential parallel processing of scripts. That is, it performs a sequence of steps and in each of these steps, it performs a number of executions in parallel. Each of these executions can be further divided to perform a number of smaller scripts in sequence. This allows, for instance, a single call to a search engine or multiple parallel calls to one or more search engines with the results merged afterwards.

Endpoint.json processing section

The form of the processing section of the endpoint.json file is an array of arrays of processing units to execute. Entries in the outer array are executed in sequence, while those in the inner array are executed in parallel. The section can be thought of as below:

Code Block
languagejs
themeMidnightDJango
"processing": [
  ArrayOfProcessingUnitToExecuteFirst,
  ArrayOfProcessingUnitsToExecuteSecond,
  ...
]

Table of Contents

Processing Unit

Each array of processing units to execute defines a number of processing units (which may be one or more scripts) that will be executed in parallel. In the case of processing units that connect to search engines, each processing unit is expected to connect to a single search engine, and therefore specifies the engine id of the engine configuration to use. If the processing unit is not connecting to a search engine, the engine id can be omitted.

When the processing unit returns results, they are attached to a label. This allows different processing units to add results to different labels. Subsequent processing units will have access to these results (and the corresponding label) and will be able to process them further (for example merge them).

Thus each processing unit can be thought of as below:

Code Block
languagejs
themeMidnightDJango
processingUnit = {
  "label": "<label>",
  "engine": "<engine id>",
  "scripts": ArrayOfScriptsToExecute
}

Scripts

The array of scripts defines a number of scripts to execute in order. Any results will be attached to the label and available in the next script, allowing (in the case of search engine connections) a script to query the engine, and a subsequent script to perform a further query if the first returned no results. The scripts specify the name of the JavaScript file to execute and a configuration to be passed when the script is executed.

Thus, each script definition looks like:

Code Block
languagejs
themeMidnightDJango
script = {
  "script": "<JavaScript file>",
  "config": {
      .
      .
      .
  }
}

Processing Unit Scripts

Processing unit scripts must conform to a particular format shown below:

Code Block
languagejs
themeMidnightDJango
module.exports = function(config, scriptConfig) {

    //**
 * Copyright Search Technologies 2016
 * Processing unit script template
 */
'use strict';

/**
 * Add any requires/constants here
 */

/**
 * @param config the main (SEIA server) configuration
 * @param scriptConfig the script configuration from the
 */
module.exports = function(config, scriptConfig) {
  /**
   * Any instance initialisation here
   */

  /**
   * @param engine the engine instance (if an engine was configured)
   * @param request the body of the incoming request to the endpoint
   * @param pResults the results from the previous parallel executions
   * @param qpl qpl from the previous script execution
   * @param results results from the previous script execution
   * @param callback the callback function to return results. The callback is of the form
   *        callback(err, qpl, results)
   * @param groups the groups from group expansion (if configured)
   */
  this.process = function(engine, request, pResults, qpl, results, callback, groups) {
    // Perform some processing
    ....


    // Perform the query at the engine, passing the callback so the results can be returned to the user
    engine.execute(request, function(err, results){
      callback(err, qpl, results);
    });
  }
};

When the SEIA server is started up, an instance of script is created. The instantiation is passed to the configuration of the SEIA server and the configuration defined in the script block of the processing unit. At this time, any one-off initialization my be performed.

  • When the endpoint is called, the process method of the script instance is called, passing the engine (if configured), the request passed to the endpoint, and the qpl and results from previous script executions.
  • If group expansion is configured, the groups returned are passed in.
  • A callback function is also supplied.
    • This callback must be called when the script completes.
    • The callback allows errors, the qpl, and the results from the script execution to be passed out to subsequent scripts.

Simple (Single call) Example

endpoint.json

The shipped configuration for the search endpoint is configured as below:

* Main function required by the system to process with this endpoint
     * @param payload
     * @param opts
     * @param callback
     */
    this.process = (payload, opts, callback) => {
    // Perform some processing
    ....


    // Perform the query at the engine, passing the callback so the results can be returned to the user
     opts.engine.execute(payload, (err, results) => {
         callback(err, opts.qpl, results)
     })
  }
};


When the Enterprise Search server is started up, an instance of script is created. The instantiation is passed to the configuration of the Enterprise Search server and the configuration defined in the script block of the processing unit. At this time, any one-off initialization my be performed.

  • When the endpoint is called, the process method of the script instance is called, the payload passed to the endpointl, which also holds the qpl.
  • The opts which holds engine (if configured), the groups (If group expansion is configured)and results from previous script executions.
  • A callback function is also supplied.
    • This callback must be called when the script completes.
    • The callback allows errors, the qpl, and the results from the script execution to be passed out to subsequent scripts.

Simple (Single call) Example

endpoint.json

The shipped configuration for the search endpoint is configured as below:

Code Block
languagejs
themeDJango
"processing": [
  [
    {
      "label": "search",
      "engine": "elastic",
      "scripts": [
        {
          "script": "scripts/search",
          "config": {
            "compositeSearchFields": {
              "grank1": 4.0,
              "grank2": 2.0,
              "content": 1.0,
              "title": 1.0,
              "url": 1.0,
              "description": 1.0
            },
            "security": {
              "aclField": "acls"
Code Block
languagejs
themeMidnight
"processing": [
  [
    {
      "label": "search",
      "engine": "elastic",
      "scripts": [
        {
       },
   "script": "scripts/search",
          "config"parsed_options: {
              "expansionextendedOperators" : {true,
              "thesauruscustomOperators": "thesaurus"true,
              "stripPunctwildcards": false,true
              "reload": 600000
  }
          },
            "compositeSearchFields": {}
      ]
    }
    "grank1": 4.0,
              "grank2": 2.0,
              "content": 1.0,
              "title": 1.0,]
]

This causes a single call to the search.js script when the endpoint is called. The results are written under the search label.

Script

The shipped script for the search endpoint is shown below:

Code Block
languagejs
themeDJango
const Qpl = require('qpl').Core
const Parser = require('qpl').Parser
const TokenList = require('qpl').TokenList
const FTT = require('qpl').FTT

const isNil = require('lodash.isnil')

module.exports = function(config, scriptConfig) {
    
    this.process = (payload, opts, callback) => {
        const qry     "url": 1.0,= payload.q;
        
      "description": 1.0
     __logger.debug('Processing: ' + qry);
        },
        if (opts.groups   "security": {!== undefined)
            __logger.debug('Groups: ' "aclField": "acls"+ opts.groups);
        
    }
    // Handle an 'everything' query (including Solr }style)
         }
      ]
if ((Qpl.isEmpty(qry) || qry === '*' || qry === '*:*')) {
      }
  ]
]

This causes a single call to the search.js script when the endpoint is called. The results are written under the search label.

Script

The shipped script for the search endpoint is shown below:

Code Block
languagejs
themeMidnight
/**
 * Copyright Search Technologies// 2016
 */
'use strict';

var Qpl = require('qpl').Core,
    logger = require('../../../app/utilities/log.server.utilities').get('server');

module.exports = function(config, scriptConfig) {
  var thesaurus = require('qpl').MongoThesaurus.load(config.db.uri, scriptConfig.expansion.stripPunct, scriptConfig.expansion.thesaurus, scriptConfig.expansion.reload);

  this.process = function(engine, request, pResults, qpl, results, callback, groups) {
    var qry = request.q;
    logger.debug('Processing: ' + qry);
    if (groups !== undefined)
      logger.debug('Groups: ' + groups);

    // Handle an 'everything' query (including Solr style)
    if (Qpl.isEmpty(qry) || qry === '*' || qry === '*:*') {
      // Just query a wildcard of everything
      request.qpl = Qpl.wildcard('*');
    }
    else {
      // A real query - tokenise it
      var tokens = Qpl.tokenize(qry);

      // Use the thesaurus
      var singleExpansions = thesaurus.expandSingle(tokens);
      var multipleExpansions = thesaurus.expandMultiple(tokens);

      // And create a composite
      var fields = scriptConfig.compositeSearchFields;
      request.qpl = Qpl.or([
        Qpl.and(Qpl.compositeOr(fields, singleExpansions)),Just query a wildcard of everything
            if (isNil(payload.qpl)) {
                payload.qpl = Qpl.wildcard('*')
            }
        } else {
            const fields = scriptConfig.compositeSearchFields
            const poptions = scriptConfig.parsed_options
            
            poptions.tokenizer = {
                type: (field) => {
                    if (Qpl.isEmpty(field)) { return FTT.FieldType.STRING }
            
                    const ft = _fields[field]
            
                    if (isNil(ft)) { return FTT.FieldType.UNKNOWN }
            
                    return FTT.validType(ft) ? ft : FTT.FieldType.UNKNOWN
        Qpl.and(Qpl.compositeOr(fields, multipleExpansions), 5)
      ]);
},
        }

       // Add the security if required
    iftokenize: (groupsstr, field) !==> undefined){
      request.securityFilter = Qpl.securityFilter(scriptConfig.security.aclField, groups);

    // Perform the query at the engine, passing the callback so the results can be returned to the user
 return TokenList.tokenizeOnWhitespace(str).getStrings()
               engine.execute(request, function(err, results){ },
      callback(err, qpl, results);

        });
  }
};

The script requires QPL and, on initialization, loads a MongoThesaurus from QPL, using the Mongo URI passed in from the SEIA server configuration and the thesaurus parameters passed in from the script configuration.

When the endpoint is called, the process() function is called. This receives a reference to the engine, the body of the request to the endpoint, the qpl and results (both undefined) from previous scripts and the call. The query is extracted from the request, the desired QPL built and then the request is passed to the engine (via the execute() function). Once the engine completes, the callback() function is used to pass the results back to the caller.

Federation (Parallel & Sequential Call) Example

endpoint.json

An endpoint for federation might be configured as below:

Code Block
languagejs
themeMidnight
{
  "endpoint": "federated",
  "enabled": true,
  "description": "Default federated search endpoint",
  "groupExpansion": {
    "enabled": false,
    "url": "http://localhost:50505/groupExpansion?username=${user}&json=1"
  },
  "processing": [
    [
      // Execute the searches in parallel
      {
        "label": "search",
        "engine": "elastic",
        "scripts": [
          {
            "script": "../search/scripts/search",
            "config": {
              "expansion" : {
                "thesaurus": "thesaurus",
                "stripPunct": false,
                "reload": 600000
              },
              "compositeSearchFields": {
                "grank1": 4.0,
                "grank2": 2.0,
                "content": 1.0,
                "title": 1.0,
                "url": 1.0,
                "description": 1.0
              },
              "security": {
                "aclField": "acls"
              }
            }
          }
        ]
      },
      {
        "label": "search1",
        "engine": "elastic1",
        "scripts": [
          {
            "script": "../search/scripts/search",
            "config": {
              "expansion" : {
                "thesaurus": "thesaurus",
                "stripPunct": false,
                "reload": 600000
              },
              "compositeSearchFields": {
                "grank1": 4.0,
                "grank2": 2.0,
                "content": 1.0,
                "title": 1.0,
                "url": 1.0,
                "description": 1.0
              },
              "security": {
                "aclField": "acls"
              }
            }
          }
        ]
      }
    ],
    [
      // Then merge the results
      {
        "label": "merged",
        "scripts": [
          {
            "script": "scripts/merge"
          }
        ]
      }
    ]
  ]
}

Located in its own directory, this configuration utilizes two (parallel) calls to two different engines (elastic and elastic1 as configured for the engine) using the search.js script from the simple example above.

  • The results are placed under two different labels (search and search1 as configured for the label).
  • Once both searches have completed, the merge.js script is executed and the results added under the merge label.

Script

An example results merging script is shown below:

Code Block
languagejs
themeMidnight
/**
 * Copyright Search Technologies 2017
 * Created by Steve Denny on 27/02/2017.
 *
 * Example federation merge script
 */
'use strict';

module.exports = function(/*config, scriptConfig*/) {

  this.process = function(engine, request, pResults, qpl, results, callback /*, groups*/) {
    console.log(JSON.stringify(pResults, null, 2));

    // Get the two sets of results from the previous queries
    var search = pResults.search;
    var search1 = pResults.search1;

    // Perform a very simple merge, interleaving the results.
    // TODO: A proper implementation should consider (at the very least) the sort order
    var merged =        valid: (field) => {
                    return true
                }
            }
            const qp = new Parser(poptions)
    
            const parsed = qp.parse(qry)
    
            payload.qpl = Qpl.compositeOr(fields, parsed)
            payload.suggest = scriptConfig.suggest
        }
    
        // Add the security if required
        if (!isNil(opts.groups) && !isNil(scriptConfig.security)) {
      offset: search.offset,
      len: Math.max(search.len, search1.len),
__logger.debug('Groups: ' + opts.groups)
           count: searchpayload.count + search1.count,
securityFilter = Qpl.securityFilter(scriptConfig.security.aclField, opts.groups)
       hits: []}
    };

    // copyPerform the hitsquery toat the mergedengine, data
passing the callback so varthe iresults = 0;
    while (merged.hits.length < merged.len) {
      if (i<search.hits.length)can be returned to the user
        merged.hits.push(search.hits[i]);engine.execute(payload, function(err, results){
      if (i<search1.hits.length)
        merged.hits.push(search1.hits[i]callback(err, qpl, results);
        i++});
    }
}

    // Remove the unmerged data
    delete pResults.search;
    delete pResults.search1;

    // Send the results back
    callback(undefined /*error*/, qpl, merged);
  }
};

The script retrieves the search results from the results from the previous script execution (pResults), via their assigned labels (search and search1).

It performs a very simple merge, interleaving the results and then returns the merged results, which are then added under the merged label (as configured in the endpoint.json)
;


The script requires QPL and, on initialization, loads a MongoThesaurus from QPL, using the Mongo URI passed in from the SEIA server configuration and the thesaurus parameters passed in from the script configuration.

When the endpoint is called, the process() function is called. This receives a reference to the engine, the body of the request to the endpoint, the qpl and results (both undefined) from previous scripts and the call. The query is extracted from the request, the desired QPL built and then the request is passed to the engine (via the execute() function). Once the engine completes, the callback() function is used to pass the results back to the caller.