Page History
Per Document Event Triggers
Per document events are triggered for each content record that is added, updated, deleted or fetched.
- PreAdd: this event is triggered before the content scope is stored (added or updated) in the Storage Unit.
- Process: this event is triggered after the content scope is stored (added or updated) in the Storage Unit. This event is also triggered when a record is reprocessed (see Reprocess API).
- PreDelete: this event is triggered before the content scope is deleted from the Storage Unit.
- PostDelete: this event is triggered after the content scope is deleted from the Storage Unit.
- Fetch: this event is triggered after the content scope is fetched from the Storage Unit.
- User Defined Document Events: Users can define other types of document events by invoking a
transaction/execute
ortransaction/batch
call with a custom action (see Transaction API) and a reference to the record key.
General Event Triggers
General events are triggered directly by the Storage Unit when specific operations occur.
- BatchStart: this event is triggered when a new batch is created during content ingestion or content reprocessing. When a batch is created, a batch variable is added to the execution context which can be accessed by records on document events.
- BatchEnd: this event is triggered when a batch is completed during content ingestion or content reprocessing.
- User Defined General Events: Users can define other types of general events by invoking a
transaction/execute
ortransaction/batch
calls with a custom action (see Transaction API) and no record key.
Implementing Event Functions
The content processing JavaScript modules are placed inside the processing_modules folder of the StageR serverSTageR server. Each module can implement one or more of the event functions. The name of the function needs to be the name of the implemented event. A module can have functions of both per document and general events.
Per Document Event Functions
Per document events receive five parameters:
- key: The id of the record.
- content: A Javascript Object with the content of the scope of the record that is being processed.
- context: A Javascript Object with configuration variables and references to utility functions.
- settings: A Javascript Object with the available configuration properties for the module.
- callback: A callback function that should be called when the per document event function completes its execution. Callback parameters are: err, content, context.
Code Block | ||||
---|---|---|---|---|
| ||||
exports.Process = function(key, content, context, settings, callback){ if (context.isBatch !== undefined && context.isBatch === true) { if (content) { context.batchArray.push({index: {_index: context.elasticSearch.index, _type: context.elasticSearch.type, _id: key}}); context.batchArray.push(content); } callback(null, content, context); } else { initialize(settings, function(client, index, type) { client.index({ index: index, type: type, id: key, body: content }, function (err) { client.close(); callback(err, content, context); }); }); } }; |
General Event Functions
General events receive three parameters:
- context: A Javascript Object with configuration variables and references to utility functions.
- settings: A Javascript Object with the available configuration properties for the module.
- callback: A callback function that should be called when the general event function completes its execution. Callback return parameters are: err, context.
Code Block | ||||
---|---|---|---|---|
| ||||
exports.BatchStart = function(context, settings, callback){ initialize(settings, function(client, index, type) { context.batchArray = []; context.elasticSearch = {}; context.elasticSearch.client = client; context.elasticSearch.index = index; context.elasticSearch.type = type; callback(null, context); }); }; |
The admin/setContentProcessingModules
API call configures content processing modules and module settings for a storage unit. Content processing modules are configured per scope. A default list of modules can be configured for scopes that are not explicitly defined. Each module can define its own list of settings. When executing the events for each module, these will be executed in the order in which they appear in the configuration array.
Content processing module configuration consists of lists of modules for each scope and general settings.
Code Block | ||||
---|---|---|---|---|
| ||||
{ "modules" : { "connector": [ { "module" : "FieldMapping" }, { "settings" : { "elasticsearch-index" : "aspiredocs", "elasticsearch-type" : "aspiredoc" }, "module" : "ESPublisher" } ], "index" : [ { "module" : "FieldMapping" }, { "settings" : { "elasticsearch-index" : "researchdocs", "elasticsearch-type" : "researchdoc" }, "module" : "ESPublisher" } ], "research" : [ { "module" : "NormalizeCategory" } ] }, "settings" : { "elasticsearch-port" : 9200, "elasticsearch-server" : "localhost" } } |
In this example, the connector scope will execute events from AspireFieldMapping and ESPublisher in that order; index scope will execute FieldMapping and ESPublisher and; research scope will execute NormalizeCategory. For each event, the application will look for an implementation of that event on each content processing module, if it is available it will execute the event and move to the next module to find the same function event to execute and so on.
General events will usually be used to initialize common configuration to be used by document events, this configuration can be set in the context variable and it will be shared with all document events that belong to the same general event, for example all documents that belong to a batch will receive the same context variable that BatchStart returns, and BatchEnd will receive this same context variable with any modifications that could have been done by other events to the data/configuration of the context variable.
Foreign Key Joins
StageR provides STageR provides a content processing module ForeignKeyJoin for automatic merging of records from different storage units based on record keys. 1-to-N relations can be specified between storage unit records. If configured,ForeignKeyJoin will run on Process and Fetch events of each document for the specified scope.
The content of the record needs to define a field with the following format:
Code Block | ||||
---|---|---|---|---|
| ||||
"foreignKeys": { "FOREIGN_KEY_NAME_1": { "storageUnit": "FOREIGN_STORAGE_UNIT_NAME", "scope": "FOREIGN_SCOPE", "ids": [ "RECORD_ID_1", "RECORD_ID_2", ... "RECORD_ID_N" ] }, "FOREIGN_KEY_NAME_2": { "storageUnit": "FOREIGN_STORAGE_UNIT_NAME", "scope": "FOREIGN_SCOPE", "ids": [ "RECORD_ID_1", "RECORD_ID_2", ... "RECORD_ID_N" ] }, ... "FOREIGN_KEY_NAME_N": { "storageUnit": "FOREIGN_STORAGE_UNIT_NAME", "scope": "FOREIGN_SCOPE", "ids": [ "RECORD_ID_1", "RECORD_ID_2", ... "RECORD_ID_N" ] }, } |
Output:
Code Block | ||||
---|---|---|---|---|
| ||||
"PRIMARY_RECORD_SCOPE":{ ... ... ... FOREIGN_KEY_NAME_1:[ { RECORD_ID_1_FOREIGN_SCOPE_DATA }, { RECORD_ID_2_FOREIGN_SCOPE_DATA }, ... { RECORD_ID_N_FOREIGN_SCOPE_DATA } ] } |
Example:
- Record with foreign key references:
Code Block | ||||
---|---|---|---|---|
| ||||
"connector":{ "url": "file:///server/myfolder/file1.txt", "content":"test content", "foreignKeys": { "acls": { "storageUnit": "DocAcls", "scope": "acls", "ids": [ "1", "3", "7" ] } } } |
- Foreign Key Records:
Code Block | ||||
---|---|---|---|---|
| ||||
{key:"1", content:{acls:{"access": "allow","domain": "search","scope": "global","name": "user1","type": "user"}}}, {key:"2", content:{acls:{"access": "allow","domain": "search","scope": "global","name": "group2","type": "group"}}}, {key:"3", content:{acls:{"access": "allow","domain": "search","scope": "global","name": "group3","type": "group"}}}, {key:"4", content:{acls:{"access": "allow","domain": "search","scope": "global","name": "group4","type": "group"}}}, {key:"5", content:{acls:{"access": "allow","domain": "search","scope": "global","name": "group5","type": "group"}}}, {key:"6", content:{acls:{"access": "allow","domain": "search","scope": "global","name": "group6","type": "group"}}}, {key:"7", content:{acls:{"access": "allow","domain": "search","scope": "global","name": "group7","type": "group"}}} |
- Output:
Code Block | ||||
---|---|---|---|---|
| ||||
"connector":{ "url": "file:///server/myfolder/file1.txt", "content":"test content", "acls": [ { "access": "allow", "domain": "search", "scope": "global", "name": "user1", "type": "user" }, { "access": "allow", "domain": "search", "scope": "global", "name": "group3", "type": "group" }, { "access": "allow", "domain": "search", "scope": "global", "name": "group7", "type": "group" } ] } |
To configure the ForeignKeyJoin module, add it to the scope's content processing configuration and make sure the scope content contains the foreignKeys field.
Code Block | ||||
---|---|---|---|---|
| ||||
POST admin/setContentProcessingModules/STORAGE_UNIT { "modules": { "connector": [ { "module": "ForeignKeyJoin" }, ... ] } } |
With this configuration, foreign key merges will happen for the connector scope on any Process or Fetch event.
Publishers
ElasticSearch Publisher
StageR provides STageR provides a content processing module to publish content from a storage unit to Elasticsearch. This publisher triggers on Process and PostDelete events and will publish each content record as a new document to Elasticsearch using the record key as the id in the search engine.
Configure the Elasticsearch publisher by adding the module called HTTPESPublisher through the admin/setContentProcessingModules API call.
Code Block | ||||
---|---|---|---|---|
| ||||
POST admin/setContentProcessingModules/STORAGE_UNIT { "modules" : { "connector": [ { "settings" : { "elasticsearch-index" : "aspiredocs", "elasticsearch-type" : "aspiredoc" }, "module" : "HTTPESPublisher" } ], ... }, "settings" : { "elasticsearch-hosts" : localhost:9200" } } |
The configuration above will publish to Elasticsearch (located in localhost:9200) all documents from the connector scope to the aspiredocs index as aspiredoc document type.
Solr Publisher
StageR STageR provides a content processing module to publish content from a storage unit to Solr. This publisher triggers on Process and PostDelete events and will publish each content record as a new document to Solr using the record key as the id in the search engine.
Configure the Solr publisher by adding the module called SolrPublisher through the admin/setContentProcessingModules API call.
Code Block | ||||
---|---|---|---|---|
| ||||
POST admin/setContentProcessingModules/STORAGE_UNIT { "modules" : { "connector": [ { "settings" : { "solr-collection" : "testcollection" }, "module" : "SolrPublisher" } ], ... }, "settings" : { "solr-hosts" : localhost:8983" } } |
The configuration above will publish to Solr (located in localhost:8983) all documents from the connector scope to the testcollection index.