...
Anchor |
---|
| Configure Storage Unit |
---|
| Configure Storage Unit |
---|
|
Configure Storage Unit
Set Content Processing Modules
A storage unit has a set of events that are triggered during different actions performed on the storage unit. There are two different types of events: per document events and general events.
...
- PreAdd: this event is triggered before the a content record is stored (added or updated) in the storage unit.
- PostAddProcess: this event is triggered after the a content record is stored (added or updated) in the storage unit.
- PreDelete: this event is triggered before the a content record is deleted from the storage unit.
- PostDelete: this event is triggered after the a content record is deleted from the storage unit.
- Fetch: this event is triggered after the a content record is fetched from the storage unit.
- User Defined Document Events:
General events are triggered when specific transactions are submitted to the storage unit:
- A custom action can be sent along with a record key to trigger an event linked to the record. Content will not be modified in the database storage. Any number of processing modules can be attached to these events. This transaction can be received through execute or batch calls of the transaction API.
General events are triggered when specific transactions are submitted to the storage unit:
- BatchStart: BatchStart: this event is triggered when a new batch is created during content ingestion. When a batch is created, a batch variable is added to the execution context which can be accessed by records on per document events.
- BatchEnd: this event is triggered when a batch is completed during content ingestion.
- User Defined General Events: StartFullScan: this event is triggered when a start full scan transaction is received. This transaction can be received through A custom action can be sent without a key to indicate a general event. Any number of processing modules can be attached to these events. This kind of transaction can be received through execute or batch calls of the transaction APIEndFullScan.
- StartFullScan (sent by Aspire publisher): this event is triggered when a start full scan transaction is received.
- EndFullScan (sent by Aspire publisher): this event is triggered when an end full scan transaction is received.
This transaction can be received through execute or batch calls of the transaction API
Event Event actions are configured on external javascript JavaScript modules placed inside the processing_modules folder of the staging repository Staging Repository server. Each module can implement one or more of the event functions. The name of the function needs to be the name of the implemented event. A module can have functions of both general and per document events.
Per document events receive four receives five parameters:
- key: The id of the record.
- content: A Javascript JavaScript Object with the content of the scope of the record that is being processed.
- context: A Javascript JavaScript Object with configuration variables and access to utility/API functions.
- settings: List of A JavaScript Object with available configuration properties for the module .(from the content processing modules configuration).
- callback: Function to call when the execution has completed. Function parameters: error (pass null if no error occurred), content, context.
Code Block |
---|
|
/**
* Creates a new content object with only url, acls and content fields, if they exist.
*/
exports.Process = function (key, content, context, settings, callback) {
var newContent = {}
|
Code Block |
---|
|
exports.PostAdd = function(key, content, context, settings){
initialize(settings);
if (context.isBatch !== undefined && context.isBatch === true){
if (content.url) {
newContent.url = content.url
}
if context.batchArray.push({ index: { _index: index, _type: type, _id: key }});
context.batchArray.push(content);
}
return content;
} else {
client.index({
index: index,
type: type,
id: key,
body: content
}, function (err, resp) {
if (err) console.log('Error publishing' + JSON.stringify(err));
if (resp) console.log('Publish successful');(content.acls) {
newContent.acls = content.acls
}
if (content.content && content.content.$) {
newContent.content = content.content.$
}
callback(null, newContent, context)
}; |
General events receive three parameters:
- context: A JavaScript Object with configuration variables and access to utility/API functions.
- settings: A JavaScript Object with available configuration properties for the module (from the content processing modules configuration).
- callback: Function to call when the execution has completed. Function parameters: error (pass null if no error occurred), context.
Code Block |
---|
|
exports.BatchStart = function (context, settings, callback) {
context.size = 0
context.keysInRequest = []
callback(null, context)
} |
The set content processing modules API call defines the content processing pipelines and settings that will be executed per scope when an event is trigger. A default list of modules can be configured for scopes that are not explicitly defined.
Each module can define it's own collection of settings. If a module has a setting with the same name as a global setting, the global setting gets overwritten for that module only.
When executing the events for each module, these will be executed in the order in which they appear in the configuration array.
Code Block |
---|
|
{
"modules" : {
"connector": [
{
return content;
"module" : "AspireFieldMapping"
});
,
}
}; |
General events receive two parameters: context: A Javascript Object with configuration variables and access to utility functions. settings: List of available configuration properties for the module.
Code Block |
---|
|
exports.BatchStart = function(context, settings){
{
initialize(settings);
if (context.batchArray === undefined){
"settings" : {
context.batchArray =[];
}
}; |
The configure API call configures content processing modules and module settings for a storage unit. Content processing modules are configured per scope. A default list of modules can be configured for scopes that are not explicitly defined. Each module can define it's own collection of settings. When executing the events for each module, these will be executed in the order in which they appear in the configuration array.
Storage unit configuration consists of lists of modules for each scope and general settings. Each module may contain specific settings, as well.
Code Block |
---|
|
{
"modules" : {
"connector": [
{
"module" : "AspireFieldMapping" "elasticsearch-index" : "aspiredocs",
"elasticsearch-type" : "aspiredoc"
},
"module" : "ESPublisher"
},
], {
"settings"index" : [ {
"elasticsearch-index" : "aspiredocs",{
"elasticsearch-type"module" : "aspiredocFieldMapping"
},
},
{
"module" : "ESPublisher"
"settings" : {
}
],
"elasticsearch-index" : [
"researchdocs",
{
"module"elasticsearch-type" : "FieldMappingresearchdoc"
},
},
{
"module" : "ESPublisher"
"settings" : {
}
],
"elasticsearch-indexresearch" : [ "researchdocs",
{
"elasticsearch-type "module" : "researchdocNormalizeCategory"
}
]
},
"settings" : {
"moduleelasticsearch-port" : "ESPublisher"9200,
"elasticsearch-server" : }"localhost"
],
"research" : [
}
} |
This configuration defines a content processing pipeline for three different scopes of the storage unit: connector (AspireFieldMapping → ESPublisher), index (FieldMapping → ESPublisher) and research (NormalizeCategory).
Request
The storage unit PUT/POST set content processing modules request requires the name of the storage unit in the URL and receives a JSON in the body with the content processing modules configuration.
Warning: This will replace any previous stored configuration for the storage unit.
Code Block |
---|
|
POST admin/setContentProcessingModules/<storage-unit-name>
{
"module"modules" : "NormalizeCategory"{
"connector": [ }
]
},
"settings" : {
"elasticsearch-port" : 9200,
"elasticsearch-servermodule" : "localhostFieldMapping"
}
} |
Request
The storage unit PUT/POST set content processing modules request requires the name of the storage unit in the URL and receives a JSON in the body with the content processing modules configuration. This will replace any previous stored configuration for the storage unit.
Code Block |
---|
|
POST admin/setContentProcessingModules/<storage-unit-name> |
Code Block |
---|
|
{
"modules },
{
"settings" : {
"connector": [
"elasticsearch-index" : "aspiredocs",
{
"moduleelasticsearch-type" : "FieldMappingaspiredoc"
},
{},
"settingsmodule" : {"ESPublisher"
}
"elasticsearch-index" : "aspiredocs",],
...
},
"elasticsearch-typesettings" : "aspiredoc"{
"elasticsearch-port" }: 9200,
"elasticsearch-server" : "localhost"
"module" : "ESPublisher"
}
],
...
},
"settings" : {
"elasticsearch-port" : 9200,
"elasticsearch-server" : "localhost"
}
}}
} |
Response
If the operation is successful, a 200 response code and an OK message are returned.
Code Block |
---|
|
{"message":"OK"} |
Enable Content Processing
Enables or disables content processing for a storage unit.
Request
The storage unit PUT/GET/POST enable content processing request requires the name of the storage unit and a boolean value to indicate whether to enable (true) or disable (false) content processing. When a storage unit is created, content processing is enabled by default.
Enable
Code Block |
---|
|
PUT admin/enableContentProcessing/<storage-unit-name>/true |
Disable
Code Block |
---|
|
PUT admin/enableContentProcessing/<storage-unit-name>/false |
Response
If the operation is successful, a 200 response code and an OK message are returned.
Code Block |
---|
|
{"message":"OK"} |
Enable Content
...
Encryption
Enables or disables content processing encryption for a storage unit. When enabled, the JSON documents are encrypted, with the configured key manager, before being stored in the database.
Request
The storage unit PUT/GET/POST enable content processing encryption request requires the name of the storage unit and a boolean value to indicate whether to enable (true) or disable (false) content processing. When a storage unit is created, content processing is enabled by defaultencryption. By default encryption is enabled.
Enable
Code Block |
---|
|
POSTPUT admin/enableContentProcessingenableContentEncryption/<storage-unit-name>/true |
Disable
Code Block |
---|
|
POSTPUT admin/enableContentProcessingenableContentEncryption/<storage-unit-name>/false |
Response
If the operation is successful, a 200 response code and an OK message are returned.
Code Block |
---|
|
{"message":"OK"} |
Enable Content Compression
Enables or disables content compression for a storage unit. When enabled, the JSON documents are compressed before encryption happens for storage.
Request
The storage unit PUT/GET/POST enable content compression request requires the name of the storage unit and a boolean value to indicate whether to enable (true) or disable (false) content processingcompression. By default compression is disabled.
Enable
Code Block |
---|
|
POSTPUT admin/enableContentCompression/<storage-unit-name>/true |
Disable
Code Block |
---|
|
POSTPUT admin/enableContentCompression/<storage-unit-name>/false |
Response
If the operation is successful, a 200 response code and an OK message are returned.
Code Block |
---|
|
{"message":"OK"} |
Enable Reprocessing Queue
Enables or disables the background reprocessing queue of a storage unit. When enabled, automatic and manual reprocess requests will execute the Process document events configured for documents being reprocessed.
Request
The storage unit PUT/GET/POST enable reprocessing queue request requires the name of the storage unit and a boolean value to indicate whether to enable (true) or disable (false) content processingthe reprocessing queue execution. By default the reprocessing queue is enabled.
Enable
Code Block |
---|
|
POSTPUT admin/enableContentCompressionenableReprocessingQueue/<storage-unit-name>/true |
Disable
Code Block |
---|
|
POSTPUT admin/enableContentCompressionenableReprocessingQueue/<storage-unit-name>/false |
Response
If the operation is successful, a 200 response code and an OK message are returned.
...
Get the current storage unit configuration. This includes the content processing modules (settings field) and the configuration flags for encryption, compression, content processing )and reprocessing queue.
Request
The storage unit GET /POST get configuration request requires the name of the storage unit in the URL.
Code Block |
---|
|
POST GET admin/getConfiguration/<storage-unit-name> |
Response
If the operation is successful, a 200 response code and the storage unit JSON configuration.
Code Block |
---|
|
{
"settings":{
"modules" : {
"connector": [
{
"module" : "FieldMapping"
},
{
"settings" : {
"elasticsearch-index" : "aspiredocs",
"elasticsearch-type" : "aspiredoc"
},
"module" : "ESPublisher"
}
],
...
},
"settings" : {
"elasticsearch-port" : 9200,
"elasticsearch-server" : "localhost"
}
},
"contentProcessing" : true,
"compressContent" : false,
"encryptContent": true,
"reprocessingQueue" : true
} |
...