Provides the functionality to manage the staging repository storage units (create, edit, delete, report, etc).

Create Storage Unit


Creates a new storage unit in the staging repository. A storage unit

Request

On this page:

The create storage unit GET/PUT/POST request requires the storage unit name.

PUT admin/create/<storage-unit-name>

Response

If the storage unit was created successfully, a 200 response code with an OK message is returned.

{"message": "OK"}

If a storage unit with the given name already exists, a 400 response code with a STORAGE_UNIT_EXISTS message is returned.

{"message": "STORAGE_UNIT_EXISTS"}

Drop Storage Unit


Deletes a storage unit from the staging repository server. All content and transactions for the storage unit are deleted.

Request

The drop storage unit GET/DELETE/POST request requires the name of the storage unit to delete.

DELETE admin/drop/<storage-unit-name>

Response

If the storage unit was deleted successfully, a 200 response code with an OK message is returned.

 {"message": "OK"}

If the storage unit doesn't exist, a 400 response code with a STORAGE_UNIT_DOESNT_EXIST message is returned.

{"message": "STORAGE_UNIT_DOESNT_EXIST"}

List Storage Units


Lists all available storage units with the option of returning content and transaction statistics.

Request

The list storage units GET/POST request can receive a withStats parameter to return content and transaction statistics for each storage unit.

GET admin/list?withStats

Response

A 200 response code with the list of storage unit names is returned.

 [
    {"name": "Management"},
    {"name": "Management10"},
    {"name": "Management4"},
    {"name": "Management5"},
    {"name": "testStorageUnit"}
]

A 200 response code with the list of storage unit names and statistics is return when using the withStats parameter.

[
    {
        "name": "Sales",
        "records": 0,
        "transactions": 0
    },
    {
        "name": "Products",
        "records": 32,
        "transactions": 32,
        "latestTransactionTime": "2015-11-12T20:13:13.737Z",
        "latestTransactionNum": "5644f2d997cc11144081d14f"
    }
]


List Scopes


List all available scopes for a given Storage Unit.

Request

The list scopes GET request receives the Storage Unit name

GET admin/listScopes/<storage-unit-name>

Response

A 200 response code with the list of scopes for the Storage Unit is returned.

[
  "connector",
  "index"
]


Get Storage Unit Statistics


Gets the statistics for a storage unit.

Request

The get storage unit statistics GET/POST request requires the name of the storage unit.

GET admin/getStatistics/<storage-unit-name>

Response

If the storage unit exists, a 200 response code and a JSON with the storage unit statistics is returned.

{
	"name": "Management5",
	"records": 32,
	"transactions": 32,
	"latestTransactionTime": "2015-11-12T20:13:13.737Z",
	"latestTransactionNum": "5644f2d997cc11144081d14f"
}

If the storage unit doesn't exist, a 400 response code with a STORAGE_UNIT_DOESNT_EXIST message is returned.

{"message": "STORAGE_UNIT_DOESNT_EXIST"}


Dump Storage Unit


Dumps all transactions and content records of a storage unit. Dump is a two step process: create the dump file and download the dump file.

Create Dump

Request

The create dump GET request requires the name of the storage unit to dump.

GET admin/createDump/<storage-unit-name>

Response

If the storage unit exists, a 200 response code and a fileId to call the download dump request.

If the storage unit doesn't exist, a 400 response code with a STORAGE_UNIT_DOESNT_EXIST message is returned.

 {"message": "STORAGE_UNIT_DOESNT_EXIST"}

Download Dump

Request

The download dump GET request requires the fileId returned by the create dump request.

GET admin/downloadDump/<file-id>

Response

If the file id exists, but the dump file is in process of being created, a 200 response code with an IN_PROGRESS message is returned.

 

 {"message": "IN_PROGRESS"}

If the file id exists and the dump file is ready, a 200 response code and the file are returned.

 

If the file id doesn't exist, a 400 response code with an INVALID_DUMP_FILE_ID message is returned.

 {"message": "INVALID_DUMP_FILE_ID"}

Configure Storage Unit


Set Content Processing Modules

A storage unit has a set of events that are triggered during different actions performed on the storage unit. There are two different types of events: per document events and general events.

Per document events are triggered for each document when it is added, update, deleted or fetched.

  • PreAdd: this event is triggered before a content record is stored (added or updated) in the storage unit.
  • Process: this event is triggered after a content record is stored (added or updated) in the storage unit.
  • PreDelete: this event is triggered before a content record is deleted from the storage unit.
  • PostDelete: this event is triggered after a content record is deleted from the storage unit.
  • Fetch: this event is triggered after a content record is fetched from the storage unit.
  • User Defined Document Events: A custom action can be sent along with a record key to trigger an event linked to the record. Content will not be modified in the database storage. Any number of processing modules can be attached to these events. This transaction can be received through execute or batch calls of the transaction API.

General events are triggered when specific transactions are submitted to the storage unit:

  • BatchStart: this event is triggered when a new batch is created during content ingestion. When a batch is created, a batch variable is added to the execution context which can be accessed by records on per document events.
  • BatchEnd: this event is triggered when a batch is completed during content ingestion.
  • User Defined General Events: A custom action can be sent without a key to indicate a general event. Any number of processing modules can be attached to these events. This kind of transaction can be received through execute or batch calls of the transaction API.
    • StartFullScan (sent by Aspire publisher): this event is triggered when a start full scan transaction is received. 
    • EndFullScan (sent by Aspire publisher): this event is triggered when an end full scan transaction is received. 

Event actions are configured on external JavaScript modules placed inside the processing_modules folder of the Staging Repository server. Each module can implement one or more of the event functions. The name of the function needs to be the name of the implemented event. A module can have functions of both general and per document events.

Per document events receives five parameters:

  • key: The id of the record.
  • content: A JavaScript Object with the content of the scope of the record that is being processed.
  • context: A JavaScript Object with configuration variables and access to utility/API functions.
  • settings: A JavaScript Object with available configuration properties for the module (from the content processing modules configuration).
  • callback: Function to call when the execution has completed. Function parameters: error (pass null if no error occurred), content, context.
/**
 * Creates a new content object with only url, acls and content fields, if they exist.
 */
exports.Process = function (key, content, context, settings, callback) {
  var newContent = {}
  if (content.url) {
    newContent.url = content.url
  }
  if (content.acls) {
    newContent.acls = content.acls
  }
  if (content.content && content.content.$) {
    newContent.content = content.content.$
  }
  callback(null, newContent, context)
};

General events receive three parameters: 

  • context: A JavaScript Object with configuration variables and access to utility/API functions.
  • settings: A JavaScript Object with available configuration properties for the module (from the content processing modules configuration).
  • callback: Function to call when the execution has completed. Function parameters: error (pass null if no error occurred), context.
exports.BatchStart = function (context, settings, callback) {
  context.size = 0
  context.keysInRequest = []
  callback(null, context)
}

The set content processing modules API call defines the content processing pipelines and settings that will be executed per scope when an event is trigger. A default list of modules can be configured for scopes that are not explicitly defined.

Each module can define it's own collection of settings. If a module has a setting with the same name as a global setting, the global setting gets overwritten for that module only.  

When executing the events for each module, these will be executed in the order in which they appear in the configuration array.

 

 {
    "modules" : {
        "connector": [ 
            {
                "module" : "AspireFieldMapping"
            },
            {
                "settings" : {
                    "elasticsearch-index" : "aspiredocs",
                    "elasticsearch-type" : "aspiredoc"
                },
                "module" : "ESPublisher"
            }
        ], 
        "index" : [ 
            {
                "module" : "FieldMapping"
            }, 
            {
                "settings" : {
                    "elasticsearch-index" : "researchdocs",
                    "elasticsearch-type" : "researchdoc"
                },
                "module" : "ESPublisher"
            }
        ],
        "research" : [ 
            {
                "module" : "NormalizeCategory"
            }
        ]
    },
    "settings" : {
        "elasticsearch-port" : 9200,
        "elasticsearch-server" : "localhost"
    }
}

This configuration defines a content processing pipeline for three different scopes of the storage unit: connector (AspireFieldMapping → ESPublisher), index (FieldMapping → ESPublisher) and research (NormalizeCategory).

Request

The storage unit PUT/POST set content processing modules request requires the name of the storage unit in the URL and receives a JSON in the body with the content processing modules configuration.

Warning: This will replace any previous stored configuration for the storage unit.

POST admin/setContentProcessingModules/<storage-unit-name>
{
    "modules" : {
        "connector": [ 
            {
                "module" : "FieldMapping"
            },
            {
                "settings" : {
                    "elasticsearch-index" : "aspiredocs",
                    "elasticsearch-type" : "aspiredoc"
                },
                "module" : "ESPublisher"
            }
        ], 
        ...
    },
    "settings" : {
        "elasticsearch-port" : 9200,
        "elasticsearch-server" : "localhost"
    }
}

 

Response

If the operation is successful, a 200 response code and an OK message are returned.

{"message":"OK"}

Enable Content Processing

Enables or disables content processing for a storage unit.

Request

The storage unit PUT/GET/POST enable content processing request requires the name of the storage unit and a boolean value to indicate whether to enable (true) or disable (false) content processing. When a storage unit is created, content processing is enabled by default.

Enable

PUT admin/enableContentProcessing/<storage-unit-name>/true

Disable

PUT admin/enableContentProcessing/<storage-unit-name>/false

Response

If the operation is successful, a 200 response code and an OK message are returned.

{"message":"OK"}

Enable Content Encryption

Enables or disables content encryption for a storage unit. When enabled, the JSON documents are encrypted, with the configured key manager, before being stored in the database.

Request

The storage unit PUT/GET/POST enable content encryption request requires the name of the storage unit and a boolean value to indicate whether to enable (true) or disable (false) content encryption. By default encryption is enabled.

Enable

PUT admin/enableContentEncryption/<storage-unit-name>/true

Disable

PUT admin/enableContentEncryption/<storage-unit-name>/false

Response

If the operation is successful, a 200 response code and an OK message are returned.

{"message":"OK"}

Enable Content Compression

Enables or disables content compression for a storage unit. When enabled, the JSON documents are compressed before encryption happens for storage.

Request

The storage unit PUT/GET/POST enable content compression request requires the name of the storage unit and a boolean value to indicate whether to enable (true) or disable (false) content compression. By default compression is disabled.

Enable

PUT admin/enableContentCompression/<storage-unit-name>/true

Disable

PUT admin/enableContentCompression/<storage-unit-name>/false

Response

If the operation is successful, a 200 response code and an OK message are returned.

{"message":"OK"}

Enable Reprocessing Queue

Enables or disables the background reprocessing queue of a storage unit. When enabled, automatic and manual reprocess requests will execute the Process document events configured for documents being reprocessed.

Request

The storage unit PUT/GET/POST enable reprocessing queue request requires the name of the storage unit and a boolean value to indicate whether to enable (true) or disable (false) the reprocessing queue execution. By default the reprocessing queue is enabled.

Enable

PUT admin/enableReprocessingQueue/<storage-unit-name>/true

Disable

PUT admin/enableReprocessingQueue/<storage-unit-name>/false

Response

If the operation is successful, a 200 response code and an OK message are returned.

 {"message":"OK"}


Get Configuration


Get the current storage unit configuration. This includes the content processing modules (settings field) and the configuration flags for encryption, compression, content processing and reprocessing queue.

Request

The storage unit GET get configuration request requires the name of the storage unit in the URL.

 GET admin/getConfiguration/<storage-unit-name>

Response

If the operation is successful, a 200 response code and the storage unit JSON configuration.

{
    "settings":{
        "modules" : {
            "connector": [ 
                {
                    "module" : "FieldMapping"
                },
                {
                    "settings" : {
                        "elasticsearch-index" : "aspiredocs",
                        "elasticsearch-type" : "aspiredoc"
                    },
                    "module" : "ESPublisher"
                }
            ], 
            ...
        },
        "settings" : {
            "elasticsearch-port" : 9200,
            "elasticsearch-server" : "localhost"
        }
    },
    "contentProcessing" : true,
    "compressContent" : false,
	"encryptContent": true
    "reprocessingQueue" : true
}
 
  • No labels