Document Queues and Metadata

processQueue

Manages the items that needs to be processed by the workflow, these items may or may not be sent to scanned.

Field Name	Example	Description
_id	C:\test-folder\folderA\testDocument.txt	The unique id of the document
metadata	[depends on each connector]	The necessary metadata fields the connector needs to fetch or populate this document
type	[depends on each connector]	The serialized version of the ItemType of the document
status	C, P or A	The document processing status: C: Completed, means it have been already processed P: in Progress, means it is currently been processed A: Available, means it is available for been processed
action	add, update, delete	The action to be performed to the search engine for the document
timestamp	1465334398471	The time-stamp when this document was added to the queue
signature	CBEC1210FE2D51A8166C3E70D38F8A07	An MD5 signature, when a document changes this signature should also change
parentId	C:\test-folder\folderA	The id of the parent document, in other words the document that scanned the current document
processor	File_System-192.168.1.15:50505	The identifier of the Aspire server that processed or is processing the current document
shouldScan	false	Determines whether or not this document should be considered for scanning
shouldProcess	true	Determines whether or not this document should be considered for being processed by the workflow
retries	0	The number of times this document has been retried
name	testDocument.txt	The name of this document
isCrawlRootItem	false	Indicates if this is one of the root crawl items (for internal control)
hierarchyId	C:\test-folder\folderA\testDocument.txt	Unique Id for using to generate the hierarchy for this document, it may be different from the _id field

Example:

{
    "_id" : "C:\\test-folder\\folderA\\testDocument.txt",
    "metadata" : {
        "fetchUrl" : "file://C:/test-folder/folderA/testDocument.txt",
        "url" : "file://C:/test-folder/folderA/testDocument.txt"
    },
    "type" : "vtwqabl6oiadwy3pnuxhgzlbojrwq5dfmnug433mn5twszltfzqxg4djojss4y3pnvyg63tfnz2hglsgnfwgk43zon2gk3kjorsw2vdzobsqaaaaaaaaaaaaciaaa6dsaahguylwmexgyylom4xek3tvnuaaaaaaaaaaaaasaaahq4duaacgm2lmmu",
    "status" : "C",
    "action" : "add",
    "timestamp" : NumberLong(1465334398471),
    "signature" : "CBEC1210FE2D51A8166C3E70D38F8A07",
    "parentId" : "C:\\test-folder\\folderA",
    "processor" : "File_System_Source-192.168.56.1:50505",
    "shouldScan" : false,
    "shouldProcess" : true,
    "retries" : 0,
    "name" : "0.txt",
    "isCrawlRootItem" : false,
    "hiearchyId" : "C:\\test-folder\\folderA\\testDocument.txt"
}

scanQueue

Manages the items that needs to be scanned by the connector, these items may or may not be have been sent to process previously.

Field Name	Example	Description
_id	C:\test-folder\folderA	The unique id of the document
metadata	[depends on each connector]	The necessary metadata fields the connector needs to fetch or populate this document
type	[depends on each connector]	The serialized version of the ItemType of the document
status	C, P or A	The document processing status: C: Completed, means it have been already processed P: in Progress, means it is currently been processed A: Available, means it is available for been processed
action	add, update, delete	The action to be performed to the search engine for the document
timestamp	1465334398471	The time-stamp when this document was added to the queue
signature	CBEC1210FE2D51A8166C3E70D38F8A07	An MD5 signature, when a document changes this signature should also change
parentId	C:\test-folder	The id of the parent document, in other words the document that scanned the current document
processor	File_System-192.168.1.15:50505	The identifier of the Aspire server that processed or is processing the current document
shouldScan	false	Determines whether or not this document should be considered for scanning
shouldProcess	true	Determines whether or not this document was considered for being processed by the workflow
retries	0	The number of times this document has been retried
name	folderA	The name of this document
isCrawlRootItem	false	Indicates if this is one of the root crawl items (for internal control)
hierarchyId	C:\test-folder\folderA\testDocument.txt	Unique Id for using to generate the hierarchy for this document, it may be different from the _id field

Example:

{
    "_id" : "C:\\test-folder\\folderA",
    "metadata" : {
        "fetchUrl" : "file://C:/test-folder/folderA",
        "url" : "file://C:/test-folder/folderA",
        "displayUrl" : "C:\\test-folder\\folderA",
        "lastModified" : "2016-02-23T17:08:55Z",
        "dataSize" : 0,
        "acls" : null
    },
    "type" : "vtwqabl6oiadwy3pnuxhgzlbojrwq5dfmnug433mn5twszltfzqxg4djojss4y3pnvyg63tfnz2hglsgnfwgk43zon2gk3kjorsw2vdzobsqaaaaaaaaaaaaciaaa6dsaahguylwmexgyylom4xek3tvnuaaaaaaaaaaaaasaaahq4duaadgm33mmrsxe",
    "status" : "C",
    "action" : "add",
    "timestamp" : NumberLong(1465334398103),
    "signature" : "CD2C65824E45BFE94C71970EEEA18A8C",
    "parentId" : "C:\\test-folder",
    "processor" : "File_System_Source-192.168.56.1:50505",
    "shouldScan" : true,
    "shouldProcess" : true,
    "retries" : 0,
    "name" : "folderA",
    "isCrawlRootItem" : false,
    "hiearchyId" : "C:\\test-folder\\folderA"
}

hierarchy

Holds the hierarchy information about every single parent document scanned by the connector, each parent contains the information about all its parents all the way up to the root document.

Field Name	Example	Description
_id	C:\test-folder\folderA	Unique id of the parent document
name	folderA	Name to be used in the hierarchy metadata
ancestors	[parent hierarchy info]	Holds the same information but for the parent of document, or null if this is a root document

Example:

{
    "_id" : "C:\\test-folder\\folderA",
    "name" : "folderA",
    "ancestors" : {
        "_id" : "C:\\test-folder",
        "name" : "test-folder",
        "ancestors" : null
    }
}

Statistics and Logging

audit
errors
statistics

Controlling and Incremental

status
snapshots

Page tree

MongoDB Collections Description

Document Queues and Metadata

processQueue

scanQueue

hierarchy

Statistics and Logging

audit

errors

statistics

Controlling and Incremental

status

snapshots