Available from (2.1 Release) The Aspire Index Auditing is a feature aimed to help the Administrators to keep track of all content source actions and search engine indexes in order to identify possible differences and problems between both of them. The Index Auditing consists in three new additions:
If you're interested in learning more, here's a recording of the Tech Talk on the Performance and Auditing Tech Talk along with the Performance and Auditing Tech Talk presentation.
By default every content source will log every action done for each of the documents crawled, to disable it go to the Advanced Properties in the content source configuration and uncheck the Enable Auditing option.
The actions that are being logged are:
The document was sent to be processed to the workflow as an add
The document was sent to be processed to the workflow as an update
The document was sent to be processed to the workflow as a delete
The document was found as not changed, no further processing was done for it
The document was excluded by index pattern configuration, no further processing was done for it
The document finished the workflow successfully
The document finished the workflow with an error
The document was terminated by a workflow rule
The processing batch finished successfully
The processing batch finished with error
To see the audit logs of any crawls from the Aspire UI, open the content source statistics, and click View Audit Logs:
Once you clicked on View Audit Logs the following page will be displayed:
The auditing log files for a search engine are generated via index dumps. At the moment only these publishers are able to create index dumps:
See tutorial on how to configure the index dump at Solr Tutorial
See tutorial on how to configure the index dump at ElasticSearch Tutorial
See tutorial on how to configure the index dump at GSA Tutorial
To dump an index and compare it you have to:
All publisher and content source audit files are located at {aspire-distribution-home}/audit.
The reconciliation audit files, once created, will be located under the folder named: {content-source-name}_{publisher-name}_diff
Each content source, publisher or reconciliation folder under the audit folder can contain multiple audit files each identified by a timestamp:
For content sources audit files, the timestamp is the crawl start time. For publishers and reconciliation the timestamp is the time of the audit file creation.