-
Created by Unknown User (nnavarro), last modified by user-1b188 on Oct 03, 2018
You are viewing an old version of this page. View the current version.
Compare with Current
View Page History
« Previous
Version 14
Next »
Aspire Index Auditing is a feature aimed to help administrators keep track of all content source actions and search engine indexes in order to identify possible differences and problems between both of them.
Index Auditing consists of three components:
- Content Source auditing
- Search Engine index auditing (publisher index dumps)
- Reconciliation file between the content source and search engine auditing files
By default, every content source will log every action done for each of the documents crawled, to disable it go to the Advanced Connector Properties in the content source configuration and uncheck the Enable Auditing option.
![](/download/attachments/707312047/AUTEnable.png?version=1&modificationDate=1464918196000&api=v2)
There are 2 types of events logged:
Job
- Indicates an event for one single job
- Batch
- Indicates an event for a batch of jobs
The actions logged by the auditing are:
Add:
- The document was sent to be processed to the workflow as an add
BatchCompleted:
- The processing batch finished successfully
BatchError:
- The processing batch finished with error
Excluded:
- The document was excluded by index pattern configuration, no further processing was done for it
Update:
- The document was sent to be processed to the workflow as an update
Delete:
- The document was sent to be processed to the workflow as a delete
NoChange:
- The document was found as not changed, no further processing was done for it
WorkflowComplete:
- The document finished the workflow successfully
WorkflowError:
- The document finished the workflow with an error
WorkflowTerminated:
- The document was terminated by a workflow rule
CrawlBegin:
- Indicates the beginning of the crawl process
CrawlEnd:
- Indicates the ending of the crawl process
How to Access the Auditing
Step 1
To see the audit logs of any crawls from the Aspire UI, open the content source statistics, and click View Audit Logs:
![](/download/attachments/707312047/STSAuditAccess.png?version=1&modificationDate=1464919733000&api=v2)
Step 2
After you click on View Audit Logs, the following page will be displayed:
![](/download/attachments/707312047/AUTPlainAuditing.png?version=1&modificationDate=1464920881000&api=v2)
Step 3
You can filter the audit logs by type and/or action.
![](/download/attachments/707312047/AUTType.png?version=1&modificationDate=1464921434000&api=v2)
By Type
![](/download/attachments/707312047/AUTAction.png?version=1&modificationDate=1464921344000&api=v2)
By Action
Search Engine Index Auditing
The auditing log files for a search engine are generated via index dumps. Only these publishers are able to create index dumps:
- Publish to Solr
See the tutorial on how to configure the index dump at Solr Tutorial.
- Publish to ElasticSearch
See the tutorial on how to configure the index dump at ElasticSearch Tutorial.
- Publish to GSA
See the tutorial on how to configure the index dump at GSA Tutorial.
To dump an index and compare it you have to:
- Select the publisher (should be configured to create index dumps) and click on Compare to:
- Go to any auditing page of a content source, and click on
,
- It will open a pop-up window for generating both index dumps and reconciliation files
- Click on Start Dump to start an index dump. Take into account that the index dump will be executed against the documents indexed by this content source only.
- You will see the dump state getting refresh by clicking on the refresh button, and once it finish you will see it in the dumps dropdown list:
- Select the index dump you have just created (notice the button action changes when you select an index dump) and click on Start Comparison
- When the comparison finishes you are going to be able to see the reconciliation file:
Auditing Files
All publisher and content source audit files are located at {aspire-distribution-home}/audit.
- The reconciliation audit files, once created, will be located under the folder named: {content-source-name}_{publisher-name}_diff
- Each content source, publisher, or reconciliation folder under the audit folder can contain multiple audit files each identified by a timestamp:
- For content sources audit files, the timestamp is the crawl start time.
- For publishers and reconciliation the timestamp is the time of the audit file creation.
![](/download/attachments/707312047/image2016-6-2%2010%3A4%3A8.png?version=1&modificationDate=1464923049000&api=v2)