Available from (2.1 Release) The Aspire Index Auditing is a feature aimed to help
the Administrators to administrators keep track of all content source actions and search engine indexes in order to identify possible differences and problems between both of them.
The in new additionscomponents:
- Content Source auditing
- Search Engine index auditing (publisher index dumps)
- Reconciliation file between the content source and search engine auditing files
Content
sources auditingSources Auditing
By default, every content source will log every action done for each of the documents crawled, to disable it go to the the Advanced Connector Properties in in the content source configuration and uncheck the the Enable Auditing option option.
Image Added
There are 2 types of events logged:
- Job
- Indicates an event for one single job
- Batch
- Indicates an event for a batch of jobs
The actions logged by the auditing The actions that are being logged are:
- Add:
- The document was sent to be processed to the workflow as an add
- Batch Completed:
- The processing batch finished successfully
- Batch Error:
- The processing batch finished with error
- Excluded:
- The document was excluded by index pattern configuration, no further processing was done for it
- Update:
- The document was sent to be processed to the workflow as an update
- Delete:
- The document was sent to be processed to the workflow as a delete
- No Change:
- The document was found as not changed, no further processing was done for it
ExcludedWorkflow completedThe document was excluded by index pattern configuration, no further processing was done for it
- The document finished the workflow successfully
- Workflow
error- Error:
- The document finished the workflow with an error
- Workflow
terminated- Terminated:
- The document was terminated by a workflow rule
- Crawl Begin:
- Indicates the beginning of the crawl process
- Crawl End:
- Indicates the ending of the crawl process
How to access the Auditing
Step 1
The processing batch finished successfully
The processing batch finished with errorTo see the audit logs of any crawls from the Aspire UI, open the content source statistics, and
click click View Audit Logs:
Image Removed
Image Added
Step 2
After you click
Once you clicked on View Audit Logs, the following page will be displayed:
Image Added
Step 3
Image RemovedYou can
also filter the audit logs by
action:Image Removedtype and/or action.
Image Added
By Type
Image Added
By Action
Search Engine
index auditingIndex Auditing
The auditing log files for a search engine are generated via index dumps. At the moment only Only these publishers are able to create index dumps:
To dump an index and compare it you have to:
Go to any auditing page of a content source, and click on Index Compare, select - Select the publisher (should be configured to create index dumps) and click
on - on Compare to:
- Go to any auditing page of a content source, and click on Image Added,
Image Added
- It will open a pop-up window for generating both index dumps and reconciliation files
Select New Index Dump and then click on to - to start an index dump. Take into account that the index dump will be executed
agains - against the documents indexed by this content source only.
Image Added
- You will see the dump state getting refresh by clicking on
the button- button, and once it finish you will see it in the dumps dropdown list:
- Select the index dump you have just created (notice the button action changes when you select an index dump) and click
on - on Start Comparison
- When the comparison finishes you are going to be able to see the reconciliation file:
Auditing files
All publisher and content source audit files are located at {aspire-distribution-home}/audit.
- The reconciliation audit files, once created, will be located under the folder named: {content-source-name}_{publisher-name}_diff
- Each content source, publisher, or reconciliation folder under the audit folder can contain multiple audit files each identified by a timestamp:
Image Removed
- For content sources audit files, the timestamp is the crawl start time.
- For publishers and reconciliation the timestamp is the time of the audit file creation.
Image Added