Welcome to Heritrix for Aspire! This page will be a central location for all information on crawling and processing web content using the Aspire Heritrix (web crawler) connector and associated components.
This feature is open source, subject to the Apache 2.0 license. The source for this feature can be found on GitHub
About the Heritrix connector for Aspire, how it works, features, etc.
Supported Heritrix versions, user access requirements, other requirements
Step by step tutorial to crawl your first web site with Heritrix
Questions and answers, including troubleshooting techniques for Administrators
Questions and answers, including troubleshooting techniques for Developers
Overview
Content Tools