Heritrix Connector (Aspire 2)

Welcome to Heritrix for Aspire! This page will be a central location for all information on crawling and processing web content using the Aspire Heritrix (web crawler) connector and associated components.

This feature is open source, subject to the Apache 2.0 license. The source for this feature can be found on GitHub

Introduction

About the Heritrix connector for Aspire, how it works, features, etc.

Prerequisites

Supported Heritrix versions, user access requirements, other requirements

Configuration Tutorial

Step by step tutorial to crawl your first web site with Heritrix

Administration FAQ

Questions and answers, including troubleshooting techniques for Administrators

Developer Information

Questions and answers, including troubleshooting techniques for Developers

Page tree

Heritrix Connector (Aspire 2)