The Aspider Web Crawler connector will crawl content from any given Web Site.
Aspider is based on the Heritrix HTML parser for links discovery, but relies on the Aspire 3 Connector Framework to handle the connections and distributed crawls.
Aspider is highly configurable and behaves better for intranet crawls in comparison to the Heritrix crawler.
Some of the features of the Aspider Web Crawler connector include:
The Aspider Web Crawler connector retries several types of documents, listed bellow are some examples of documents retrieved by this crawler.
Info |
---|
This crawler will retrieve any document found linked in the HTML Markup as links (i.e. PDFs, MS Word, MS PowerPoint, etc). |
Due to design implementation, Aspider Web Crawler has the following limitations:
Anything we should add? Please let us know.