Aspider Introduction

Created by Andres Aguilar on Dec 27, 2016

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

The Aspider Web Crawler connector will crawl content from any given Web Site.

Features

Some of the features of the Aspider Web Crawler connector include:

HTTP Authentication
- Basic/Digest
- NTLM
- Negotiate/Kerberos
- HTML Forms (Cookie based)
Connection throttling
Incremental crawl
Ignore/Respect robots.txt and robots meta-tags
Heritrix HTML parser for link extraction
Connection proxy
Configurable User-Agent
Max Crawl Depth
Distributed Crawling
Include/Exclude patterns
HTTPS crawling

Content Retrieved

The Aspider Web Crawler connector retries several types of documents, listed bellow are some examples of documents retrieved by this crawler.

Include

HTML pages
- .html
- .aspx
- .php
- etc
scripts and stylesheets
- .js
- .css
- etc
images
- .jpg
- .gif
- .png
- etc

This crawler will retrieve any document found linked in the HTML Markup as links.

Limitations

Due to design implementation, Aspider Web Crawler has the following limitations:

Dynamic generated markup
- Any markup generated by the browser by executing a site's javascript will NOT be detected by the crawler, so dynamic links will not be discovered.

Anything we should add? Please let us know.

No labels