You are viewing an old version of this page. View the current version.
Compare with Current
View Page History
Version 1
Next »
The Aspider Web Crawler connector will crawl content from any given Web Site.
Features
Some of the features of the Aspider Web Crawler connector include:
- HTTP Authentication
- Basic/Digest
- NTLM
- Negotiate/Kerberos
- HTML Forms (Cookie based)
- Connection throttling
- Incremental crawl
- Ignore/Respect robots.txt and robots meta-tags
- Heritrix HTML parser for link extraction
- Connection proxy
- Configurable User-Agent
- Max Crawl Depth
- Distributed Crawling
- Include/Exclude patterns
- HTTPS crawling
Content Retrieved
The Aspider Web Crawler connector retries several types of documents, listed bellow are some examples of documents retrieved by this crawler.
Include
- HTML pages
- scripts and stylesheets
- images
Limitations
Due to design implementation, Aspider Web Crawler has the following limitations:
- Dynamic generated markup
- Any markup generated by the browser by executing a site's javascript will NOT be detected by the crawler, so dynamic links will not be discovered.