You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

The Twitter connector will crawl content from any twitter account. 

The Twitter connector is a crawler developed using the Twitter Developer Platform for tweets discovery, but relies on the Aspire 3 Connector Framework to handle connections and distributed crawls.



On this page



Features


Some of the features of the Twitter connector include:

  •  Authentication using twitter user, consumer key and consumer secret key
  • Incremental crawl
  • Full crawl


Content Retrieved


The Aspider Web Crawler connector retrieves several types of documents. Listed below are some examples of documents retrieved by this crawler.

  • HTML pages
    • html, aspx, php, etc.
  • Scripts and stylesheets
    • js, css, etc.
  • Images
    • jpg, gif, png, etc.


This crawler will retrieve any document found linked in the HTML Markup as links (such as PDFs, MS Word, MS PowerPoint, etc).

Limitations 


Due to the design implementation, Aspider Web Crawler has the following limitations:

  • Dynamic generated markup
    • Any markup generated by the browser by executing a site's javascript will NOT be detected by the crawler, so dynamic links will not be discovered.


Anything we should add? Please let us know.


  • No labels