The Twitter connector will crawl content from any twitter account.

The Twitter connector is a crawler developed using the Twitter Developer Platform for tweets discovery, but relies on the Aspire 3 Connector Framework to handle connections and distributed crawls.

Panel

title	On this page

Table of Contents

Features

Some of the features of the Twitter connector include:

Authentication Authentication using twitter user, consumer key and consumer secret key

Incremental crawl

Full crawl

Content Retrieved

The Aspider Web Crawler connector retrieves several types of documentsThe Twitter connector retrieves all tweets related to the twitter user specified. Listed below are some examples of documents content information of different tweet types that can be retrieved by this crawler.

HTML pages

- html, aspx, php, etc.

Scripts and stylesheets

- js, css, etc.

Images

jpg, gif, png, etc.

Text tweet
URL links
Geo location
Hashtags
User mentions entities
Media entities
Retweet count

Info
This crawler will retrieve any document found linked in the HTML Markup as links (such as PDFs, MS Word, MS PowerPoint, etc).

Limitations

Due to the design implementation, Aspider Web Crawler Twitter connector has the following limitations:

Dynamic generated markup

Any markup generated by the browser by executing a site's javascript will NOT be detected by the crawler, so dynamic links will not be discovered.

To retrieve the tweets from Twitter the implementation is using a application only authentication using the getUserTimeline method to extract user tweets the approach is returning at maximum 3200 tweets for reference see https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline.html

Anything we should add? Please let us know.

Page tree

Versions Compared

Old Version 1

New Version 2

Key

Features

Content Retrieved

Limitations

Page tree

Page History

Versions Compared

Old Version 1

New Version 2

Key

Features

Content Retrieved

Limitations