The Selenium connector will crawl content from websites using an internet browser to retrieve the pages.
The Aspire Selenium connector requires the latest instance of the web browser to be used, and its respective web driver. The web driver only supports a range of browser versions, if the browser is outside that range the connector will throw an exception while trying to start using the browser.
Before installing the Selenium connector, make sure that:
The Selenium connector run on either Windows or Linux. The web drivers include a version appropriate for each operative system
Name | Supported |
---|---|
Content Crawling | Yes |
Identity Crawling | No |
Snapshot-based Incremental s | Yes |
Non-snapshot-based Incrementals | No |
Document Hierarchy | Yes |
Some features of the Selenium connector include:
The Selenium connector retrieves several types of documents, such as:
Name | Type | Relevant Metadata | Content Fetch and Extraction | Description |
---|---|---|---|---|
Web Page | document | HTML Meta tags, HTTP headers | Yes | Pages discovered on the target website |
Due to Selenium's own limitations, the connector doesn't support:
Due to API limitations, Selenium connector is only compatible with browsers that have a Web Driver implementation, for example:
Other features are also dependent on browser support, such as Headless Mode.