You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Current »


The HTML Metadata Extractor component allows the extraction of metadata from HTML pages. These pages would most likely come from Aspider Web Crawler. The metadata extracted will be store inside the document processed.

Features

Some of the features of the Publish to HTML Metadata Extractor component include:

  • Extract metadata from documents with HTML content.
  • Support for Jsoup selectors to extract the metadata
  • No labels