The HTML Metadata Extractor component allows the extraction of metadata from HTML pages. These pages would most likely come from Aspider Web Crawler. The metadata extracted will be store inside the document processed.
Some of the features of the Publish to HTML Metadata Extractor component include:
- Extract metadata from documents with HTML content.
- Support for Jsoup selectors to extract the metadata