Page History

The biggest change in Aspire 4.0 is related to the way the connectors work, they . They now use an external database (MongoDB, HBase, Elasticsearch) to hold all of the crawling information such as document urls, status, statistics, snapshots (for incrementals), logs, etc. The idea behind this change is to allow This allows the connectors to work distributed from its very the architectural design.

Now all All of the connectors run under the same principles, using the same logic, so that each connector is more like a Repository Access Provider so we . We keep them as simple as possible, rather than a complex (multi-threaded) crawling application; so the . The complexity of distributed crawling and multi-threading relies on the Connector Framework.

What's next?

Children Display

all	true

have to the document document start creating to your own From

The following diagram illustrates how the Connector Framework interacts with the connector implementation in order to run a crawl:

Image Removed

out

Image Added

Page tree

Versions Compared

Old Version 17

New Version Current

Key

What's next?