Page History

The biggest change in Aspire 3.1 is related to the way

the

connectors work

, they

. They now use an external database (MongoDB) to hold all of the crawling information such as document urls, status, statistics, snapshots (for incrementals), logs, etc.

The idea behind this change is to allow

This allows the connectors to work distributed from

its very

the architectural design.

Now all

All of the connectors run under the same principles, using the same logic, so that each connector is more like a Repository Access Provider

so we

. We keep them as simple as possible, rather than a complex (multi-threaded) crawling application

; so the

. The complexity of distributed crawling and multi-threading relies on the Connector Framework.

What's next?

Children Display

all	true

Responsibilities that the Connector developers

have to

implement:

Scan

the

the repository document containers to discover new documents to process
Populate

document

document metadata
Fetch

document

document content

If you want to

start creating

create your connector right away, go

to

to Write

your own

Your Own Connector

From Scratch

from Scratch

What's next?

Children Display

all	true

Responsibilities of the Connector Framework (you don't have to worry about this):

Multi-threading processing
Distribute the crawl processing
Store and fetch documents from the database.
Maintain a snapshot for incremental crawling (adding, updating or deleting documents)
Handle statistics
Start, Pause, Stop, Resume the crawl
Send the documents to the respective workflows for processing and search engine indexing

The following diagram illustrates how the Connector Framework interacts with the connector implementation in order to run a crawl:

Image Removed

If you want to learn more about the Connector Framework check out NoSQL Connector Framework Overview.

Image Added

Page tree

Versions Compared

Old Version 17

New Version Current

Key