A Manager is responsible for coordinating the execution of "jobs" from any given crawl and the crawl state, it prepares batches of jobs for eventual assigning to Worker nodes.
- There is an automatically elected main Manager who coordinates which manager will get to manage each crawl
- it also takes appropriate actions when either a Manager or Worker node is detected to be down.
A Worker is responsible for processing batch of "jobs" obtained from the Manager nodes.
- Also executes all rules inside any workflows configured for the associated job crawls.
- Fetching of content from repositories
- Content and metadata modification/extraction
- Indexing of documents with Publishers