Why does an incremental crawl last as long as a full crawl?
Some connectors perform incremental crawls based on snapshot entries, which are meant to match the exact documents that have been indexed by the connector to the search engine. On an incremental crawl, the connector fully crawls the repository the same way as a full crawl, but it only indexes the modified, new or deleted documents during that crawl.
For a discussion on crawling, see Full & Incremental Crawls.
Save your content source before creating or editing another one
Failing to save a content source before creating or editing another content source can result in an error.
Save the initial content source before creating or working on another.
My connector keeps the same status "Running" and is not doing anything
After a crawl has finished, the connector status may not be updated correctly.
To confirm this, do the following:
1. In RoboMongo, go to your connector database (like: aspire-nameOfYourConnector).
2. Open the "Status" collection and perform the following query:
3, Edit the entry and set the status to "S" (Completed).
Note: To see the full options of "Status" values, see MongoDB Collection Status.
My connector is not providing group expansion results
Make sure your connector has a manual scheduler configured for Group Expansion.
1, Go to the Aspire debug console, and look for the respective scheduler (in the fourth table: Aspire Application Scheduler).
2. If you are unsure which scheduler is for Group Expansion, you can check the Schedule Detail.
- You can identify it with the value: cacheGroups
3.To run the Group Expansion process, click Run.
(Cookie Based) I have added the "username" and "password" fields but I can't still authenticate.
Sometimes just the username and password fields are not enough to authenticate to a site. Some sites requires some custom fields or even the "submit" button in order to successfully authenticate you. So you may have to add them as custom fields in the Aspider Configuration.
You can also open the browser inspect mode in order to break down the authentication request and make sure you are not missing any field.
(Cookie Based) How to include those dynamic "hidden" elements into the Aspider authentication request?
Don't worry about those hidden fields inside the form, Aspider will automatically include them in the request, you don't have to do anything.
(Cookie Based) My initial login is successful but shortly after, Aspider can't connect to already discovered URLs.
Watch out for "logout" pages which usually send requests to the browsers to clear their cookies. If Aspider is requested to clear its cookies for logging out, it will do that and will not try to re-login.
Suggestion: Add an exclusion pattern for the "log out" pages.
- If "Scan Excluded Items" is selected, make sure the "Do not follow patterns" also contains a pattern that matches the "log out" page.
(Cookie Based) My site requires two different HTTP requests in order to authenticate, can Aspider handle that?
Unfortunately not at the moment, Aspider Cookie Based Authentication is built to send only one request, but we are already considering improvements for it.
(Cookie Based) Which versions of SSL are supported by the Aspider web crawler?
As of Java 8, Aspider supports and has been tested to work on the following protocols:
Note: SSLv2 and SSLv3 are not supported by Aspider.