If you want to crawl a web site with Aspider you need to make sure you have the following points covered:
There are two types of authentication: the HTTP based authentication and the cookie based authentication.
The first thing you should do is executing a curl over the seed URL:
$ curl -i http://mysitehost/mysite
If the first line of the result is a redirection for example: "HTTP/1.1 302 Found" It means you are probably facing a cookie based authenticated site.
If the first line of the result is a "HTTP/1.1 401 Unauthorized", you are probably facing with a HTTP based authentication.
If your site requires this kind of authentication, you need to know certain details about it in order to configure aspider to crawl it:
For example if your login form page consists of the following HTML:
<html> <head>....</head> <body> <div ....> <form method="post" action="...." ... ... </form> </div> </body> </html>
Your path would be /html/body/div/form
If your site requires this type of authentication you need to determine which authentication scheme to use.
If you executed the curl command from above, you can determine the authentication scheme by looking at the "WWW-Authenticate" headers.
Aspider support the following schemes:
Some sites have two "WWW-Authenticate" headers in their response. The first one they correspond to the preferred schema and the second one is used for fail-over, this is done because some browsers don't support the preferred authentication scheme. As Aspider is concerned you can use either one of the two mechanisms if they are one of the supported schemas mentioned above.
Some schemas require a realm to work, if you see any realm inside of the response headers, use that in the configuration.