If you want to crawl a web site website with Aspider, you need to make sure you have the following points covered:.
...
...
...
There are two types of authentication: the HTTP cookie-based authentication and the cookie based authentication.
...
HTTP-based authentication.
...
The first thing you should do is executing a curl over the seed URL:
Code Block |
---|
$ curl -i http://mysitehost/mysite |
...
...
...
...
...
...
The first thing you should do is execute a curl over the seed URL:
Code Block |
---|
$ curl -i http://mysitehost/mysite |
...
If your site requires this kind of authentication, then you need to know certain details about it in order to configure aspider Aspider to crawl it:.
The login form page
, thisis
the placewhere the
site redirects you whenlogin form is located, some sites redirect you here if you are not authenticated
. If you executed, in that case, if you execute the curl command from above, it will most likely be the URL specified in the "Location" response Header.
Code Block | ||
---|---|---|
| ||
<html>
<head>....</head>
<body>
<div ....>
|
...
<form method="post" action="...." ... ... </form> </div> </body> </html> |
Your path would be /html/body/div/form
b. Also identify the
...
name attribute of the user and password input elements.
Aspider supports the following versions of SSL:
Note: SSLV2 and SSLv3 are not supported.
If your site requires this type of authentication, then you need to determine which authentication scheme to use.
If you executed the curl command from above, you can determine the authentication scheme by looking at the "WWW-Authenticate" headers.
Aspider support supports the following schemes:
Note |
---|
Some sites have two "WWW-Authenticate" headers in their response. The first one they correspond corresponds to the preferred schema and the second one is used for fail-over, this . This is done because some browsers don't support the preferred authentication scheme. As For Aspider is concerned , you can use either one of the two mechanisms if they it are one of the supported schemas a supported schema (mentioned above). |
Info |
---|
Some schemas require a realm to work, if . If you see any a realm inside of the response headers, then use that in the configuration. |
...
The command will prompt you for the password. If the response header come displays as shown below you have the correct credentials.
...
If you want to use the Negotiate/Kerberos authentication scheme, then you need to find out the "Key Distribution Center" (KDC) which . This is a service that supplies session tickets and temporary session keys to users and computers within an Active Directory domain. If you don't know your KDC address, do as follows:
Code Block |
---|
> nltest /dsgetdc:<domain.name> |
The KDC address will appear in the first line as "DC: <the KDC address>"
Code Block |
---|
$ cat /etc/krb5.conf |
...