FAQs

Specific

Link Extraction script example

Link Extraction

// Available variables:
// ALogger logger, aspire logger, for debug purposes
// WebDriver driver, selenium driver instance, can manipulate the page at will
// List<String> discoveredUrls, insert all the URLs to process here

// Get a list of all the <a /> elements 
driver.findElements(By.tagName("a")).each { item ->
    String link = item.getAttribute("href");
    
    if (link == null || link == "")
        link = url.getAttribute("src");
    
    logger.info("Current url %s, discovered %s", driver.getCurrentUrl(), link);
    
    discoveredUrls.add(link);
}

logger.info("Current url %s, discovery complete", driver.getCurrentUrl());

Login script example

Login

// Available variables
// ALogger logger, aspire logger, for debug purposes
// WebDriver driver, selenium driver instance, can manipulate the page at will
// List<String> discoveredUrls, insert all the URLs to process here

driver.get(loginUrl); // Load the login url
WebDriverWait waitCondition = new WebDriverWait(driver, 5, 500);

// WebElement, look for certain items on the page
def loginField = waitCondition.until(ExpectedConditions.presenceOfElementLocated(By.id("login")));
def passwordField = waitCondition.until(ExpectedConditions.presenceOfElementLocated(By.id("password")));
def submitButton = waitCondition.until(ExpectedConditions.presenceOfElementLocated(By.id("submit")));

loginField.sendKeys(username); // Set the username field
passwordField.sendKeys(password); // Set the password
submitButton.click(); // Click on the login button

logger.info("Login successful on %s", loginUrl);

Session validation script example

Session validation

// Available variables
// ALogger logger, aspire logger, for debug purposes
// WebDriver driver, selenium driver instance, can manipulate the page at will

def element = null;

try {
  element = driver.findElement(By.className("user-profile"));
}
catch (NoSuchElementException nsee) {
  logger.warn(nsee, "Missing field on session validation");
}

return element != null; // If the element doesn't exist on the page, the user is not logged in anymore

General

Why does an incremental crawl last as long as a full crawl?

Some connectors perform incremental crawls based on snapshot files, which are meant to match the exact documents that have been indexed by the connector to the search engine. On an incremental crawl, the connector fully crawls the file system the same way as a full crawl, but it only indexes the modified, new or deleted documents during that crawl.

For a discussion on crawling, see Full & Incremental Crawls.

Save your content source before creating or editing another one

Failing to save a content source before creating or editing another content source can result in an error.

ERROR [aspire]: Exception received attempting to get execute component command com.searchtechnologies.aspire.services.AspireException: Unable to find content source

Page tree

FAQs

Specific

Link Extraction script example

Login script example

Session validation script example

General

Why does an incremental crawl last as long as a full crawl?

Save your content source before creating or editing another one

Troubleshooting

Problem

Solution

Page tree

Selenium FAQs

FAQs

Specific

Link Extraction script example

Login script example

Session validation script example

General

Why does an incremental crawl last as long as a full crawl?

Save your content source before creating or editing another one

Troubleshooting

Problem

Solution