FAQs

Specific

Link Extraction script example

Link Extraction

// Available variables:
// WebDriver driver, selenium driver instance, can manipulate the page at will
// List<String> discoveredUrls, insert all the URLs to process here
// ALogger logger, aspire logger, for debug purposes

// Get a list of all the <a /> elements 
driver.findElements(By.tagName("a")).each { item ->
    String link = item.getAttribute("href");
    
    if (link == null || link == "")
        link = url.getAttribute("src");
    
    
    logger.info("Current url %s, discovered %s", driver.getCurrentUrl(), link);
    
    discoveredUrls.add(link);
}

logger.info("Current url %s, discovery complete", driver.getCurrentUrl());

General

Why does an incremental crawl last as long as a full crawl?

Some connectors perform incremental crawls based on snapshot files, which are meant to match the exact documents that have been indexed by the connector to the search engine. On an incremental crawl, the connector fully crawls the file system the same way as a full crawl, but it only indexes the modified, new or deleted documents during that crawl.

For a discussion on crawling, see Full & Incremental Crawls.

Save your content source before creating or editing another one

Failing to save a content source before creating or editing another content source can result in an error.

ERROR [aspire]: Exception received attempting to get execute component command com.searchtechnologies.aspire.services.AspireException: Unable to find content source

Page tree

FAQs

Specific

Link Extraction script example

General

Why does an incremental crawl last as long as a full crawl?

Save your content source before creating or editing another one

Troubleshooting

Problem

Solution

Page tree

Selenium FAQs

FAQs

Specific

Link Extraction script example

General

Why does an incremental crawl last as long as a full crawl?

Save your content source before creating or editing another one

Troubleshooting

Problem

Solution