This tutorial walks through the steps necessary to crawl an Oracle RightNow CX site, using the Aspire RightNow connector.


Before Beginning: Create User Account

As stated in the [[RightNow Connector Prerequisites (Aspire 2)]|prerequisites], you have to configure an user account, with sufficient rights, to access the data contained in your Oracle RightNow CX site, through the Connect Web Services. The recommended name for this account is "aspire_crawl_account" or something similarly descriptive.

Step 1: Launch Aspire and open the Content Source Management Page


Aspire Content Source Management Page

Step 2: Install and Configure the Oracle RightNow Connector



Add new Oracle Right Now Content Source

To specify what should be crawled from your Oracle RightNow CX site, you have to create one Content Source for each object type (available in the RightNow CX platform) you want to crawl. As an example, if you want to crawl both Answers and Incidents, you will need two Content Sources, one for Answers and one for Incidents.

Step 2a: Add a Content Source

To create a new content source:

  1. From the Aspire 2 Home page, click on the "Add Source" button.
  2. Click on "Oracle RightNow".

Step 2b: Specify Basic Information



General Configuration Tab

In the "General" tab, specify basic information for the Content Source:

  1. Enter a name for the Content Source, in the "Name" field.

    You can choose any name for the Content Source, but, a descriptive name is recommended, since this will appear in the Content Source page, error messages, etc.

  2. Click on the "Active" checkbox to enable the Content Source.

    Unchecking the "Active" option allows you to configure Content Sources but not have them enabled. This may be useful in any circumstance in which you do not want the Content Source to be accessed. For example, if your Oracle RightNow site will be under maintenance and you want to avoid crawls during that period of time.

  3. Click on the "Schedule" drop-down list and select one of the following: Manually, Periodically, Daily, or Weekly.

    Aspire can automatically schedule Content Sources to be crawled on a set schedule, such as: once a day, several times a week, or even every X hours or minutes. For this tutorial, select Manually. Later (if you need to) you can set up a regular crawling schedule.

  4. After selecting a Schedule option, specify the details, if applicable:
    1. Manually: No additional options.
    2. Periodically: Specify the "Run every:" options by entering the number of "hours" and "minutes."
    3. Daily: Specify the "Start time:" by clicking on the hours and minutes drop-down lists and selecting options.
    4. Weekly: Specify the "Start time:" by clicking on the hours and minutes drop-down lists and selecting options, then clicking on the day checkboxes to specify days of the week to run the crawl.
    5. Advanced: Enter a custom CRON Expression (e.g. 0 0 0 ? * *)

Step 2b: Specify Connector Properties



Connector Properties

In the "Connector" tab you will specify the properties that will allow the RightNow Connector to access your Oracle RightNow CX site, and configure its behavior.

Starting Point

  1. In the "WSDL Location" field, enter the URL of your Connect Services (SOAP-based services). It should be formatted as https://<yourdomain>.custhelp.com/cgi-bin/<yourinterface>.cfg/services/soap or https://<yourdomain>/cgi-bin/<yourinterface>.cfg/services/soap
  2. Select the "RightNow Object Type" to be crawled from the RightNow instance. Currently there are 3 options: Answer, Incident, Other. Answer and Incident are special cases, because the connector is able to crawl limted data, not available trough the ROQL query language (see the ROQL Configuration section), specifically, Answer attachments and Incident Attachments.

    Select the Crawl Attachments option if you want the connector to try to carwl attachment data (currently only available for Answer and Incident).

  3. Set the "RightNow object URL". This URL is a template that will be used to generate links to each resource being crawled. In order to build a valid template, first locate, navigating through your RightNow Instance, the base URL for the RightNow object type that you want to crawl. Each object type has a different base URL. Then, add to the end of the base URL the wildcard string {id}, the connector will replace it with each crawled object id. For instance, the base URL for the Answer object type should look like: http://<yourdomain>.custhelp.com/app/answers/detail/a_id/, to this base URL you add the wildcard string at the end, creating a valid template: http://<yourdomain>.custhelp.com/app/answers/detail/a_id/{id}
  4. Specify the username and password of the account created on the prerequisites section.

    Note: The password will be automatically encrypted by Aspire.

ROQL Configuration

To access the data contained in a RightNow CX platform, the RightNow connector issues queries in a language called Rightnow Object Query Language (ROQL). For more information about ROQL see the "RQL" section, on the Oracle RightNow CX Developer Guide As explained in more detail below, the connector gives the you the flexibility to inlcude complete ROQL queries,thus, specifyspecifying exactly what you want to crawl. There are, however, some important limitations that you should consider:

  1. As stated previously, any data that cannot be retrieved using RQL queries (except attachments for Incidents and Answers) cannot be crawled by the out-of-the-box RightNow connector.
  2. Any RightNow object that you want to crawl, must have an Id field to uniquely identify the object, and a UpdatedTime field, which will be used by the connector when running incremental crawls. To check which objects have those fields, please refer to the "Connect Web Services" -> "Object Model" -> "Object Model Overview" -> "Primary Objects" section of the Oracle RightNow CX Developer Guide

The RightNow Connector requires, at least, two different ROQL queries: A Crawl Query and a Index Query.

Crawl Query

The Crawl Query indicates, as its name suggest, which objects are going to be "crawled". In other words, the Crawl query tells the connector which objects must be checked in the RightNow Instance, and determine if a particular object must be added, deleted or updated, so, at this point, you are not defining what data is going to be indexed from the RightNow objects, just which ojects are going to be retrieved.

When creating your Crawl query, you must comply with a set of requirements:

  1. As mentioned above, you must include the object id and updatedTime as part of the SELECT clause.
  2. You must add an ORDER BY clause, applied to the object id, to ensure that the objects (documents) are always returned in the same order, which is necessary when running incremental crawls and paging (see below).
  3. You must add a LIMIT clause and always set it to 10000. This value is a hard limit implemented by the Connect Web Services API on the maximum amount of records returned per query.
  4. You must add an OFFSET clause and append the %d string to the end of it. The %d is a wildcard that will allow the RightNow connector to paginate and therefore, workaround the 10000 records limit.
  5. Put a semicolon at the end of the query.
  6. The total length of the Crawl query string must not exceed 4000 characters (Connect Web Services API limitation).

The following is an example of a basic, well formatted Crawl Query:

SELECT Answer.id, Answer.updatedTime FROM Answer ORDER BY Answer.Id LIMIT 10000 OFFSET %d;
Index Query

The Index Query specifies what data is going to be extracted from each object retrieved by the Crawl Query. It must comply with the following requirements:

  1. You must create aliases for each field in the SELECT clause, using the keyword as. This aliases will be used as field names by Aspire.
  2. You must include the object id and object updatedTime (again!) as part of the SELECT clause. The aliases for this fields must be id and lastmodified respectively.
  3. Since the Index Query is called for each specific object, you must include a WHERE clause setting it to the wildcard string, i.e {id}. This wildcard value will be replaced internally by each RightNow Object id.
  4. Do not include any multi-valued fields. See the Index Sub-Queries section for more details.
  5. Put a semicolon at the end of the query.
  6. The total length of the Index query string must not exceed 4000 characters (Connect Web Services API limitation).

The following is an example of a basic, well formatted Index Query:

SELECT  Answer.id as id,  Answer.updatedTime as lastmodified, Answer.Name as name, Answer.Question as question, 
	Answer.Solution as solution, Answer.Summary as summary,	Answer.AssignedTo.Account.name as AssignedTo, Answer.StatusWithType.Status.name as type,
	Answer.StatusWithType.StatusType.name as statustype, Answer.URL as url, Answer.Keywords as keywords 
        FROM Answer WHERE id={id};
Index Sub-Queries

RQOL allows to query for fields that are multi-valued, i.e. list of values. For example, the field Answer.AccessLevels.namedidlist.name corresponds to a list of the Answer access levels names. If this type of fields are included in the Index Query, data duplication may occur. To avoid this issue, you must create a Sub-Query for each multi-valued field.

Sub-Queries must comply with the following requirements:

  1. Only one field in the SELECT clause per Sub-Query.
  2. You must include a WHERE clause setting it to the wildcard string, i.e {id}. This wildcard value will be replaced internally by the RightNow Object id.
  3. Put a semicolon at the end of the query.

An example of a well-formatted Sub-Query:

SELECT Answer.AccessLevels.namedidlist.name as accessLevels FROM Answer WHERE id={id};

Step 2c: Advanced Connector Properties

For this tutorial, you can leave every option as it is set by default. There is, however, one important consideration: If you are indexing objects of type Answer or Incident you must leave unchecked the "Disable text extraction" option as it is necessary to process the contents of attachments and other data. If you are indexing any other type of object, you must leave checked the "Disable text extraction" option,otherwise, errors may occur.

Step 3: Specify Workflow Information



Workflow Configuration Tab

In the "Workflow" tab, specify the workflow steps for the jobs that come out of the crawl. Drag and drop rules to determine which steps should an item follow after being crawled. This rules could be where to publish the document or transformations needed on the data before sending it to a search engine. See Workflow for more information.

  1. For the purpose of this tutorial, drag and drop the Publish To File rule found under the Publishers tab to the onPublish Workflow tree.
    1. Specify a Name and Description for the Publisher.
    2. Click Add.

After completing this steps click on the Save button and you'll be sent back to the Home Page.


Step 4: Initiate the Full Crawl

Now that everything is set up, actually initiating the crawl is easy.

Now that the content source is set up, the crawl can be initiated.

  1. Click on the crawl type option to set it as "Full" (is set as "Incremental" by default and the first time it'll work like a full crawl. After the first crawl, set it to "Incremental" to crawl for any changes done in the RightNow Instance).
  2. Click on the Start button.

Note that content sources will be automatically initiated by the scheduler based on the schedule you specified for the content source, be it once a day, once a week, every hour, etc. But you can always start a crawl at any time by clicking on the "Full" button.

Be aware that Aspire will never initiate multiple simultaneous crawls on the same content source. Of course, multiple jobs may be crawling different content sources at the same time.

This means that you can click on "Full" or "Update" and not have to worry about the scheduler perhaps scheduling a crawl on the same content source. The scheduler will always check to see if the content source is actively crawling before starting its own crawl of that same content source.

During the Crawl

During the crawl, you can do the following:

  • Click on the "Refresh" button on the Content Sources page to view the latest status of the crawl.

    The status will show RUNNING while the crawl is going, and CRAWLED when it is finished.

  • Click on "Complete" to view the number of documents crawled so far, the number of documents submitted, and the number of documents with errors.

If there are errors, you will get a clickable "Error" flag that will take you to a detailed error message page.


Step 5: Initiate an Incremental Crawl

If you only want to process content updates from the RightNow Instance (objects which are added, modified, or removed), then select the "Incremental" option instead of the "Full" option. The RightNow Connector will automatically identify only changes which have occurred since the last crawl.

If this is the first time that the connector has crawled, the action of the "Incremental" option will perform the same action as a "Full" crawl, crawling everything specified by the Crawl query. Thereafter, the "Incremental" option will only look for updates to the objects that were previously crawled.

Scheduled crawls are always "Incremental" crawls. This means that the you may need to manually perform a "Full" crawl initially before using scheduled jobs after that to perform "update" crawls.

Statistics are reset for every crawl.


Step 6 (Optional): Initiate a Test Crawl

If you want to test your configuration, you may run a "Test" crawl. When running a test crawl, you can specify the amount of documents you want to crawl and the amount of documents you want to skip. There are two things to consider if you want to run a test crawl:

  1. If you enabled the "Crawl Attachments" option (If you are crawling Answers or Incidents), every attachment count as a document by itself.
  2. Always set the amount of items to skip to the amount you really want plus one. This is necessary because the connector considers the Crawl query as the root document, but, in this case it should be omitted.

For instance, if you want to skip 0 items, you should set the skip value to 1. If you want to skip 10 items, you should set the skip value to 11, and so on.


  • No labels