Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Aspire 3.3 (the first Aspire release as part of Accenture!) includes the successful integration of Aspire into the Hadoop ecosystem as a Cloudera parcel with Accenture AIP.  However, Aspire you can still be used normally in standalone mode and also use Aspire in stand-alone mode or as a parcel inside of Cloudera.

Find more information on the Cloudera parcel configuration at Aspire Parcel and Service for Cloudera.


A logical consequence of the integration into the Hadoop ecosystem is the support for HBase for crawl metadata and statistics (previously only MongoDB was supported). This will facilitate facilitates the use of Aspire as part of Big Data solutions. You can read about a successful and

Review a  relevant use case success here. This extended Extended support necessitated some refactoring of the connector framework , and implementing along the way several improvements over the last previous Aspire version (3.1.1).

All configuration steps needed for configuring to configure HBase for crawl metadata can be found at HBase Settings.

Other new features of interest are Licensing and User Roles. User roles improve the security control by separating users into "developersDevelopers" and "administratorsAdministrators" with different roles and permissions over the Aspire configuration.

New connectors:

New publishers:

Please visit See the Aspire 3.3 Release Notes for more technical information about this release.


Migrating from Aspire 3.x

When importing a content source from 3.x into 3.3, the following error may occur and the . The content source shows may show up with a red "Failed" status:.

Code Block
Error message: Unable to start appBundle: com.searchtechnologies.aspire:app-rap-connector
Caused by: com.searchtechnologies.aspire.services.AspireException: Failed to register components from appBundle: CONTENT_SOURCE_NAME (Parent: <null>)
	at com.searchtechnologies.aspire.application.AspireApplicationImpl.registerAppBundleComponents(AspireApplicationImpl.java:945)
	at com.searchtechnologies.aspire.application.AspireApplicationImpl.registerAppBundle(AspireApplicationImpl.java:980)
	at com.searchtechnologies.aspire.application.AspireApplicationComponent.loadApplication(AspireApplicationComponent.java:696)
	at com.searchtechnologies.aspire.application.AspireApplicationComponent.loadApplication(AspireApplicationComponent.java:692)
	at com.searchtechnologies.aspire.configuration.ConfigurationManager.reloadApplication(ConfigurationManager.java:697)
	at com.searchtechnologies.aspire.configuration.ContentSourcesModule.processSyncUnitUpdate(ContentSourcesModule.java:309)
	at com.searchtechnologies.aspire.configuration.SynchronizedModule.run(SynchronizedModule.java:289)
	at java.lang.Thread.run(Thread.java:748)
Caused by: com.searchtechnologies.aspire.services.AspireException: The value ("${waitForWfApps}") of element <waitForWfApps> is improperly formatted for a boolean - must be either "true" or "false"
	at com.searchtechnologies.aspire.framework.ComponentImpl.getBooleanFromConfig(ComponentImpl.java:2634)
	at com.searchtechnologies.aspire.connector.framework.controller.CrawlControllerImpl.initialize(CrawlControllerImpl.java:260)
	at com.searchtechnologies.aspire.framework.ComponentFactoryImpl.registerComponent(ComponentFactoryImpl.java:446)
	at com.searchtechnologies.aspire.application.ComponentManagerImpl.registerComponents(ComponentManagerImpl.java:328)
	at com.searchtechnologies.aspire.application.ComponentManagerImpl.initialize(ComponentManagerImpl.java:93)
	at com.searchtechnologies.aspire.application.PipelineManagerImpl.initialize(PipelineManagerImpl.java:75)
	at com.searchtechnologies.aspire.framework.ComponentFactoryImpl.registerComponent(ComponentFactoryImpl.java:446)
	at com.searchtechnologies.aspire.application.ComponentManagerImpl.registerComponents(ComponentManagerImpl.java:328)
	at com.searchtechnologies.aspire.application.ComponentManagerImpl.initialize(ComponentManagerImpl.java:93)
	at com.searchtechnologies.aspire.framework.ComponentFactoryImpl.registerComponent(ComponentFactoryImpl.java:446)
	at com.searchtechnologies.aspire.application.AspireApplicationImpl.registerAppBundleComponents(AspireApplicationImpl.java:941)


This happens could happen because the Aspire 3.3 connectors contain several new configuration options that the "content source to import does not have." lack.  To fix this error:

  1. Click on the content source to access the Configuration page.
  2. Click Save and Done.

Aspire will generate generates the new options and save saves them into the configuration files.

MongoDB Changes

Any migration from Aspire 3.x requires a Full Crawl of all content sources since there was a major refactor on the MongoDB provider component. In specific the following tables changed:

CollectionFields in 3.xFields in 3.3Compatible
audit
  • _id (ObjectId)
  • id (String)
  • crawlStart (Int64)
  • url (String)
  • type (String)
  • action (String)
  • batch (String or Null)
  • ts (Int64)
  • _id (ObjectId)
  • id (String)
  • ts (Int64)
  • action (String)
Yes
errors
  • _id (ObjectId)
  • error (Object)
    • .@time (Int64)
    • .@crawlTime (Int64)
    • .@cs (String)
    • .@processor (String)
    • .@type (String)
    • ._$ (String)
  • _id (ObjectId)
  • message (String)
  • type (String)
  • crawlId (String)
  • time (Int64)
No
hierarchy
  • _id (String)
  • itemType (String)
  • name (String)
  • ancestors (Object or Null)
    • ._id (String)
    • .name (String)
    • .ancestors (Object or Null)
  • _id (String)
  • itemType (String)
  • name (String)
  • ancestors (Object or Null)
    • ._id (String)
    • .itemType (String)
    • .name (String)
    • .ancestors (Object or Null)
No

processQueue

and scanQueue

  • _id (String)
  • metadata (Object)
  • type (String)
  • status (String)
  • action (String)
  • timestamp (Int64)
  • signature (String)
  • processor (String)
  • shouldScan (Boolean)
  • shouldProcess (Boolean)
  • crawlRetries (Int32) *
  • name (String)
  • isCrawlRootItem (Boolean)
  • hierarchyId (String)
  • inCrawlRetries (Int32) *
  • _id (String)
  • metadata (Object)
  • url (String)
  • type (String)
  • status (String)
  • action (String)
  • timestamp (Int64)
  • signature (String)
  • processor (String)
  • shouldScan (Boolean)
  • shouldProcess (Boolean)
  • crawlRetries (Int32)
  • name (String)
  • isCrawlRootItem (Boolean)
  • hierarchyId (String)
  • inCrawlRetries (Int32)
Yes
snapshot
  • _id (String)
  • container (Boolean)
  • crawlId (Int64)
  • signature (String)
  • timestamp (Int64)
  • error (Boolean)
  • notFoundCount (int32) *
  • _id (String)
  • id (String)
  • url (String)
  • fetchUrl (String)
  • itemType (String)
  • displayUrl (String)
  • container (Boolean)
  • crawlId (Int64)
  • signature (String)
  • timestamp (String)
  • error (Boolean)
  • notFoundCount (int32)
No
statistics
  • _id (String)
  • statistics (Object)
    • .@processor (String)
    • .@server (String)
    • .@status (String)
    • .@mode (String)
    • .@startTime (Int64)
    • .@endTime (Int64)
    • .@currentTime (Int64)
    • .@cs (String)
    • .queue (Object)
      • .scan (Object)
        • .@toScan (Int32)
        • .@scanning (Int32)
        • .@scanned (Int32)
        • .@total (Int32)
      • .process (Object)
        • .@toProcess (Int32)
        • .@processing (Int32)
        • .@processed (Int32)
        • .@total (Int32)
    • .inProgress (Object)
      • .@adding (Int32)
      • .@updating (Int32)
      • .@deleting (Int32)
      • .@total (Int32)
    • .processed (Object)
      • .@added (Int32)
      • .@updated (Int32)
      • .@deleting (Int32)
      • .@unchanged (Int32)
      • .@excluded (Int32)
      • .@terminated (Int32)
      • .@errored (Int32)
      • .@total (Int32)
    • .errors (Object)
      • .@batch (Int32)
      • .@scan (Int32)
      • .@document (Int32)
      • .@total (Int32)
  • _id (String)
  • @processor (String)
  • @server (String)
  • @status (String)
  • @mode (String)
  • @startTime (Int64)
  • @endTime (Int64)
  • @currentTime (Int64)
  • @cs (String)
  • queue (Object)
    • .scan (Object)
      • .@toScan (Int32)
      • .@scanning (Int32)
      • .@scanned (Int32)
      • .@total (Int32)
    • .process (Object)
      • .@toProcess (Int32)
      • .@processing (Int32)
      • .@processed (Int32)
      • .@total (Int32)
  • inProgress (Object)
    • .@adding (Int32)
    • .@updating (Int32)
    • .@deleting (Int32)
    • .@total (Int32)
  • processed (Object)
    • .@added (Int32)
    • .@updated (Int32)
    • .@deleting (Int32)
    • .@unchanged (Int32)
    • .@excluded (Int32)
    • .@terminated (Int32)
    • .@errored (Int32)
    • .@total (Int32)
  • errors (Object)
    • .@batch (Int32)
    • .@scan (Int32)
    • .@document (Int32)
    • .@total (Int32)
No
status
  • _id (ObjectId)
  • connectorSource (Object)
  • @action (String)
  • @actionProperties (String)
  • @crawlId (String)
  • @normalizedCSName (String)
  • displayName (String)
  • @scheduler (String)
  • @scheduleId (String)
  • @jobNumber (String)
  • @sourceId (String)
  • @actionType (String)
  • @dbId (String)
  • crawlStart (Int64)
  • crawlStatus (String)
  • processDeletes (String)
  • processingDeletesStatus (String)
  • crawlEnd (Int64)
  • _id (String)
  • connectorSource (Object)
  • @action (String)
  • @actionProperties (String)
  • @crawlId (String)
  • @normalizedCSName (String)
  • displayName (String)
  • @scheduler (String)
  • @scheduleId (String)
  • @jobNumber (String)
  • @sourceId (String)
  • @actionType (String)
  • @dbId (String)
  • crawlStart (Int64)
  • crawlStatus (String)
  • processDeletes (String)
  • processingDeletesStatus (String)
  • crawlEnd (Int64)
No

*These fields have been were available since in Aspire 3.1