Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

         

Info

For version 3.3, Aspire requires a license file to run.

See Aspire Licensing for information on obtaining a license.


The following are the NoSQL DB providers supported by the Aspire 3.3 release:

  • MongoDB version 3.6
  • HBase version 1.2.4

The supported version of Elasticsearch is 6.3.0

The supported version of StageR is 1.2   Note:  The latest version of Stager is v. 1.2 and it supports MongoDB v. 3.4.10 




Below you can find a list of the updates for this version.

This page maintains a list of all of the updates for version 3.3 of Aspire.

On this page:

  • New Features
  • Bug Fixes
  • Aspire Core
  • ApplicationsServicesKnown Issues Core Applications
  • UI
  • External Technical Limitations  
  • To Be Released
  • Items to Deprecate on Aspire 3.3
  • Related pages:

    AnchornewfeaturesnewfeaturesNew Features
    • Web Crawler named Aspider Connector replaces the Legacy Heritrix Connector.
    • Salesforce Connector has been refactored to include the following features:
      • Runs in the new connector framework.
      • Supports execution in a distributed environment.
      • Allows concurrent crawling of multiple endpoints.
      • Provides faster incremental crawls.
      • Uses snapshots.
    • New way to manage Failed Documents for all of the Source Connectors:
      • Allows document reprocessing that previously failed in both processing and publishing stages.
    • Avro Reader Extractor Application and Avro Publisher. 
      Status
      subtletrue
      colourGreen
      titleAlpha version
    • Parquet Extractor Application. 
      Status
      subtletrue
      colourGreen
      titleAlpha version
    • SMTP Connector. 
      Status
      subtletrue
      colourGreen
      titleAlpha version
    • HDFS Connector and Web HDFS Publisher. 
      Status
      subtletrue
      colourGreen
      titleAlpha version
    • Documentum DQL
      • Error tolerant option to index metadata when fetch document fails.
      • New RenditionType option for indexing.
    • Support of Azure Authentication on the SharePoint Online Connector.
    • New features for the SharePoint Connector (2007/2010):
      • Supports default snapshots on incremental crawls.
      • Supports crawling-specific views on lists.
    • Implemented a single security key-store throughout all of Aspire.
    • Updated SharePoint 2007/2010 Web Service Extensions.
    AnchorbugfixesbugfixesBug Fixes AnchorAspireCoreAspireCoreAspire Core
    • Incorrect Historical Statistics for a new connector after another was executed.
    • Negative DPS appeared on the Statistics.
    • Link related to Aspire Authentication was updated in the config/settings.xml file.
    • Crawl Begin/End Auditing actions displayed different elements on Auditing.
    • NPE when a crawl stopped without pausing.
    • NPE when a crawl stopped in ReleaseController after pausing.
    • UI not showing correctly the error for an invalid groovy script.
    • Content source name could be updated with blank spaces.
    • Historical Crawl Statistics for one connector appearing in another one.
    • Incorrect Logs showed in QueueLoader component.
    • Exception loading workflow.xml file when an empty custom groovy script was added.
    • Advanced scheduler option was not working.
    • Server error was encountered when displaying Audit Log.
    • Group Expansion and Advance properties check boxes were misplaced.
    • Page was stuck when adding a Custom Publisher.
    • Service start and stop buttons displayed wrong tooltips.
    • Crawl stalling in Linux after publishing end job.
    • Couldn't save content source when using multiple check box selectors with one option checked.
    • OpenDXF was not escaping characters in JSON inputs.
    • Any exception thrown produced a NPE in the app-rap-connector.
    • Failover: Dual instance full test (interrupted) - Recovery Option Full - Not all items Crawled
    • Failover: Dual instance full test (interrupted) - Recovery Option Incremental - Never ended crawling
    • Allowed laxing of deletes policy in connector framework.
    • Aspire not detecting changes in the connector settings and not asking to save them.
    • Stop crawl option not working correctly.
    • Crawl showing wrong time when crawling in Linux.
    • Statistics showing items In Progress when paused.
    • Docs not crawled were reported as Adds on Auditing.
    • Server error was encountered when displaying Audit Log.
    • Aspire.sh -create_master option not working properly on Linux.
    • Hierarchy extractor needed to have default values selected on Workflow jobs to work properly.
    • Minor improvements and fixes in Aspire UI.
    • Validations improvements and fixes for several components.
    AnchorApplicationsApplicationsApplications
    • Archive Extractor
      • Incremental on archives files was not working using the Lotus connector.
      • Nested archive threw an "Archive not recognized" error.
    AnchorConnectorsConnectorsConnectors
    • CIFS
      • Malformed URL was not being validated.
      • Removed slash character at the end of name attribute on Hierarchy.
    • Confluence
      • Name attribute for the level 1 hierarchy showed the name of the content source.
    • Documentum DQL
      • Fixed NPE when running an incremental crawl.
      • DisplayUrl field was not separating webtop from document id
      • Include/Exclude fields appeared as part of the configurations.
    • eRoom
      • Updates and Deletes were not picked up by Incremental crawl over certain items (Comments and Votes for polls.)
      • UI validation when using wrong URL.
      • No error was reported when setting a wrong username/password.
    • FileSystem
      • Improved the wording in some tooltips.
    • Heritrix
      • Implemented deletes handling feature from Heritrix in the connector framework.
    • Jive
      • Changes on Document ACL's were reflected incorrectly for both Activity Incremental and Normal Incremental.
      • Non-text Document filtering reported Add instead of Update for the documents filtered.
      • Page Size value was not using the UI parameter.
    • Lotus
      • Exclude pattern was not working as expected for items that were not attachments.
      • Incremental on archive files was not working.
      • Incremental crawl with index containers was not working.
      • No error showed if the database and view were the same.
    • RDB Snapshot
      • Crawl was not finishing with the Use Slices option and set bad Extract SQL.
      • No error was reported when setting a wrong ACL SQL.
      • Wrong sql statement in Full crawl was not showing errors.
    • RDB Tables
      • Action column was ignored for the incremental crawl.
    • Service Now
      • Displayed incorrect URL field in Knowledge Articles (XML representation).
      • Inclusion\Exclusion pattern was not working for attachments.
      • Aspire error when two images files were attached and a full crawl was run.
    • Social Cast
      • Tag nonTextDocument was missed in the Aspire Object.
    • SharePoint 2007
      • Error on console and UI while crawling an item updated on root using both Index Containers and Scan Recursively disabled.
      • NPE processed container after changing ACL on an Incremental crawl.
      • ACLs showed the same item as group and user.
    • SharePoint 2010
      • Minor fixes to the tooltips.
    • SharePoint 2013
      • Incremental reported duplicate jobs when adding a subsite.
      • Delete job had the incorrect displayUrl and fetchUrl after renaming a file.
    • SharePoint Online
      • Adding specific site collections made incremental crawl everything.
      • Renaming an item returned an add, update and delete on the same crawl.
      • Error when crawling site URL with encoded blank spaces.
    AnchorPublishersPublishersPublishers
    • Publish to Solr
      • Deletes were not working correctly.
    AnchorServicesServicesServices
    • Add Service button was not working.
    • Azure Group Expander
      • Azure GE and SharePoint Online GE were not deleting users.
    • CEWS Listener
      • PropertyOflong and PropertyOfArrayOflong were not working.
    • Fast Content API
      • Missing validations.
    • Group Expansion Manager
      • Fixed 'Missing version number' error when service was loaded.
      • Some validations were missed.
    • LDAP Cache
      • Some validations were missed.
      • Problems with tooltips for LDAP Attribute in Cache user and Cache group options.
    AnchorExtTechLimitExtTechLimitExternal Technical Limitations  
    • Zip files are not crawled with the Activity Incrementals when they are created inside Jive Documents.
    AnchorToReleaseToReleaseTo Be Released
    • Amazon S3
    • Box
    • CEWS Listener
    • FTP
    • GSA Publisher
    • IBM Connections
    • PST Extractor
    • Publish to HDFS
    • Publish to SharePoint 2013
    • Publish to SharePoint 2013 (Install & Setup)
    • Salesforce
    • Subversion
    • Teamforge
    AnchorItemDeprecateItemDeprecateItems to Deprecate on Aspire 3.3

    The following items are marked to be deprecated on the next Aspire version: 

    • Elasticsearch bootloader
      • aspire-elastic-bootloader
    • DCM
      • aspire-dcm-enterprise
      • aspire-amazonec2-dm
      • aspire-zk-dm
    • The old Admin UI(s)
      • Parts of aspire-application
    • Big Data
      • app-semantic-co-occurrence-hadoop
      • app-semantic-co-occurrence-hadoop-soln
      • aspire-hadoop-job-launcher
      • aspire-hadoop-hdfs
      • aspire-hadoop-wiki-dict-generator
      • aspire-load-hdfs
    • Connectors
      • Staging Repo Connector
    • Solutions
      • OCR
      • Semantic Co-ocurrence
    • Publishers
      • Cloudsearch
      • Staging Repo Publisher
    AnchorknownissuesknownissuesKnown Issues AnchorAspireCore2AspireCore2Aspire Core 
    • Importing connector with special characters in the path fields not loading correctly.
    • Auditing
      • Dump option not working with Solr 6.2.0 & 6.3.0
      • Dump option not working with ElasticSearch 5.0.2
      • Incremental Crawl - Unchanged documents not displayed in Audit Log.
    • Aspire Shell
      • The option load-content-sources not working.
      • Relative paths were not working for the commands that create jobs.
      • Sometimes it was possible to delete the Aspire Shell prompt.
    • Failed Documents
      • FailedDocuments - Connector getting stuck when stopping the crawl.
    • Failover
      • Single instance, full test, interrupted, incremental recovery: Error Processing some files after resuming crawl.
      • During full crawls, some documents were left out if an instance was killed.
      • File System connector resumed crawling after restart.
    AnchorApplications2Applications2Applications
    • Archive Extractor
      • Routing options not working with OnError.
      • Delete by Query not working as expected Using ElasticSearch 5.0.1
    AnchorConnectors2Connectors2Connectors
    • Include and Exclude pattern trimming empty spaces.
    • Aspider
      • Aspider - Crawl statistics not displayed on the UI when version was used.
    • Documentum
      • GroupExpansion marking groups as users.
      • Error on console was displayed while crawling a folder: "Stream handler unavailable due to: null"
    • Heritrix
      • Reject Images/Videos/Javascript/CSS not working for external site out of domain.
    • Jira Issues
      • Multiple connection timeouts occurring.
    • Jive
      • Crawl generated unnecessary deletes on Normal Incremental with the Use Progressive Retries option.
    • RDB Snapshot
      • bigINT SQL Server database type not supported for SQL Slices.
      • Inconsistency when crawling ACL information using ACL fetching options.
      • Crawl not working with a specific column when the Use column from Extraction SQL option was specified.
    • RDB Tables
      • Wrong value in "sequence column" parameter not showing UI error.
      • Inconsistent crawling ACL information when using ACL fetching options.
    • SharePoint Connectors
      • Renaming one document included by a pattern not generating a deletion.
      • No descriptive error with non-existent URL. 
    • SharePoint 2007
      • Connector generating wrong hierarchy on updates.
      • Custom headers not being added to the request.
    • SharePoint 2010
      • Documents inside a folder not being picked up by incrementals if the parent ACL changed.
    AnchorServices2Services2Services
    • Import Service option not working when the service under the same name already existed.
    • Group Expansion Service
      • Removing GE collection from mongo causing unusable GE for connectors.
    • Group Expansion Manager
      • GE Manager - More than one GE Service using the same servlet name generating an error.
    AnchorUI2UI2UI
  • Add Source List not showing until clicking refresh sources.
  • New and Enhanced Features


    Aspire Core and Framework Components

    • Salted Challenge Response Authentication Mechanism (SCRAM) support has been added to the MongoDB used with Aspire.
    • The ability to dynamically load jar files has been added to Aspire with Java 9.
    • When starting Aspire either normally or in debug mode, the debug line in the settings.xml file is handled appropriately. 
    • A section has been added to the settings.xml file for HBase information.

    • Logging of remote IP addresses for successful or failed logins will now occur.

    • The Mongo provider now encrypts/hashes IDs.
    • Record fields have been improved.
    • Entitlements checking no longer checks missing components at every restart.

    • Time zones have been normalized for Aspire, including logs and statistics.

    • The documentation has been updated for Keytab/Kerberos.
    • Improvements have been added to Job usage.
    • Updates have been made to the ExtractText default configuration limit for text extracted from a stream.
    • List page retrieval and metadata extraction have been improved in SharePoint Commons.

    Aspire UI

    • To re-fetch entitled components (after deleting the Resources folder), an "Allow Refresh" button has been added.
    • The ability to show Provider Information has been added.

    Connectors

    • Aspider
      • A Headless browser has been added for rendering dynamically generated pages (client-side JavaScript pages).
    • IBM Connections
    • Elastic
    • SharePoint 2010

      • On the Multiple URls drop-down, when the 'Site Discovery' option is set, the 'Set List View' option is removed. 

    • SharePoint Online
      • An NPE at crawl end error could occur if bad credentials were used.
      • Incremental crawls no longer detect containers as updated items.
      • Scan recursively was not working as expected.

    • SMB

      • Added DFS support and override last access date of documents
    • Twitter

    Publishers

    • Elasticsearch
      • Case sensitive index names can be handled properly now.
    • Google Cloud Search
      • A new Google Cloud Search (GCS) publisher receives content from Aspire connectors and uses the Java Client library to index the content into Cloud Search.
    • HBase
      • Content can now be deleted.
      • During a full crawl, the publisher now defaults to clean.
      • When not in file configuration mode, the publisher can now be used without security. 

    • Publish to StageR
      • Field level help has been added for the special scope $record.

    Applications

    • The Entitlements Admin application has been updated.


    Bug Fixes


    Aspire Core and Framework Components

    • Admin UI
      • The ability to configure a weekly schedule could cause an error when saving
    • Aspire Application
      • ConfigManager could log a debug message into {aspire.home}/logs/configmanager.log

      • A problem could occur when editing a custom application in the Admin UI

      • Startup problems could occur using the Staging Publisher
    • Connector Framework
      • When stopping and restarting Aspire while the GroupDownload process was running, the group download did not start again

    • MongoDB Provider
      • The LDAP Cache could report a MongoDB Duplicate key error

      • Aspider could stop with a MongoDB Duplicate key error

    • SharePoint Commons

      • An out of memory (OOM) exception could occur during large crawls

      • Added support for incrementals using Aspire Snapshots on SP
    • The Aspire Archetype had "http" rather than "https" repository and entitlement URLs
    • Failed to connect to Artifactory with custom keystore. Artifactory certificates were added to the distribution. See: https://contentanalytics.digital.accenture.com/pages/viewpage.action?spaceKey=aspire33&title=Crawling+via+HTTPs
    • AspireObject was casting an incorrect numeric type when created from JSON
    • The AspireObject isEmpty method returned true even if the object had children

    • The processDeletes (String) was missing a Status page
    • The Aspire Connector Framework was not using shouldScan during incremental crawls
    • When running a full crawl, a "Provider 'encrypted' not installed" message could occur
    • The Mongo provider generated an invalid JSON object during document conversion

    • Audit logs were incomplete
    • For AIP integration, the logout action was not being logged

    • Publisher framework retryDelay, retryDelayMultiplier and maxRetryDelay properties were not supported by Dynamic XML Forms (DFX)

    • The Aspire-Services jar file was missing a noSQL package

    • The "Loading Application" message could display whether a connector was loading or not

    • Extract Text
      • Use the Apache Tika SAX Parser for Microsoft documents
    • Scheduler
      • The option to create a Cache Groups scheduler was not being displayed

    Aspire UI

    • A Connector component might not show the actual state of a crawl
    • The link that points to the Confluence wiki has been updated

    Connectors

    • Aspider
      • An authentication form error could occur indicating "Target host is not specified while crawling"
      • Neither NTLM nor ADFS authentication was occurring when a host was specified in the Credentials
      • On any port, the Port field was not working correctly with any value except "-1"

      • A crawl could cause a warning about duplicate IDs in MongoDB

      • To indicate that the Gateway was not working, the exception message in ADFS needed updating 

    • Confluence

      • ACLs info appeared inside the hierarchy section

      • A batch error could display while publishing to Elasticsearch 6.3.0
    • Documentum
      • Exception was being thrown during Group Expansion
    • File System
      • Starting Directories in the File option was not working as expected
    • IBM Connections

      • The connector needed to use the Aspire GroupExpansion instead of SharePoint Integrated security with an optimized IBM Connections Group Downloader
      • Memory leaks could occur
      • During an incremental crawl, the deletes of Blogs, Wikis and Files were not working
      • The Content crawled from IBM Connection did not contain a last-modified date. The problem was with the date format
    • Kafka
      • A "NO-NAME" field could occur

    • SharePoint 2010

      • A problem could occur when identifying the site-collections for a WEB-Application
      • When adding a link on a site collection to crawl, [NO-NAME] should not be part of the name attribute in the hierarchy section
      • No error should occur during the incremental crawl for the Blog site collection
      • No errors should occur when crawling a specified list (views included)
    • SharePoint 2013

      • When crawling incrementals for an External list, the connector was not picking up the changes
      • When crawling SP2013, errors such as "HTTP Error 400. The size of the request headers is too Long" might occur
      • KeyNotFoundException while trying to check attachments for list with lookup references deleted

      • Crawl a list and the name in the hierarchy of the documents will be displayed as NO-NAME even though the items have  title field.
      • The placeholder needed to be changed for the 'Seeds file' field

      • The connector was unable to crawl large lists

    • SharePoint 2016

      • An error could occur while crawling site

    • SharePoint Online

      • NPE crawling on distributed mode. Random NPE in the item complete callback

      • String index out of range while getting a List display url

      • Error while crawling after a crawl was stopped: Item parent wasn't assigned during crawl

    • Standalone Mode

      • When a user added a custom connector, feedback needed to be provided by the Aspire UI
    • Staging Repository
      • A global variable was not working when configuring the server in the Staging Repository connector

      • When crawling over multiple documents and publishing at two different scopes, the items published could be duplicated

      • The Stager connection could be broken when running a full crawl

    Publishers

    • Stager BDC Plugin could randomly fail during the crawls after setup

    • Elasticsearch
      • DeleteByQuery was not being used with Elasticsearch 6.1.1
    • GCS Publisher

      • A resource config/application.xml was missing on the jar file

      • A relative path was not working in the Credentials Key File field
      • An error could occur when crawling and publishing to GCS

    • Kafka
      • An error could be masked when running a non-batched job

    • Publish to Avro
      • Validation needed to be added to the Time Rollover Threshold field

    • TLS 1.2 support was needed for the SharePoint Security Pre Trimmer

    Services

    • Azure Group Expander could refuse to start.

    • Group Expansion failed if user data exceeded the Mongo Max Document Limit (16MB)

    • For Aspire Distributed mode, Services in the master node were not starting automatically after saving changes
    • Errors to reflect failed Services were not being generated
    • Services that were set up in an Aspire cluster were not synced up correctly

    • Azure Active Directory Group Expander
      • Users were not being removed

    • Group Expansion Service
      • The userGroupCache map was accessed when the Group Expansion Service was running
    • LDAP Cache Service

      • The controls did not display and the Schedule was set to Advanced even if Minutes or Hourly were set

      • Problems with  LDAP-Cache component could include: reporting a duplicate key error twice, stopping with a duplicate error, taking too long, and refresh refusing to start. The connector could not look up ACL information in the LDAP-Cache component

    • You are now able to check the user’s cache for the Azure Group Expander via the Debug console

    Applications

    • Archive Extractor
      • The "Send delete by query first" option could throw an exception

      • Deleting files inside of an archive file was not handled properly for incremental crawls

    • AVRO Extractor

      • During an incremental crawl, a "duplicate key error" message could display


    Update service name to chain of spaces was not validated all the time.

    Known Issues


    Connectors 

    • FTP 

      • FTP connector is only working with Unix systems and not in Windows
    • Twitter
      • Full/Incremental crawls for retweets are not working

    Publishers

    • Google Cloud Search
      • Bundle location error loading the publisher for the first time
      • NullPointerException publishing with Batch and Content Type Raw options
      • ItemUploadRequest exception
      • Pending required field validation for the 'Indexer Type' field