This page maintains a list of all of the updates for version 3.3 of Aspire.
On this page:
New FeaturesBug FixesAspire CoreApplicationsServicesKnown Issues Core ApplicationsSolutionsTo Be ReleasedExternal Technical Limitations Items to deprecate on Aspire 3.3New Features
- New Web Crawler named Aspider Connector that replaces the Legacy Heritrix Connector.
- New way to manage Failed Documents for all the Source Connectors.
- It allows documents reprocessing that previously failed in both processing and publishing stages.
- New Avro Reader Extractor Application and Avro Publisher.
- New Parquet Extractor Applicaton.
- New SMTP Extractor.
- New support to Azure Authentication on the SharePoint Online Connector.
- New features for the SharePoint Connector (2007/2010):
- Support for default snapshots on incremental crawls.
- Support crawling specific views on lists.
- New StageR Repository Lister.
- Implemented a single security key-store throughout all Aspire.
New and Enhanced Features
Aspire Core and Framework Components
- Salted Challenge Response Authentication Mechanism (SCRAM) support has been added to the MongoDB used with Aspire.
- The ability to dynamically load jar files has been added to Aspire with Java 9.
- When starting Aspire either normally or in debug mode, the debug line in the
settings.xml
file is handled appropriately. A section has been added to the settings.xml
file for HBase information.
Logging of remote IP addresses for successful or failed logins will now occur.
- The Mongo provider now encrypts/hashes IDs.
- Record fields have been improved.
Entitlements checking no longer checks missing components at every restart.
Time zones have been normalized for Aspire, including logs and statistics.
- The documentation has been updated for Keytab/Kerberos.
- Improvements have been added to Job usage.
- Updates have been made to the ExtractText default configuration limit for text extracted from a stream.
- List page retrieval and metadata extraction have been improved in SharePoint Commons.
Aspire UI
Connectors
- Aspider
- A Headless browser has been added for rendering dynamically generated pages (client-side JavaScript pages).
- IBM Connections
- Elastic
SharePoint 2010
- SharePoint Online
SMB
- Added DFS support and override last access date of documents
- Twitter
Publishers
- Elasticsearch
- Case sensitive index names can be handled properly now.
- Google Cloud Search
- A new Google Cloud Search (GCS) publisher receives content from Aspire connectors and uses the Java Client library to index the content into Cloud Search.
- HBase
- Publish to StageR
Applications
- The Entitlements Admin application has been updated.
Bug Fixes
Aspire Core and Framework Components
- Admin UI
- The ability to configure a weekly schedule could cause an error when saving
- Aspire Application
- Connector Framework
- MongoDB Provider
SharePoint Commons
- The Aspire Archetype had "http" rather than "https" repository and entitlement URLs
- Failed to connect to Artifactory with custom keystore. Artifactory certificates were added to the distribution. See: https://contentanalytics.digital.accenture.com/pages/viewpage.action?spaceKey=aspire33&title=Crawling+via+HTTPs
- AspireObject was casting an incorrect numeric type when created from JSON
The AspireObject isEmpty method returned true even if the object had children
- The processDeletes (String) was missing a Status page
- The Aspire Connector Framework was not using shouldScan during incremental crawls
- When running a full crawl, a "Provider 'encrypted' not installed" message could occur
The Mongo provider generated an invalid JSON object during document conversion
- Audit logs were incomplete
For AIP integration, the logout action was not being logged
Publisher framework retryDelay, retryDelayMultiplier and maxRetryDelay properties were not supported by Dynamic XML Forms (DFX)
The Aspire-Services jar file was missing a noSQL package
The "Loading Application" message could display whether a connector was loading or not
- Extract Text
- Use the Apache Tika SAX Parser for Microsoft documents
- Scheduler
- The option to create a Cache Groups scheduler was not being displayed
Aspire UI
- A Connector component might not show the actual state of a crawl
- The link that points to the Confluence wiki has been updated
Connectors
- Aspider
- An authentication form error could occur indicating "Target host is not specified while crawling"
- Neither NTLM nor ADFS authentication was occurring when a host was specified in the Credentials
On any port, the Port field was not working correctly with any value except "-1"
A crawl could cause a warning about duplicate IDs in MongoDB
To indicate that the Gateway was not working, the exception message in ADFS needed updating
Confluence
- Documentum
- Exception was being thrown during Group Expansion
- File System
- Starting Directories in the File option was not working as expected
IBM Connections
- The connector needed to use the Aspire GroupExpansion instead of SharePoint Integrated security with an optimized IBM Connections Group Downloader
- Memory leaks could occur
- During an incremental crawl, the deletes of Blogs, Wikis and Files were not working
- The Content crawled from IBM Connection did not contain a last-modified date. The problem was with the date format
- Kafka
SharePoint 2010
- A problem could occur when identifying the site-collections for a WEB-Application
- When adding a link on a site collection to crawl, [NO-NAME] should not be part of the name attribute in the hierarchy section
- No error should occur during the incremental crawl for the Blog site collection
- No errors should occur when crawling a specified list (views included)
SharePoint 2013
- When crawling incrementals for an External list, the connector was not picking up the changes
- When crawling SP2013, errors such as "HTTP Error 400. The size of the request headers is too Long" might occur
KeyNotFoundException while trying to check attachments for list with lookup references deleted
- Crawl a list and the name in the hierarchy of the documents will be displayed as NO-NAME even though the items have title field.
The placeholder needed to be changed for the 'Seeds file' field
The connector was unable to crawl large lists
SharePoint 2016
SharePoint Online
NPE crawling on distributed mode. Random NPE in the item complete callback
String index out of range while getting a List display url
Error while crawling after a crawl was stopped: Item parent wasn't assigned during crawl
Standalone Mode
- When a user added a custom connector, feedback needed to be provided by the Aspire UI
- Staging Repository
A global variable was not working when configuring the server in the Staging Repository connector
When crawling over multiple documents and publishing at two different scopes, the items published could be duplicated
The Stager connection could be broken when running a full crawl
Publishers
Services
Azure Group Expander could refuse to start.
Group Expansion failed if user data exceeded the Mongo Max Document Limit (16MB)
- For Aspire Distributed mode, Services in the master node were not starting automatically after saving changes
- Errors to reflect failed Services were not being generated
Services that were set up in an Aspire cluster were not synced up correctly
- Azure Active Directory Group Expander
- Group Expansion Service
- The userGroupCache map was accessed when the Group Expansion Service was running
LDAP Cache Service
The controls did not display and the Schedule was set to Advanced even if Minutes or Hourly were set
Problems with LDAP-Cache component could include: reporting a duplicate key error twice, stopping with a duplicate error, taking too long, and refresh refusing to start. The connector could not look up ACL information in the LDAP-Cache component
You are now able to check the user’s cache for the Azure Group Expander via the Debug console
Applications
- Archive Extractor
AVRO Extractor
Bug Fixes
Aspire Core
Applications
Connectors
Z...Publishers
..
..Services
..
Known Issues
Aspire Core
Applications
..
..
Connectors
..
Solutions
...
To Be Released
- Amazon S3
- Box
- CEWS Listener
- FTP
- GSA Publisher
- IBMConnections
- PST Extractor
- Publish to HDFS
- Publish to SharePoint 2013
- Publish to SharePoint 2013 (Install & Setup)
- Salesforce
- Subversion
- Teamforge
External Technical Limitations
- Changes in Box notes content are not considered for incremental crawls.
- Changes made to the attachments of the item type Opportunity in Salesforce are not considered for incremental crawls.
Items to deprecate on Aspire 3.3
Elasticsearch bootloader- aspire-elastic-bootloader
DCM- aspire-dcm-enterprise
- aspire-amazonec2-dm
- aspire-zk-dm
The old Admin UI(s)- Parts of aspire-application
Big Data- app-semantic-co-occurrence-hadoop
- app-semantic-co-occurrence-hadoop-soln
- aspire-hadoop-job-launcher
- aspire-hadoop-hdfs
- aspire-hadoop-wiki-dict-generator
- aspire-load-hdfs
ConnectorsSolutionsPublishersCloudsearchGSAStaging Repo PublisherKnown Issues
Connectors
FTP
- FTP connector is only working with Unix systems and not in Windows
- Twitter
- Full/Incremental crawls for retweets are not working
Publishers