From .10HBase version 1.2.4 the anchornewfeaturesnewfeatures

NoSQL DB Provider
- HBase supported.
AIP Integration related features (Aspire Cloudera Parcel mode)
- Audit Logs
  - Log Aspire actions via Jetty.
- Cloudera Parcel updated.
- Licensing
  - Licensing to build entitlements for AIP
  - Licensing to Access Aspire Application: Annual, Perpetual and Trial.
- Log Tika errors as warnings.
- Logging improvements .
  - Added processes and scripts to delete old logs.
- Security Access Control Configuration.
  - User Role Authentication with LDAP.
- Security
  - Access to file system/APIs via groovy.

On this pageRelated pagesNew Features

Items Released

Hot Fixes (under construction)

To Be Released

Known Issues

Bug Fixes

External Technical Limitations

Items Deprecated on Aspire 3.3

Applications
- HDFS Binary File Writer.
- Connectors
  - HBase
  - Jive
    - Simplified way to bring Security ACLs.
  - Kafka
  - Salesforce
    - Chatter Feeds Endpoint updated to use the SOAP API and Accounts management.
  - Service Now
    - New option to choose what to use for the Include/Exclude patterns. Either URLs or Knowledge Article Id.
  - SharePoint 2016
  - SMB2
  - Framework
    - Updated start scripts to fail with versions of Java less than 1.8.
    - Normalize time zone for Aspire including logs and statistics.
    - Remove Recovery Policy options from the UI, since they are no longer used.
    - Aspire Archetype - Add required Felix properties for Kerberos authentication in the Aspire Distribution.
    - Jetty web server modifications for Aspire: HTTP TRACE method and server header info.
    - Option enable to change Aspire's port when distribution is created with maven command.
    - LDAP Cache collection names changed for something more meaningful.
  - Services
    - HAR Compactor
  - Publishers
    - HBase
    - Kafka
    - Solr Cloud (SolrJ library)
  - UI
    - New grid/list view on Content Source page.

AnchorConnectorsReleasedConnectorsReleasedItems Released

Connectors
- Aspider
- Amazon S3
- File System
- CIFS
- Documentum
- Documentum DQL
- SMB2
- HBase
- HDFS
- Kafka (Security pending)
- RDB Snapshot
- RDB via Table
- Salesforce
- Service Now
- SharePoint 2013
- SharePoint 2016
- SharePoint Online
- Jive
- StageR
- FTP
- RSS
Publishers
- HBase
- Elasticsearch
- SolrCloud
- Solr
- Kafka
- Elasticsearch
- StageR

AnchorbugfixesbugfixesBug Fixes AnchorAspireCore2AspireCore2Aspire Core

NPE killing Aspire distribution
Error when second server joins a distributed crawl.

AnchorAspireCore2AspireCore2Auditing

Page Navigation not showing all the jobs.

AnchorAspireCore2AspireCore2Connectors

Aspider
- Crawl getting stuck (Error claiming the entry from the queue).
- Decoded characters in URLs when no needed.
- Case-sensitive option not considering robots.txt file.
- NPE when running an incremental crawl using an invalid URL in the seed file.
- NPE when running a full crawl using default options.
- Full Crawl - NPE while running a Full Crawl using default options.
- Minor UI issues.
CIFS
- Connector not allowing special characters as paths to crawl.
- Missing placeholder on "Path to URLs field.
Jive
- ACLs not getting updated when the privacy of a group changed.
- Hierarchy for documents with attachments not constructed correctly.
- Extract Text stage is not being executed with default settings.
- Activity Incremental + Creation Date Filter not catching updates after the first one.
- Activity Incremental crawl generating duplicate key error collection.
- Unicode characters not handled at all in several Jive contents.
- Hierarchy for Social Group's blogs being generated incorrectly.
- Error not being displayed retrieving ACLs from Entitlement API using on-premise Jive.
- No UI error when Aspire Security Plugin is disabled.
- The first level of the hierarchy populating the content source name in "name" field.
RDB Snapshot
- Minor UI validations.
Salesforce
- ACLs error with default connector configuration.
- ACLs for Groups/Users not reflected correctly after updates.
- Error crawling Feed deleted.
- Error on fetching Attachments.
- Error getting deletes on Incremental for Tasks and Profiles.
- Error on invalid session.
- Error ending crawl.
Service Now
- Deletes being reported over attachments not updated.
StageR
- Connector doing reprocessing for incremental crawl.

AnchorAspireCore2AspireCore2Publishers

StageR
- Validation for Document Key field when field does not exist.

AnchorAspireCore2AspireCore2UI

Importing connector with special characters in the path fields not loading correctly.

AnchorToReleaseToReleaseTo Be Released

Applications
- Archive Extractor
- AVRO Extractor
- Parquet Extractor
Connectors
- Box
- Confluence
- eRoom
- IBM Connections
- Jira
- Jive (support ver. 9)
- Kinesis
- Lotus
- RightNow
- SharePoint 2010
- SMTP
- Socialcast
- TeamForge
Publishers

CDH Hadoop
Elasticsearch in Azure
Kinesis
Web HDFS

Services
- Azure Group Expansion
- HTTP Listener
- HTTP Service
- SharePoint 2013 CEWS Listener

AnchorExtTechLimitExtTechLimitExternal Technical LimitationsHBase: When running Aspire with long-term, large ingestion (with HBASE as the underlying HBase libraries may eventually stop returning results without throwing any error back, degrading the crawl performance down until it stalls completely. When this happens the only solution is to restart the affected aspire servers so the underlying HBase library threads get to connect from scratch.

New and Enhanced Features

Aspire Core and Framework Components

Salted Challenge Response Authentication Mechanism (SCRAM) support has been added to the MongoDB used with Aspire.
The ability to dynamically load jar files has been added to Aspire with Java 9.
When starting Aspire either normally or in debug mode, the debug line in the settings.xml file is handled appropriately.
A section has been added to the settings.xml file for HBase information.
Logging of remote IP addresses for successful or failed logins will now occur.
The Mongo provider now encrypts/hashes IDs.
Record fields have been improved.
Entitlements checking no longer checks missing components at every restart.
Time zones have been normalized for Aspire, including logs and statistics.
The documentation has been updated for Keytab/Kerberos.
Improvements have been added to Job usage.
Updates have been made to the ExtractText default configuration limit for text extracted from a stream.
List page retrieval and metadata extraction have been improved in SharePoint Commons.

Aspire UI

To re-fetch entitled components (after deleting the Resources folder), an "Allow Refresh" button has been added.
The ability to show Provider Information has been added.

Connectors

Aspider
- A Headless browser has been added for rendering dynamically generated pages (client-side JavaScript pages).
IBM Connections
Elastic
SharePoint 2010
- On the Multiple URls drop-down, when the 'Site Discovery' option is set, the 'Set List View' option is removed.
SharePoint Online
- An NPE at crawl end error could occur if bad credentials were used.
- Incremental crawls no longer detect containers as updated items.
- Scan recursively was not working as expected.
SMB
- Added DFS support and override last access date of documents
Twitter

Publishers

Elasticsearch
- Case sensitive index names can be handled properly now.
Google Cloud Search
- A new Google Cloud Search (GCS) publisher receives content from Aspire connectors and uses the Java Client library to index the content into Cloud Search.
HBase
- Content can now be deleted.
- During a full crawl, the publisher now defaults to clean.
- When not in file configuration mode, the publisher can now be used without security.
Publish to StageR

Field level help has been added for the special scope $record.

Applications

The Entitlements Admin application has been updated.

Bug Fixes

Aspire Core and Framework Components

Admin UI
- The ability to configure a weekly schedule could cause an error when saving
Aspire Application
- ConfigManager could log a debug message into {aspire.home}/logs/configmanager.log
- A problem could occur when editing a custom application in the Admin UI
- Startup problems could occur using the Staging Publisher
Connector Framework
- When stopping and restarting Aspire while the GroupDownload process was running, the group download did not start again
MongoDB Provider
- The LDAP Cache could report a MongoDB Duplicate key error
- Aspider could stop with a MongoDB Duplicate key error
SharePoint Commons
- An out of memory (OOM) exception could occur during large crawls
- Added support for incrementals using Aspire Snapshots on SP
The Aspire Archetype had "http" rather than "https" repository and entitlement URLs
Failed to connect to Artifactory with custom keystore. Artifactory certificates were added to the distribution. See: https://contentanalytics.digital.accenture.com/pages/viewpage.action?spaceKey=aspire33&title=Crawling+via+HTTPs
AspireObject was casting an incorrect numeric type when created from JSON
The AspireObject isEmpty method returned true even if the object had children
The processDeletes (String) was missing a Status page
The Aspire Connector Framework was not using shouldScan during incremental crawls
When running a full crawl, a "Provider 'encrypted' not installed" message could occur
The Mongo provider generated an invalid JSON object during document conversion
Audit logs were incomplete
For AIP integration, the logout action was not being logged
Publisher framework retryDelay, retryDelayMultiplier and maxRetryDelay properties were not supported by Dynamic XML Forms (DFX)
The Aspire-Services jar file was missing a noSQL package
The "Loading Application" message could display whether a connector was loading or not
Extract Text
- Use the Apache Tika SAX Parser for Microsoft documents
Scheduler
- The option to create a Cache Groups scheduler was not being displayed

Aspire UI

A Connector component might not show the actual state of a crawl
The link that points to the Confluence wiki has been updated

Connectors

Aspider
- An authentication form error could occur indicating "Target host is not specified while crawling"
- Neither NTLM nor ADFS authentication was occurring when a host was specified in the Credentials
- On any port, the Port field was not working correctly with any value except "-1"
- A crawl could cause a warning about duplicate IDs in MongoDB
- To indicate that the Gateway was not working, the exception message in ADFS needed updating
Confluence
- ACLs info appeared inside the hierarchy section
- A batch error could display while publishing to Elasticsearch 6.3.0
Documentum
- Exception was being thrown during Group Expansion
File System
- Starting Directories in the File option was not working as expected
IBM Connections
- The connector needed to use the Aspire GroupExpansion instead of SharePoint Integrated security with an optimized IBM Connections Group Downloader
- Memory leaks could occur
- During an incremental crawl, the deletes of Blogs, Wikis and Files were not working
- The Content crawled from IBM Connection did not contain a last-modified date. The problem was with the date format
Kafka
- A "NO-NAME" field could occur
SharePoint 2010
- A problem could occur when identifying the site-collections for a WEB-Application
- When adding a link on a site collection to crawl, [NO-NAME] should not be part of the name attribute in the hierarchy section
- No error should occur during the incremental crawl for the Blog site collection
- No errors should occur when crawling a specified list (views included)
SharePoint 2013
- When crawling incrementals for an External list, the connector was not picking up the changes
- When crawling SP2013, errors such as "HTTP Error 400. The size of the request headers is too Long" might occur
- KeyNotFoundException while trying to check attachments for list with lookup references deleted
- Crawl a list and the name in the hierarchy of the documents will be displayed as NO-NAME even though the items have title field.
- The placeholder needed to be changed for the 'Seeds file' field
- The connector was unable to crawl large lists
SharePoint 2016
- An error could occur while crawling site
SharePoint Online
- NPE crawling on distributed mode. Random NPE in the item complete callback
- String index out of range while getting a List display url
- Error while crawling after a crawl was stopped: Item parent wasn't assigned during crawl
Standalone Mode
- When a user added a custom connector, feedback needed to be provided by the Aspire UI
Staging Repository
- A global variable was not working when configuring the server in the Staging Repository connector
- When crawling over multiple documents and publishing at two different scopes, the items published could be duplicated
- The Stager connection could be broken when running a full crawl

Publishers

Stager BDC Plugin could randomly fail during the crawls after setup
Elasticsearch
- DeleteByQuery was not being used with Elasticsearch 6.1.1
GCS Publisher
- A resource config/application.xml was missing on the jar file
- A relative path was not working in the Credentials Key File field
- An error could occur when crawling and publishing to GCS
Kafka
- An error could be masked when running a non-batched job
Publish to Avro
- Validation needed to be added to the Time Rollover Threshold field
TLS 1.2 support was needed for the SharePoint Security Pre Trimmer

Services

Azure Group Expander could refuse to start.
Group Expansion failed if user data exceeded the Mongo Max Document Limit (16MB)
For Aspire Distributed mode, Services in the master node were not starting automatically after saving changes
Errors to reflect failed Services were not being generated
Services that were set up in an Aspire cluster were not synced up correctly
Azure Active Directory Group Expander
- Users were not being removed
Group Expansion Service
- The userGroupCache map was accessed when the Group Expansion Service was running
LDAP Cache Service
- The controls did not display and the Schedule was set to Advanced even if Minutes or Hourly were set
- Problems with LDAP-Cache component could include: reporting a duplicate key error twice, stopping with a duplicate error, taking too long, and refresh refusing to start. The connector could not look up ACL information in the LDAP-Cache component
You are now able to check the user’s cache for the Azure Group Expander via the Debug console

Applications

Archive Extractor
- The "Send delete by query first" option could throw an exception
- Deleting files inside of an archive file was not handled properly for incremental crawls
AVRO Extractor
- During an incremental crawl, a "duplicate key error" message could display

SalesForce connector: Due to SalesForce API limitations, the connector has the following limitations:

For incremental crawls, the getUpdated and getDeleted methods are used, but when an attachment is updated from any item, that action will not be processed by the methods mentioned.
Security and incremental related limitations:
- In security, we are only supporting 'Supported elements'.
- For sharing related incremental crawling, unsharing of Salesforce item is not working.
- For incremental crawling of Salesforce task items, we are only supporting tasks based on accounts.
- If removal of sharing occurs for a item (e.g., removing sharing of an account), it is not reflect in the incremental crawl.
- Pricebook sharing ACLs are not supported.
- We are only supporting Tasks that are based on accounts for incremental crawling.
Chatter security
- Chatter ACLs only will be retrieved if the “Filter TrackedChange feeds” option is checked.
- Chatter ACLs are only supported for items that were created by a User or a Group, otherwise no ACL will be generated for the item.
- The public chatter groups will have two ACLs, one for the public group and a PUBLIC:ALL ACL.
- Private and Unlisted chatter groups will have one ACL for the group.
- The followers of a chatter user will be treated as a private group called “<username>’s followers”, all the feed items created by a user to their followers will have this ACL.
- The chatter item attachments will inherit the parent item ACLs.
- Reducing the users retrieval scope might lead to a loss of ACLs, since no ACLs won’t be generated for followers of users outside the scope of the user retrieval.
Salesforce Compatibility limitation
- Every 3 months Salesforce releases a new version of their API and, sometimes makes changes to the data structures, after each update there is a possibility that the compatibility between the connector and Salesforce will break.

AnchorknownissuesknownissuesKnown Issues AnchorItemDeprecateItemDeprecateItems Deprecated on Aspire 3.3 AnchorAspireCore2AspireCore2Aspire Core

Loading Application message trying to add connector but it does not load.
Failover:

Triple instance full crawl (double-interrupted): Having missing jobs.
Dual Test Full Crawl interrupted: after aspire shutdown in one instance the other instance continue the crawl but never ends, Not all Docs are published on Solr.
Full test interrupted: after aspire shutdown and restarted docs are not published on Solr.

NoSQL provider:

Configurations - Encrypted fields is not working.
Missing 'NoSQL provider unavailable' message when provider is down.

AnchorAspireCore2AspireCore2Connectors

Aspider
- Crawl statistics are shown more items than actually processed.
Documentum
- Connector fails during group expansion with large number of users and groups.
Jive
- Resume button

The following items are deprecated on this Aspire version:

Elasticsearch bootloader

aspire-elastic-bootloader

DCM

aspire-amazonec2-dm
aspire-zk-dm

Big Data

app-semantic-co-occurrence-hadoop
app-semantic-co-occurrence-hadoop-soln
aspire-hadoop-job-launcher
aspire-hadoop-hdfs
aspire-hadoop-wiki-dict-generator
aspire-load-hdfs

Connectors

Staging Repo Connector (File System)
SVN

Services:

Fast Components
Fast Content API
Fast Query Completion Listener
Fast Query Listener

Solutions

OCR
Semantic Co-ocurrence

Publishers

Cloudsearch

SharePoint 2013

Staging Repo Publisher (File System)

Known Issues

Connectors

FTP
- FTP connector is only working with Unix systems and not in Windows
Twitter
- Full/Incremental crawls for retweets are not working

Publishers

Google Cloud Search
- Bundle location error loading the publisher for the first time
- NullPointerException publishing with Batch and Content Type Raw options
- ItemUploadRequest exception
- Pending required field validation for the 'Indexer Type' field

Page tree

Versions Compared

Old Version 121

New Version Current

Key

For version 3.3, Aspire requires a license file to run.

See Aspire Licensing for information on obtaining a license.

New and Enhanced Features

Aspire Core and Framework Components

Aspire UI

Connectors

Publishers

Applications

Bug Fixes

Aspire Core and Framework Components

Aspire UI

Connectors

Publishers

Services

Applications

Known Issues

Connectors

Publishers

Page tree

Page History

Versions Compared

Old Version 121

New Version Current

Key

For version 3.3, Aspire requires a license file to run.

See Aspire Licensing for information on obtaining a license.

New and Enhanced Features

Aspire Core and Framework Components

Aspire UI

Connectors

Publishers

Applications

Bug Fixes

Aspire Core and Framework Components

Aspire UI

Connectors

Publishers

Services

Applications

Known Issues

Connectors

Publishers