-
Created by Unknown User (nnavarro), last modified by user-1b188 on Sep 13, 2017
You are viewing an old version of this page. View the current version.
Compare with Current
View Page History
« Previous
Version 81
Next »
This page maintains a list of all of the updates for version 3.3 of Aspire.
New Features
- Web Crawler named Aspider Connector replaces the Legacy Heritrix Connector.
- Salesforce Connector has been refactored to include the following features:
- Runs in the new connector framework.
- Supports execution in a distributed environment.
- Allows concurrent crawling of multiple endpoints.
- Provides faster incremental crawls.
- Uses snapshots.
- New way to manage Failed Documents for all of the Source Connectors:
- Allows document reprocessing that previously failed in both processing and publishing stages.
- Avro Reader Extractor Application and Avro Publisher. ALPHA VERSION
- Parquet Extractor Application. ALPHA VERSION
- SMTP Connector. ALPHA VERSION
- HDFS Connector and Web HDFS Publisher. ALPHA VERSION
- Http Generic Service. ALPHA VERSION
- Documentum DQL
- Error tolerant option to index metadata when fetch document fails.
- New RenditionType option for indexing.
- Support of Azure Authentication on the SharePoint Online Connector.
- New features for the SharePoint Connector (2007/2010):
- Supports default snapshots on incremental crawls.
- Supports crawling-specific views on lists.
- Implemented a single security key-store throughout all of Aspire.
- Updated SharePoint 2007/2010 Web Service Extensions.
External Technical Limitations
- Zip files are not crawled with the Activity Incrementals when they are created inside Jive Documents.
- When a Salesforce SOQL statement selects a number of large fields (such as two or more custom fields of type long text) then Salesforce may return fewer records than defined in the page size in order to control the overall response payload size. The reduction in page size also occurs when dealing with base64 encoded fields (or blob fields), such as the Body. Remove these fields from the query if you want Salesforce to return the number of rows specified on the page size.
To Be Released
Items to Deprecate on Aspire 3.3
The following items are marked to be deprecated on the next Aspire version:
- Elasticsearch bootloader
- aspire-elastic-bootloader
- DCM
- aspire-dcm-enterprise
- aspire-amazonec2-dm
- aspire-zk-dm
- The old Admin UI(s)
- Parts of aspire-application
- Big Data
- app-semantic-co-occurrence-hadoop
- app-semantic-co-occurrence-hadoop-soln
- aspire-hadoop-job-launcher
- aspire-hadoop-hdfs
- aspire-hadoop-wiki-dict-generator
- aspire-load-hdfs
- Connectors
- Staging Repo Connector (File System)
- Solutions
- Publishers
- Cloudsearch
- Staging Repo Publisher (File System)
Known Issues
Aspire Core
- Importing connector with special characters in the path fields not loading correctly.
- Auditing
- Dump option not working with Solr 6.2.0 & 6.3.0
- Dump option not working with ElasticSearch 5.0.2
- Incremental Crawl - Unchanged documents not displayed in Audit Log.
- Aspire Shell
- The option load-content-sources not working.
- Relative paths were not working for the commands that create jobs.
- Sometimes it was possible to delete the Aspire Shell prompt.
- Failed Documents
- FailedDocuments - Connector getting stuck when stopping the crawl.
- Failover
- Single instance, full test, interrupted, incremental recovery: Error Processing some files after resuming crawl.
- During full crawls, some documents were left out if an instance was killed.
- File System connector resumed crawling after restart.
Applications
- Archive Extractor
- Routing options not working with OnError.
- Delete by Query not working as expected Using ElasticSearch 5.0.1
- AVRO Extractor
- Routing Workflow for add/update jobs to On Error does not work
Connectors
- Include and Exclude pattern trimming empty spaces.
- Aspider
- Aspider - Crawl statistics not displayed on the UI when version was used.
- CIFS
- Due to a security issue observed with the protocol, Microsoft has recommended deactivating SMB1 from Windows servers.
- Documentum
- GroupExpansion marking groups as users.
- Error on console was displayed while crawling a folder: "Stream handler unavailable due to: null"
- Heritrix
- Reject Images/Videos/Javascript/CSS not working for external site out of domain.
- Jira Issues
- Multiple connection timeouts occurring.
- Jive
- Crawl generated unnecessary deletes on Normal Incremental with the Use Progressive Retries option.
- RDB Snapshot
- bigINT SQL Server database type not supported for SQL Slices.
- Inconsistency when crawling ACL information using ACL fetching options.
- Crawl not working with a specific column when the Use column from Extraction SQL option was specified.
- RDB Tables
- Wrong value in "sequence column" parameter not showing UI error.
- Inconsistent crawling ACL information when using ACL fetching options.
- SharePoint Connectors
- Renaming one document included by a pattern not generating a deletion.
- No descriptive error with non-existent URL.
- SharePoint 2007
- Connector generating wrong hierarchy on updates.
- Custom headers not being added to the request.
- SharePoint 2010
- Documents inside a folder not being picked up by incrementals if the parent ACL changed.
Publishers
- Publish to SharePoint 2013
- Due to a security issue observed with the protocol, Microsoft has recommended deactivating SMB1 from Windows servers.
Services
- Import Service option not working when the service under the same name already existed.
- Group Expansion Service
- Removing GE collection from mongo causing unusable GE for connectors.
- Group Expansion Manager
- GE Manager - More than one GE Service using the same servlet name generating an error.
UI
- Add Source List not showing until clicking refresh sources.
- Update service name to chain of spaces was not validated all the time.