CEWS Component

Is a component that allows the SharePoint Content Enrichment Web Service to send specific managed properties to aspire, and receive a set of modified or enriched managed properties in return. It consists of a web service wrapped around the Aspire HTTP listener and an aspire application that allows the user to configure a specific content source (usually left inactive) to manage the workflow of that aspire application.

Workflow

Aspire will be used in 2 places in the SharePoint content consumption:

  1. For sources using CEWS such as SharePoint. Aspire will be a single stop within the content processing pipeline. Content will be crawled by the native sharepoint connector, and some managed properties will be sent to aspire for enrichment, after which SharePoint will complete the content processing and index the resulting documents.
  2. For sources being crawled by aspire such as the XML sources, Content will be harvested into the Document Store, it will then be processed from the store and sent to the BCS repository. SharePoint will then pull content from the BCS repository and process it as usual skipping the CEWS step.

Below is a graphical representation of the workflow:

SPArchitecture.png


Configuration

ElementTypeDefaultdescription
endpointstring
This is the endpoint you are exposing with aspire including the server:port/WebserviceName. This will be used in the sharepoint configuration.
filterReturnbooleanfalseSharePoint will throw exceptions if properties are returned that are not expected. This value allows you to only return specified properties from the returnProperties value.
returnPropertiesstring
Comma separated list of properties that will be returned if they exist in the document. All properties not in this list will be ignored if "filterReturn" is set to true.
contentSourcestring
The name of the Content Source whose workflow will be used for the CEWS listner. This can be any type of connector. Since this is a connector it should be left inactive so that content does not accidentally get pushed through the workflow.
workflowReloadPeriodstring15sEquivalent to the workflowreloadPeriod from any normal connector. This is the interval at which the workflow is reviewed for changes and if changes are found they are loaded into memory.
workflowErrorTolerantbooleanfalseEquivalent to workflowErrorTolerant from any normal connector. When set to true this allows workflows to continue even when they encounter an error and complete normally regardless of the document fields available
debugbooleanfalsePrints debug information to logs and console

Aspire Service Example Configuration



CEWS Service Section

In aspire we need to configure the Content Enrichment service and point it to a specific Content Source for management.

In the Service endpoint enter http://0.0.0.0:62028/CEWebService.svc to let the service listen on all available network interfaces.

Check the Filter return box and add the names of the Properties to be returned by this Content Enrichment service. These must be the same fields as configured with SharePoint's Set-SPEnterpriseSearchContentEnrichmentConfiguration cmdlet. If this is not configured, the default return values are returned with any custom properties populated in the workflow.

An application tag is added to the settings.xml file

<application config="com.searchtechnologies.aspire:app-sp2013-content-enrichment" id="13">
      <properties>
        <property name="debug">false</property>
        <property name="endpoint">http://itsup2015-ba01:62027/CEWebService.svc</property>
        <property name="filterReturn">true</property>
        <property name="returnProperties">Author,Title</property>
        <property name="contentSource">CEWSWorkflow</property> <!-- only required in versions before 2.2 -->
        <property name="workflowReloadPeriod">15s</property>
        <property name="workflowErrorTolerant">false</property>
      </properties>
    </application>

Aspire Workflow Configuration

Workflow

Aspire 2.2 and later



CEWS Workflow Section

Once the SP2013 Content Enrichment service has been created, you can modify the workflow responsible for populating the properties/fields returned to SharePoint. Only the onProcess workflow is available, but all Aspire workflow functions may be used.

Aspire 2.1 and before

Once the SP2013 Content Enrichment application has been created you will need to add a connector with the name specified in the contentSource parameter. This can be any type of connector and all configurations except the workflow OnAddUpdate are ignored so it is recommended that it is left inactive.

  1. Open the aspire Administration UI
  2. Click "Add Source"
  3. Enter the Source Name previously specified
  4. Click on the Connector tab
  5. Populate any required fields so that the source can be saved.
    1. These parameters will not be used by the CEWS listener so they do not have to be valid for any source.
  6. Click on the WorkFlow Tab
  7. The onProcess workflow is automatically selected (there is no other).
  8. Add any appropriate workflow steps
  9. Save

Populating Return Values

CEWS allows different types of return values. Those must be defined when adding them to the Aspire object, for example an array of strings:

import java.util.Arrays;

def HESSource = doc.add("HESSource").setAttribute("type", "PropertyOfArrayOfstring");
HESSource.setContent(Arrays.asList("External SharePoint", "External SharePoint|External Teamsites"));


Possible value types are:

  • PropertyOfArrayOfdecimal
  • PropertyOfArrayOfdouble
  • PropertyOfArrayOflong
  • PropertyOfArrayOfstring
  • PropertyOfboolean
  • PropertyOfdateTime
  • PropertyOfdecimal
  • PropertyOfdouble
  • PropertyOflong
  • PropertyOfstring

Default Return Values

The Aspire CEWS service returns those fields by default:

NameTypeValue
aspireCEEndpointPropertyOfstringAddress of Aspire CEWS endpoint, e.g. http://sp.search.local:62028/CEWebService.svc
aspireCESchemaPropertyOfstringhttp://schemas.microsoft.com/office/server/search/contentprocessing/2012/01/ContentProcessingEnrichment
aspireFeederLabelPropertyOfstringceWebService

Additionally, all fields passed into the service are returned as well.

Those fields must be defined when configuring SharePoint:

$cec = New-SPEnterpriseSearchContentEnrichmentConfiguration
$cec.OutputProperties = "aspireCEEndpoint", "aspireCESchema", "aspireFeederLabel", ...
...


As an alternative, configure the Aspire CEWS Service to return only certain fields.

Sharepoint configuration

On the SharePoint box we need to configure CEWS to communicate with the our new Aspire endpoint

We need to specify which properties Aspire will be consuming and which properties we will be returning.

We also can configure how to handle the raw document.

Below are the PowerShell commands required to update the CEWS configuration:

# Get the Search Service Application for later
$ssa = Get-SPEnterpriseSearchServiceApplication

# Create a new CEWS config
$config = New-SPEnterpriseSearchContentEnrichmentConfiguration

# Set the endpoint value
$config.Endpoint = "<URLToWebService>"

# Set the Debug value
$config.DebugMode = $False

##############################################################
# IMPORTANT! if you set debug on then the next two parameters#
# are ignored!                                               #
# Debug sends all managed properties instead of the ones you #
# are requesting. It will also ignore any response you send! #
##############################################################

# Set all properties you will send to CEWS, 
# You can also use $config.InputProperties.add("xxx")
$config.InputProperties = "Author", "Filename", "Title"

# Set properties you will export Same process as import properties
$config.OutputProperties = "Author", "Title"

# Do you want the full document to be sent via CEWS (this is a relatively significant performance impact.) ?
$config.SendRawData = $True

# Set the max size of the raw document to 8MB or as desired.
$config.MaxRawDataSize = 8192

# Update the configuration sending the Config and The SSA you just defined
Set-SPEnterpriseSearchContentEnrichmentConfiguration -SearchApplication $ssa -ContentEnrichmentConfiguration $config


The configuration can be removed with:

# Get the Search Service Application for later
$ssa = Get-SPEnterpriseSearchServiceApplication
Remove-SPEnterpriseSearchContentEnrichmentConfiguration -SearchApplication $ssa 


You can get the current configuration for review or update with:

Get-SPEnterpriseSearchContentEnrichmentConfiguration 


Known Issues

  • The service cannot be browsed at http[s]://aspireserver:<port>/<endpoint> if Java 1.8 is installed. The best way to verify that the service is running is browsing http[s]://aspireserver:<port>/<endpoint>?wsdl which should return the CEWS wsdl. Despite the fact that the service is not browsable, it works as expected.
  • No labels