You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Step 1. Launch Aspire and open the Content Source Management Page

Launch Aspire (if it's not already running). See:

Step 2. Add a new Content Source

  • For this step please follow the step from the Configuration Tutorial of the connector of you choice, please refer to Connector list

Step 3. Add a new Microsoft Search publisher to the Workflow

To add a Publish to Microsoft Search drag a Microsoft Search publisher rule from the Workflow Library and drop to the Workflow Tree where you want to add it. This will automatically open the Microsoft Search publisher window for the configuration of the publisher.

Step 3a. Specify Publisher Information

 In the publisher window, specify the connection information to publish to the Microsoft Search.

  • Name: Unique name for the publisher
  • Tenant Id:  The tenant Id provided by Microsoft
  • Client Id:  The client Id generated when the Application was registered in Prerequisites
  • Client Secret:  The client secret that was generated in Prerequisites
  • Index Name:  The name of the connection/index that will be created in MS Search
  • Groovy Transform:  The path to a Groovy transformation file that will process the document to make it match the expected structure from either the fixed ExternalFile or the custom ExternalItem data types in MS Search connection schemas. To see the default of the files, please refer to the Where are the default Groovy transformation and custom schema located?
  • Use custom schema:  Enables the usage of the limited custom schema in MS Search.  If disabled it will assume ExternalFile schema, otherwise it will be ExternalItem and will require the path to a JSON schema file
  • Start/end actions Clear:  In order to create the connection automatically on full crawls this must be enabled, otherwise it will be assumed the connection exists

Other configuration items are common to every publisher component.


ExternalFile vs ExternalItem

Microsoft Search allows the usage of one of two schemas:  fixed ExternalFile or custom ExternalItem.  ExternalItem allows limited freedom to define properties to be expected from crawled items.  Needless to say, the Groovy transformation file must yield an output that matches the expected schema.

ExternalFile schema

The fixed external file schema expects the following information:

  • acl:  The list of ACLs for the document
  • createdDateTime:  A standard UTC string that represents the creation date and time
  • modifiedDateTime:  A standard UTC string that represents the last modification date and time
  • createdBy:  The author's name
  • lastModifiedBy:  The name of the last person that modified the document
  • title:  The document's title
  • url:  The document's url
  • name:  The document's name
  • extension:  The document's file name extension
  • size:  The document size
  • content:  The document's content

This is a sample document in JSON format as expected by the REST API:

{
    "acl": [
        {
            "type": "user",
            "value": "d411eb08-42e2-4316-aab5-2df8e9d9c21b",
            "accessType": "grant",
            "identitySource": "Azure Active Directory"
        }
    ],
    "createdDateTime": "2017-11-08T19:06:17Z",
    "modifiedDateTime": "2017-11-08T19:06:17Z",
    "createdBy": "empty",
    "lastModifiedBy": "empty",
    "title": "sample document",
    "url": "http://the.url.com",
    "name": "name.txt",
    "extension": "txt",
    "size": 10,
    "content": "the content/n"
}




  • No labels