Step 1. Launch Aspire

Launch Aspire (if it's not already running). See: Launch Control

Step 2. Open the Content Source Management Page

For details on using the Aspire Content Source Management page, please refer to Admin UI.

Step 3. Add a New Content Source

For this step, please follow the procedure from the Configuration Tutorial of the connector of you choice. Refer to the Connector list

Step 4. Add a New Publish to Azure Blobs to the Workflow

To add a Publish to Azure Blobs drag from the Publish to Azure Blobs rule from the Workflow Library and drop to the Workflow Tree where you want to add it.

This will automatically open the Publish to Azure Blobs window for the configuration of the publisher.

Step 4a. Specify Publisher Information

In the Publish to Azure Blobs window, specify the connection information needed to publish to Azure Blob storage.

Storage Connection String: This is the connection string for the Azure Blob storage service, which contains four parts.
- Default Endpoints Protocol, with two possible values: http or https. For example: “DefaultEndpointsProtocol=http;”
- Account Name, which is the name of your Microsoft Azure storage account. For example: “AccountName=myAccount;”
- Account Key, which is the key associated to your Azure storage account. For example: “AccountKey=myKey;”
- BlobEndpoint, indicates the URL for the blob storage repository. For example: “BlobEndpoint=http://mystorageaccount.blob.core.windows.net ”
Blob Container Name: Enter the name of the container inside the Azure Blob storage where you want to publish your results.
Clean container before full crawl: Mark this option if you want to clean the blob container before a full crawl.
Upload binary objects: Mark this option if you want to upload the binary objects together with the json objects. For the Binary Objects to be successfully uploaded you need to disable the extract text option in the Connector settings.
Use only one extension: Mark this option if you want all the binary objects to be uploaded using the same file extension. The file extension to be used is specified in the next field "Unique binary file extension". If the option is not marked, then the uploaded files will keep their original file extension.
Unique binary file extension: if the option "Use only one extension" is marked, then this is the used file extension for all uploaded binary objects.
File extension exclude list: This is a comma separated list for all the file extensions you want to exclude from the upload binary process. Add the extensions of the files that you don't want to be uploaded into this list.
Use Transformation file: Select the check box to use a Groovy script (indicated in the JSON Transformation field) to transform the content and store the results of the transformation.
Clear the check box to create the standard Aspire JSON content.
JSON Transformation: Enter the location of the file containing the JSON Transformation script. This script will be used to transform the data from Aspire as it is posted to the Azure Blob Container.
See JSON Transformation Script for details on the format of the script.
Soft Delete: Select the check box to execute a soft delete (object mark as deleted) rather than a physical delete from Azure Blob.
Delete Flag Name: Enter the name of the flag used to mark document as deleted in Azure Blob. The flag will be stored as part of the blob metadata (Deleted=true).
Add Blob Metadata: Select the check box to add metadata to the Azure Blob.
Metadata Source: Specify where to take metadata values from. Select Original Document from the drop down if the values for the metadata fields are stored inside the original input document (Aspire Doc) or select Transformed Document if the values for the metadata fields are stored inside the transformed output document.
Metadata Name: Enter the name of the blob metadata field. If a name specified as a metadata field matches the delete flag name used for soft deletes, the value will be overwritten by the soft delete handler. This is a required value.
Metadata Value: Enter the field name or path (e.g. connectorSpecific/Author) inside Aspire or transformed document where to take the value from. This is a required value.
Default Value: Enter the default value to be used when the metadata value does not exist.
Debug: Select the check box to run the publisher in Debug mode.
Click Add.

Once you've clicked Add, it will take a moment for Aspire to download all of the necessary components (the Jar files) from the Maven repository and load them into Aspire. Once that's done, the publisher will appear in the Workflow Tree.

For details on using the Workflow section, please refer to Workflow introduction.

Page tree

Publish to Azure Blobs How to Configure

Step 1. Launch Aspire

Step 2. Open the Content Source Management Page

Step 3. Add a New Content Source

Step 4. Add a New Publish to Azure Blobs to the Workflow

Step 4a. Specify Publisher Information