Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.



Step 1. Launch Aspire and Open the Content Source Management Page

Launch Aspire (if it's not already running). See:


Step 2. Add

a New Content Source

or select a Workflow

  • Add a new workflow or open an existing workflow.
  • For this step, please
follow the procedure from the Configuration Tutorial of the connector of you choice. Refer to the Connector list


Step 3. Add

a New Publish to Azure Blobs

the Azure Blob Publisher to the Workflow

  • Select the event for which you want to add the Azure Blob Publisher to, from the Event combo.
  • To add an Azure Blob Publisher, drag the Azure Blob Publisher from the Rules Section on the right side of the screen and drop it below the Workflow Event to the left side of the screen. This will automatically open the Azure Blob Publisher window for the configuration of the publisher.


Step 3a. Specify a description for the publisher

In the top section of the Azure Blob Publisher configuration window, specify the description for the publisher.

Step 3b. Specify Connection Configuration

In the connection section, specify the connection information needed to publish to Azure Blob storage.

  • Storage Connection String: This is the connection string for the Azure Blob storage service, which contains four parts.
    • Default Endpoints Protocol, with two possible values: http or https. For example: “DefaultEndpointsProtocol=http;”
    • Account Name, which is the name of your Microsoft Azure storage account. For example: “AccountName=myAccount;”
    • Account Key, which is the key associated to your Azure storage account. For example: “AccountKey=myKey;”
    • BlobEndpoint, indicates the URL for the blob storage repository. For example:  “BlobEndpoint=http://mystorageaccount.blob.core.windows.net
  • Blob Container Name: Enter the name of the container inside the Azure Blob storage where you want to publish your results.
  • Clean container before full crawl: Mark this option if you want to clean the blob container before a full crawl.

Step 3c. Specify Binary Objects Configuration (Optional)

In the Binary Objects section, specify the Binary Objects you want to upload related to the connection configuration.

  • Upload binary objects: Mark this option if you want to upload the binary objects together with the json objects. For the Binary Objects to be successfully uploaded you need to disable the extract text option in the Connector settings.
    • Use only one extension: Mark this option if you want all the binary objects to be uploaded using the same file extension. The file extension to be used is specified in the next field "Unique binary file extension". If the option is not marked, then the uploaded files will keep their original file extension.
    • Unique binary file extensions: if the option "Use only one extension" is marked, then this is the used file extension for all uploaded binary objects.
    • File extension exclude list: This is a comma separated list for all the file extensions you want to exclude from the upload binary process. Add the extensions of the files that you don't want to be uploaded into this list.

Step 3d. Specify Transform Documents Configuration (Optional)

In the Transform Documents section, specify your transform groovy script location, by default it uses the local transform file or Aspire Standard Transform Script.

  • Use Local Transform File: Mark this option to specify a Transform script (indicated in the JSON Transformation field) to transform the content and store the results of the transformation and unmark this option to specify an uploaded resource file
    • By default, it uses the local aspire script to create the standard Aspire JSON content.  
    • Groovy Transform: Enter the location of the file containing the JSON Transformation Script. This script will be used to transform the data from Aspire as it is posted to the Azure Blob Server


Step 3e. Specify Metadata Configuration (Optional)

In the Metadata section, specify the metadata configuration.

  • Soft Delete: Mark this option to execute a soft delete (object mark as deleted) rather than a physical delete from Azure Blob.
    • Delete Flag Name: Enter the name of the flag used to mark the document as deleted in Azure Blob. The flag will be stored as part of the blob metadata
  • Add Blob Metadata: Mark this option to add metadata to the Azure Blob.
    • Metadata Name: Enter the name of the blob metadata field. If a name specified as a metadata field matches the delete flag name used for soft deletes, the value will be overwritten by the soft delete handler. This is a required value.
    • Metadata Value: Enter the field name or path (e.g. connectorSpecific/Author) inside Aspire or transformed document where to take the value from. This is a required value.
    • Default Value: Enter the default value to be used when the metadata value does not exist.

Step 3f. Specify Debug Configuration (Optional)

In the Debug section of the Azure Blob Publisher configuration, specify the Debug flag.

  • Debug: Check to enable debug mode to show debug messages from the publisher.


Step 3g. Click on the Add button

Once you click the add button, the Elasticsearch Publisher settings will be saved.