Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Complete the following steps in order to configure the Azure Blobs Amazon S3 Publisher.

Panel
titleOn this page

Table of Contents


Image RemovedImage Added

Step 1. Launch Aspire 


Launch Aspire (if it's not already running). See: Launch Control

Step 2. Open the Content Source Management Page

Browse to: http://localhost:50505.

For details on using the Aspire Content Source Management page, please refer to Admin UI.


Step 3. Add a New Content Source

For this step, please follow the procedure from the Configuration Tutorial of the connector of you choice. Refer to the Connector list


Step 4. Add a New Publish to 

Azure Blobs to

S3 to the WorkflowImage Modified

To add a Publish to Azure Blobs drag S3 drag from the Publish to Azure BlobsS3 rule from the Workflow Library and drop to the Workflow Tree where you want to add it.

This will automatically open the Publish to Azure BlobsS3 window for the configuration of the publisher.

Step 4a. Specify Publisher Information

 In the Publish to Azure Blobs window, specify the connection information needed to publish to Azure Blob storage.

  1. Storage Connection StringAWS Region: This is the connection string for the Azure Blob storage service, which contains three parts.
  2. Default Endpoints Protocol, with two possible values: http or https. For example: “DefaultEndpointsProtocol=http;”
  3. Account Name, which is the name of your Microsoft Azure storage account. For example: “AccountName=myAccount;”
  4. Account Key, which is the key associated to your Azure storage account. For example: “AccountKey=myKey;”
    AWS Region to connect.
  5. S3 BucketBlob Container Name: Enter the name of the container inside the Azure Blob storage where you want to publish your results.

  6. Clean container before full crawl: Mark this option if you want to clean the blob container before a full crawl.

  7. Use Transformation file: Select the check box to use a Groovy script (indicated in the JSON Transformation field) to transform the content and store the results of the transformation.
    Clear the check box to create the standard Aspire JSON content.  
    JSON Transformation: Enter the location of the file containing the JSON Transformation script. This script will be used to transform the data from Aspire as it is posted to the Azure Blob Container.
    See JSON Transformation Script
  8. bucket on which the documents will be stored.

  9. AWS Access Key: The Acces Key to authenticate against the S3 instance.

  10. AWS Access Secret: The AWS key secret to authenticate agains the S3 instance.  

  11. Folder prefix: Custom prefix that will be used to create an "folder hierarchy". Aspire will publish the files with the following name: "folderPrefix/jobId/fileName".

  12. Directory name format: The number of characters from the hash of the identifier to use for each directory in the store structure. The default value of "2,3" will give 2 levels of diretory, using the 5 lowest significant characters from the hash, the top level using the highest 2 significant characters of those 5 characters and the second level using the other 3. 
  13. Metadata Field: Field, as part of the user metadata, that will be used to store the metadata generated by Aspire for this S3 object. The metadata will be stored in JSON format. The final name will be "x-amz-meta-<Metadata Field>".
  14. Process Deletes: If enabled, all files removed from the content source will also be removed from the S3 instance. If disabled, the objects will not be removed from the S3 instance, even when they are deleted from the content source.
  15. Upload binary to S3 bucket: If checked, the actual file will be uploaded to the S3 bucket. If not selected, then only a metadata file (with a .meta suffix) will be uploaded instead, containting only the metadata and not the binary
  16. for details on the format of the script.
  17. Soft Delete: Select the check box to execute a soft delete (object mark as deleted) rather than a physical delete from Azure Blob.
  18. Delete Flag Name: Enter the name of the flag used to mark document as deleted in Azure Blob. The flag will be stored as part of the blob metadata (Deleted=true).
  19. Add Blob Metadata: Select the check box to add metadata to the Azure Blob.
  20. Metadata Source: Specify where to take metadata values from. Select Original Document from the drop down if the values for the metadata fields are stored inside the original input document (Aspire Doc) or select Transformed Document if the values for the metadata fields are stored inside the transformed output document.
  21. Metadata Name: Enter the name of the blob metadata field. If a name specified as a metadata field matches the delete flag name used for soft deletes, the value will be overwritten by the soft delete handler. This is a required value.
  22. Metadata Value: Enter the field name or path (e.g. connectorSpecific/Author) inside Aspire or transformed document where to take the value from. This is a required value.
  23. Default Value: Enter the default value to be used when the metadata value does not exist.
  24. Debug:  Select the check box to run the publisher in Debug mode.
  25. Click Add.


Once you've clicked Add, it will take a moment for Aspire to download all of the necessary components (the Jar files) from the Maven repository and load them into Aspire. Once that's done, the publisher will appear in the Workflow Tree.

Info

For details on using the Workflow section, please refer to Workflow introduction.