Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.



This section describes the Configuration for the Azure Identity Connector to be used by the Azure Identity Seeds. Aside from the Connector Description and Type, no other configuration values are required unless customization is necessary. If no changes are performed, default values are used.

Easy Heading Free
navigationTitleOn this Page
navigationExpandOptionexpand-all-by-default

Step 1. Open the Aspire Admin UI

Browse to the Aspire Admin UI. It is typically located at http://localhost:50505.

Step 2. Select the Connector Instances option from the left hand menu

The "Connector Instances" option, identified by a "connector" image   is located on the left side of the application, between the "Connections" and "Policies" options. Click on it to navigate to the "Connector Instances" page.

Step 3. Specify Connector Description and Type

Once on the "Connector" page, click on the the "+New" option to create a new Connector or select an existing one to modify it.

  • Description: specify a description for the Connector. It is advised for it to be concise and meaningful.
  • Type: select "Azure Identity" as the type for the Connector.













Step 4. Specify Connector General configuration

Once the type has been selected, you will be presented with the "General" section of the "Connector Instances" page. Here you need to enter the following information for the Connector:

  • Debug: enables/disables debug messages for the system.
  • Debug Workflow: enables/disables job logging.
  • Pipeline Statistics: enables/disables pipeline jobs statistics for the debug console.
  • Source Info Cache Size: number of "SourceInfo" objects kept in memory per seed.
  • Storage Maps Cache Size : number of map objects kept in memory per seed.
  • Storage Sets Cache Size : number of map objects kept in memory per seed.
  • Identity Cache Size: number of identities kept in memory per seed.

This page describes the configuration elements for this section.

Image Added

Step 5. Specify Text Extraction configuration

Once the type has been selected, you will be presented with The "Text Extraction" section is right below the "General" section of the "Connector Instances" page. Here you need to enter the following information for the Connector:Text Extraction is not performed for this connector so no configuration is necessary.



Step 6. Hierarchy configuration

The "Hierarchy" section is located below the "Text Extraction" section of the "Connector Instances" page. Hierarchy is not generated for this connector so no configuration is necessary.




Step 7. Specify Scanner configuration

The "Scanner" section is located below the "Hierarchy" section of the "Connector Instances" page. Details on the options to configure the Connector's scanner can be found on this page.

Image Added

Step 8. Specify Workflow configuration

The "Workflow" section is located below the "Scanner" section of the "Connector Instances" page. Details on the options to configure the Connector's workflow can be found on this page.

Image Added

Step 9. Specify Failed Documents configuration

The "Failed Documents" section is located below the "Workflow" section of the "Connector Instances" page. This connector does not perform Failed Documents Processing so no configuration is necessary.



Step 10

  • Enable Text Extraction: Specify the Client ID for the credential.
    • Override default settings: Specify the Client secret for the credential.
      • Maximum Size: Specify the Client secret for the credential.
      • Timeout: Specify the Client secret for the credential.
      • Nesting Max Depth : Specify the Client secret for the credential.
      • HTML Output : Specify the Client secret for the credential.
      • Apache Tika Configuration Path: Specify the Client secret for the credential.
      • Override PDFBox properties: Specify the Client secret for the credential.
        • Enable "Autospace": Specify the Client secret for the credential.
        • Enable "SupressDuplicateOverlappingText": Specify the Client secret for the credential.
        • Enable "ExtractAnnotationText": Specify the Client secret for the credential.
        • Enable "SortByPosition": Specify the Client secret for the credential.
        • Enable "ExtractAcroFormContent": Specify the Client secret for the credential.
        • Enable "ExtractInlineImages": Specify the Client secret for the credential.
        • Enable "ExtractUniqueInlineImagesOnly": Specify the Client secret for the credential.
      • Non-Text Document Filtering : Specify the Client secret for the credential.
        • Open data stream for non-text documents: Specify the Client secret for the credential.
        • Identify By: Specify the Client secret for the credential.
        • Non-text document extensions:
      • Metadata Mapping: Specify the Client secret for the credential.

Step 4. Specify a Throttling Policy (Optional)

On the "Policies" section of the "Credentials", you have the option to specify a previously defined throttling policy for the connections using this credential: just select the desired policy from the list of available policies.

Step 5

. Save the Connector

Click on the "Complete" button to save the new Connector (when updating, the button option will read "Save" instead of "Complete").