Page tree
Skip to end of metadata
Go to start of metadata

The Hierarchy Extractor looks for the 'hierarchy' tag in a job, and when located, sends jobs to index any new parents, their fields and ACLs.

Hierarchy Extractor


Factory Namecom.searchtechnologies.aspire:aspire-hierarchy-extractor


InputsAspireObject with a 'hierarchy' tag
OutputsSend jobs to index any new parents, their fields and ACL.

Aspire Enterprise


acls/acl/@usergroupstring The user/group name for the ACL.
acls/acl/typestringAllowIndicates whether the user/group will have access to the crawled files. Options include: allow, deny.
acls/acl/entitystringgroupSpecifies if the ACL corresponds to a group or user. Options include: group, user.

If no fixed ACLs configured as above, then a union of parent plus children ACLs is going to be used as the ParentACLs, and each time a new child adds a new ACL to the Union, the parent job is going to be reindexed.

Branch Handler Configuration

This component publishes to the onAdd, onDelete and onUpdate, so a branch must be configured for each of these three events.

branches/branch/@eventstringThe event to configure - onAdd, onDelete or onUpdate.
branches/branch/@pipelineManagerstringThe name of the pipeline manager to publish to. Can be relative.
branches/branch/@pipelinestringThe name of the pipeline to publish to. If missing, publishes to the default pipeline for the pipeline manager.
branches/branch/@allowRemotebooleanIndicates if this pipeline can be found on remote servers (see Distributed Processing for details).
branches/branch/@batchingbooleanIndicates if the jobs processed by this pipeline should be marked for batch processing (useful for publishers or other components that support batch processing).
branches/branch/@batchSizeintThe max size of the batches that the branch handler will created.
branches/branch/@batchTimeoutlongTime to wait before the batch is closed if the batchSize hasn't been reached.
branches/branch/@simultaneousBatchesintThe max number of simultanous batches that will be handled by the branch handler.

Example Configurations


<component name="HierarchyExtractor" factoryName="aspire-hierarchy-extractor" subType="default">
      <branch event="onAdd" pipelineManager="." pipeline="addPipeline" batching="true"/>
      <branch event="onDelete" pipelineManager="." pipeline="deletePipeline" batching="true"/>

Fixed ACLs Configuration

<component name="HierarchyExtractor" factoryName="aspire-hierarchy-extractor" subType="default">
      <acl usergroup="mycompany\aaguilar">
      <acl usergroup="mycompany\stAllEmployees">
      <branch event="onAdd" pipelineManager="." pipeline="addPipeline" batching="true"/>
      <branch event="onDelete" pipelineManager="." pipeline="deletePipeline" batching="true"/>

Example Output

For every new parent found a job will be sent to the "onAdd" event of the branch handler:

<doc source="/HierarchyExtractor/Main/HierarchyExtractor">
    <item id="CDCE0D45AC20FDE62F5CEB6118643033" level="1" name="FSC" type="aspire/filesystem" url="C:\testdata\a\">
  <content>url:C:\testdata\a\ docId:CDCE0D45AC20FDE62F5CEB6118643033</content>
    <acl access="allow" domain="mycompany" entity="user" fullname="mycompany\aaguilar" name="aaguilar" scope="global"/>
    <acl access="deny" domain="mycompany" entity="group" fullname="mycompany\stAllEmployees" name="stAllEmployees" scope="global"/>

Parent Database Management

There are 5 servlet commands you can use to manage the parent database, avaliable from the debug console:

  • Reindex

    Resend the jobs to the "onAdd" event of the configured Branch Handler

  • Dump

    Creates a dump file of the database, that you can import later

  • Import

    Imports the data from a dump file from the file system

  • Clear

    Deletes all content from the database, you can decide if you want to send delete jobs to the "onDelete" branch of the configured Branch Handler.

  • Statistics

    Return the count of parents stored in the database

  • No labels