The Hierarchy Extractor looks for the 'hierarchy' tag in a job, and when located, sends jobs to index any new parents, their fields and ACLs.
Hierarchy Extractor
| |
---|---|
Factory Name | com.searchtechnologies.aspire:aspire-hierarchy-extractor |
subType | default |
Inputs | AspireObject with a 'hierarchy' tag |
Outputs | Send jobs to index any new parents, their fields and ACL. |
Element | Type | Default | Description |
---|---|---|---|
acls/acl/@usergroup | string | The user/group name for the ACL. | |
acls/acl/type | string | Allow | Indicates whether the user/group will have access to the crawled files. Options include: allow, deny. |
acls/acl/entity | string | group | Specifies if the ACL corresponds to a group or user. Options include: group, user. |
If no fixed ACLs configured as above, then a union of parent plus children ACLs is going to be used as the ParentACLs, and each time a new child adds a new ACL to the Union, the parent job is going to be reindexed.
This component publishes to the onAdd, onDelete and onUpdate, so a branch must be configured for each of these three events.
Element | Type | Description |
---|---|---|
branches/branch/@event | string | The event to configure - onAdd, onDelete or onUpdate. |
branches/branch/@pipelineManager | string | The name of the pipeline manager to publish to. Can be relative. |
branches/branch/@pipeline | string | The name of the pipeline to publish to. If missing, publishes to the default pipeline for the pipeline manager. |
branches/branch/@allowRemote | boolean | Indicates if this pipeline can be found on remote servers (see Distributed Processing for details). |
branches/branch/@batching | boolean | Indicates if the jobs processed by this pipeline should be marked for batch processing (useful for publishers or other components that support batch processing). |
branches/branch/@batchSize | int | The max size of the batches that the branch handler will created. |
branches/branch/@batchTimeout | long | Time to wait before the batch is closed if the batchSize hasn't been reached. |
branches/branch/@simultaneousBatches | int | The max number of simultanous batches that will be handled by the branch handler. |
<component name="HierarchyExtractor" factoryName="aspire-hierarchy-extractor" subType="default"> <branches> <branch event="onAdd" pipelineManager="." pipeline="addPipeline" batching="true"/> <branch event="onDelete" pipelineManager="." pipeline="deletePipeline" batching="true"/> </branches> </component>
<component name="HierarchyExtractor" factoryName="aspire-hierarchy-extractor" subType="default"> <acls> <acl usergroup="mycompany\aaguilar"> <type>allow</type> <entity>user</entity> </acl> <acl usergroup="mycompany\stAllEmployees"> <type>deny</type> <entity>group</entity> </acl> </acls> <branches> <branch event="onAdd" pipelineManager="." pipeline="addPipeline" batching="true"/> <branch event="onDelete" pipelineManager="." pipeline="deletePipeline" batching="true"/> </branches> </component>
For every new parent found a job will be sent to the "onAdd" event of the branch handler:
<doc source="/HierarchyExtractor/Main/HierarchyExtractor"> <hierarchy> <item id="CDCE0D45AC20FDE62F5CEB6118643033" level="1" name="FSC" type="aspire/filesystem" url="C:\testdata\a\"> <ancestors/> </item> </hierarchy> <id>C:\testdata\a\</id> <url>C:\testdata\a\</url> <fetchUrl>C:\testdata\a\</fetchUrl> <action>add</action> <md5>CDCE0D45AC20FDE62F5CEB6118643033</md5> <mimeType>aspire/filesystem</mimeType> <lastModified>2014-03-21T17:44:20Z</lastModified> <dataSize>0</dataSize> <content>url:C:\testdata\a\ docId:CDCE0D45AC20FDE62F5CEB6118643033</content> <sourceName>FSC</sourceName> <sourceType>filesystem</sourceType> <acls> <acl access="allow" domain="mycompany" entity="user" fullname="mycompany\aaguilar" name="aaguilar" scope="global"/> <acl access="deny" domain="mycompany" entity="group" fullname="mycompany\stAllEmployees" name="stAllEmployees" scope="global"/> </acls> </doc>
There are 5 servlet commands you can use to manage the parent database, avaliable from the debug console:
Resend the jobs to the "onAdd" event of the configured Branch Handler
Creates a dump file of the database, that you can import later
Imports the data from a dump file from the file system
Deletes all content from the database, you can decide if you want to send delete jobs to the "onDelete" branch of the configured Branch Handler.
Return the count of parents stored in the database