Aspire uses an xml file oriented schema. Every time a content source is created from the Content Management UI, a set of files are created; each of which contains a particular piece of configuration for the content source.
The configuration of a content source consists of four xml files that are located in the configuration folder. These files are:
${aspire.home}/config/content-sources
For every new content source configured, a new folder is created in the location specified above. The name of the content source folder will be a normalized version of the content source name specified from the UI.
The content source folder name normalization follows this set of rules:
To correctly load a content source, all four files are required on its configuration folder, otherwise the content source will not be loaded into the UI.
The general.xml file contains the general information associated with the content source: displayName, normalizedName (used to uniquely identify the content source), schedule and the current state (active or inactive) of the content source.
<contentSource active="true"> <schedule type="manually"/> <displayName>cifsTest</displayName> <normalizedName>cifsTest</normalizedName> </contentSource>
The connector.xml file contains the connector application definition and property values to install the connector into Aspire.
<application config="com.searchtechnologies.aspire:app-cifs-connector"> <properties> <property name="generalConfiguration">true</property> <property name="snapshotDir">${dist.data.dir}/${app.name}/snapshots</property> <property name="ExtractTextMaxSize">unlimited</property> <property name="disableTextExtract">false</property> <property name="workflowReloadPeriod">15s</property> <property name="workflowErrorTolerant">false</property> <property name="debug">true</property> </properties> </application>
The content-source.xml file contains the content source specific configuration to connect and perform a crawl against a given repository. Information like the repository url, the user credentials and any other repository configuration required by Aspire to crawl said repository.
<connectorSource> <url>smb://servername/AspireTesting/</url> <partialScan>true</partialScan> <subDirUrl>LSA</subDirUrl> <domain>search</domain> <username>jdoe</username> <password>pass1234</password> <indexContainers>true</indexContainers> <scanRecursively>true</scanRecursively> <fileNamePatterns> <include pattern=".*LSA.*"/> <exclude pattern=".*tmp.*"/> </fileNamePatterns> </connectorSource>
The workflow.xml file contains the workflow configuration of all workflow trees configured for the content source: afterScan, onPublish, onAddUpdate, onDelete and/or onError. This file contains the configuration of all rules set on each of these workflow trees. For more details see Workflow.
<?xml version="1.0" encoding="UTF-8"?> <workflow version="2"> <templates> <template id="Boolean (Byte array)" type="choice"> <description>Boolean test (from a byte array)</description> <ruleDescription>Tests if the value of '${field}' is '${value}'</ruleDescription> <dxf><dxf:template version="1.0" xmlns:dxf="http://www.searchtechnologies.com/DXF/2011"><properties><field display="Field name" type="string"><dxf:help>The field from the document to test</dxf:help></field><value display="Value" type="string"><dxf:help>The value to test for</dxf:help></value></properties></dxf:template></dxf> <script> f = doc.${field}?.getContent() if (f instanceof byte[]) f = new String(f) f == "${value}"</script> </template> <template id="JobTerminate" type="script"> <description>Terminates the job</description> <ruleDescription>Terminates the job</ruleDescription> <script>job.terminate()</script> </template> <template id="RaiseException" type="script"> <description>Raises an exception</description> <ruleDescription>Exception: "${msg}"</ruleDescription> <dxf><dxf:template version="1.0" xmlns:dxf="http://www.searchtechnologies.com/DXF/2011"><properties><msg display="Message" type="string"><dxf:help>The exception message</dxf:help></msg></properties></dxf:template></dxf> <script> import com.searchtechnologies.aspire.services.AspireException throw new AspireException("WorkflowException", "${msg}") </script> </template> <template id="SetStringValue" type="script"> <description>Assigns a string value to a field</description> <ruleDescription>Assigns the value "${value}" to the field ${field}</ruleDescription> <dxf><dxf:template version="1.0" xmlns:dxf="http://www.searchtechnologies.com/DXF/2011"><properties><field display="Field name" type="string"><dxf:help>The field from the document to modify</dxf:help></field><value display="Value" type="string"><dxf:help>The string value to set</dxf:help></value></properties></dxf:template></dxf> <script>doc.${field} = "${value}"</script> </template> <template id="Switch (Byte array)" type="choice"> <description>Switch (from a byte array)</description> <ruleDescription>Switches based on the value of '${field}'</ruleDescription> <dxf><dxf:template version="1.0" xmlns:dxf="http://www.searchtechnologies.com/DXF/2011"><properties><field display="Field name" type="string"><dxf:help>The field from the document to test</dxf:help></field></properties></dxf:template></dxf> <script> f = doc.${field}?.getContent() if (f instanceof byte[]) f = new String(f) f</script> </template> </templates> <rules> <rule appName="/PostToCIFS" config="com.searchtechnologies.aspire:app-publish-to-file" id="1" type="application"> <description>PublishToFile</description> <properties> <property name="logFile">log/${app.name}/publishToFile.jobs</property> <property name="debug">false</property> <property name="numJobs">5</property> </properties> </rule> </rules> <plans> <plan name="onPublish"> <reference rid="1"/> </plan> </plans> <applications> <application config="com.searchtechnologies.aspire:app-publish-to-file" name="/PostToCIFS"> <properties> <property name="logFile">log/${app.name}/publishToFile.jobs</property> <property name="debug">false</property> <property name="numJobs">5</property> </properties> </application> </applications> </workflow>
Aspire allows users to create and add content sources directly from the configuration files without the need of saving the content sources through the UI and without restarting Aspire. This is possible due to the synchronization functionality introduced with ZooKeeper for failover. See Failover for Aspire using Zookeeper for more information.
To add a content source configuration: