Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.

The Regex Metadata Splitter stage parses fields with semi-colon separated list and creates <val> entry in order to feed multi-value fields. Unlike the default subtype, values are only moved to the output element if they match a regular expression.

Regex Splitter (Aspire 2)
Factory Name com.searchtechnologies.aspire:aspire-tools
subType regexSplitter
Inputs AspireObject that has xPath and delimeter element indicating input element to split
Outputs AspireObject


Specify xPath element in the AspireObject e.g., //category.
The regular expression must be matched for the split field to be moved to the output.
The name of the output element that will be created in the document.

Sample Configuration

 <component name="regexSplitter" subType="regex" factoryName="aspire-splitter">
   <xPath regex="^[A-Za-z ]*Keywords" output="pg_indexterm_classifications">/doc/pgClassifications_expanded</xPath>
   <xPath regex="^[A-Za-z ]*Keywords" output="p_indexterm_classifications">/doc/pClassifications_expanded</xPath>
   <xPath regex="^[A-Za-z ]*Keywords" output="pv_indexterm_classifications">/doc/pvClassifications_expanded</xPath>
   <xPath regex="^[A-Za-z ]*Keywords" output="ma_indexterm_classifications">/doc/maClassifications_expanded</xPath>

For example

Input fields:

 <maClassifications_expanded source="clasificationExpander">Scottish Keywords;Scottish Keywords/GAELIC LANGUAGE;Scottish Keywords/GAELIC LANGUAGE PROGRAMMES</maClassifications_expanded>

Output fields:

 <ma_indexterm_classifications source="RegexSplitter" tagName="maClassifications_expanded">
   <val>Scottish Keywords</val>
   <val>Scottish Keywords/GAELIC LANGUAGE</val>
   <val>Scottish Keywords/GAELIC LANGUAGE PROGRAMMES</val>