The Field Mapper component takes one or more source fields and maps the result value into a destination field. The mapping can be made through various mechanisms such as: Simple Mapping, Multiple Source Mapping, Regex Mapping, Template Mapping, Date Format Mapping and Constant Mapping.

Training Material

If you're interested in learning more, here's a recording of the Tech Talk on the Field Mapping video Field Mapping slides

Configuration

All mappings done through this component will be added to the AspireObject under the <searchFields> tag.

ElementTypeDefaultDescription
mappingsxmlnoneXml that contains all the mappings.
debugbooleanfalseWhen true, debug logs will be print out.

Mappings Configuration

Simple Mapping

Takes the value inside the source field and maps it to the destination field.

ElementTypeDescription
sourceFieldstringThe source field of the mapping.
destinationFieldstringDestination field of the mapping.

Multiple Source Mapping

Takes multiple source fields and maps their content into a destination field. This type of mapping has two modes: Fallback and Concatenate.

  • Fallback Mode: Maps the first available field from the given source field list.
  • Concatenate Mode: Appends all source field values into one string. There three types of concatenations:
    • Blank Space: separates each value using a whitespace.
    • Multivalue field: separates each value using a semicolon (;).
    • Custom separator: separates each value using a custom character separator.

 

ElementTypeDescription
sourceFields/sourceFieldstringList of source fields.
destinationFieldstringDestination field of the mapping.
multipleMapTypestringMapping mode. Can be concatenationMode or fallbackMode.
concatenationTypestringConcatenation mode. Can be separatorString, multivalued and default.
separatorstringString to use as custom separator. For separatorString concatenation mode.

Template Mapping

Takes a Groovy template and replaces the field names with their values and maps it to a destination field.

ElementTypeDescription
sourceTemplatestringGroovy template to map multiple fields (e.g. "Hello: ${title}, by ${author}").
destinationFieldstringDestination field of the mapping.

Constant Mapping

Sets the destination field with a constant value.

ElementTypeDescription
constantValuestringThe constant value to set on the destination field.
destinationFieldstringDestination field of the mapping.

Regular Expression Mapping

Takes the source field value and tries to match with a regular expression pattern. There are two modes: Replace and Extract.

  • Replace:Replaces the matching string with a new value.
  • Extract:Extracts the matching string and sets it on the destination field.

 

ElementTypeDescription
sourceFieldstringThe source field of the mapping.
destinationFieldstringDestination field of the mapping.
regularExpressionMappingTypestringRegex mode. Can be extract or replace.
regexstringRegular expression to match the source field value. (e.g. \.(?<=\.).*$)
replaceValuestringReplace value for the string that matches the regular expression. Used for replace mode.

Date Format Mapping

Takes the source field date value, formats it using an output format and sets the new value into the destination field.

ElementTypeDescription
sourceFieldstringThe source field of the mapping.
destinationFieldstringDestination field of the mapping.
inputFormatstringThe date format of the source field value (e.g. yyyy-MM-dd'T'HH:mm:ss'Z').
outputFormatstringThe date format of the destination field value. (e.g. yyyy-MM-dd)

Configuration Example

<component name="FieldMapper" subType="default" factoryName="aspire-field-mapper">
  <mappings>
    <simpleMappings>
      <mapping>
        <sourceField>repItemType</sourceField>
        <destinationField>docTypeField</destinationField>
      </mapping>
    </simpleMappings>
    <multipleSourceMappings>
      <mapping>
        <sourceFields>
          <sourceField>url</sourceField>
          <sourceField>lastModified</sourceField>
          <sourceField>dataSize</sourceField>
        </sourceFields>
        <destinationField>concatField</destinationField>
        <multipleMapType>concatenationMode</multipleMapType>
        <concatenationType>multivalued</concatenationType>
      </mapping>
      <mapping>
        <sourceFields>
          <sourceField>fieldA</sourceField>
          <sourceField>FetchUrl</sourceField>
          <sourceField>lastModified</sourceField>
          <sourceField>fieldB</sourceField>
        </sourceFields>
        <destinationField>fallbackField</destinationField>
        <multipleMapType>fallbackMode</multipleMapType>
      </mapping>
    </multipleSourceMappings>
    <templateMappings>
      <mapping>
        <sourceTemplate>Source: ${sourceName} Type: ${sourceType}</sourceTemplate>
        <destinationField>templateField</destinationField>
      </mapping>
    </templateMappings>
    <constantMappings>
      <mapping>
        <destinationField>constantField</destinationField>
        <constantValue>constantValueField</constantValue>
      </mapping>
    </constantMappings>
    <regularExpressionMappings>
      <mapping>
        <sourceField>url</sourceField>
        <destinationField>regexField</destinationField>
        <regularExpressionMappingType>extract</regularExpressionMappingType>
        <regex>\.(?<=\.).*$</regex>
      </mapping>
    </regularExpressionMappings>
    <dateFormatMappings>
      <mapping>
        <sourceField>lastModified</sourceField>
        <inputFormat>yyyy-MM-dd'T'HH:mm:ss'Z'</inputFormat>
        <destinationField>simpleModified</destinationField>
        <outputFormat>yyyy-MM-dd</outputFormat>
      </mapping>
    </dateFormatMappings>
  </mappings>
  <debug>false</debug> 
</component>

Output Document Example

<doc>
  <url>C:\testdata\Search Engine Security Blog.docx</url>
  <snapshotUrl>002 C:\testdata\Search Engine Security Blog.docx</snapshotUrl>
  <docType>item</docType>
  <repItemType>aspire/file</repItemType>
  <fetchUrl>file:/C:/testdata/Search%20Engine%20Security%20Blog.docx</fetchUrl>
  <displayUrl>C:\testdata\Search Engine Security Blog.docx</displayUrl>
  <id>C:\testdata\Search Engine Security Blog.docx</id>
  <lastModified>2013-06-10T21:38:24Z</lastModified>
  <dataSize>194425</dataSize>
  <sourceName>FieldMapper-FSTest</sourceName>
  <sourceType>filesystem</sourceType>
  <connectorSource type="filesystem">
    <url>C:\testdata</url>
    <partialScan>false</partialScan>
    <subDirUrl />
    <indexContainers>false</indexContainers>
    <scanRecursively>false</scanRecursively>
    <useACLs>false</useACLs>
    <acls />
    <scanExcludedItems>false</scanExcludedItems>
    <fileNamePatterns />
    <displayName>FieldMapper-FSTest</displayName>
  </connectorSource>
  <action>add</action>
  <hierarchy>
    <item id="4CD52EBD58F51B94364D1CC77D878910" level="2" name="Search Engine Security Blog.docx" 
     url="C:\testdata\Search Engine Security Blog.docx">
      <ancestors>
        <ancestor id="7C070A1B17F7736EF883435C5AC053E2" level="1" name="FieldMapper-FSTest" parent="true" 
         type="aspire/filesystem" url="C:\testdata\" />
      </ancestors>
    </item>
  </hierarchy>
  <protocol source="FetchURLStage/protocol">file</protocol>
  <mimeType source="FetchURLStage/mimeType">content/unknown</mimeType>
  <extension source="FetchURLStage">
    <field name="modificationDate">2013-06-10T21:38:24Z</field>
    <field name="content-length">194425</field>
    <field name="last-modified">Mon, 10 Jun 2013 21:38:24 GMT</field>
    <field name="content-type">content/unknown</field>
  </extension>
  <searchFields>
    <constantField>constantValueField</constantField>
    <simpleModified>2013-06-10</simpleModified>
    <concatField>C:\testdata\Search Engine Security Blog.docx,2013-06-10T21:38:24Z,194425</concatField>
    <fallbackField>2013-06-10T21:38:24Z</fallbackField>
    <regexField>.docx</regexField>
    <docTypeField>aspire/file</docTypeField>	
    <templateField>Source: FieldMapper-FSTest Type: filesystem</templateField>
  </searchFields>
</doc>
  • No labels