The Avro Builder provides the following functionality

  • Generates an Avro file that the Aspire workflows publish to after completing data manipulation tasks on each job (document).
  • Avro files will “roll over” based on internal batching or file size as configured by the administrator.



Avro Builder
Factory Name
com.searchtechnologies.aspire:app-publish-to-avro
subType


InputsAspireObject from a connector's job with metadata and content extracted from a specific file
OutputsAvro representation of AspireObject sent to an Avro file.

Configuration

Element
Type
Default
Description
avroSchemastring
Avro schema path
outputNonBatchFilestring

Output file path

nonBatchFileMaxSizeint600Max file size (MB)
outputBatchDirNamestring
Output directory path for internal batch files
maxInternalBatchSizeint1000Batch size
timeRolloverThresholdlong86400Interval after which rollover should happen in sec
rolloverFileLocationstring
The directory where Avro files should be moved
debugbooleanfalseIf true it will log debug information from the component

Example Configuration

Internal batches

    <application config="com.searchtechnologies.aspire:app-publish-to-avro" name="/Aspire_Publish_To_Avro_Application">
      <properties>
        <property name="useSizeLimited">false</property>
        <property name="debug">false</property>
        <property name="maxInternalBatchSize">5</property>
        <property name="outputBatchDirName">dirName</property>
        <property name="useInternalBatch">true</property>
        <property name="avroSchema">c:\tmp\schema.avsc</property>
		<property name="timeRolloverThreshold">5</property>
		<property name="rolloverFileLocation></property>
		<property name="OutputType">internalBatches</property>
      </properties>
    </application>

Size limited

    <application config="com.searchtechnologies.aspire:app-publish-to-avro" name="/Aspire_Publish_To_Avro_Application">
      <properties>
        <property name="useSizeLimited">true</property>
        <property name="debug">false</property>
        <property name="nonBatchFileMaxSize">50</property>
        <property name="outputNonBatchFile">c:\avro\avro.snappy</property>
        <property name="useInternalBatch">false</property>
        <property name="avroSchema">c>\tmp\schema.avsc</property>
		<property name="timeRolloverThreshold">5</property>
		<property name="rolloverFileLocation></property>
		<property name="OutputType">sizeLimited</property>
      </properties>
    </application>


  • No labels