Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The general

...

principle of the

...

Publisher Framework is the same as

...

the Connector Framework. A generic publisher component calls a repository-specific provider to access the

...

repository where content will be published.

  • For standard “targets” (Solr, Elasticsearch,

...

  •  SharePoint via Stager etc.), the

...

  • publisher has been most likely already created and is now a part of standard Aspire version.
  • You may publish to a customer

...

  • -specific target. In this case, you only need to consider how to perform actions at

...

  • the target, rather than

...

  • considering all

...

  • available general functionality (such as when a new batch should be used). 
  • All common functionality (connections, batch handling, commit/clear jobs, etc.) is handled by the framework

...

  • , which uses call methods in the provider as required. 

...

Panel
titleOn this page

Table of Contents



How it Works


  1. Select a publisher jar to load

...

  1. . This is a component (aspire-XXX-publisher).

...

  1.  
  2. A common app bundle

...

  1. is loaded automatically

...

  1. .
  2. The app bundle loads the publisher framework jar and the originally requested

...

  1.  provider. 
  2. The framework

...

  1. may perform optional

...

  1. Groovy or

...

  1. XML transforms; and the appropriate parameters are collected by the framework

...

  1. .

...


  1. Where possible, connections are pooled.


Developer

...

Settings
Anchor
devSettings
devSettings


Similar to

...

“SourceInfo”, the framework uses

...

PublisherInfo”. This holds information that is used to connect to

...

a target repository (

...

URL, username, password etc.)

...

and also controls the framework functionality

...

. For example, if the framework allows for a transformation

...

that a connector

...

does not require

...

, you can disable it.

...

You can extend this, if required

...

.

The framework allows for the following configuration

...

:

  • When a connection is required

  • Pool connections 

    • True/false that connections should be pooled 

  • Use transform 

    • Transform type – none/xml/json + default transform file (to pick out of component) 

  • Supports authentication 

    • http(s)

...

  • A set of properties in the

...

  • publisher controls the DXF in the common app bundle and the options in the framework.

...

  •  

    • Use these to control the app bundle loaded (

...

    • via the aspire application), and the configuration of the component “publisher info” (for example, to control

...

    • whether or not the publisher supports “clear” and “commit” operations

...

    • )

  • For other specific DXF properties, define your own resources/dxf/publisher.xml file. Your properties will be merged with the common ones.

Component Properties

Control of the common DXF is by way of a new properties file resources/aspire.properties that is added to the component. This file allows you to add properties.

  • These properties are passed to the DXF allowing control of the options shown to the installer.

The developer controls the DXF by setting the properties in resources/aspire.properties like in this example:

Code Block
component.appbundle.maven.coordinates=app-pap-publisher
#component.<SUBTYPE>.dxf=xxxx
component.default.dxf=dxf/publisher.xml
publisher.framework.isPAP=true

publisher.framework.dxf.merge=true
publisher.framework.dxf.merge.top=true

publisher.framework.dxf.url=true
publisher.framework.dxf.credentials=true
publisher.framework.dxf.startEnd=true
publisher.framework.dxf.transform=true
publisher.framework.dxf.connection=true
publisher.framework.dxf.dumpIndex=false


Installation Settings

...


Installation settings are collected when the component is installed.

...

Required items include location and connection (user/password) details of the target. The

...

intent is that only options that the developer has enabled (in the

...

Developer Settings) will be presented to the user.

...

These settings are collected using DXF. A publisher-specific DXF is merged with a common piece to present the entire set.

The framework collects the following parameters: 

  • Target URL 

    • The

...

    • URL for the search engine, etc. 

  • Authentication

    • Yes/

...

    • No/

...

    • Type 

    • Gather username/password 

  • Clear before full crawls 

    • True/

...

    • False. If true, the publisher will

...

    • start jobs for full

...

    • crawls by calling a clear method 

  • Commit after crawls 

    • True/

...

    • False. If true, the publisher will

...

    • end jobs for crawls by calling a commit method 

  • Transform data before sending 

    • True/

...

    • False 

  • Transform file name 

    • For cases when transformation is required

...


Implementation


  1. On startup, the framework connects to the provider (PAP) and calls a method “newPublisherInfo”.
  2. This returns a class (much like the SourceInfo), holding all of the configuration for the publisher (including the common options – perform clear ,etc.). 
    • This can be passed to other calls later.
    • If required, connection pools will be

...

    • initialized here. 
  1. When processing a document, the framework first

...

  1. categorizes the job

...

  1. into “control” or “document”. 
    • “Control” jobs are commit and clear

...

    • The framework calls the provider’s commit or clear methods (if enabled and processing is selected); passing a connection as required. 
    • For “document” jobs, the

...

    • framework determines whether a new batch is required and calls the provider’s startBatch

...

    • method. 
    • The framework provides “standard” batch implementations.

...

  1. The provider establishes the specific type of job (add/update, delete or delete by query) and

...

  1. calls the appropriate provider method. 
  2. Closing the component

...

  1. releases all of the connections.

...

Component properties

...



Code Block
titlePAP class example
package com.searchtechnologies.aspire.simplefile;

import com.searchtechnologies.aspire.framework.ComponentImpl;
import com.searchtechnologies.aspire.publisher.services.PublisherAccessProvider;
import com.searchtechnologies.aspire.publisher.services.PublisherBatch;
import com.searchtechnologies.aspire.publisher.services.PublisherInfo;
import com.searchtechnologies.aspire.publisher.services.PublisherRepositoryConnection;
import com.searchtechnologies.aspire.publisher.services.queryexpr.DeleteByQuery;
import com.searchtechnologies.aspire.services.AspireException;
import com.searchtechnologies.aspire.services.AspireObject;
import com.searchtechnologies.aspire.services.Job;
import org.w3c.dom.Element;

import java.io.IOException;
import java.io.Writer;

public class SimpleFilePAP extends ComponentImpl implements PublisherAccessProvider {

  @Override
  public void initialize(Element element) throws AspireException {
  }


  @Override
  public PublisherInfo newPublisherInfo(Element cfg) {
    SimpleFilePublisherInfo publisherInfo = new SimpleFilePublisherInfo();
    publisherInfo.initialize(cfg);
    return publisherInfo;
  }

  @Override
  public void processClear(PublisherRepositoryConnection conn, Job j, PublisherInfo publisherInfo) {
    info("Clear job");
  }

  @Override
  public void processCommit(PublisherRepositoryConnection conn, Job j, PublisherInfo publisherInfo) {
    info("Commit job");
  }

  @Override
  public void startBatch(PublisherBatch batch, PublisherInfo publisherInfo) {
    info("Batch start");
    // Add the header
    try {
      Writer w = (Writer) batch.getBatchConnection().connection();
      w.write("<docs>\n");
      w.flush();
    } catch (IOException e) {
      throw new AspireException("SimpleFilePublisher.IOException-writeToStream", e, "IOException writing %s", publisherInfo.getUrl());
    }
  }

  @Override
  public void endBatch(PublisherBatch batch, PublisherInfo publisherInfo) {
    info("Batch end");
    // Add the footer
    try {
      Writer w = (Writer) batch.getBatchConnection().connection();
      w.write("</docs>\n");
      w.flush();
    } catch (IOException e) {
      throw new AspireException("SimpleFilePublisher.IOException-writeToStream", e, "IOException writing %s", publisherInfo.getUrl());
    }
  }

  @Override
  public void processAddUpdate(PublisherBatch batch, Job j, PublisherInfo publisherInfo) {
    info("Batch process");
    Writer w = (Writer) batch.getBatchConnection().connection();
    // Write out the document from the job
    j.get().toXmlString(AspireObject.PRETTY, w);
  }

  @Override
  public void processDelete(PublisherBatch batch, Job j, PublisherInfo publisherInfo) {

  }

  @Override
  public void processDeleteByQuery(PublisherBatch batch, DeleteByQuery deleteByQuery, Job j, PublisherInfo publisherInfo) {

  }

  @Override
  public void close() {

  }

}