For use when developing an application with Saga Library embedded, and connecting to ElasticSearch managed by the Saga Server.

Prerequisites

This tutorial assumes:

  • The reader has the ability to create a project with Maven Framework support.

  • The data that Saga will use is managed through the Saga user interface.

  • Java 11+ is installed in the machine. (Java 17 for SAGA 1.3.3/1.3.4)

Configure pom.xml

In the pom.xml of the project, a basic example of the minimum configuration can be found below. The elementary section of this configuration is the dependencies section, where we need two main libraries: saga-library and saga-elastic-provider.

saga-librarysaga-elastic-provider
The core library of Saga, this dependency includes the Engine, Stages, Tag Manager, Pipeline Manager and Resource Manager which are all of the parts necessary to use Saga in any application.

This dependency will grant access to ElasticSearch as a provider for Saga, which means our Stages and Managers will be able to fetch the data directly from this provider.

More providers will be available in the future, but to use Saga full functionality, we recommend the use of the saga-elastic-provider.

Other Configuration Considerations

Other important configuration considerations of note are the use of Java 11 for the compilation of the code, and the encoding UTF-8 as shown in the lines 36-38. (Change it to 17 if using SAGA 1.3.3/1.3.4, also change the 1.3.1 version and saga.version to 1.3.3/1.3.4, lines 6 and 10)

pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.accenture.saga</groupId>
    <artifactId>saga-howto</artifactId>
    <version>1.3.1</version>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <saga.version>1.3.1</saga.version>
    </properties>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.8.0</version>
                <executions>
                    <execution>
                        <id>compile</id>
                        <phase>compile</phase>
                        <goals>
                            <goal>compile</goal>
                        </goals>
                    </execution>
                    <execution>
                        <id>testCompile</id>
                        <phase>test-compile</phase>
                        <goals>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
                <configuration>
                    <source>11</source>
                    <target>11</target>
                    <encoding>${project.build.sourceEncoding}</encoding>
                </configuration>
            </plugin>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>3.1.0</version>
                <configuration>
                    <archive>
                        <manifest>
                            <mainClass>com.accenture.saga.server.SagaServer</mainClass>
                        </manifest>
                    </archive>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                    <appendAssemblyId>false</appendAssemblyId>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id> <!-- this is used for inheritance merges -->
                        <phase>package</phase> <!-- bind to the packaging phase -->
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

    <dependencies>
        <dependency>
            <groupId>com.accenture.saga</groupId>
            <artifactId>saga-library</artifactId>
            <version>${saga.version}</version>
            <scope>compile</scope>
        </dependency>

        <dependency>
            <groupId>com.accenture.saga</groupId>
            <artifactId>saga-elastic-indexer</artifactId>
            <version>${saga.version}</version>
            <scope>compile</scope>
        </dependency>

        <dependency>
            <groupId>com.accenture.saga</groupId>
            <artifactId>saga-elastic-provider</artifactId>
            <version>${saga.version}</version>
            <scope>compile</scope>
        </dependency>
    </dependencies>
</project>

Initializing the Saga Components

1. Begin by creating a main class which will hold a SagaEngine, ResourceManager, TagManager and PipelineManager.

package com.accenture.saga;

import com.accenture.saga.engine.PipelineManager;
import com.accenture.saga.engine.SagaEngine;
import com.accenture.saga.resourcemgr.ResourceManager;
import com.accenture.saga.tags.TagManager;

public class Main {

    SagaEngine engine;
    ResourceManager resourceManager;
    TagManager tagManager;
    PipelineManager pipelineManager;

    /**
     * Constructor
     */
    public Main() {
        
        
    }
    
    public static void main(String[] args) {
        Main _instance = new Main();
    }
}


2. Create and configure the ResourceManager.

3. Add a provider to it.  (This configuration will be hard-coded,) 

4. Add the SagaJsonFactory class, which allows us to create SagaJson objects (the standard document of Saga) from text, files or readers.

The configuration we are going to use for the provider is the following:

"resource": "saga-provider:saga_pipelines"


Each field from the top, starting with the common:

  • name ( type=string | required ) - The name we are going to use for the provider. It doesn't which name you use, but our take is "saga-provider"

  • type ( type=string | required ) - Indicates the type of provider we are using, in this case since we are using saga-elastic-provider, it's type would be "Elastic"

From here on, all of the properties are specific to saga-elastic-indexer:

  • shema ( type=string | default=http | optional ) - Schema for the url to Elasticsearch

  • hostnamesAndPorts ( type=string array | default=["localhost:9200"] | required ) - A list of hosting server names and theirs ports
  • timestamp ( type=string | default=updatedAt | optional ) - Name of the field reflecting any change done to the data

  • exclude ( type=string array | optional ) - Name of the fields omitted (when possible) from the response of ElasticSearch

Our code should look this this:

public Main(String text, List<String> tags) throws SagaException {

    resourceManager = new ResourceManager();

    resourceManager.registerProvider(
            SagaJsonFactory.getInstance(
                    "{" +
                            "    \"name\": \"saga-provider\"," +
                            "    \"type\": \"Elastic\"," +
                            "    \"indexName\": \"saga\"," +
                            "    \"nodeUrls\": [\"localhost:9200\"]," +
                            "    \"authentication\": \"none\"," +
                            "    \"timestamp\": \"updatedAt\"," +
                            "    \"exclude\": [" +
                            "      \"updatedAt\"," +
                            "      \"createdAt\"" +
                            "    ]" +
                            "}"
            )
    );

}


5. Next we proceed with the configuration of the TagManager below the ResourceManager.  (Hard-code the configuration.)

"resource": "saga-provider:saga_tags"


In the configuration above, saga-provider is representing the provider we added to the ResourceManager in the previous configuration.  Then the colon (:) indicates the division between the provider and the actual resource. 

Since we are using a saga-elastic-provider, the resources will be index names. 

Since we are connecting to a Saga index created by the Saga Server, all of the indexes will be a combination of the solution's name (usually will be saga), an underscore (_) and the type of data the index holds (in this case, tags forming the name saga_tags).

The code should look like this now:

public Main(String text, List<String> tags) throws SagaException {

    resourceManager = new ResourceManager();

    resourceManager.registerProvider(
            SagaJsonFactory.getInstance(
                    "{" +
                            "    \"name\": \"saga-provider\"," +
                            "    \"type\": \"Elastic\"," +
                            "    \"indexName\": \"saga\"," +
                            "    \"nodeUrls\": [\"localhost:9200\"]," +
                            "    \"authentication\": \"none\"," +
                            "    \"timestamp\": \"updatedAt\"," +
                            "    \"exclude\": [" +
                            "      \"updatedAt\"," +
                            "      \"createdAt\"" +
                            "    ]" +
                            "}"
            )
    );

    tagManager = new TagManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_tags\"}"));
}

As you can see, the TagManager receives the ResourceManager as a parameter, which grants us access to the resource saga-provider:saga_tags.


6. Similarly, we will proceed to configure the PipelinesManager using the following configuration:

"resource": "saga-provider:saga_pipelines"


Once again, saga-provider references the provider added in the ResourceManager, and saga_pipelines is a combination of the name of the solution's name and the type of data (in this case, pipelines).

public Main(String text, List<String> tags) throws SagaException {

    resourceManager = new ResourceManager();

    resourceManager.registerProvider(
            SagaJsonFactory.getInstance(
                    "{" +
                            "    \"name\": \"saga-provider\"," +
                            "    \"type\": \"Elastic\"," +
                            "    \"indexName\": \"saga\"," +
                            "    \"nodeUrls\": [\"localhost:9200\"]," +
                            "    \"authentication\": \"none\"," +
                            "    \"timestamp\": \"updatedAt\"," +
                            "    \"exclude\": [" +
                            "      \"updatedAt\"," +
                            "      \"createdAt\"" +
                            "    ]" +
                            "}"
            )
    );

    tagManager = new TagManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_tags\"}"));

    pipelineManager = new PipelineManager(resourceManager,  SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_pipelines\"}"));
}

Setting Up the Engine

After we have set up the Resource, Tag and Pipeline Manager, we can assign the ResourceManager and the TagManager to the Engine.

public Main(String text, List<String> tags) throws SagaException {

    resourceManager = new ResourceManager();

    resourceManager.registerProvider(
            SagaJsonFactory.getInstance(
                    "{" +
                            "    \"name\": \"saga-provider\"," +
                                "    \"type\": \"Elastic\"," +
                                "    \"indexName\": \"saga\"," +
                                "    \"nodeUrls\": [\"localhost:9200\"]," +
                                "    \"authentication\": \"none\"," +
                                "    \"timestamp\": \"updatedAt\"," +
                                "    \"exclude\": [" +
                                "      \"updatedAt\"," +
                                "      \"createdAt\"" +
                                "    ]" +
                            "}"
            )
    );

    tagManager = new TagManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_tags\"}"));

    pipelineManager = new PipelineManager(resourceManager,  SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_pipelines\"}"));

    engine = new SagaEngine();

    engine.setResourceManager(resourceManager);
    engine.setTagManager(tagManager);
}

Working with the Pipeline Manager

We have two options:*

Option 1:  Let the PipelineManager build the pipeline for us using a set of tags (that we will provide).

Option 2:  Manually provide a complete pipeline configuration to the PipelineManager.

Pros & Cons

Option 1: Automatic Pipeline

Option 2: Manual Pipeline

Pros


  • Uses configuration set up through Saga's UI
  • Loads only the necessary from the resource (tags, stages, ...)
  • Builds pipeline based on tag dependency
  • Can generate multiple and different pipelines


  • Each Recognizer can have a base pipeline as dependency
  • Complete control over the flow of the data
Cons


  • Pipelines not always the most efficient (...yet)
  • Each base pipeline must be configure manually (... for the moment)




  • Configuration of every stage must be done manually
  • Relies strongly in the knowledge of the user for each possible stage configuration
  • Lack of flexibly when changing to another pipeline
Tie
  • Needs a stage of type TextBlockReader configured manually
  • Needs a stage of type TextBlockReader configured manually


*Since Option 1 is more flexible and makes use of the configuration in ElasticSearch, we will use that one.

Request a Pipeline

Before asking the PipelineManager for a pipeline, we need to provide a stage of type TextBlockReader.  At the moment, we only have one stage of that type, the SimpleReaderStage, which requires a splitRegex in the configuration as a SagaJson object.

1.  Let's add that to the code.

public Main(String text, List<String> tags) throws SagaException {

    resourceManager = new ResourceManager();

    resourceManager.registerProvider(
            SagaJsonFactory.getInstance(
                    "{" +
                            "    \"name\": \"saga-provider\"," +
                                "    \"type\": \"Elastic\"," +
                                "    \"indexName\": \"saga\"," +
                                "    \"nodeUrls\": [\"localhost:9200\"]," +
                                "    \"authentication\": \"none\"," +
                                "    \"timestamp\": \"updatedAt\"," +
                                "    \"exclude\": [" +
                                "      \"updatedAt\"," +
                                "      \"createdAt\"" +
                                "    ]" +
                            "}"
            )
    );

    tagManager = new TagManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_tags\"}"));

    pipelineManager = new PipelineManager(resourceManager,  SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_pipelines\"}"));

    //The Fun Part

    engine = new SagaEngine();

    engine.setResourceManager(resourceManager);
    engine.setTagManager(tagManager);


    SimpleReaderStage simpleReaderStage = new SimpleReaderStage(engine, SagaJsonFactory.getInstance("{ \"splitRegex\": \"[\\r|\\n]+\"}"));
}
  • splitRegex ( type=string | required ) - Regex pattern used to split the text stream into text blocks

With the regex [\r\n]+ we are indicating the character signaling a break line,  Also note that the SimpleReaderStage receives the engine as the first parameter.

The regex [\r\n]+, is the standard for mostly all the text you will be processing

Now we can ask the PipelineManager to build a pipeline for the tags.  We still don't know where they came from, but let's fix that.


2.  Add the building of the pipeline.

pipelineManager.buildPipelineFor(engine, tags, simpleReaderStage);


3.  Add the tags as a parameter of the constructor.  Our code should look like this:

package com.accenture.saga;

import com.accenture.saga.engine.PipelineManager;
import com.accenture.saga.engine.SagaEngine;
import com.accenture.saga.engine.stages.SimpleReaderStage;
import com.accenture.saga.exception.SagaException;
import com.accenture.saga.json.SagaJsonFactory;
import com.accenture.saga.resourcemgr.ResourceManager;
import com.accenture.saga.tags.TagManager;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class Main {

    SagaEngine engine;
    ResourceManager resourceManager;
    TagManager tagManager;
    PipelineManager pipelineManager;

    /**
     * Constructor
     */
    public Main(List<String> tags) throws SagaException {

        resourceManager = new ResourceManager();

        resourceManager.registerProvider(
                SagaJsonFactory.getInstance(
                        "{" +
                                "    \"name\": \"saga-provider\"," +
                                "    \"type\": \"Elastic\"," +
                                "    \"indexName\": \"saga\"," +
                                "    \"nodeUrls\": [\"localhost:9200\"]," +
                                "    \"authentication\": \"none\"," +
                                "    \"timestamp\": \"updatedAt\"," +
                                "    \"exclude\": [" +
                                "      \"updatedAt\"," +
                                "      \"createdAt\"" +
                                "    ]" +
                                "}"
                )
        );

        tagManager = new TagManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_tags\"}"));

        pipelineManager = new PipelineManager(resourceManager,  SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_pipelines\"}"));

        engine = new SagaEngine();

        engine.setResourceManager(resourceManager);
        engine.setTagManager(tagManager);

        SimpleReaderStage simpleReaderStage = new SimpleReaderStage(engine, SagaJsonFactory.getInstance("{ \"splitRegex\": \"[\\r|\\n]+\"}"));

        pipelineManager.buildPipelineFor(engine, tags, simpleReaderStage);
    }

    public static void main(String[] args) throws SagaException {

        List<String> tags = new ArrayList<>(Arrays.asList("tag1", "tag2", "tag3"));

        Main _instance = new Main(tags);
    }
}

Here we just built a list with the tags "tag1", "tag2", and "tag3", which we passed as a parameter of the constructor, so that the PipelineManager can have them to build the pipeline.

  • At this point pipelineManager.buildPipelineFor uses the tags we specified to identify the stages that recognize these tags.
  • From these stages, build a dependency hierarchy to add any necessary tags and stages in order to find the specified tags.
  • After it has identified all of the stages and tags necessary, the PipelineManger adds them to the Engine we provided.

Now our Engine is ready to receive text and build a graph.

Processing Text

Let's also add a String parameter to the constructor for the text we want to process.  This text will be added to the Engine using the method reset.

engine.reset(new ByteArrayInputStream(text.getBytes(StandardCharsets.UTF_8)));

This method accepts an InputStream, so we can specify the encoding in UTF-8.  In order to pass the text to Saga, we get the bytes from the text in UTF-8 encoding and create a ByteArrayInputStream with them. 

All of the processing done by Saga is with the encoding UTF-8.

At the moment, we only told Saga which is the text.  Now we need to process it. For this, use the method advance, which returns a Vertex,  This will be the first Vertex of the text block.

Vertex start = engine.advance();

Currently our code should look like this:

package com.accenture.saga;

import com.accenture.saga.engine.PipelineManager;
import com.accenture.saga.engine.SagaEngine;
import com.accenture.saga.engine.Vertex;
import com.accenture.saga.engine.stages.SimpleReaderStage;
import com.accenture.saga.exception.SagaException;
import com.accenture.saga.json.SagaJsonFactory;
import com.accenture.saga.resourcemgr.ResourceManager;
import com.accenture.saga.tags.TagManager;

import java.io.ByteArrayInputStream;
import java.nio.charset.StandardCharsets;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class Main {

    SagaEngine engine;
    ResourceManager resourceManager;
    TagManager tagManager;
    PipelineManager pipelineManager;

    /**
     * Constructor
     */
    public Main(String text, List<String> tags) throws SagaException {

        resourceManager = new ResourceManager();

        resourceManager.registerProvider(
                SagaJsonFactory.getInstance(
                        "{" +
                                "    \"name\": \"saga-provider\"," +
                                "    \"type\": \"Elastic\"," +
                                "    \"indexName\": \"saga\"," +
                                "    \"nodeUrls\": [\"localhost:9200\"]," +
                                "    \"authentication\": \"none\"," +
                                "    \"timestamp\": \"updatedAt\"," +
                                "    \"exclude\": [" +
                                "      \"updatedAt\"," +
                                "      \"createdAt\"" +
                                "    ]" +
                                "}"
                )
        );

        tagManager = new TagManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_tags\"}"));

        pipelineManager = new PipelineManager(resourceManager,  SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_pipelines\"}"));

        engine = new SagaEngine();

        engine.setResourceManager(resourceManager);
        engine.setTagManager(tagManager);

        SimpleReaderStage simpleReaderStage = new SimpleReaderStage(engine, SagaJsonFactory.getInstance("{ \"splitRegex\": \"[\\r|\\n]+\"}"));

        pipelineManager.buildPipelineFor(engine, tags, simpleReaderStage);

        engine.reset(new ByteArrayInputStream(text.getBytes(StandardCharsets.UTF_8)));

        Vertex start = engine.advance();
    }

    public static void main(String[] args) throws SagaException {

        List<String> tags = new ArrayList<>(Arrays.asList("tag1", "tag2", "tag3"));

        String text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed egestas orci eu mauris luctus consequat.";

        Main _instance = new Main(text, tags);
    }
}

Working with the Graph


For more specific information on how to navigate through the graph, please visit Understanding Interpretation Graphs


Before we start with this section, just a clarification, the graph has an extensive range of applications that varies depending of the content, the tags, the stages and the user final goal. For this  example we will do 3 of the most primordial cases, print the graph, search for a specific type of tag, and getting the highest value route.

For this examples we will assume that tag1 has an Entity Recognizer identifying "adipiscing elit", and the baseline-pipeline is the following $action.getHelper().renderConfluenceMacro("$codeS$body$codeE")

Printing the Graph

Printing the graph it's the most basic of all the cases, since it allow us to have a more graphical picture on how Saga processed the text. 

For this case we will be using the class GraphPrinter and the static method printOnce, which needs the engine, the start vertex and the last vertex, so basically we would be printing a section of the graph, which in this case is the entire text block section.

We have the engine, and the start vertex; for the last vertex, the engine has a method getAllVertex, which returns a LinkedList of all the vertex processed by the engine, so we can get the last element (vertex) from the list.

Vertex start = engine.advance();

Vertex last = engine.getAllVertex().getLast();

// Print Graph

GraphPrinter.printOnce(engine, start, last);

This will show a text version of the graph in the console, much like this one

INFO | {GraphPrinter} |  V---------------------------------[Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed egestas orci eu mauris luctus consequat.]---------------------------------V 
INFO | {GraphPrinter} |  ^-----------------[Lorem ipsum dolor sit amet, consectetur adipiscing elit]------------------V-------------[Sed egestas orci eu mauris luctus consequat]-------------V-[]-^ 
INFO | {GraphPrinter} |  ^-[Lorem]-V-[ipsum]-V-[dolor]-V-[sit]-V---[amet,]----V-[consectetur]-V-[adipiscing]-V-[elit]-^-[Sed]-V-[egestas]-V-[orci]-V-[eu]-V-[mauris]-V-[luctus]-V-[consequat]-^ 
INFO | {GraphPrinter} |  ^-[lorem]-^                           ^-[amet]-V-[,]-^               ^-------[{tag1}]--------^-[sed]-^ 

Where V represents the first appearance of a vertex, ^ is a connection to an existent vertex, everything between [ ] is a token, and inside a token if it appears between { }, is a semantic tag

If you want reduce the vertex returned by the getAllVertex, you can also specify only the first vertex, or the first and the last vertex

Searching for a Tag

In this case we will search for all the tokens that are semantic tags.

First obstacle, How do we identify a semantic tag?

That's easy, all the tokens that are semantic tags, have a flag called SEMANTIC_TAG, so we need to first get that flag; these flags are define by the engine, so we will ask it.

Flag SEMANTIC_TAG = engine.getFlagForRead(LexObject.LexObjectType.LEX_ITEM, "SEMANTIC_TAG");
  1. We define the object flag (We recommend to name the object just like the flag, just good practice)
  2. From the engine we called the method getFlagForRead, which needs the type of flag and the name of the flag
  3. We indicate the type of flag LEX_ITEM, and the name of the flag SEMANTIC_TAG

There are 2 types of flags LEX_ITEM and VERTEX, they are enums of type LexObjectType, which is in the class LexObject

You can know the flags by looking into every Stage and see what are the flags it sets for the items it creates.

Second obstacle, How do we get the tokens?

This can also be done in one line,

List<LexItem> items = SagaGraph.getTokens(start, last, lexItem -> lexItem.hasFlag(SEMANTIC_TAG));
  1. We use the class SagaGraph, and call the method getTokens
  2. We specify the start and end from where we want to find the tokens
  3. We add a filter function which check the item has the flag SEMANTIC_TAG
  4. Return the result as a list of LexItems

Now our code should look like this

package com.accenture.saga;

import com.accenture.saga.engine.*;
import com.accenture.saga.engine.stages.SimpleReaderStage;
import com.accenture.saga.exception.SagaException;
import com.accenture.saga.json.SagaJsonFactory;
import com.accenture.saga.resourcemgr.ResourceManager;
import com.accenture.saga.tags.TagManager;
import com.accenture.saga.utilities.SagaGraph;

import java.io.ByteArrayInputStream;
import java.nio.charset.StandardCharsets;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class Main {

    SagaEngine engine;
    ResourceManager resourceManager;
    TagManager tagManager;
    PipelineManager pipelineManager;

    /**
     * Constructor
     */
    public Main(String text, List<String> tags) throws SagaException {

        resourceManager = new ResourceManager();

        resourceManager.registerProvider(
                SagaJsonFactory.getInstance(
                        "{" +
                                "    \"name\": \"saga-provider\"," +
                                "    \"type\": \"Elastic\"," +
                                "    \"indexName\": \"saga\"," +
                                "    \"nodeUrls\": [\"localhost:9200\"]," +
                                "    \"authentication\": \"none\"," +
                                "    \"timestamp\": \"updatedAt\"," +
                                "    \"exclude\": [" +
                                "      \"updatedAt\"," +
                                "      \"createdAt\"" +
                                "    ]" +
                                "}"
                )
        );

        tagManager = new TagManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_tags\"}"));

        pipelineManager = new PipelineManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_pipelines\"}"));

        engine = new SagaEngine();

        engine.setResourceManager(resourceManager);
        engine.setTagManager(tagManager);

        SimpleReaderStage simpleReaderStage = new SimpleReaderStage(engine, SagaJsonFactory.getInstance("{ \"splitRegex\": \"[\\r|\\n]+\"}"));

        pipelineManager.buildPipelineFor(engine, tags, simpleReaderStage);

        engine.reset(new ByteArrayInputStream(text.getBytes(StandardCharsets.UTF_8)));

        Vertex start = engine.advance();

        Vertex last = engine.getAllVertex(start).getLast();

        // Print Graph

        GraphPrinter.printOnce(engine, start, last);

        // Get specific items

        Flag SEMANTIC_TAG = engine.getFlagForRead(LexObject.LexObjectType.LEX_ITEM, "SEMANTIC_TAG");

        List<LexItem> items = SagaGraph.getTokens(start, last, lexItem -> lexItem.hasFlag(SEMANTIC_TAG));

        items.forEach(item -> System.out.println(item.toStringForDebug()));
    }

    public static void main(String[] args) throws SagaException {

        List<String> tags = new ArrayList<>(Arrays.asList("tag1", "tag2", "tag3"));

        String text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed egestas orci eu mauris luctus consequat.";

        Main _instance = new Main(text, tags);

        System.exit(0);
    }
}

At the end a println was added to see the token we got

I"{tag1}"(40:55)

indicates Item, the text inside the quotes is the text of the item, and the numbers between parenthesis is the position in characters of the token (start:end) 

Getting the Highest Route

The highest route is the interpretation with the highest confidence, amount other factors (e.g. largest tokens, more complex patterns, ...), and is the interpretation that most likely we want. Let's start with the code and then the explanation

Flag TEXT_BLOCK = engine.getFlagForRead(LexObject.LexObjectType.LEX_ITEM, "TEXT_BLOCK");

List<LexItem> route = SagaGraph.getRoute(start, last, lexItem -> !lexItem.hasFlag(TEXT_BLOCK));
  1. Normally we would want to ignore any token with the TEXT_BLOCK flag in it
  2. We use again the class SagaGraph, and call the function getRoute
  3. Add the start and end vertex
  4. We add a filter function which check the item doesn't have TEXT_BLOCK flag
package com.accenture.saga;

import com.accenture.saga.engine.*;
import com.accenture.saga.engine.stages.SimpleReaderStage;
import com.accenture.saga.exception.SagaException;
import com.accenture.saga.json.SagaJsonFactory;
import com.accenture.saga.resourcemgr.ResourceManager;
import com.accenture.saga.tags.TagManager;
import com.accenture.saga.utilities.SagaGraph;

import java.io.ByteArrayInputStream;
import java.nio.charset.StandardCharsets;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class Main {

    SagaEngine engine;
    ResourceManager resourceManager;
    TagManager tagManager;
    PipelineManager pipelineManager;

    /**
     * Constructor
     */
    public Main(String text, List<String> tags) throws SagaException {

        resourceManager = new ResourceManager();

        resourceManager.registerProvider(
                SagaJsonFactory.getInstance(
                        "{" +
                                "    \"name\": \"saga-provider\"," +
                                "    \"type\": \"Elastic\"," +
                                "    \"indexName\": \"saga\"," +
                                "    \"nodeUrls\": [\"localhost:9200\"]," +
                                "    \"authentication\": \"none\"," +
                                "    \"timestamp\": \"updatedAt\"," +
                                "    \"exclude\": [" +
                                "      \"updatedAt\"," +
                                "      \"createdAt\"" +
                                "    ]" +
                                "}"
                )
        );

        tagManager = new TagManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_tags\"}"));

        pipelineManager = new PipelineManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_pipelines\"}"));

        engine = new SagaEngine();

        engine.setResourceManager(resourceManager);
        engine.setTagManager(tagManager);

        SimpleReaderStage simpleReaderStage = new SimpleReaderStage(engine, SagaJsonFactory.getInstance("{ \"splitRegex\": \"[\\r|\\n]+\"}"));

        pipelineManager.buildPipelineFor(engine, tags, simpleReaderStage);

        engine.reset(new ByteArrayInputStream(text.getBytes(StandardCharsets.UTF_8)));

        Vertex start = engine.advance();

        Vertex last = engine.getAllVertex(start).getLast();

        // Print Graph

        GraphPrinter.printOnce(engine, start, last);

        // Get specific items

        Flag SEMANTIC_TAG = engine.getFlagForRead(LexObject.LexObjectType.LEX_ITEM, "SEMANTIC_TAG");

        List<LexItem> items = SagaGraph.getTokens(start, last, lexItem -> lexItem.hasFlag(SEMANTIC_TAG));

        items.forEach(item -> System.out.println(item.toStringForDebug()));

        // Get the highest route

        Flag TEXT_BLOCK = engine.getFlagForRead(LexObject.LexObjectType.LEX_ITEM, "TEXT_BLOCK");

        List<LexItem> route = SagaGraph.getRoute(start, last, lexItem -> !lexItem.hasFlag(TEXT_BLOCK));

        route.forEach(item -> System.out.println(item.toStringForDebug()));
    }

    public static void main(String[] args) throws SagaException {

        List<String> tags = new ArrayList<>(Arrays.asList("tag1", "tag2", "tag3"));

        String text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed egestas orci eu mauris luctus consequat.";

        Main _instance = new Main(text, tags);

        System.exit(0);
    }
}

Once again a println was added at the end to see the tokens we got, from the route

I"Lorem"(0:5)
I"ipsum"(6:11)
I"dolor"(12:17)
I"sit"(18:21)
I"amet"(22:26)
I","(26:27)
I"consectetur"(28:39)
I"{tag1}"(40:55)
I"Sed"(57:60)
I"egestas"(61:68)
I"orci"(69:73)
I"eu"(74:76)
I"mauris"(77:83)
I"luctus"(84:90)
I"consequat"(91:100)

As you can see in the eighth token is actually the tag tag1, since its confidence is higher than the tokens adipiscing and elit.

Important

Something important to keep in mind is that advance method returns the first vertex of a text block, and that text block can represent the entirety of the text, but many times will only represent a fraction of the text; so we need to keep calling advanced again and again until we reach the end, meaning the vertex comes null

package com.accenture.saga;

import com.accenture.saga.engine.*;
import com.accenture.saga.engine.stages.SimpleReaderStage;
import com.accenture.saga.exception.SagaException;
import com.accenture.saga.json.SagaJsonFactory;
import com.accenture.saga.resourcemgr.ResourceManager;
import com.accenture.saga.tags.TagManager;
import com.accenture.saga.utilities.SagaGraph;

import java.io.ByteArrayInputStream;
import java.nio.charset.StandardCharsets;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class Main {

    SagaEngine engine;
    ResourceManager resourceManager;
    TagManager tagManager;
    PipelineManager pipelineManager;

    /**
     * Constructor
     */
    public Main(String text, List<String> tags) throws SagaException {

        resourceManager = new ResourceManager();

        resourceManager.registerProvider(
                SagaJsonFactory.getInstance(
                        "{" +
                                "    \"name\": \"saga-provider\"," +
                                "    \"type\": \"Elastic\"," +
                                "    \"indexName\": \"saga\"," +
                                "    \"nodeUrls\": [\"localhost:9200\"]," +
                                "    \"authentication\": \"none\"," +
                                "    \"timestamp\": \"updatedAt\"," +
                                "    \"exclude\": [" +
                                "      \"updatedAt\"," +
                                "      \"createdAt\"" +
                                "    ]" +
                                "}"
                )
        );

        tagManager = new TagManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_tags\"}"));

        pipelineManager = new PipelineManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_pipelines\"}"));

        engine = new SagaEngine();

        engine.setResourceManager(resourceManager);
        engine.setTagManager(tagManager);

        SimpleReaderStage simpleReaderStage = new SimpleReaderStage(engine, SagaJsonFactory.getInstance("{ \"splitRegex\": \"[\\r|\\n]+\"}"));

        pipelineManager.buildPipelineFor(engine, tags, simpleReaderStage);

        engine.reset(new ByteArrayInputStream(text.getBytes(StandardCharsets.UTF_8)));

        Vertex start = engine.advance();

        do {

            Vertex last = engine.getAllVertex(start).getLast();

            // Print Graph

            GraphPrinter.printOnce(engine, start, last);

            // Get specific items

            Flag SEMANTIC_TAG = engine.getFlagForRead(LexObject.LexObjectType.LEX_ITEM, "SEMANTIC_TAG");

            List<LexItem> items = SagaGraph.getTokens(start, last, lexItem -> lexItem.hasFlag(SEMANTIC_TAG));

            items.forEach(item -> System.out.println(item.toStringForDebug()));

            // Get the highest route

            Flag TEXT_BLOCK = engine.getFlagForRead(LexObject.LexObjectType.LEX_ITEM, "TEXT_BLOCK");

            List<LexItem> route = SagaGraph.getRoute(start, last, lexItem -> !lexItem.hasFlag(TEXT_BLOCK));

            route.forEach(item -> System.out.println(item.toStringForDebug()));

            start = engine.advance();

        }while (start != null);
    }

    public static void main(String[] args) throws SagaException {

        List<String> tags = new ArrayList<>(Arrays.asList("tag1", "tag2", "tag3"));

        String text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed egestas orci eu mauris luctus consequat.";

        Main _instance = new Main(text, tags);

        System.exit(0);
    }
}

Working with the Graph (using Processing Units)

There are more ways to get around graphs in SAGA, some more difficult than others, it depends of the user case and its needs. This section will explain how to do this workaround using Processing Units and get its results.

A processor unit facilitates the processing of text and documents reducing significantly the processing time by saving the units in memory to reuse them while processing text instead of creating a new pipeline each time.

First, we need to build our Processing Unit, to do that, we have a function that builds our unit, it has a lot of parameters, from our responseType to timeouts. Besides that, it uses our server resourceManager, tagManager, pipelineManager and the sagaEngineSettings.

Another thing to add is that the Processing Unit can be used on TAGS or PROCESSORS, each of those have different parameters, for TAGS we need an array containing all the tags for the unit, while the PROCESSORS only needs a string with the Processor to use. the default is the 'baseline-pipeline'. So, our ProcessingUnit builder would look like this:

ProcessingUnit.Builder unitBuilder;

    if (params.containsKey(TAGS)) { // Generate builder with tags
      unitBuilder = new ProcessingUnit.Builder(
        server.resourceManager(),
        server.tagManager(),
        server.pipelineManager(),
        server.settings(),
        params.getStringList(TAGS));
    } else if (params.containsKey(PROCESSOR)) { // Generate builder with processor
      unitBuilder = new ProcessingUnit.Builder(
        server.resourceManager(),
        server.tagManager(),
        server.pipelineManager(),
        server.settings(),
        params.getString(PROCESSOR, "baseline-pipeline"));
    }

unitBuilder.responseType(params.getString(TYPE, DEFAULT_TYPE))
      .combineRoutes(params.getBoolean(COMBINE_ROUTES, false))
      .createEngines(params.getBoolean(CREATE_ENGINE, false))
      .enginePoolSize(params.getInteger(ENGINE_POOL_SIZE, 1))
      .engineTimeout(params.getLong(ENGINE_TIMEOUT, 30000L))
      .excludeFlags(params.getStringList(EXCLUDE_FLAGS, "TEXT_BLOCK"))
      .includeFlags(params.getStringList(INCLUDE_FLAGS, "SEMANTIC_TAG"))
      .getOnlyExactTags(params.getBoolean(EXACT_TAGS, false))
      .includeMetadata(params.getBoolean(INCLUDE_METADATA, false))
      .maxCharsSizeToProcess(params.getInteger(MAX_CHAR_SIZE, 0))
      .readerMultiline(params.getBoolean(MULTILINE, true))
      .readerSplitRegex(params.getString(SPLIT_REGEX, DEFAULT_SPLIT_REGEX))
      .shouldRefreshExpiration(params.getBoolean(REFRESH_EXPIRATION, false))
      .includeStats(params.getBoolean(INCLUDE_STATS, false));
  • responseType ( type=string | default=json | required ) - Type of response for our Processing Unit.
  • combineRoutes ( type=string | default=false | required ) - Combine routes on our processing unit results
  • createEngines ( type=string | default=false | required ) - Create more engines to process our unit
  • enginePoolSize ( type=integer | default=1 | optional ) - Quantity of engines available on the pool.
  • engineTimeout ( type=integer | default=30000L | optional ) - Timeout for our engines.
  • excludeFlags ( type=string array | default="TEXT_BLOCK" | required ) - List of all Flags to be excluded/ignored for our processing unit.
  • includeFlags ( type=string array | default="SEMANTIC_TAG" | required ) - List of all the Tags to be included to our processing unit.
  • getOnlyExactTags ( type=boolean | default=false | optional ) - Enable processing only for the tags we include.
  • includeMetadata ( type=boolean | default=false | optional ) - Enable including and returning metadata in our processing unit.
  • maxCharsSizeToProcess ( type=integer | default=0 | optional ) - Length character limit for our processing unit.
  • readerMultiline ( type=boolean | default=true | required ) - Enable the multiline processing of texts and docs.
  • readerSplitRegex ( type=string | default="[\r\n]+" | required ) - Regex pattern used to split the text in our processing unit. stream into text blocks
  • shouldRefreshExpiration ( type=boolean | default=false | required ) - Enable refreshing our unit on expiration.
  • includeStats ( type=boolean | default=false | optional ) - Include stats on our processing unit.


After creating our builder, we must create a new instance for our new Processing Unit and call our builder. Then we store it in memory to be used and reused according the user needs.

ProcessingUnit processingUnit = unitBuilder.build();

Then we process our unit, the command is pretty simple and straightforward, we only call our processingUnit to process our text and that's all.

processingUnit.process(text);

Finally, we get a response, however, that response is going to be different, according the type used for our processing unit. In the next section, we will cover those types

Types of Response for Processing Units

Currently, the processing units response are the following:

public enum ResponseType {
    ux,
    json,
    text,
    matchExtraction,
    route,
    custom
  }

Note

The UX and Text response types are under revision at the moment.


JSON

This response type returns the highest confidence route alongside two arrays, one for all Lex_Items and other for all the Vertex found.  The way to call our processing unit for this response is the same as before:

processingUnit.process(text);

Match Extraction

This response type returns an array with all the Sematic Tags matches. The call for our processing unit is the same as above:

processingUnit.process(text);

Route

This response type also returns the highest confidence route, but just that, alone. The call for our processing unit is the same:

processingUnit.process(text);

Custom

This response type does not return anything. Its use is a little bit more complicated because instead of receiving just a document, it also receives a lambda function, where the user can customize the processing of what is needing.

processingUnit.process(text, lexItems -> {
        List<Object> sentence = processSentence(lexItems);

        if (sentence != null) {
          resultTokens.add(sentence);
        } else {
          logger.debug(text);
        }
      });
processingUnit.process(text);

In this example, we see that the process for the unit receives a second parameter, which is an anonymous function to process a Sentence in a more specific way, thus customizing the processing unit.