Page History

When developing an application with Saga Library embedded, and connecting to the ElasticSearch manage by the Saga Server

Note

This tutorial assumes:

The reader ability to create a project with Maven Framework support
The data Saga will use is manage through the Saga's user interface
Java 11+ is installed in the machine

Step-by-step guide

Table of Contents

maxLevel	3
minLevel	3
outline	true

Configure pom.xml

In the pom.xml of the project, a basic example of the minimum configuration can be found below. But elementary section of this configuration is the dependencies section, where we need two main libraries.

saga-library

saga-elastic-provider

The core library of Saga, this dependency includes the Engine, Stages, Tag Manager, Pipeline Manager and Resource Manager which are all the parts necessary to use Saga in any application.

This dependency will grant us access to ElasticSearch as a provider for Saga, which means our Stages and Managers will be able to fetch the data directly from this provider

Info
More providers will be available in the future, but to use Saga full functionality we recommend the use of the saga-elastic-provider.

Other important configuration to notice is the use of Java 11, for the compilation of the code and the encoding UTF-8, as you can see in the lines 36-38

Code Block

language	xml
theme	RDark
firstline	0
title	pom.xml
linenumbers	true

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.accenture.saga</groupId>
    <artifactId>saga-howto</artifactId>
    <version>1.0.0-SNAPSHOT</version>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <saga.version>1.0.0-SNAPSHOT</saga.version>
    </properties>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.8.0</version>
                <executions>
                    <execution>
                        <id>compile</id>
                        <phase>compile</phase>
                        <goals>
                            <goal>compile</goal>
                        </goals>
                    </execution>
                    <execution>
                        <id>testCompile</id>
                        <phase>test-compile</phase>
                        <goals>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
                <configuration>
                    <source>11</source>
                    <target>11</target>
                    <encoding>${project.build.sourceEncoding}</encoding>
                </configuration>
            </plugin>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>3.1.0</version>
                <configuration>
                    <archive>
                        <manifest>
                            <mainClass>com.accenture.saga.server.SagaServer</mainClass>
                        </manifest>
                    </archive>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                    <appendAssemblyId>false</appendAssemblyId>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id> <!-- this is used for inheritance merges -->
                        <phase>package</phase> <!-- bind to the packaging phase -->
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

    <dependencies>
        <dependency>
            <groupId>com.accenture.saga</groupId>
            <artifactId>saga-library</artifactId>
            <version>${saga.version}</version>
            <scope>compile</scope>
        </dependency>

        <dependency>
            <groupId>com.accenture.saga</groupId>
            <artifactId>saga-elastic-indexer</artifactId>
            <version>${saga.version}</version>
            <scope>compile</scope>
        </dependency>

        <dependency>
            <groupId>com.accenture.saga</groupId>
            <artifactId>saga-elastic-provider</artifactId>
            <version>${saga.version}</version>
            <scope>compile</scope>
        </dependency>
    </dependencies>
</project>

Initializing The Saga Components

For starters, we will create a main class, which will hold a SagaEngine, ResourceManager, TagManager and PipelineManager.

Code Block

language	java
theme	Midnight
linenumbers	true

package com.accenture.saga;

import com.accenture.saga.engine.PipelineManager;
import com.accenture.saga.engine.SagaEngine;
import com.accenture.saga.resourcemgr.ResourceManager;
import com.accenture.saga.tags.TagManager;

public class Main {

    SagaEngine engine;
    ResourceManager resourceManager;
    TagManager tagManager;
    PipelineManager pipelineManager;

    /**
     * Constructor
     */
    public Main() {
        
        
    }
    
    public static void main(String[] args) {
        Main _instance = new Main();
    }
}

First we start by creating an configuring the ResourceManager, and adding a provider to it, but this configuration will be hard-coded, so we need to add SagaJsonFactory class, which allow us to create SagaJson objects (the standard document of Saga) from text, files or readers.

The configuration we are going to use for the provider is the following

Saga_json
"name": "saga-provider", "type": "Elastic", "scheme": "http", "hostname": "localhost", "port": 9200, "timestamp": "updatedAt", "exclude": [ "updatedAt", "createdAt" ]

Each field from the top, starting with the common

Parameter
summary The name we are going to use for the provider. It doesn't which name you use, but our take is "saga-provider"
name name
required true
Parameter
summary Indicates the type of provider we are using, in this case since we are using saga-elastic-provider, it's type would be "Elastic"
name type
required true

from here on, all the properties are specific to saga-elastic-indexer

Parameter
summary Schema for the url to Elasticsearch
default http
name shema
Parameter
summary Name of the hosting server
default localhost
name hostname
Parameter
summary Port of ElasticSearch
default 9200
name 9200
type integer
Parameter
summary Name of the field reflecting any change done to the data
default updatedAt
name timestamp
Parameter
summary Name of the fields omitted (when possible) from the response of ElasticSearch
name exclude
type string array

Our code should look this this

Code Block

language	java
theme	Midnight
linenumbers	true

public Main(String text, List<String> tags) throws SagaException {

    resourceManager = new ResourceManager();

    resourceManager.registerProvider(
            SagaJsonFactory.getInstance(
                    "{" +
                            "    \"name\": \"saga-provider\"," +
                            "    \"type\": \"Elastic\"," +
                            "    \"scheme\": \"http\"," +
                            "    \"hostname\": \"localhost\"," +
                            "    \"port\": 9200,\n" +
                            "    \"timestamp\": \"updatedAt\"," +
                            "    \"exclude\": [" +
                            "      \"updatedAt\"," +
                            "      \"createdAt\"" +
                            "    ]" +
                            "}"
            )
    );

}

Next we proceed with the configuration of the TagManager below the ResourceManager, once again we will hard-code the configuration for this one

Saga_json
"resource": "saga-provider:saga_tags"

In the configuration above, saga-provider is representing the provider we add to the ResourceManager in the previous configuration, then the colon (:) indicates the division between the provider and the actual resource; since we are using a saga-elastic-provider, the resources will be indexes names, and since were are connecting to a Saga index, created by the Saga server, all the indexes will be a combination between the solution's name (usually will be saga), and underscore (_) and the type of data the index holds, in this case tags, forming the name saga_tags

The code should look now like this

Code Block

language	java
theme	Midnight
linenumbers	true

public Main(String text, List<String> tags) throws SagaException {

    resourceManager = new ResourceManager();

    resourceManager.registerProvider(
            SagaJsonFactory.getInstance(
                    "{" +
                            "    \"name\": \"saga-provider\"," +
                            "    \"type\": \"Elastic\"," +
                            "    \"scheme\": \"http\"," +
                            "    \"hostname\": \"localhost\"," +
                            "    \"port\": 9200,\n" +
                            "    \"timestamp\": \"updatedAt\"," +
                            "    \"exclude\": [" +
                            "      \"updatedAt\"," +
                            "      \"createdAt\"" +
                            "    ]" +
                            "}"
            )
    );

    tagManager = new TagManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_tags\"}"));
}

As you can see the TagManager, receives as a parameter the ResourceManager, which grant us access to the resource saga-provider:saga_tags.

In the same way, we proceed to configure the PipelinesManager using the following configuration

Saga_json
"resource": "saga-provider:saga_pipelines"

Once again, saga-provider, does reference to the provider added in the ResourceManager, and saga_pipelines is a combination between the name of the solution's name and the type of data, in this case pipelines.

Code Block

language	java
theme	Midnight
linenumbers	true

public Main(String text, List<String> tags) throws SagaException {

    resourceManager = new ResourceManager();

    resourceManager.registerProvider(
            SagaJsonFactory.getInstance(
                    "{" +
                            "    \"name\": \"saga-provider\"," +
                            "    \"type\": \"Elastic\"," +
                            "    \"scheme\": \"http\"," +
                            "    \"hostname\": \"localhost\"," +
                            "    \"port\": 9200,\n" +
                            "    \"timestamp\": \"updatedAt\"," +
                            "    \"exclude\": [" +
                            "      \"updatedAt\"," +
                            "      \"createdAt\"" +
                            "    ]" +
                            "}"
            )
    );

    tagManager = new TagManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_tags\"}"));

    pipelineManager = new PipelineManager(resourceManager,  SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_pipelines\"}"));
}

Setting Up The Engine

So one we are at the fun part, once we got the Resource, Tag and Pipeline Manager all set up, we assign the ResourceManager and the TagManager to the Engine

Code Block

language	java
theme	Midnight
linenumbers	true

public Main(String text, List<String> tags) throws SagaException {

    resourceManager = new ResourceManager();

    resourceManager.registerProvider(
            SagaJsonFactory.getInstance(
                    "{" +
                            "    \"name\": \"saga-provider\"," +
                            "    \"type\": \"Elastic\"," +
                            "    \"scheme\": \"http\"," +
                            "    \"hostname\": \"localhost\"," +
                            "    \"port\": 9200,\n" +
                            "    \"timestamp\": \"updatedAt\"," +
                            "    \"exclude\": [" +
                            "      \"updatedAt\"," +
                            "      \"createdAt\"" +
                            "    ]" +
                            "}"
            )
    );

    tagManager = new TagManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_tags\"}"));

    pipelineManager = new PipelineManager(resourceManager,  SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_pipelines\"}"));

    engine = new SagaEngine();

    engine.setResourceManager(resourceManager);
    engine.setTagManager(tagManager);
}

Working With The Pipeline Manager

From here we got 2 options, first option is letting the PipelineManager build the pipeline for us, using a set of tags (that we will provide); the second option is to manually provide a complete pipeline configuration to the PipelineManager

Pros & Cons

Automatic Pipeline

Manual Pipeline

Pros

Uses configuration set up through Saga's UI
Loads only the necessary from the resource (tags, stages, ...)
Builds pipeline based on tag dependency
Can generate multiple and different pipelines
Each Recognizer can have a base pipeline as dependency

Pros

Complete control over the flow of the data

Cons

Pipelines not always the most efficient (...yet)
Each base pipeline must be configure manually (... for the moment)

Cons

Configuration of every stage must be done manually
Relies strongly in the knowledge of the user for each possible stage configuration
Lack of flexibly when changing to another pipeline

Tie

Needs a stage of type TextBlockReader configure manually

Tie

Needs a stage of type TextBlockReader configure manually

Since the first one is the most flexible and the one that makes use of the configuration in ElasticSearch, we will use that one.

Request A Pipeline

Before asking the PipelineManager for a pipeline, we need to provide a stage of type TextBlockReader, at the moment we only have one stage of that type, the SimpleReaderStage, which requires a splitRegex in the configuration, as a SagaJson object. So let's add that to the code shall we.

Code Block

language	java
theme	Midnight
linenumbers	true

public Main(String text, List<String> tags) throws SagaException {

    resourceManager = new ResourceManager();

    resourceManager.registerProvider(
            SagaJsonFactory.getInstance(
                    "{" +
                            "    \"name\": \"saga-provider\"," +
                            "    \"type\": \"Elastic\"," +
                            "    \"scheme\": \"http\"," +
                            "    \"hostname\": \"localhost\"," +
                            "    \"port\": 9200,\n" +
                            "    \"timestamp\": \"updatedAt\"," +
                            "    \"exclude\": [" +
                            "      \"updatedAt\"," +
                            "      \"createdAt\"" +
                            "    ]" +
                            "}"
            )
    );

    tagManager = new TagManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_tags\"}"));

    pipelineManager = new PipelineManager(resourceManager,  SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_pipelines\"}"));

    //The Fun Part

    engine = new SagaEngine();

    engine.setResourceManager(resourceManager);
    engine.setTagManager(tagManager);


    SimpleReaderStage simpleReaderStage = new SimpleReaderStage(engine, SagaJsonFactory.getInstance("{ \"splitRegex\": \"[\\r|\\n]+\"}"));
}

Parameter
summary asdas
name splitRegex

With the regex [\r\n]+ we are indicating the character signaling a break line; also note the SimpleReaderStage receives the engine as the first parameter.

Tip
The regex [\r\n]+, is the standard for mostly all the text you will be processing

Now we can ask the PipelineManager to build a pipeline for the tags... Which we still don't know where they came from, but let's fix that; first we add the building of the pipeline.

Code Block

language	java
theme	RDark
linenumbers	true

pipelineManager.buildPipelineFor(engine, tags, simpleReaderStage);

Now we just add the tags as a parameter of the constructor, our code should look like this

Code Block

language	java
theme	Midnight
linenumbers	true

package com.accenture.saga;

import com.accenture.saga.engine.PipelineManager;
import com.accenture.saga.engine.SagaEngine;
import com.accenture.saga.engine.stages.SimpleReaderStage;
import com.accenture.saga.exception.SagaException;
import com.accenture.saga.json.SagaJsonFactory;
import com.accenture.saga.resourcemgr.ResourceManager;
import com.accenture.saga.tags.TagManager;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class Main {

    SagaEngine engine;
    ResourceManager resourceManager;
    TagManager tagManager;
    PipelineManager pipelineManager;

    /**
     * Constructor
     */
    public Main(List<String> tags) throws SagaException {

        resourceManager = new ResourceManager();

        resourceManager.registerProvider(
                SagaJsonFactory.getInstance(
                        "{" +
                                "    \"name\": \"saga-provider\"," +
                                "    \"type\": \"Elastic\"," +
                                "    \"scheme\": \"http\"," +
                                "    \"hostname\": \"localhost\"," +
                                "    \"port\": 9200,\n" +
                                "    \"timestamp\": \"updatedAt\"," +
                                "    \"exclude\": [" +
                                "      \"updatedAt\"," +
                                "      \"createdAt\"" +
                                "    ]" +
                                "}"
                )
        );

        tagManager = new TagManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_tags\"}"));

        pipelineManager = new PipelineManager(resourceManager,  SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_pipelines\"}"));

        engine = new SagaEngine();

        engine.setResourceManager(resourceManager);
        engine.setTagManager(tagManager);

        SimpleReaderStage simpleReaderStage = new SimpleReaderStage(engine, SagaJsonFactory.getInstance("{ \"splitRegex\": \"[\\r|\\n]+\"}"));

        pipelineManager.buildPipelineFor(engine, tags, simpleReaderStage);
    }

    public static void main(String[] args) throws SagaException {

        List<String> tags = new ArrayList<>(Arrays.asList("tag1", "tag2", "tag3"));

        Main _instance = new Main(tags);
    }
}

Here we just build a list with the tags "tag1", "tag2", and "tag3", which we passed as a parameter of the constructor, so the PipelineManager can have them to build the pipeline.

At this point pipelineManager.buildPipelineFor uses the tags we specified to identified the stages which recognize these tags, from this stages build a dependency hierarchy, which adds any necessary tags and stages in order to found the specified tags; once it has identified all the stages and tags necessary, PipelineManger adds them to the Engine we provided, which means our Engine is ready to receive text and build a graph.

Process A Text

Let's also add a String parameter to the constructor, for the text we want to process, this text will be added to the Engine using the method reset.

Code Block

language	java
theme	Midnight

engine.reset(new ByteArrayInputStream(text.getBytes(StandardCharsets.UTF_8)));

This method accepts an InputStream, this so we can specify the encoding in UTF-8; so in order to pass the text to Saga we get the bytes from the text in UTF-8 encoding and create a ByteArrayInputStream with them.

Note
All the process done by Saga is with the encoding UTF-8

At the moment we only told Saga which is the text, now we need to process it. For this we use the method advance, which returns a Vertex, this will be the first Vertex of the text block.

Code Block

language	java
theme	Midnight

Vertex start = engine.advance();

Currently our code should look like this

Code Block

language	java
theme	Midnight
linenumbers	true

package com.accenture.saga;

import com.accenture.saga.engine.PipelineManager;
import com.accenture.saga.engine.SagaEngine;
import com.accenture.saga.engine.Vertex;
import com.accenture.saga.engine.stages.SimpleReaderStage;
import com.accenture.saga.exception.SagaException;
import com.accenture.saga.json.SagaJsonFactory;
import com.accenture.saga.resourcemgr.ResourceManager;
import com.accenture.saga.tags.TagManager;

import java.io.ByteArrayInputStream;
import java.nio.charset.StandardCharsets;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class Main {

    SagaEngine engine;
    ResourceManager resourceManager;
    TagManager tagManager;
    PipelineManager pipelineManager;

    /**
     * Constructor
     */
    public Main(String text, List<String> tags) throws SagaException {

        resourceManager = new ResourceManager();

        resourceManager.registerProvider(
                SagaJsonFactory.getInstance(
                        "{" +
                                "    \"name\": \"saga-provider\"," +
                                "    \"type\": \"Elastic\"," +
                                "    \"scheme\": \"http\"," +
                                "    \"hostname\": \"localhost\"," +
                                "    \"port\": 9200,\n" +
                                "    \"timestamp\": \"updatedAt\"," +
                                "    \"exclude\": [" +
                                "      \"updatedAt\"," +
                                "      \"createdAt\"" +
                                "    ]" +
                                "}"
                )
        );

        tagManager = new TagManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_tags\"}"));

        pipelineManager = new PipelineManager(resourceManager,  SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_pipelines\"}"));

        engine = new SagaEngine();

        engine.setResourceManager(resourceManager);
        engine.setTagManager(tagManager);

        SimpleReaderStage simpleReaderStage = new SimpleReaderStage(engine, SagaJsonFactory.getInstance("{ \"splitRegex\": \"[\\r|\\n]+\"}"));

        pipelineManager.buildPipelineFor(engine, tags, simpleReaderStage);

        engine.reset(new ByteArrayInputStream(text.getBytes(StandardCharsets.UTF_8)));

        Vertex start = engine.advance();
    }

    public static void main(String[] args) throws SagaException {

        List<String> tags = new ArrayList<>(Arrays.asList("tag1", "tag2", "tag3"));

        String text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed egestas orci eu mauris luctus consequat.";

        Main _instance = new Main(text, tags);
    }
}

Working With The Graph

Before we start with this section, just a clarification, the graph has a width range of applications that varies depending of the content, the tags, the stages and the user final goal. For this example we will do 3 of the most primordial cases, print the graph, search for a specific type of tag, and getting the highest value route.

For this examples we will asume that tag1 has an Entity Recognizer identifying "adipiscing elit", and the baseline-pipeline is the following

Saga_json

"stages": [
        {
            "language": "en",
            "type": "TextBreakerStage"
        },
        {
            "requiredFlags": [
                "SENTENCE"
            ],
            "type": "WhitespaceTokenizerStage"
        },
        {
            "type": "StopWordsStage"
        },
        {
            "type": "CaseAnalysis"
        },
        {
            "type": "CharChangeSplitter"
        }
    ]

Printing The Graph

Printing the graph it's the most basic of all the cases, since it allow us to have a more graphical picture on how Saga processed the text.

For this case we will be using the class GraphPrinter and the static method printOnce, which needs the engine, the start vertex and the last vertex, so basically we would be printing a section of the graph, which in this case is the entire text block section.

We have the engine, and the start vertex; for the last vertex, the engine has a method getAllVertex, which returns a LinkedList of all the vertex processed by the engine, so we can get the last element (vertex) from the list.

Code Block

language	java
theme	Midnight

Vertex start = engine.advance();

Vertex last = engine.getAllVertex().getLast();

// Print Graph

GraphPrinter.printOnce(engine, start, last);

This will show a text version of the graph in the console, much like this one

Code Block

theme	FadeToGrey

INFO | {GraphPrinter} |  V---------------------------------[Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed egestas orci eu mauris luctus consequat.]---------------------------------V 
INFO | {GraphPrinter} |  ^-----------------[Lorem ipsum dolor sit amet, consectetur adipiscing elit]------------------V-------------[Sed egestas orci eu mauris luctus consequat]-------------V-[]-^ 
INFO | {GraphPrinter} |  ^-[Lorem]-V-[ipsum]-V-[dolor]-V-[sit]-V---[amet,]----V-[consectetur]-V-[adipiscing]-V-[elit]-^-[Sed]-V-[egestas]-V-[orci]-V-[eu]-V-[mauris]-V-[luctus]-V-[consequat]-^ 
INFO | {GraphPrinter} |  ^-[lorem]-^                           ^-[amet]-V-[,]-^               ^-------[{tag1}]--------^-[sed]-^

Where V represents the first appearance of a vertex, ^ is a connection to an existent vertex, everything between [ ] is a token, and inside a token if it appears between { }, is a semantic tag

Tip
If you want reduce the vertex returned by the getAllVertex, you can also specify only the first vertex, or the first and the last vertex

Searching For a Tag

In this case we will search for all the tokens that are semantic tags.

First obstacle, How do we identify a semantic tag?

That's easy, all the tokens that are semantic tags, have a flag called SEMANTIC_TAG, so we need to first get that flag; these flags are define by the engine, so we will ask it.

Code Block

language	java
theme	Midnight

Flag SEMANTIC_TAG = engine.getFlagForRead(LexObject.LexObjectType.LEX_ITEM, "SEMANTIC_TAG");

We define the object flag (We recommend to name the object just like the flag, just good practice)
From the engine we called the method getFlagForRead, which needs the type of flag and the name of the flag
We indicate the type of flag LEX_ITEM, and the name of the flag SEMANTIC_TAG

Tip
There are 2 types of flags LEX_ITEM and VERTEX, they are enums of type LexObjectType, which is in the class LexObject

Tip
You can know the flags by looking into every Stage and see what are the flags it sets for the items it creates.

Second obstacle, How do we get the tokens?

This can also be done in one line,

Code Block

language	java
theme	Midnight

List<LexItem> items = SagaGraph.getTokens(start, last, lexItem -> lexItem.hasFlag(SEMANTIC_TAG));

We use the class SagaGraph, and call the method getTokens
We specify the start and end from where we want to find the tokens
We add a filter function which check the item has the flag SEMANTIC_TAG
Return the result as a list of LexItems

Now our code should look like this

Code Block

language	java
theme	Midnight

package com.accenture.saga;

import com.accenture.saga.engine.*;
import com.accenture.saga.engine.stages.SimpleReaderStage;
import com.accenture.saga.exception.SagaException;
import com.accenture.saga.json.SagaJsonFactory;
import com.accenture.saga.resourcemgr.ResourceManager;
import com.accenture.saga.tags.TagManager;
import com.accenture.saga.utilities.SagaGraph;

import java.io.ByteArrayInputStream;
import java.nio.charset.StandardCharsets;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class Main {

    SagaEngine engine;
    ResourceManager resourceManager;
    TagManager tagManager;
    PipelineManager pipelineManager;

    /**
     * Constructor
     */
    public Main(String text, List<String> tags) throws SagaException {

        resourceManager = new ResourceManager();

        resourceManager.registerProvider(
                SagaJsonFactory.getInstance(
                        "{" +
                                "    \"name\": \"saga-provider\"," +
                                "    \"type\": \"Elastic\"," +
                                "    \"scheme\": \"http\"," +
                                "    \"hostname\": \"localhost\"," +
                                "    \"port\": 9200,\n" +
                                "    \"timestamp\": \"updatedAt\"," +
                                "    \"exclude\": [" +
                                "      \"updatedAt\"," +
                                "      \"createdAt\"" +
                                "    ]" +
                                "}"
                )
        );

        tagManager = new TagManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_tags\"}"));

        pipelineManager = new PipelineManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_pipelines\"}"));

        engine = new SagaEngine();

        engine.setResourceManager(resourceManager);
        engine.setTagManager(tagManager);

        SimpleReaderStage simpleReaderStage = new SimpleReaderStage(engine, SagaJsonFactory.getInstance("{ \"splitRegex\": \"[\\r|\\n]+\"}"));

        pipelineManager.buildPipelineFor(engine, tags, simpleReaderStage);

        engine.reset(new ByteArrayInputStream(text.getBytes(StandardCharsets.UTF_8)));

        Vertex start = engine.advance();

        Vertex last = engine.getAllVertex(start).getLast();

        // Print Graph

        GraphPrinter.printOnce(engine, start, last);

        // Get specific items

        Flag SEMANTIC_TAG = engine.getFlagForRead(LexObject.LexObjectType.LEX_ITEM, "SEMANTIC_TAG");

        List<LexItem> items = SagaGraph.getTokens(start, last, lexItem -> lexItem.hasFlag(SEMANTIC_TAG));

        items.forEach(item -> System.out.println(item.toStringForDebug()));
    }

    public static void main(String[] args) throws SagaException {

        List<String> tags = new ArrayList<>(Arrays.asList("tag1", "tag2", "tag3"));

        String text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed egestas orci eu mauris luctus consequat.";

        Main _instance = new Main(text, tags);

        System.exit(0);
    }
}

At the end a println was added at the end to see the token we got

Code Block
I"{tag1}"(40:55)

I indicates Item, the text inside the quotes is the text of the item, and the numbers between parenthesis is the position in characters of the token (start:end)

Getting The Highest Route

The highest route is the interpretation with the highest confidence, amount other factors (e.g. largest tokens, more complex patterns, ...), and is the interpretation that most like we want. Let's start we the code and then explained

Code Block

language	java
theme	Midnight

Flag TEXT_BLOCK = engine.getFlagForRead(LexObject.LexObjectType.LEX_ITEM, "TEXT_BLOCK");

List<LexItem> route = SagaGraph.getRoute(start, last, lexItem -> !lexItem.hasFlag(TEXT_BLOCK));

Normally we would want to ignore any token with the TEXT_BLOCKSBLOCK flag in it
We use again the class SagaGraph, and call the function getRoute
Add the start and end vertex
We add a filter function which check the item doesn't have TEXT_BLOCK flag

Code Block

package com.accenture.saga;

import com.accenture.saga.engine.*;
import com.accenture.saga.engine.stages.SimpleReaderStage;
import com.accenture.saga.exception.SagaException;
import com.accenture.saga.json.SagaJsonFactory;
import com.accenture.saga.resourcemgr.ResourceManager;
import com.accenture.saga.tags.TagManager;
import com.accenture.saga.utilities.SagaGraph;

import java.io.ByteArrayInputStream;
import java.nio.charset.StandardCharsets;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class Main {

    SagaEngine engine;
    ResourceManager resourceManager;
    TagManager tagManager;
    PipelineManager pipelineManager;

    /**
     * Constructor
     */
    public Main(String text, List<String> tags) throws SagaException {

        resourceManager = new ResourceManager();

        resourceManager.registerProvider(
                SagaJsonFactory.getInstance(
                        "{" +
                                "    \"name\": \"saga-provider\"," +
                                "    \"type\": \"Elastic\"," +
                                "    \"scheme\": \"http\"," +
                                "    \"hostname\": \"localhost\"," +
                                "    \"port\": 9200,\n" +
                                "    \"timestamp\": \"updatedAt\"," +
                                "    \"exclude\": [" +
                                "      \"updatedAt\"," +
                                "      \"createdAt\"" +
                                "    ]" +
                                "}"
                )
        );

        tagManager = new TagManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_tags\"}"));

        pipelineManager = new PipelineManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_pipelines\"}"));

        engine = new SagaEngine();

        engine.setResourceManager(resourceManager);
        engine.setTagManager(tagManager);

        SimpleReaderStage simpleReaderStage = new SimpleReaderStage(engine, SagaJsonFactory.getInstance("{ \"splitRegex\": \"[\\r|\\n]+\"}"));

        pipelineManager.buildPipelineFor(engine, tags, simpleReaderStage);

        engine.reset(new ByteArrayInputStream(text.getBytes(StandardCharsets.UTF_8)));

        Vertex start = engine.advance();

        Vertex last = engine.getAllVertex(start).getLast();

        // Print Graph

        GraphPrinter.printOnce(engine, start, last);

        // Get specific items

        Flag SEMANTIC_TAG = engine.getFlagForRead(LexObject.LexObjectType.LEX_ITEM, "SEMANTIC_TAG");

        List<LexItem> items = SagaGraph.getTokens(start, last, lexItem -> lexItem.hasFlag(SEMANTIC_TAG));

        items.forEach(item -> System.out.println(item.toStringForDebug()));

        // Get the highest route

        Flag TEXT_BLOCK = engine.getFlagForRead(LexObject.LexObjectType.LEX_ITEM, "TEXT_BLOCK");

        List<LexItem> route = SagaGraph.getRoute(start, last, lexItem -> !lexItem.hasFlag(TEXT_BLOCK));

        route.forEach(item -> System.out.println(item.toStringForDebug()));
    }

    public static void main(String[] args) throws SagaException {

        List<String> tags = new ArrayList<>(Arrays.asList("tag1", "tag2", "tag3"));

        String text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed egestas orci eu mauris luctus consequat.";

        Main _instance = new Main(text, tags);

        System.exit(0);
    }
}

Once again a println was added at the end to see the tokens we got, from the route

Code Block

theme	FadeToGrey

I"Lorem"(0:5)
I"ipsum"(6:11)
I"dolor"(12:17)
I"sit"(18:21)
I"amet"(22:26)
I","(26:27)
I"consectetur"(28:39)
I"{tag1}"(40:55)
I"Sed"(57:60)
I"egestas"(61:68)
I"orci"(69:73)
I"eu"(74:76)
I"mauris"(77:83)
I"luctus"(84:90)
I"consequat"(91:100)

Important Note

Something important to keep in mind is that advanced function returns the first vertex of a text block, and that text block can represent the entirety of the text, but many times will only represent a fraction of the text; so we need to keep calling advanced again and again until we reach the end, meaning the vertex comes null

Code Block

package com.accenture.saga;

import com.accenture.saga.engine.*;
import com.accenture.saga.engine.stages.SimpleReaderStage;
import com.accenture.saga.exception.SagaException;
import com.accenture.saga.json.SagaJsonFactory;
import com.accenture.saga.resourcemgr.ResourceManager;
import com.accenture.saga.tags.TagManager;
import com.accenture.saga.utilities.SagaGraph;

import java.io.ByteArrayInputStream;
import java.nio.charset.StandardCharsets;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class Main {

    SagaEngine engine;
    ResourceManager resourceManager;
    TagManager tagManager;
    PipelineManager pipelineManager;

    /**
     * Constructor
     */
    public Main(String text, List<String> tags) throws SagaException {

        resourceManager = new ResourceManager();

        resourceManager.registerProvider(
                SagaJsonFactory.getInstance(
                        "{" +
                                "    \"name\": \"saga-provider\"," +
                                "    \"type\": \"Elastic\"," +
                                "    \"scheme\": \"http\"," +
                                "    \"hostname\": \"localhost\"," +
                                "    \"port\": 9200,\n" +
                                "    \"timestamp\": \"updatedAt\"," +
                                "    \"exclude\": [" +
                                "      \"updatedAt\"," +
                                "      \"createdAt\"" +
                                "    ]" +
                                "}"
                )
        );

        tagManager = new TagManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_tags\"}"));

        pipelineManager = new PipelineManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_pipelines\"}"));

        engine = new SagaEngine();

        engine.setResourceManager(resourceManager);
        engine.setTagManager(tagManager);

        SimpleReaderStage simpleReaderStage = new SimpleReaderStage(engine, SagaJsonFactory.getInstance("{ \"splitRegex\": \"[\\r|\\n]+\"}"));

        pipelineManager.buildPipelineFor(engine, tags, simpleReaderStage);

        engine.reset(new ByteArrayInputStream(text.getBytes(StandardCharsets.UTF_8)));

        Vertex start = engine.advance();

        do {

            Vertex last = engine.getAllVertex(start).getLast();

            // Print Graph

            GraphPrinter.printOnce(engine, start, last);

            // Get specific items

            Flag SEMANTIC_TAG = engine.getFlagForRead(LexObject.LexObjectType.LEX_ITEM, "SEMANTIC_TAG");

            List<LexItem> items = SagaGraph.getTokens(start, last, lexItem -> lexItem.hasFlag(SEMANTIC_TAG));

            items.forEach(item -> System.out.println(item.toStringForDebug()));

            // Get the highest route

            Flag TEXT_BLOCK = engine.getFlagForRead(LexObject.LexObjectType.LEX_ITEM, "TEXT_BLOCK");

            List<LexItem> route = SagaGraph.getRoute(start, last, lexItem -> !lexItem.hasFlag(TEXT_BLOCK));

            route.forEach(item -> System.out.println(item.toStringForDebug()));

            start = engine.advance();

        }while (start != null);
    }

    public static void main(String[] args) throws SagaException {

        List<String> tags = new ArrayList<>(Arrays.asList("tag1", "tag2", "tag3"));

        String text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed egestas orci eu mauris luctus consequat.";

        Main _instance = new Main(text, tags);

        System.exit(0);
    }
}

Note

Something important to keep in mine is that advanced returns the first vertex of a text block, and that text block can represent the entirety of the text, but many time will only represent a fraction of the text; so we need to keep calling advanced again and

Page tree

Versions Compared

Old Version 17

New Version 18

Key

Step-by-step guide

Configure pom.xml

Initializing The Saga Components

Setting Up The Engine

Working With The Pipeline Manager

Pros & Cons

Request A Pipeline

Process A Text

Working With The Graph

Printing The Graph

Searching For a Tag

Getting The Highest Route

Important Note

Related articles