When For use when developing an application with Saga Library embedded, and connecting to the ElasticSearch manage managed by the Saga Server.
Note | ||
---|---|---|
| ||
This tutorial assumes:
|
Table of Contents | ||||||
---|---|---|---|---|---|---|
|
|
In the pom.xml of the project, a basic example of the minimum configuration can be found below. But The elementary section of this configuration is the dependencies section, where we need two main libraries: saga-library and saga-elastic-provider.
saga-library | saga-elastic-provider | ||
---|---|---|---|
The core library of Saga, this dependency includes the Engine, Stages, Tag Manager, Pipeline Manager and Resource Manager which are all of the parts necessary to use Saga in any application. | This dependency will grant us access to ElasticSearch as a provider for Saga, which means our Stages and Managers will be able to fetch the data directly from this provider.
|
Other important configuration to notice is considerations of note are the use of Java 11, for for the compilation of the code, and the encoding UTF-8, as you can see as shown in the lines 36-38.
Code Block | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.accenture.saga</groupId> <artifactId>saga-howto</artifactId> <version>1.0.0-SNAPSHOT</version> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <saga.version>1.0.0-SNAPSHOT</saga.version> </properties> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.8.0</version> <executions> <execution> <id>compile</id> <phase>compile</phase> <goals> <goal>compile</goal> </goals> </execution> <execution> <id>testCompile</id> <phase>test-compile</phase> <goals> <goal>testCompile</goal> </goals> </execution> </executions> <configuration> <source>11</source> <target>11</target> <encoding>${project.build.sourceEncoding}</encoding> </configuration> </plugin> <plugin> <artifactId>maven-assembly-plugin</artifactId> <version>3.1.0</version> <configuration> <archive> <manifest> <mainClass>com.accenture.saga.server.SagaServer</mainClass> </manifest> </archive> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> <appendAssemblyId>false</appendAssemblyId> </configuration> <executions> <execution> <id>make-assembly</id> <!-- this is used for inheritance merges --> <phase>package</phase> <!-- bind to the packaging phase --> <goals> <goal>single</goal> </goals> </execution> </executions> </plugin> </plugins> </build> <dependencies> <dependency> <groupId>com.accenture.saga</groupId> <artifactId>saga-library</artifactId> <version>${saga.version}</version> <scope>compile</scope> </dependency> <dependency> <groupId>com.accenture.saga</groupId> <artifactId>saga-elastic-indexer</artifactId> <version>${saga.version}</version> <scope>compile</scope> </dependency> <dependency> <groupId>com.accenture.saga</groupId> <artifactId>saga-elastic-provider</artifactId> <version>${saga.version}</version> <scope>compile</scope> </dependency> </dependencies> </project> |
For starters, we will create 1. Begin by creating a main class , which which will hold a SagaEngine, ResourceManager, TagManager and PipelineManager.
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
package com.accenture.saga; import com.accenture.saga.engine.PipelineManager; import com.accenture.saga.engine.SagaEngine; import com.accenture.saga.resourcemgr.ResourceManager; import com.accenture.saga.tags.TagManager; public class Main { SagaEngine engine; ResourceManager resourceManager; TagManager tagManager; PipelineManager pipelineManager; /** * Constructor */ public Main() { } public static void main(String[] args) { Main _instance = new Main(); } } |
2. Create and configure the ResourceManager.
3. Add First we start by creating an configuring the ResourceManager, and adding a provider to it, but this . (This configuration will be hard-coded, so we need to add )
4. Add the SagaJsonFactory class, which allow allows us to create SagaJson objects (the standard document of Saga) from text, files or readers.
The configuration we are going to use for the provider is the following:
Saga_json |
---|
"name": "saga-provider", "type": "Elastic", "scheme": "http", "hostnamehostnamesAndPorts": ["localhost", "port": 9200"], "timestamp": "updatedAt", "exclude": [ "updatedAt", "createdAt" ] |
Each field from the top, starting with the common:
Parameter | ||||||
---|---|---|---|---|---|---|
|
Parameter | ||||||
---|---|---|---|---|---|---|
|
from From here on, all of the properties are specific to saga-elastic-indexer:
Parameter | ||||||
---|---|---|---|---|---|---|
|
Parameter | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||
|
Parameter | ||||||
---|---|---|---|---|---|---|
|
Parameter | ||||||
---|---|---|---|---|---|---|
|
Our code should look this this:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
public Main(String text, List<String> tags) throws SagaException { resourceManager = new ResourceManager(); resourceManager.registerProvider( SagaJsonFactory.getInstance( "{" + " \"name\": \"saga-provider\"," + " \"type\": \"Elastic\"," + " \"scheme\": \"http\"," + " \"hostnamehostnamesAndPorts\": [\"localhost:9200\"]," + " \"port\": 9200,\n" + " \"timestamp\": \"updatedAt\"," + " \"exclude\": [" + " \"updatedAt\"," + " \"createdAt\"" + " ]" + "}" ) ); } |
5. Next we proceed with the configuration of the TagManager below the ResourceManager, once again we will hard. (Hard-code the configuration for this one.)
Saga_json |
---|
"resource": "saga-provider:saga_tags" |
In the configuration above, saga-provider is representing the provider we add added to the ResourceManager in the previous configuration, then . Then the colon (:) indicates the division between the provider and the actual resource; since .
Since we are using a saga-elastic-provider, the resources will be index names.
Since we indexes names, and since were are connecting to a Saga index , created by the Saga serverServer, all of the indexes will be a combination between of the solution's name (usually will be saga), and an underscore (_) and the type of data the index holds , (in this case, tags , forming the name saga_tags ).
The code should look now like this now:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
public Main(String text, List<String> tags) throws SagaException { resourceManager = new ResourceManager(); resourceManager.registerProvider( SagaJsonFactory.getInstance( "{" + " \"name\": \"saga-provider\"," + " \"type\": \"Elastic\"," + " \"scheme\": \"http\"," + " \"hostnamehostnamesAndPorts\": [\"localhost:9200\"]," + " \"port\": 9200,\n" + " \"timestamp\": \"updatedAt\"," + " \"exclude\": [" + " \"updatedAt\"," + " \"createdAt\"" + " ]" + "}" ) ); tagManager = new TagManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_tags\"}")); } |
As you can see, the TagManager , receives the ResourceManager as a parameter the ResourceManager, which grant grants us access to the resource saga-provider:saga_tags.
In the same way6. Similarly, we will proceed to configure the PipelinesManager using the following configuration:
Saga_json |
---|
"resource": "saga-provider:saga_pipelines" |
Once again, saga-provider , does reference to references the provider added in the ResourceManager, and saga_pipelines is a combination between of the name of the solution's name and the type of data, data (in this case, pipelines).
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
public Main(String text, List<String> tags) throws SagaException { resourceManager = new ResourceManager(); resourceManager.registerProvider( SagaJsonFactory.getInstance( "{" + " \"name\": \"saga-provider\"," + " \"type\": \"Elastic\"," + " \"scheme\": \"http\"," + " \"hostnamehostnamesAndPorts\": [\"localhost\"," + " \"port\": 9200,\n:9200\"]," + " \"timestamp\": \"updatedAt\"," + " \"exclude\": [" + " \"updatedAt\"," + " \"createdAt\"" + " ]" + "}" ) ); tagManager = new TagManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_tags\"}")); pipelineManager = new PipelineManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_pipelines\"}")); } |
So one we are at the fun part, once we got the After we have set up the Resource, Tag and Pipeline Manager all set up, we can assign the ResourceManager and the TagManager to the Engine.
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
public Main(String text, List<String> tags) throws SagaException { resourceManager = new ResourceManager(); resourceManager.registerProvider( SagaJsonFactory.getInstance( "{" + " \"name\": \"saga-provider\"," + " \"type\": \"Elastic\"," + " \"scheme\": \"http\"," + " \"hostnamehostnamesAndPorts\": [\"localhost:9200\"]," + " \"port\": 9200,\n" + " \"timestamp\": \"updatedAt\"," + " \"exclude\": [" + " \"updatedAt\"," + " \"createdAt\"" + " ]" + "}" ) ); tagManager = new TagManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_tags\"}")); pipelineManager = new PipelineManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_pipelines\"}")); engine = new SagaEngine(); engine.setResourceManager(resourceManager); engine.setTagManager(tagManager); } |
We have two options:*
Option 1: Let the PipelineManager build the pipeline for us
,using a set of tags (that we will provide)
; the second option is to manually.
Option 2: Manually provide a complete pipeline configuration to the PipelineManager.
Option 1: Automatic Pipeline | Option 2: Manual Pipeline | |
---|---|---|
Pros |
|
| |
Cons |
|
| |
Tie |
|
|
|
|
*Since the first one is the most Option 1 is more flexible and the one that makes use of the configuration in ElasticSearch, we will use that one.
Before asking the PipelineManager for a pipeline, we need to provide a stage of type TextBlockReader, at . At the moment, we only have one stage of that type, the SimpleReaderStage, which requires a splitRegex in the configuration , as a SagaJson object. So let
1. Let's add that to the code shall we.
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
public Main(String text, List<String> tags) throws SagaException { resourceManager = new ResourceManager(); resourceManager.registerProvider( SagaJsonFactory.getInstance( "{" + " \"name\": \"saga-provider\"," + " \"typename\": \"Elasticsaga-provider\"," + " \"schemetype\": \"httpElastic\"," + " \"hostnamescheme\": \"localhosthttp\"," + " \"porthostnamesAndPorts\": [\"localhost:9200\"],\n" + " \"timestamp\": \"updatedAt\"," + " \"exclude\": [" + " \"updatedAt\"," + " \"createdAt\"" + " ]" + "}" ) ); tagManager = new TagManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_tags\"}")); pipelineManager = new PipelineManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_pipelines\"}")); //The Fun Part engine = new SagaEngine(); engine.setResourceManager(resourceManager); engine.setTagManager(tagManager); SimpleReaderStage simpleReaderStage = new SimpleReaderStage(engine, SagaJsonFactory.getInstance("{ \"splitRegex\": \"[\\r|\\n]+\"}")); } |
Parameter | ||||||
---|---|---|---|---|---|---|
|
With the regex [\r\n]+ we are indicating the character signaling a break line; also , Also note that the SimpleReaderStage receives the engine as the first parameter.
Tip |
---|
The regex [\r\n]+, is the standard for mostly all the text you will be processing |
Now we can ask the PipelineManager to build a pipeline for the tags... Which we We still don't know where they came from, but let's fix that; first we add .
2. Add the building of the pipeline.
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
pipelineManager.buildPipelineFor(engine, tags, simpleReaderStage); |
Now we just add 3. Add the tags as a parameter of the constructor, our . Our code should look like this:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
package com.accenture.saga; import com.accenture.saga.engine.PipelineManager; import com.accenture.saga.engine.SagaEngine; import com.accenture.saga.engine.stages.SimpleReaderStage; import com.accenture.saga.exception.SagaException; import com.accenture.saga.json.SagaJsonFactory; import com.accenture.saga.resourcemgr.ResourceManager; import com.accenture.saga.tags.TagManager; import java.util.ArrayList; import java.util.Arrays; import java.util.List; public class Main { SagaEngine engine; ResourceManager resourceManager; TagManager tagManager; PipelineManager pipelineManager; /** * Constructor */ public Main(List<String> tags) throws SagaException { resourceManager = new ResourceManager(); resourceManager.registerProvider( SagaJsonFactory.getInstance( "{" + " \"name\": \"saga-provider\"," + " \"type\": \"Elastic\"," + " \"scheme\": \"http\"," + " \"hostname\": \"localhosthttp\"," + " \"porthostnamesAndPorts\": [\"localhost:9200\"],\n" + " \"timestamp\": \"updatedAt\"," + " \"exclude\": [" + " \"updatedAt\"," + " \"createdAt\"" + " ]" + "}" ) ); tagManager = new TagManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_tags\"}")); pipelineManager = new PipelineManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_pipelines\"}")); engine = new SagaEngine(); engine.setResourceManager(resourceManager); engine.setTagManager(tagManager); SimpleReaderStage simpleReaderStage = new SimpleReaderStage(engine, SagaJsonFactory.getInstance("{ \"splitRegex\": \"[\\r|\\n]+\"}")); pipelineManager.buildPipelineFor(engine, tags, simpleReaderStage); } public static void main(String[] args) throws SagaException { List<String> tags = new ArrayList<>(Arrays.asList("tag1", "tag2", "tag3")); Main _instance = new Main(tags); } } |
Here we just build built a list with the tags "tag1", "tag2", and "tag3", which we passed as a parameter of the constructor, so that the PipelineManager can have them to build the pipeline.
Now , which means our Engine is ready to receive text and build a graph.
Process ALet's also add a String parameter to the constructor , for the text we want to process, this . This text will be added to the Engine using the method reset.
Code Block | ||||
---|---|---|---|---|
| ||||
engine.reset(new ByteArrayInputStream(text.getBytes(StandardCharsets.UTF_8))); |
This method accepts an InputStream, this so we can specify the encoding in UTF-8; so in . In order to pass the text to Saga, we get the bytes from the text in UTF-8 encoding and create a ByteArrayInputStream with them.
Note |
---|
All of the process processing done by Saga is with the encoding UTF-8. |
At the moment, we only told Saga which is the text, now . Now we need to process it. For this we , use the method advance, which returns a Vertex, this This will be the first Vertex of the text block.
Code Block | ||||
---|---|---|---|---|
| ||||
Vertex start = engine.advance(); |
Currently our code should look like this:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
package com.accenture.saga; import com.accenture.saga.engine.PipelineManager; import com.accenture.saga.engine.SagaEngine; import com.accenture.saga.engine.Vertex; import com.accenture.saga.engine.stages.SimpleReaderStage; import com.accenture.saga.exception.SagaException; import com.accenture.saga.json.SagaJsonFactory; import com.accenture.saga.resourcemgr.ResourceManager; import com.accenture.saga.tags.TagManager; import java.io.ByteArrayInputStream; import java.nio.charset.StandardCharsets; import java.util.ArrayList; import java.util.Arrays; import java.util.List; public class Main { SagaEngine engine; ResourceManager resourceManager; TagManager tagManager; PipelineManager pipelineManager; /** * Constructor */ public Main(String text, List<String> tags) throws SagaException { resourceManager = new ResourceManager(); resourceManager.registerProvider( SagaJsonFactory.getInstance( "{" + SagaJsonFactory.getInstance( " \"name\": \"saga-provider\","{" + " \"typename\": \"Elasticsaga-provider\"," + " \"schemetype\": \"httpElastic\"," + " \"hostnamescheme\": \"localhosthttp\"," + " \"porthostnamesAndPorts\": [\"localhost:9200\"],\n" + " \"timestamp\": \"updatedAt\"," + " \"exclude\": [" + " \"updatedAt\"," + " \"createdAt\"" + " ]" + "}" ) ); tagManager = new TagManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_tags\"}")); pipelineManager = new PipelineManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_pipelines\"}")); engine = new SagaEngine(); engine.setResourceManager(resourceManager); engine.setTagManager(tagManager); SimpleReaderStage simpleReaderStage = new SimpleReaderStage(engine, SagaJsonFactory.getInstance("{ \"splitRegex\": \"[\\r|\\n]+\"}")); pipelineManager.buildPipelineFor(engine, tags, simpleReaderStage); engine.reset(new ByteArrayInputStream(text.getBytes(StandardCharsets.UTF_8))); Vertex start = engine.advance(); } public static void main(String[] args) throws SagaException { List<String> tags = new ArrayList<>(Arrays.asList("tag1", "tag2", "tag3")); String text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed egestas orci eu mauris luctus consequat."; Main _instance = new Main(text, tags); } } |
Tip |
---|
For more specific information on how to navigate through the graph, please visit Understanding Interpretation Graphs |
Before we start with this section, just a clarification, the graph has a width an extensive range of applications that varies depending of the content, the tags, the stages and the user final goal. For this example we will do 3 of the most primordial cases, print the graph, search for a specific type of tag, and getting the highest value route.
For this examples we will asume assume that tag1 has an Entity Recognizer identifying "adipiscing elit", and the baseline-pipeline is the following
Saga_json |
---|
"stages": [ { "language": "en", "type": "TextBreakerStage" }, { "requiredFlags": [ "SENTENCE" ], "type": "WhitespaceTokenizerStage" }, { "type": "StopWordsStage" }, { "type": "CaseAnalysis" }, { "type": "CharChangeSplitter" } ] |
Printing the graph it's the most basic of all the cases, since it allow us to have a more graphical picture on how Saga processed the text.
For this case we will be using the class GraphPrinter and the static method printOnce, which needs the engine, the start vertex and the last vertex, so basically we would be printing a section of the graph, which in this case is the entire text block section.
We have the engine, and the start vertex; for the last vertex, the engine has a method getAllVertex, which returns a LinkedList of all the vertex processed by the engine, so we can get the last element (vertex) from the list.
Code Block | ||||
---|---|---|---|---|
| ||||
Vertex start = engine.advance(); Vertex last = engine.getAllVertex().getLast(); // Print Graph GraphPrinter.printOnce(engine, start, last); |
This will show a text version of the graph in the console, much like this one
Code Block | ||||
---|---|---|---|---|
| ||||
INFO | {GraphPrinter} | V---------------------------------[Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed egestas orci eu mauris luctus consequat.]---------------------------------V INFO | {GraphPrinter} | ^-----------------[Lorem ipsum dolor sit amet, consectetur adipiscing elit]------------------V-------------[Sed egestas orci eu mauris luctus consequat]-------------V-[]-^ INFO | {GraphPrinter} | ^-[Lorem]-V-[ipsum]-V-[dolor]-V-[sit]-V---[amet,]----V-[consectetur]-V-[adipiscing]-V-[elit]-^-[Sed]-V-[egestas]-V-[orci]-V-[eu]-V-[mauris]-V-[luctus]-V-[consequat]-^ INFO | {GraphPrinter} | ^-[lorem]-^ ^-[amet]-V-[,]-^ ^-------[{tag1}]--------^-[sed]-^ |
Where V represents the first appearance of a vertex, ^ is a connection to an existent vertex, everything between [ ] is a token, and inside a token if it appears between { }, is a semantic tag
Tip |
---|
If you want reduce the vertex returned by the getAllVertex, you can also specify only the first vertex, or the first and the last vertex |
In this case we will search for all the tokens that are semantic tags.
First obstacle, How do we identify a semantic tag?
That's easy, all the tokens that are semantic tags, have a flag called SEMANTIC_TAG, so we need to first get that flag; these flags are define by the engine, so we will ask it.
Code Block | ||||
---|---|---|---|---|
| ||||
Flag SEMANTIC_TAG = engine.getFlagForRead(LexObject.LexObjectType.LEX_ITEM, "SEMANTIC_TAG"); |
Tip |
---|
There are 2 types of flags LEX_ITEM and VERTEX, they are enums of type LexObjectType, which is in the class LexObject |
Tip |
---|
You can know the flags by looking into every Stage and see what are the flags it sets for the items it creates. |
Second obstacle, How do we get the tokens?
This can also be done in one line,
Code Block | ||||
---|---|---|---|---|
| ||||
List<LexItem> items = SagaGraph.getTokens(start, last, lexItem -> lexItem.hasFlag(SEMANTIC_TAG)); |
Now our code should look like this
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
package com.accenture.saga; import com.accenture.saga.engine.*; import com.accenture.saga.engine.stages.SimpleReaderStage; import com.accenture.saga.exception.SagaException; import com.accenture.saga.json.SagaJsonFactory; import com.accenture.saga.resourcemgr.ResourceManager; import com.accenture.saga.tags.TagManager; import com.accenture.saga.utilities.SagaGraph; import java.io.ByteArrayInputStream; import java.nio.charset.StandardCharsets; import java.util.ArrayList; import java.util.Arrays; import java.util.List; public class Main { SagaEngine engine; ResourceManager resourceManager; TagManager tagManager; PipelineManager pipelineManager; /** * Constructor */ public Main(String text, List<String> tags) throws SagaException { resourceManager = new ResourceManager(); resourceManager.registerProvider( SagaJsonFactory.getInstance( resourceManager.registerProvider( "{" + SagaJsonFactory.getInstance( " \"name\": \"saga-provider\","{" + " \"typename\": \"Elasticsaga-provider\"," + " \"schemetype\": \"httpElastic\"," + " \"hostnamescheme\": \"localhosthttp\"," + " \"porthostnamesAndPorts\": [\"localhost:9200\"],\n" + " \"timestamp\": \"updatedAt\"," + " \"exclude\": [" + " \"updatedAt\"," + " \"createdAt\"" + " ]" + "}" ) ); tagManager = new TagManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_tags\"}")); pipelineManager = new PipelineManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_pipelines\"}")); engine = new SagaEngine(); engine.setResourceManager(resourceManager); engine.setTagManager(tagManager); SimpleReaderStage simpleReaderStage = new SimpleReaderStage(engine, SagaJsonFactory.getInstance("{ \"splitRegex\": \"[\\r|\\n]+\"}")); pipelineManager.buildPipelineFor(engine, tags, simpleReaderStage); engine.reset(new ByteArrayInputStream(text.getBytes(StandardCharsets.UTF_8))); Vertex start = engine.advance(); Vertex last = engine.getAllVertex(start).getLast(); // Print Graph GraphPrinter.printOnce(engine, start, last); // Get specific items Flag SEMANTIC_TAG = engine.getFlagForRead(LexObject.LexObjectType.LEX_ITEM, "SEMANTIC_TAG"); List<LexItem> items = SagaGraph.getTokens(start, last, lexItem -> lexItem.hasFlag(SEMANTIC_TAG)); items.forEach(item -> System.out.println(item.toStringForDebug())); } public static void main(String[] args) throws SagaException { List<String> tags = new ArrayList<>(Arrays.asList("tag1", "tag2", "tag3")); String text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed egestas orci eu mauris luctus consequat."; Main _instance = new Main(text, tags); System.exit(0); } } |
At the end a println was added at the end to see the token we got
Code Block | ||||
---|---|---|---|---|
| ||||
I"{tag1}"(40:55) |
I indicates Item, the text inside the quotes is the text of the item, and the numbers between parenthesis is the position in characters of the token (start:end)
The highest route is the interpretation with the highest confidence, amount other factors (e.g. largest tokens, more complex patterns, ...), and is the interpretation that most like likely we want. Let's start we with the code and then explainedthe explanation
Code Block | ||||
---|---|---|---|---|
| ||||
Flag TEXT_BLOCK = engine.getFlagForRead(LexObject.LexObjectType.LEX_ITEM, "TEXT_BLOCK"); List<LexItem> route = SagaGraph.getRoute(start, last, lexItem -> !lexItem.hasFlag(TEXT_BLOCK)); |
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
package com.accenture.saga; import com.accenture.saga.engine.*; import com.accenture.saga.engine.stages.SimpleReaderStage; import com.accenture.saga.exception.SagaException; import com.accenture.saga.json.SagaJsonFactory; import com.accenture.saga.resourcemgr.ResourceManager; import com.accenture.saga.tags.TagManager; import com.accenture.saga.utilities.SagaGraph; import java.io.ByteArrayInputStream; import java.nio.charset.StandardCharsets; import java.util.ArrayList; import java.util.Arrays; import java.util.List; public class Main { SagaEngine engine; ResourceManager resourceManager; TagManager tagManager; PipelineManager pipelineManager; /** * Constructor */ public Main(String text, List<String> tags) throws SagaException { resourceManager = new ResourceManager(); resourceManager.registerProvider( SagaJsonFactory.getInstance( "{" + " \"name\": \"saga-provider\"," + " \"type\": \"Elasticsaga-provider\"," + " \"schemetype\": \"httpElastic\"," + " \"hostnamescheme\": \"localhosthttp\"," + " \"porthostnamesAndPorts\": [\"localhost:9200\"],\n" + " \"timestamp\": \"updatedAt\"," + " \"exclude\": [" + " \"updatedAt\"," + " \"createdAt\"" + " ]" + "}" ) ); tagManager = new TagManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_tags\"}")); pipelineManager = new PipelineManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_pipelines\"}")); engine = new SagaEngine(); engine.setResourceManager(resourceManager); engine.setTagManager(tagManager); SimpleReaderStage simpleReaderStage = new SimpleReaderStage(engine, SagaJsonFactory.getInstance("{ \"splitRegex\": \"[\\r|\\n]+\"}")); pipelineManager.buildPipelineFor(engine, tags, simpleReaderStage); engine.reset(new ByteArrayInputStream(text.getBytes(StandardCharsets.UTF_8))); Vertex start = engine.advance(); Vertex last = engine.getAllVertex(start).getLast(); // Print Graph GraphPrinter.printOnce(engine, start, last); // Get specific items Flag SEMANTIC_TAG = engine.getFlagForRead(LexObject.LexObjectType.LEX_ITEM, "SEMANTIC_TAG"); List<LexItem> items = SagaGraph.getTokens(start, last, lexItem -> lexItem.hasFlag(SEMANTIC_TAG)); items.forEach(item -> System.out.println(item.toStringForDebug())); // Get the highest route Flag TEXT_BLOCK = engine.getFlagForRead(LexObject.LexObjectType.LEX_ITEM, "TEXT_BLOCK"); List<LexItem> route = SagaGraph.getRoute(start, last, lexItem -> !lexItem.hasFlag(TEXT_BLOCK)); route.forEach(item -> System.out.println(item.toStringForDebug())); } public static void main(String[] args) throws SagaException { List<String> tags = new ArrayList<>(Arrays.asList("tag1", "tag2", "tag3")); String text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed egestas orci eu mauris luctus consequat."; Main _instance = new Main(text, tags); System.exit(0); } } |
Once again a println was added at the end to see the tokens we got, from the route
Code Block | ||||
---|---|---|---|---|
| ||||
I"Lorem"(0:5) I"ipsum"(6:11) I"dolor"(12:17) I"sit"(18:21) I"amet"(22:26) I","(26:27) I"consectetur"(28:39) I"{tag1}"(40:55) I"Sed"(57:60) I"egestas"(61:68) I"orci"(69:73) I"eu"(74:76) I"mauris"(77:83) I"luctus"(84:90) I"consequat"(91:100) |
As you can see in the eighth token is actually the tag tag1, since its confidence is higher than the tokens adipiscing and elit.
Note | ||
---|---|---|
|
Something important to keep in mind is |
that advance method returns the |
first vertex of a text block, and that text block can represent the entirety of the text, but many times will only represent a fraction of the text; so we need to keep calling advanced again and again until we reach the end, meaning the vertex |
comes null |
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
package com.accenture.saga; import com.accenture.saga.engine.*; import com.accenture.saga.engine.stages.SimpleReaderStage; import com.accenture.saga.exception.SagaException; import com.accenture.saga.json.SagaJsonFactory; import com.accenture.saga.resourcemgr.ResourceManager; import com.accenture.saga.tags.TagManager; import com.accenture.saga.utilities.SagaGraph; import java.io.ByteArrayInputStream; import java.nio.charset.StandardCharsets; import java.util.ArrayList; import java.util.Arrays; import java.util.List; public class Main { SagaEngine engine; ResourceManager resourceManager; TagManager tagManager; PipelineManager pipelineManager; /** * Constructor */ public Main(String text, List<String> tags) throws SagaException { resourceManager = new ResourceManager(); resourceManager.registerProvider( SagaJsonFactory.getInstance( "{" + " \"name\": \"saga-provider\"," + " \"type\": \"Elastic\"," + " \"scheme\": \"http\"," + " \"hostnamehostnamesAndPorts\": [\"localhost:9200\"]," + " \"port\": 9200,\n" + " \"timestamp\": \"updatedAt\"," + " \"exclude\": [" + " \"updatedAt\"," + " \"createdAt\"" + " ]" + "}" ) ); tagManager = new TagManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_tags\"}")); pipelineManager = new PipelineManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_pipelines\"}")); engine = new SagaEngine(); engine.setResourceManager(resourceManager); engine.setTagManager(tagManager); SimpleReaderStage simpleReaderStage = new SimpleReaderStage(engine, SagaJsonFactory.getInstance("{ \"splitRegex\": \"[\\r|\\n]+\"}")); pipelineManager.buildPipelineFor(engine, tags, simpleReaderStage); engine.reset(new ByteArrayInputStream(text.getBytes(StandardCharsets.UTF_8))); Vertex start = engine.advance(); do { Vertex last = engine.getAllVertex(start).getLast(); // Print Graph GraphPrinter.printOnce(engine, start, last); // Get specific items Flag SEMANTIC_TAG = engine.getFlagForRead(LexObject.LexObjectType.LEX_ITEM, "SEMANTIC_TAG"); List<LexItem> items = SagaGraph.getTokens(start, last, lexItem -> lexItem.hasFlag(SEMANTIC_TAG)); items.forEach(item -> System.out.println(item.toStringForDebug())); // Get the highest route Flag TEXT_BLOCK = engine.getFlagForRead(LexObject.LexObjectType.LEX_ITEM, "TEXT_BLOCK"); List<LexItem> route = SagaGraph.getRoute(start, last, lexItem -> !lexItem.hasFlag(TEXT_BLOCK)); route.forEach(item -> System.out.println(item.toStringForDebug())); start = engine.advance(); }while (start != null); } public static void main(String[] args) throws SagaException { List<String> tags = new ArrayList<>(Arrays.asList("tag1", "tag2", "tag3")); String text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed egestas orci eu mauris luctus consequat."; Main _instance = new Main(text, tags); System.exit(0); } } |
Content by Label | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
...