Working With The Pipeline Manager
From here we got 2 options, first option is letting the PipelineManager build the pipeline for us, using a set of tags (that we will provide); the second option is to manually provide a complete pipeline configuration to the PipelineManager
Pros & Cons
Automatic Pipeline | Manual Pipeline |
---|
Pros Uses configuration set up through Saga's UI Loads only the necessary from the resource (tags, stages, ...) Builds pipeline based on tag dependency Can generate multiple and different pipelines Each Recognizer can have a base pipeline as dependency
| Pros |
Cons | Cons Configuration of every stage must be done manually Relies strongly in the knowledge of the user for each possible stage configuration Lack of flexibly when changing to another pipeline
|
Tie - Needs a stage of type TextBlockReader configure manually
| Tie - Needs a stage of type TextBlockReader configure manually
|
Since the first one is the most flexible and the one that makes use of the configuration in ElasticSearch, we will use that one.
Request A Pipeline
Before asking the PipelineManager for a pipeline, we need to provide a stage of type TextBlockReader, at the moment we only have one stage of that type, the SimpleReaderStage, which requires a splitRegex a splitRegex in the configuration, as a SagaJson object. So let's add that to the code shall we.
Code Block |
---|
language | java |
---|
theme | RDark |
---|
linenumbers | true |
---|
|
public Main() {
resourceManager = new ResourceManager();
resourceManager.registerProvider(
SagaJsonFactory.getInstance(
"{"
+ "\"name\":\"saga-provider\","
+ "\"type\":\"FileSystem\","
+ "\"baseDir\":\"testdata\""
+ "}"
)
);
tagManager = new TagManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_tags\"}"));
pipelineManager = new PipelineManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_pipelines\"}"));
//The Fun Part
engine = new SagaEngine();
engine.setResourceManager(resourceManager);
engine.setTagManager(tagManager);
SimpleReaderStage simpleReaderStage = new SimpleReaderStage(engine, SagaJsonFactory.getInstance("{ \"splitRegex\": \"[\\r|\\n]+\"}"));
}
|
Parameter |
---|
summary | asdas |
---|
name | splitRegex |
---|
|
With the The splitRegex is the way the SimpleReader know when to split the text for processing, with the regex [\r\n]+ we are indicating the character signaling a break line; also note the SimpleReaderStage receives the engine as the first parameter.
Tip |
---|
The regex [\r\n]+, is the standard for mostly all the text you will be processing |
Now we can ask the PipelineManager to build a pipeline for the tags... Which we still don't know where they came from, but let's fix that; first we add the building of the pipeline.
Code Block |
---|
language | java |
---|
theme | RDark |
---|
linenumbers | true |
---|
|
pipelineManager.buildPipelineFor(engine, tags, simpleReaderStage);
|
Now we just add the tags as a parameter of the constructor, our code should look like this
Code Block |
---|
language | java |
---|
theme | RDark |
---|
linenumbers | true |
---|
|
package com.accenture.saga;
import com.accenture.saga.engine.PipelineManager;
import com.accenture.saga.engine.SagaEngine;
import com.accenture.saga.engine.stages.SimpleReaderStage;
import com.accenture.saga.exception.SagaException;
import com.accenture.saga.json.SagaJsonFactory;
import com.accenture.saga.resourcemgr.ResourceManager;
import com.accenture.saga.tags.TagManager;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
public class Main {
SagaEngine engine;
ResourceManager resourceManager;
TagManager tagManager;
PipelineManager pipelineManager;
/**
* Constructor
*/
public Main(List<String> tags) throws SagaException {
resourceManager = new ResourceManager();
resourceManager.registerProvider(
SagaJsonFactory.getInstance(
"{"
+ "\"name\":\"saga-provider\","
+ "\"type\":\"FileSystem\","
+ "\"baseDir\":\"testdata\""
+ "}"
)
);
tagManager = new TagManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_tags\"}"));
pipelineManager = new PipelineManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_pipelines\"}"));
engine = new SagaEngine();
engine.setResourceManager(resourceManager);
engine.setTagManager(tagManager);
SimpleReaderStage simpleReaderStage = new SimpleReaderStage(engine, SagaJsonFactory.getInstance("{ \"splitRegex\": \"[\\r|\\n]+\"}"));
pipelineManager.buildPipelineFor(engine, tags, simpleReaderStage);
}
public static void main(String[] args) throws SagaException {
List<String> tags = List.of(args); // Just converting the arguments into a list
new ArrayList<>(Arrays.asList("tag1", "tag2", "tag3"));
Main _instance = new Main(tags);
}
} |
In this code, we are assuming that all the arguments passed to the application will be tags, so we are converting those arguments into a listHere we just build a list with the tags "tag1", "tag2", and "tag3", which we passed as a parameter of the constructor, so the PipelineManager can have them to build the pipeline.
At this point pipelineManager.buildPipelineFor uses the tags we specified to identified the stages which recognize these tags, from this stages build a dependency hierarchy, which adds any necessary tags and stages in order to found the specified tags; once it has identified all the stages and tags necessary, PipelineManger adds them to the Engine we provided, which means our Engine is ready to receive text and build a graph.
Process A Text
Let's also add a String parameter to the constructor, for the text we want to process, this text will be added to the Engine using the method reset.
Code Block |
---|
|
engine.reset(new ByteArrayInputStream(text.getBytes(StandardCharsets.UTF_8))); |
This method accepts an InputStream, this so we can specify the encoding in UTF-8; so in order to pass the text to Saga we get the bytes from the text in UTF-8 encoding and create a ByteArrayInputStream with them.
Note |
---|
All the process done by Saga is with the encoding UTF-8 |
At the moment we only told Saga which is the text, now we need to process it. For this we use the method advance, which returns a Vertex, this will be the first Vertex of the text block.
Code Block |
---|
|
Vertex v = engine.advance(); |
Currently our code should look like this
Code Block |
---|
language | java |
---|
theme | RDark |
---|
linenumbers | true |
---|
|
package com.accenture.saga;
import com.accenture.saga.engine.PipelineManager;
import com.accenture.saga.engine.SagaEngine;
import com.accenture.saga.engine.Vertex;
import com.accenture.saga.engine.stages.SimpleReaderStage;
import com.accenture.saga.exception.SagaException;
import com.accenture.saga.json.SagaJsonFactory;
import com.accenture.saga.resourcemgr.ResourceManager;
import com.accenture.saga.tags.TagManager;
import java.io.ByteArrayInputStream;
import java.nio.charset.StandardCharsets;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
public class Main {
SagaEngine engine;
ResourceManager resourceManager;
TagManager tagManager;
PipelineManager pipelineManager;
/**
* Constructor
*/
public Main(String text, List<String> tags) throws SagaException {
resourceManager = new ResourceManager();
resourceManager.registerProvider(
SagaJsonFactory.getInstance(
"{"
+ "\"name\":\"saga-provider\","
+ "\"type\":\"FileSystem\","
+ "\"baseDir\":\"testdata\""
+ "}"
)
);
tagManager = new TagManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_tags\"}"));
pipelineManager = new PipelineManager(resourceManager, SagaJsonFactory.getInstance("{ \"resource\": \"saga-provider:saga_pipelines\"}"));
engine = new SagaEngine();
engine.setResourceManager(resourceManager);
engine.setTagManager(tagManager);
SimpleReaderStage simpleReaderStage = new SimpleReaderStage(engine, SagaJsonFactory.getInstance("{ \"splitRegex\": \"[\\r|\\n]+\"}"));
pipelineManager.buildPipelineFor(engine, tags, simpleReaderStage);
engine.reset(new ByteArrayInputStream(text.getBytes(StandardCharsets.UTF_8)));
Vertex v = engine.advance();
}
public static void main(String[] args) throws SagaException {
List<String> tags = new ArrayList<>(Arrays.asList("tag1", "tag2", "tag3"));
String text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed egestas orci eu mauris luctus consequat.";
Main _instance = new Main(text, tags);
}
} |
Note |
---|
Something important to keep in mine is that advanced returns the first vertex of a text block, and that text block can represent the entirety of the text, but many time will only represent a fraction of the text; so we need to keep calling advanced again and |