This is a guide on how to process text using Saga's REST services.


Prerequisites

This tutorial assumes:

  • The reader has the ability to create a project with the Maven Framework support.

  • The data that Saga will use is managed through the Saga's user interface.

  • Java 11+ is installed in the machine. 17 if using SAGA 1.3.3/1.3.4

On this page

Configure pom.xml

You'll need these dependencies to use the subsquent code:

Sample pom.xml section
<dependency>
   <groupId>com.fasterxml.jackson.core</groupId>
   <artifactId>jackson-databind</artifactId>
   <version>{jackson-version}</version>
</dependency>

Feel free to use your favorite JSON processing API.


This guide will include simple usage of REST services.  General documentation of these services can be found here.

Processing Text

The following code works assuming:

  • There is a tag named "{component}" that includes "wing" as part of its patterns.
  • There is a tag named "{aircraft}" that includes "LAK-12" as part of its patterns.
  • The "{aircraft}" tag confidence adjustment is 2.


ProcessText
import org.codehaus.jackson.JsonNode;
import org.codehaus.jackson.map.ObjectMapper;

import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;

public class ProcessText {

   public static void main(String[] args) {

      try {

         URL url = new URL("http://localhost:8080/saga/_saga/processText");
         HttpURLConnection conn = (HttpURLConnection) url.openConnection();
         conn.setDoOutput(true);
         conn.setRequestMethod("POST");
         conn.setRequestProperty("Content-Type", "application/json");

         String input = "{" +
                        "\"q\":\"A WING FAILURE, RESULTING IN SUBSTANTIAL DAMAGE TO THE LAK-12 AIRCRAFT\"," +
                        "\"tags\":[\"aircraft\",\"component\"]," +
                        "\"splitRegex\": \"[\r\n]+\"," +
                        "\"type\": \"text\"," +
                        "\"pretty\": true" +
                        "}";

         OutputStream os = conn.getOutputStream();
         os.write(input.getBytes());
         os.flush();

         ObjectMapper mapper = new ObjectMapper();

         JsonNode actualObj = mapper.readTree(new InputStreamReader(
               (conn.getInputStream())));

         if(actualObj != null){
            if (actualObj.get("_success").getBooleanValue()) {
               System.out.println("=================================================");
               System.out.println("=                    GRAPH                      =");
               System.out.println("=================================================\n\n");
               System.out.println(actualObj.get("data").get("graph").getTextValue());
               JsonNode nodeArray = actualObj.get("data").get("line");
               final String nodeTemplate = "%s (%.2f)[pos: %s]";
               List<String> nodeList = new ArrayList();
               if(nodeArray.isArray()){
                  nodeArray.forEach(jsonNode -> nodeList.add(String.format(nodeTemplate,
                        jsonNode.get("_item").getTextValue(),
                        jsonNode.get("confidence").getDoubleValue(),
                        jsonNode.get("character").getTextValue())));
               }
               System.out.println("=================================================");
               System.out.println("=           HIGHEST CONDIFIDENCE ROUTE          =");
               System.out.println("=================================================\n\n");
               System.out.println(nodeList.stream().collect(Collectors.joining(" -> ")));
            } else {
               System.out.println("Failure");
            }
         }
         conn.disconnect();
      } catch (MalformedURLException e) {
         e.printStackTrace();
      } catch (IOException e) {
         e.printStackTrace();
      }
   }
}
  1. The fist step is to set up a connection to the REST service. 
    In this case, use "/processText".

  2. The service uses the POST verb to work so make sure you use the right one and set the correct body payload.
    The body payload structure is:  
    • "q" - text to be processed.
      • "A WING FAILURE, RESULTING IN SUBSTANTIAL DAMAGE TO THE LAK-12 AIRCRAFT" contains "WING" and "LAK-12" and we expect them to be tagged as "component" and "aircraft" respectively.
    • "tags" - matching tags, already existing on Saga, in this case we are tagging "components" and "aircrafts".
    • "splitRegex" - regular expression used to split sentences into "textblocks", we are splitting by carriage return or new line.
    • "type" - in this example is "text" and it is used to show a text representation of the interpretation graph and the highest confidence route.
    • "pretty" - is used to get a nice human readable response.
  3. Process the response into a JSON object.
  4. Verify for the success of the operation.
  5. Print the interpretation graph.
  6. Process the JSON structure containing the highest confidence route to show a simple text line with the correct order, values of confidence, and position of the token on the original text.

Choosing an Output Format

This is the JSON you can expect from the code:

Output
=================================================
=                    GRAPH                      =
=================================================


 V--------------------------------[A WING FAILURE, RESULTING IN SUBSTANTIAL DAMAGE TO THE LAK-12 AIRCRAFT]--------------------------------V 
 ^-[A]-V----[WING]-----V---[FAILURE,]----V-[RESULTING]-V-[IN]-V-[SUBSTANTIAL]-V-[DAMAGE]-V-[TO]-V-[THE]-V------[LAK-12]------V-[AIRCRAFT]-^ 
 ^-[a]-^----[wing]-----^---[failure,]----^-[resulting]-^-[in]-^-[substantial]-^-[damage]-^-[to]-^-[the]-^------[lak-12]------^-[aircraft]-^ 
       ^-[{component}]-^-[FAILURE]-V-[,]-^                                                              ^-[LAK]-V-[-]-V-[12]-^ 
                       ^-[failure]-^                                                                    ^-[lak]-^ 
                                                                                                        ^----[{aircraft}]----^ 

The first result from the code is the text-only representation of the Interpretation Graph, and this is from the "text" type set on the service parameters.  It comes as a single value within the "graph" field of the JSON response. 

Output
=================================================
=           HIGHEST CONDIFIDENCE ROUTE          =
=================================================


A (0.40)[pos: 0:1] -> WING (0.51)[pos: 2:6] -> FAILURE, (0.50)[pos: 7:15] -> RESULTING (0.50)[pos: 16:25] -> IN (0.40)[pos: 26:28] -> SUBSTANTIAL (0.50)[pos: 29:40] -> DAMAGE (0.50)[pos: 41:47] -> TO (0.40)[pos: 48:50] -> THE (0.40)[pos: 51:54] -> {aircraft} (1.00)[pos: 55:61] -> AIRCRAFT (0.50)[pos: 62:70]


The second result is a text representation of the highest confidence route.  In this case, it is almost the same as the original text. However, since we added extra importance to the "aircraft" tag, you can see that it is part of the route instead of the airplane name. 

You can also access information like

  • "components" - A list of strings containing the parent components of the token.
  • "stage" - The source stage that generated the token.
  • "flags" - A list of flags assigned to the token.
  • "matching" - Original text reference with the character positions.


The "json" type parameter returns other than the highest confidence route, just like the "text" type.  However, it also returns a list of semantic tags on the graph. 

It would be something like this:

json type output
=================================================
=                 SEMANTIC TAGS                 =
=================================================


{component} (0.51)[pos: 2:6] -> {aircraft} (1.00)[pos: 55:61]


Only two semantic tags are returned since the matches were found once per tag.  You can access more information for the highest confidence route, such as "components", "stage" and so on.

The "ux" type parameter will return a JSON structure with information useful for the Saga Server application to show the interpretation graph.  This is not really helpful unless you try to display it just as the application does.