Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This is a guide on how to process text using Saga's REST services.


Note
titlePrerquisitesPrerequisites

This tutorial assumes:

  • The reader has the ability to create a project with the Maven Framework support.

  • The data that Saga will use is managed through the Saga's user interface.

  • Java 11+ is installed in the machine. 17 if using SAGA 1.3.3/1.3.4

Panel
titleOn this page

Table of Contents

Configure pom.xml

You'll need these dependencies to use the subsquent code:

Code Block
languagexml
themeRDark
titleSample pom.xml section
<dependency>
   <groupId>com.fasterxml.jackson.core</groupId>
   <artifactId>jackson-databind</artifactId>
   <version>{jackson-version}</version>
</dependency>
Info

Feel free to use your favorite JSON processing API.


This guide will include simple usage of REST services.  General documentation of these services can be found here.

Processing Text

The following code works assuming:

  • There is a tag named "{component}" that includes "wing" as part of its patterns.
  • There is a tag named "{aircraft}" that includes "LAK-12" as part of its patterns.
  • The "{aircraft}" tag confidence adjustment is 2.


Code Block
languagejava
themeFadeToGrey
titleProcessText
linenumberstrue
import org.codehaus.jackson.JsonNode;
import org.codehaus.jackson.map.ObjectMapper;

import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;

public class ProcessText {

   public static void main(String[] args) {

      try {

         URL url = new URL("http://localhost:8080/saga/_saga/processText");
         HttpURLConnection conn = (HttpURLConnection) url.openConnection();
         conn.setDoOutput(true);
         conn.setRequestMethod("POST");
         conn.setRequestProperty("Content-Type", "application/json");

         String input = "{" +
                        "\"q\":\"A WING FAILURE, RESULTING IN SUBSTANTIAL DAMAGE TO THE LAK-12 AIRCRAFT\"," +
                        "\"tags\":[\"aircraft\",\"component\"]," +
                        "\"splitRegex\": \"[\r|\n]+\"," +
                        "\"type\": \"text\"," +
                        "\"pretty\": true" +
                        "}";

         OutputStream os = conn.getOutputStream();
         os.write(input.getBytes());
         os.flush();

         ObjectMapper mapper = new ObjectMapper();

         JsonNode actualObj = mapper.readTree(new InputStreamReader(
               (conn.getInputStream())));

         if(actualObj != null){
            if (actualObj.get("_success").getBooleanValue()) {
               System.out.println("=================================================");
               System.out.println("=                    GRAPH                      =");
               System.out.println("=================================================\n\n");
               System.out.println(actualObj.get("data").get("graph").getTextValue());
               JsonNode nodeArray = actualObj.get("data").get("line");
               final String nodeTemplate = "%s (%.2f)[pos: %s]";
               List<String> nodeList = new ArrayList();
               if(nodeArray.isArray()){
                  nodeArray.forEach(jsonNode -> nodeList.add(String.format(nodeTemplate,
                        jsonNode.get("_item").getTextValue(),
                        jsonNode.get("confidence").getDoubleValue(),
                        jsonNode.get("character").getTextValue())));
               }
               System.out.println("=================================================");
               System.out.println("=           HIGHEST CONDIFIDENCE ROUTE          =");
               System.out.println("=================================================\n\n");
               System.out.println(nodeList.stream().collect(Collectors.joining(" -> ")));
            } else {
               System.out.println("Failure");
            }
         }
         conn.disconnect();
      } catch (MalformedURLException e) {
         e.printStackTrace();
      } catch (IOException e) {
         e.printStackTrace();
      }
   }
}
  1. The fist step is to set up a connection to the REST service. 
    In this case, use "/processText".

  2. The service uses the POST verb to work so make sure you use the right one and set the correct body payload.
    The body payload structure is:  
    • "q" - text to be processed.
      • "A WING FAILURE, RESULTING IN SUBSTANTIAL DAMAGE TO THE LAK-12 AIRCRAFT" contains "WING" and "LAK-12" and we expect them to be tagged as "component" and "aircraft" respectively.
    • "tags" - matching tags, already existing on Saga, in this cafe case we are tagging "components" and "aircrafts".
    • "splitRegex" - regular expression used to split sentences into "textblocks", we are splitting by carriage return or new line.
    • "type" - in this example is "text" and it is used to show a text representation of the interpretation graph and the highest confidence route.
    • "pretty" - is used to get a nice human readable response.
  3. Process the response into a JSON object.
  4. Verify for the success of the operation.
  5. Print the interpretation graph.
  6. Process the JSON structure containing the highest confidence route to show a simple text line with the correct order, values of confidence, and position of the token on the original text.

Choosing an Output Format

This is the JSON you can expect from the code:

Code Block
languagetext
themeRDark
titleOutput
=================================================
=                    GRAPH                      =
=================================================


 V--------------------------------[A WING FAILURE, RESULTING IN SUBSTANTIAL DAMAGE TO THE LAK-12 AIRCRAFT]--------------------------------V 
 ^-[A]-V----[WING]-----V---[FAILURE,]----V-[RESULTING]-V-[IN]-V-[SUBSTANTIAL]-V-[DAMAGE]-V-[TO]-V-[THE]-V------[LAK-12]------V-[AIRCRAFT]-^ 
 ^-[a]-^----[wing]-----^---[failure,]----^-[resulting]-^-[in]-^-[substantial]-^-[damage]-^-[to]-^-[the]-^------[lak-12]------^-[aircraft]-^ 
       ^-[{component}]-^-[FAILURE]-V-[,]-^                                                              ^-[LAK]-V-[-]-V-[12]-^ 
                       ^-[failure]-^                                                                    ^-[lak]-^ 
                                                                                                        ^----[{aircraft}]----^ 

The first result from the code is the text-only representation of the Interpretation Graph, and this is from the "text" type set on the service parameters.  It comes as a single value within the "graph" field of the JSON response. 

Code Block
languagetext
themeRDark
titleOutput
=================================================
=           HIGHEST CONDIFIDENCE ROUTE          =
=================================================


A (0.40)[pos: 0:1] -> WING (0.51)[pos: 2:6] -> FAILURE, (0.50)[pos: 7:15] -> RESULTING (0.50)[pos: 16:25] -> IN (0.40)[pos: 26:28] -> SUBSTANTIAL (0.50)[pos: 29:40] -> DAMAGE (0.50)[pos: 41:47] -> TO (0.40)[pos: 48:50] -> THE (0.40)[pos: 51:54] -> {aircraft} (1.00)[pos: 55:61] -> AIRCRAFT (0.50)[pos: 62:70]


The second result is a text representation of the highest confidence route.  In this case, it is almost the same as the original text. However, since we added extra importance to the "aircraft" tag, you can see that it is part of the route instead of the airplane name. 

You can also access information like

  • "components" - A list of strings containing the parent components of the token.
  • "stage" - The source stage that generated the token.
  • "flags" - A list of flags assigned to the token.
  • "matching" - Original text reference with the character positions.


The "json" type parameter returns other than the highest confidence route, just like the "text" type.  However, it also returns a list of semantic tags on the graph. 

It would be something like this:

Code Block
languagetext
themeRDark
titlejson type output
linenumberstrue
=================================================
=                 SEMANTIC TAGS                 =
=================================================


{component} (0.51)[pos: 2:6] -> {aircraft} (1.00)[pos: 55:61]


Only two semantic tags are returned since the matches were found once per tag.  You can access more information for the highest confidence route, such as "components", "stage" and so on.

The "ux" type parameter will return a JSON structure with information useful for the Saga Server application to show the interpretation graph.  This is not really helpful unless you try to display it just as the application does.

Content by Label
showLabelsfalse
max5
spacessaga131
showSpacefalse
sortmodified
reversetrue
typepage
cqllabel in ("embedded","saga","library","app") and type = "page" and space = "saga131"
labelssaga library app embedded