You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 13 Next »

This is a guide on how to process text using Saga's REST services.

This tutorial assumes:

  • The reader ability to create a project with Maven Framework support
  • The data Saga will use is manage through the Saga's user interface.
  • Java 11+ is installed in the machine

In this page:

Configure pom.xml

To use the following code you'll need the next dependencies:

Sample pom.xml section
<dependency>
   <groupId>com.fasterxml.jackson.core</groupId>
   <artifactId>jackson-databind</artifactId>
   <version>{jackson-version}</version>
</dependency>

Feel free to use your favorite JSON processing API.


This guide will include simple usage of REST services and the general documentation of this services can be found here.

Processing Text

The next code works assuming:

  1. There is a tag named "{component}" that include "wing" as part of its patterns.
  2. There is a tag named "{aircraft}" that includes "LAK-12" as part of its patterns.
  3. The "{aircraft}" tag confidence adjustment is 2.


ProcessText
import org.codehaus.jackson.JsonNode;
import org.codehaus.jackson.map.ObjectMapper;

import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;

public class ProcessText {

   public static void main(String[] args) {

      try {

         URL url = new URL("http://localhost:8080/_saga/processText");
         HttpURLConnection conn = (HttpURLConnection) url.openConnection();
         conn.setDoOutput(true);
         conn.setRequestMethod("POST");
         conn.setRequestProperty("Content-Type", "application/json");

         String input = "{" +
                        "\"q\":\"A WING FAILURE, RESULTING IN SUBSTANTIAL DAMAGE TO THE LAK-12 AIRCRAFT\"," +
                        "\"tags\":[\"aircraft\",\"component\"]," +
                        "\"splitRegex\": \"[\r|\n]+\"," +
                        "\"type\": \"text\"," +
                        "\"pretty\": true" +
                        "}";

         OutputStream os = conn.getOutputStream();
         os.write(input.getBytes());
         os.flush();

         ObjectMapper mapper = new ObjectMapper();

         JsonNode actualObj = mapper.readTree(new InputStreamReader(
               (conn.getInputStream())));

         if(actualObj != null){
            if (actualObj.get("_success").getBooleanValue()) {
               System.out.println("=================================================");
               System.out.println("=                    GRAPH                      =");
               System.out.println("=================================================\n\n");
               System.out.println(actualObj.get("data").get("graph").getTextValue());
               JsonNode nodeArray = actualObj.get("data").get("line");
               final String nodeTemplate = "%s (%.2f)[pos: %s]";
               List<String> nodeList = new ArrayList();
               if(nodeArray.isArray()){
                  nodeArray.forEach(jsonNode -> nodeList.add(String.format(nodeTemplate,
                        jsonNode.get("_item").getTextValue(),
                        jsonNode.get("confidence").getDoubleValue(),
                        jsonNode.get("character").getTextValue())));
               }
               System.out.println("=================================================");
               System.out.println("=           HIGHEST CONDIFIDENCE ROUTE          =");
               System.out.println("=================================================\n\n");
               System.out.println(nodeList.stream().collect(Collectors.joining(" -> ")));
            } else {
               System.out.println("Failure");
            }
         }
         conn.disconnect();
      } catch (MalformedURLException e) {
         e.printStackTrace();
      } catch (IOException e) {
         e.printStackTrace();
      }
   }
}
  • The fist step is to set up a connection to the REST service.

    • In this case the one we'll be using is "/processText".
  • The service uses POST verb to work, so make sure you use the right one and set the correct body payload.
  • The body payload structure is:
    • "q" - text to be processed.
      • "A WING FAILURE, RESULTING IN SUBSTANTIAL DAMAGE TO THE LAK-12 AIRCRAFT" contains "WING" and "LAK-12" and we expect them to be tagged as "component" and "aircraft" respectively.
    • "tags" - matching tags, already existing on Saga, in this cafe we are tagging "components" and "aircrafts".
    • "splitRegex" - regular expression used to split sentences into "textblocks", we are splitting by carriage return or new line.
    • "type" - in this example is "text" and it is used to show a text representation of the interpretation graph and the highest confidence route.
    • "pretty" - is used to get a nice human readable response.
  • Process the response into a JSON object.
  • Verify for the success of the operation.
  • Print the interpretation graph.
  • Process the JSON structure containing the highest confidence route to show a simple text line with the correct order, values of confidence and position of the token on the original text.

Choosing An Output Format

This is the JSON you can expect from the code:

Output
=================================================
=                    GRAPH                      =
=================================================


 V--------------------------------[A WING FAILURE, RESULTING IN SUBSTANTIAL DAMAGE TO THE LAK-12 AIRCRAFT]--------------------------------V 
 ^-[A]-V----[WING]-----V---[FAILURE,]----V-[RESULTING]-V-[IN]-V-[SUBSTANTIAL]-V-[DAMAGE]-V-[TO]-V-[THE]-V------[LAK-12]------V-[AIRCRAFT]-^ 
 ^-[a]-^----[wing]-----^---[failure,]----^-[resulting]-^-[in]-^-[substantial]-^-[damage]-^-[to]-^-[the]-^------[lak-12]------^-[aircraft]-^ 
       ^-[{component}]-^-[FAILURE]-V-[,]-^                                                              ^-[LAK]-V-[-]-V-[12]-^ 
                       ^-[failure]-^                                                                    ^-[lak]-^ 
                                                                                                        ^----[{aircraft}]----^ 

The first result from the code is the text-only representation of the Interpretation Graph, and this is from the "text" type set on the service parameters.  It comes as a single value within the "graph" field of the JSON response. 

Output
=================================================
=           HIGHEST CONDIFIDENCE ROUTE          =
=================================================


A (0.40)[pos: 0:1] -> WING (0.51)[pos: 2:6] -> FAILURE, (0.50)[pos: 7:15] -> RESULTING (0.50)[pos: 16:25] -> IN (0.40)[pos: 26:28] -> SUBSTANTIAL (0.50)[pos: 29:40] -> DAMAGE (0.50)[pos: 41:47] -> TO (0.40)[pos: 48:50] -> THE (0.40)[pos: 51:54] -> {aircraft} (1.00)[pos: 55:61] -> AIRCRAFT (0.50)[pos: 62:70]

The second result is a text representation of the highest confidence route, in this case is almost the same as the original text but, since we added extra importance to the "aircraft" tag you can see it is part of the route instead of the airplane name.  You can also access information like

  • "components" - A list of strings containing the parent components of the token.
  • "stage" - The source stage that generated the token.
  • "flags" - A list of flags assigned to the token.
  • "matching" - Original text reference with the character positions.

The "json" type parameter returns other than the highest confidence route, just as the "text" type but, also the list of semantic tags on the graph.  It would be something like this:

json type output
=================================================
=                 SEMANTIC TAGS                 =
=================================================


{component} (0.51)[pos: 2:6] -> {aircraft} (1.00)[pos: 55:61]


Only two semantic tags are returned since the matches were found once per tag.  You can access more information just as mentioned for the highest confidence route before, such as "components", "stage" and so on.

The "ux" type parameter will return a JSON structure with information useful for the Saga server application to show the interpretation graph, this is not really helpful unless you try to display it just as the application does.


  • No labels