Client API

POST /saga/api/client/process/generate

Creates a process unit with defined configuration to identify tags. Result data you get depend on the type you choose. Depending whether you choose expiration or not the process unit can be disposable or reusable and even persistent when restarting the server.

But, what is a Processing Unit anyway?

Before Saga starts processing text it needs to create the processing pipeline containing all different stages defined by the user. Each stage that could be a Processor or a Recognizer needs to be initialized and added to the pipeline. This pipeline creation process takes time therefore it needs to happen only once and be reused for all the rest of request in order to maximize performance.

A Processing Unit is basically an object in server's memory where the pipeline instance created with all the configuration defined is stored so it can be reused by subsequent requests.

Body Parameters

unit ( type=string | default=default | required ) - Name of the process unit

type ( type=string | default=json | optional ) - Type of execution to perform
- json - Return the lexItems found in the highest confidence route, any vertex with metadata, and the actual highestRoute including all the tokens
- matchExtraction - Return the lexItems matched by a semantic tag found in the highest confidence route, any vertex with metadata.
- analytics - Return the lexItems found in the highest confidence route, any vertex with metadata, the tokens_detail with every token in the route, the inputText and the outputText (Text parsed ready for modeling data)
- route - Return only the SEMANTIC_TAG tokens in the highestRoute
  
  More types can when requested

tags ( type=string array | required ) - List of tags to identify in the text
processor ( type=string | required ) - Process Id of the pipeline you want to work from. Can only be used if no tags where defined. Format is: PipelineName:StageId
enginePoolSize ( type=integer | default=10 | optional ) - Number of engines per process unit
createEngines ( type=boolean | default=false | optional ) - Should the engine be create at once, otherwise each engine will be generated as required
engineTimeout ( type=integer | default=30000 | optional ) - Time in milliseconds, for the engine to timeout
splitRegex ( type=string | default=[\n\s ]+ | optional ) - Regex which will split the text into more manageable blocks to be process by Saga
multiline ( type=string | default=true | optional ) - Should the splitRegex apply on multiple lines
maxCharsSizeToProcess ( type=integer | default=0 | optional ) - Maximum number of character to process, more than that and the data will be truncated. If the value is 0, no maximum is applied
exactTags ( type=boolean | default=false | optional ) - Return only the exact same tags specified in the field "tags"
ignoredTags ( type=string array | optional ) - List of tags to ignore from the regular process
includeFlags ( type=string array | optional ) - List of Flags to include in the result, regardless of other configuration
excludeFlags ( type=string array | default=TEXT_BLOCK | optional ) - List of flags to exclude from the results
includeMetadata ( type=boolean | default=false | optional ) - Include the metadata in the tags and vertices
combineRoutes ( type=boolean | default=false | optional ) - Should routes of the same confidence be combined into one route
includeTheseComponents ( type=string array | optional ) - List of specific components to add, any other component will be ignore
includeComponents ( type=boolean | default=true | optional ) - Include the components (tokens which compose the tag) of each tag matched
includeComponentMetadata ( type=boolean | default=false | optional ) - If components are included, does the metadata of each one needs to be added
expires ( type=json | default=false | optional ) - Boolean or JOSN
- If JSON - Can specify the amount of the time, in which it will expire. If refreshExpiration is activated, this is the amount to reset each time
  - amount ( type=integer | default=0 | optional ) - Amount of time to extent the expiration time
  - timeUnit ( type=string | default=Seconds | optional ) - Time unit in which the amount is expressed. All the ChronoUnit are valid (Nanos, Micros, Millis, Seconds, Minutes, Hours, ...)
- If true - If true, the process unit will expire immediately after the process has ended
refreshExpiration ( type=boolean | default=false | optional ) - Rerefresh expiration time every time the unit does a process
persistent ( type=boolean | default=false | optional ) - If True the process unit will be saved in the database and every time the server restarts the process unit will be loaded
- If persistent is true, expiration will be disabled
includeStats ( type=boolean | default=false | optional ) - If True the statistics of the engine and the pipeline are included in the response

Heavy Load

If your tags have a lot of patterns/entities and takes a while to load, you may want to check createEngines options, that way the process unit will create the engines from that start and have them ready when need.

Process Unit Generation Examples

Bare Minimum

Curl Request - Bare minimum

curl --location --request POST 'http://localhost:8080/saga/api/client/process/generate' \
--header 'Content-Type: application/json' \
--data-raw '{
    "tags": ["model", "family"]
}'

Response

{
    "unit": "default",
    "message": "Processing unit default created"
}

Imminent Expiration

Curl Request - Unit with imminent expiration

curl --location --request POST 'http://localhost:8080/saga/api/client/process/generate' \
--header 'Content-Type: application/json' \
--data-raw '{
    "unit": "this_unit",
    "tags": ["model", "family"],
    "type": "matchExtraction",
    "combineRoutes": false,
    "createEngines": true,
    "enginePoolSize": 1,
    "engineTimeout": 30000,
    "excludeFlags": ["TEXT_BLOCK"],
    "includeFlags": [],
    "exactTags": false,
    "includeMetadata": true,
    "includeComponents": true,
    "includeTheseComponents": ["{engine}"],
    "includeComponentMetadata": true,
    "maxCharsSizeToProcess": 0,
    "multiline": true,
    "splitRegex": "[\r\n]+",
    "refreshExpiration": true,
    "expires": true
}'

Response

{
    "unit": "this_unit",
    "message": "Processing unit this_unit created"
}

With Expiration Span

Curl Request - Unit with expiration

curl --location --request POST 'http://localhost:8080/saga/api/client/process/generate' \
--header 'Content-Type: application/json' \
--data-raw '{
    "unit": "this_unit",
    "tags": ["model", "family"],
    "type": "matchExtraction",
    "combineRoutes": false,
    "createEngines": true,
    "enginePoolSize": 1,
    "engineTimeout": 30000,
    "excludeFlags": ["TEXT_BLOCK"],
    "includeFlags": [],
    "exactTags": false,
    "includeMetadata": true,
    "includeComponents": true,
    "includeTheseComponents": ["{engine}"],
    "includeComponentMetadata": true,
    "maxCharsSizeToProcess": 0,
    "multiline": true,
    "splitRegex": "[\r\n]+",
    "refreshExpiration": true,
    "expires": {
        "amount": 10,
        "timeUnit": "minutes"
    }
}'

Response

{
    "unit": "this_unit",
    "message": "Processing unit this_unit created"
}

No Expiration or Volatile

Curl Request - No Expiration but volatile

curl --location --request POST 'http://localhost:8080/saga/api/client/process/generate' \
--header 'Content-Type: application/json' \
--data-raw '{
    "unit": "this_unit",
    "tags": ["model", "family"],
    "type": "matchExtraction",
    "combineRoutes": false,
    "createEngines": true,
    "enginePoolSize": 1,
    "engineTimeout": 30000,
    "excludeFlags": ["TEXT_BLOCK"],
    "includeFlags": [],
    "exactTags": false,
    "includeMetadata": true,
    "includeComponents": true,
    "includeTheseComponents": ["{engine}"],
    "includeComponentMetadata": true,
    "maxCharsSizeToProcess": 0,
    "multiline": true,
    "splitRegex": "[\r\n]+"
}'

Response

{
    "unit": "this_unit",
    "message": "Processing unit this_unit created"
}

Persistent

Curl Request - Unit persistent

curl --location --request POST 'http://localhost:8080/saga/api/client/process/generate' \
--header 'Content-Type: application/json' \
--data-raw '{
    "unit": "this_unit",
    "tags": ["model", "family"],
    "type": "matchExtraction",
    "combineRoutes": false,
    "createEngines": true,
    "enginePoolSize": 1,
    "engineTimeout": 30000,
    "excludeFlags": ["TEXT_BLOCK"],
    "includeFlags": [],
    "exactTags": false,
    "includeMetadata": true,
    "includeComponents": true,
    "includeTheseComponents": ["{engine}"],
    "includeComponentMetadata": true,
    "maxCharsSizeToProcess": 0,
    "multiline": true,
    "splitRegex": "[\r\n]+",
    "persistent": true
}'

Response

{
    "unit": "this_unit",
    "message": "Processing unit this_unit created"
}

GET /saga/api/client/process/units

Get all the valid process units, and see the name, type, expiration time or is it persistent

Curl Request

curl --location --request GET 'http://localhost:8080/saga/api/client/process/units'

Response

[
    {
        "expires": "n/a",
        "name": "default",
        "type": "json",
        "persistent": false
    },
    {
        "expires": "2022-07-21T20:52:11.471-0600",
        "name": "test_unit_expiration",
        "type": "matchExtraction",
        "persistent": false
    },
    {
        "expires": "n/a",
        "name": "test_unit_persitent",
        "type": "json",
        "persistent": true
    },
    {
        "expires": "n/a",
        "name": "test_unit_no_expiration",
        "type": "matchExtraction",
        "persistent": false
    }
]

GET /saga/api/client/process/units/:unit

Get all the configuration of a specific process unit, including all the defaults values you may have omitted

URL Parameters

:unit ( type=string | default=default | required ) - Name of the process unit

Curl Request - Get Details on a bare minimum unit

curl --location --request GET 'http://localhost:8080/saga/api/client/process/units/default'

Response - Showing all the default values

{
    "includeComponentMetadata": false,
    "expires": false,
    "excludeFlags": [
        "TEXT_BLOCK"
    ],
    "unitName": "default",
    "includeTheseComponents": [],
    "enginePoolSize": 1,
    "type": "json",
    "splitRegex": "[\r\n]+",
    "tags": [
        "model",
        "family"
    ],
    "engineTimeout": 30000,
    "combineRoutes": false,
    "includeMetadata": false,
    "includeComponents": true,
    "maxCharsSizeToProcess": 0,
    "multiline": true,
    "includeFlags": [
        "SEMANTIC_TAG"
    ],
    "refreshExpiration": false,
    "createEngines": false,
    "exactTags": false,
    "persistent": false
}

DELETE /saga/api/client/process/units/:unit

Delete a specific process unit, if the process is currently you will get a error message "Unable to delete Processing unit :unit, since it's processing"

URL Parameters

unit ( type=string | default=default | required ) - Name of the process unit

Curl Request

curl --location --request DELETE 'http://localhost:8080/saga/api/client/process/units/test_unit'

Response

{
    "unit": "test_unit",
    "message": "Processing unit test_unit deleted"
}

POST /saga/api/client/process/units/:unit/reload

If any of the resources has being updated, you can force a reload of the unit using this call (although the recharge will be done automatically eventually)

URL Parameters

unit ( type=string | default=default | required ) - Name of the process unit.

Curl Request

curl --location --request DELETE 'http://localhost:8080/saga/api/client/process/units/test_unit'

Response

{
    "unit": "test_unit",
    "message": "Processing unit test_unit reloaded"
}

POST /saga/api/client/process/text

Process a text using a process unit, for this call we have 2 options put the raw text in the body, or upload a file (.txt only) with the text in it.

You can use the parameter 'createProcessingUnit' set to true in the query string in order to generate the processing unit before processing the text.

This is useful if you want to only send a single request instead of 2. Just remember that you'll need to send all parameters needed when creation a Processing Unit described previously

Body Parameters

unit ( type=string | default=default | required ) - Name of the process unit
doc ( type=string | required ) - Path to File

Curl Request

curl --location --request POST 'http://localhost:8080/saga/api/client/process/text' \
--header 'Content-Type: application/json' \
--data-raw '{
    "unit": "default",
    "doc": "The Boeing 737 released the new model MIG29M with an engine DB 601"
}'

Form Parameters

unit ( type=string | default=default | required ) - Name of the process unit
docs ( type=string | required ) - Path to File, or multiple files

Curl Request

curl --location --request POST 'http://localhost:8080/saga/api/client/process/batch' \
--form 'unit="default"' \
--form 'docs=@"/C:/Users/user/Desktop/testFile.txt"'

The responses for this call will change according with the type of unit used

POST /saga/api/client/process/batch

Same as the endpoint text, but it will process a batch of texts either in sequence or in parallel. Same as text you can choose to put the raw texts in an array, or upload a set of files (.txt only)

Body Parameters

unit ( type=string | default=default | required ) - Name of the process unit
parallel ( type=boolean | default=false | optional ) - Only applicable when parameter docs is set. Process the documents in parallel

Curl Request

curl --location --request GET 'http://localhost:8080/saga/api/client/process/batch' \
--data-raw '{
    "unit": "default",
    "docs": ["The Boeing 737 released the new model MIG29M with an engine DB 601",
    "The Boeing 737 released the new model MIG29M with an engine DB 601"]
}'

Form Parameters

unit ( type=string | default=default | required ) - Name of the process unit
parallel ( type=boolean | default=false | optional ) - Only applicable when parameter docs is set. Process the documents in parallel

Curl Request

curl --location --request POST 'http://localhost:8080/saga/api/client/process/batch' \
--form 'unit="default"' \
--form 'parallel="true"' \
--form 'docs=@"/C:/Users/user/Desktop/testFile1.txt"' \
--form 'docs=@"/C:/Users/user/Desktop/testFile2.txt"' \
--form 'docs=@"/C:/Users/user/Desktop/testFile3.txt"'

The responses for this call will change according with the type of unit used

Page tree

Client API

POST /saga/api/client/process/generate

Body Parameters

Process Unit Generation Examples

Bare Minimum

Imminent Expiration

With Expiration Span

No Expiration or Volatile

Persistent

GET /saga/api/client/process/units

GET /saga/api/client/process/units/:unit

URL Parameters

DELETE /saga/api/client/process/units/:unit

URL Parameters

POST /saga/api/client/process/units/:unit/reload

URL Parameters

POST /saga/api/client/process/text

Body Parameters

Form Parameters

POST /saga/api/client/process/batch

Body Parameters

Form Parameters