POST /saga/api/client/process/generate

Creates a process unit with defined configuration to identify tags.  Result data you get depend on the type you choose.  Depending whether you choose expiration or not the process unit can be disposable or reusable and even persistent when restarting the server.

But, what is a Processing Unit anyway?

Before Saga starts processing text it needs to create the processing pipeline containing all different stages defined by the user.  Each stage that could be a Processor or a Recognizer needs to be initialized and added to the pipeline. This pipeline creation process takes time therefore it needs to happen only once and be reused for all the rest of request in order to maximize performance.

 A Processing Unit is basically an object in server's memory where the pipeline instance created with all the configuration defined is stored so it can be reused by subsequent requests.


Body Parameters

  • unit ( type=string | default=default | required ) - Name of the process unit

  • type ( type=string | default=json | optional ) - Type of execution to perform
    • json - Return the lexItems found in the highest confidence route, any vertex with metadata, and the actual highestRoute including all the tokens
    • matchExtraction - Return the lexItems matched by a semantic tag found in the highest confidence route, any vertex with metadata.
    • analytics - Return the lexItems found in the highest confidence route, any vertex with metadata, the tokens_detail with every token in the route, the inputText and the outputText (Text parsed ready for modeling data)
    • route - Return only the SEMANTIC_TAG tokens in the highestRoute 

      More types can when requested

  • tags ( type=string array | required ) - List of tags to identify in the text
  • processor ( type=string | required ) - Process Id of the pipeline you want to work from. Can only be used if no tags where defined. Format is: PipelineName:StageId
  • enginePoolSize ( type=integer | default=10 | optional ) - Number of engines per process unit
  • createEngines ( type=boolean | default=false | optional ) - Should the engine be create at once, otherwise each engine will be generated as required
  • engineTimeout ( type=integer | default=30000 | optional ) - Time in milliseconds, for the engine to timeout
  • splitRegex ( type=string | default=[\n\s ]+ | optional ) - Regex which will split the text into more manageable blocks to be process by Saga
  • multiline ( type=string | default=true | optional ) - Should the splitRegex apply on multiple lines
  • maxCharsSizeToProcess ( type=integer | default=0 | optional ) - Maximum number of character to process, more than that and the data will be truncated. If the value is 0, no maximum is applied
  • exactTags ( type=boolean | default=false | optional ) - Return only the exact same tags specified in the field "tags"
  • ignoredTags ( type=string array | optional ) - List of tags to ignore from the regular process
  • includeFlags ( type=string array | optional ) - List of Flags to include in the result, regardless of other configuration
  • excludeFlags ( type=string array | default=TEXT_BLOCK | optional ) - List of flags to exclude from the results
  • includeMetadata ( type=boolean | default=false | optional ) - Include the metadata in the tags and vertices
  • combineRoutes ( type=boolean | default=false | optional ) - Should routes of the same confidence be combined into one route
  • includeTheseComponents ( type=string array | optional ) - List of specific components to add, any other component will be ignore
  • includeComponents ( type=boolean | default=true | optional ) - Include the components (tokens which compose the tag) of each tag matched
  • includeComponentMetadata ( type=boolean | default=false | optional ) - If components are included, does the metadata of each one needs to be added
  • expires ( type=json | default=false | optional ) - Boolean or JOSN
    • If JSON - Can specify the amount of the time, in which it will expire. If refreshExpiration is activated, this is the amount to reset each time
      • amount ( type=integer | default=0 | optional ) - Amount of time to extent the expiration time
      • timeUnit ( type=string | default=Seconds | optional ) - Time unit in which the amount is expressed. All the ChronoUnit are valid (Nanos, Micros, Millis, Seconds, Minutes, Hours, ...)
    • If true -  If true, the process unit will expire immediately after the process has ended
  • refreshExpiration ( type=boolean | default=false | optional ) - Rerefresh expiration time every time the unit does a process
  • persistent ( type=boolean | default=false | optional ) - If True the process unit will be saved in the database and every time the server restarts the process unit will be loaded
    • If persistent is true, expiration will be disabled

  • includeStats ( type=boolean | default=false | optional ) - If True the statistics of the engine and the pipeline are included in the response

Heavy Load

If your tags have a lot of patterns/entities and takes a while to load, you may want to check createEngines options, that way the process unit will create the engines from that start and have them ready when need.

Process Unit Generation Examples

Bare Minimum

Curl Request - Bare minimum
curl --location --request POST 'http://localhost:8080/saga/api/client/process/generate' \
--header 'Content-Type: application/json' \
--data-raw '{
    "tags": ["model", "family"]
}'
Response
{
    "unit": "default",
    "message": "Processing unit default created"
}

Imminent Expiration

Curl Request - Unit with imminent expiration
curl --location --request POST 'http://localhost:8080/saga/api/client/process/generate' \
--header 'Content-Type: application/json' \
--data-raw '{
    "unit": "this_unit",
    "tags": ["model", "family"],
    "type": "matchExtraction",
    "combineRoutes": false,
    "createEngines": true,
    "enginePoolSize": 1,
    "engineTimeout": 30000,
    "excludeFlags": ["TEXT_BLOCK"],
    "includeFlags": [],
    "exactTags": false,
    "includeMetadata": true,
    "includeComponents": true,
    "includeTheseComponents": ["{engine}"],
    "includeComponentMetadata": true,
    "maxCharsSizeToProcess": 0,
    "multiline": true,
    "splitRegex": "[\r\n]+",
    "refreshExpiration": true,
    "expires": true
}'
Response
{
    "unit": "this_unit",
    "message": "Processing unit this_unit created"
}

With Expiration Span

Curl Request - Unit with expiration
curl --location --request POST 'http://localhost:8080/saga/api/client/process/generate' \
--header 'Content-Type: application/json' \
--data-raw '{
    "unit": "this_unit",
    "tags": ["model", "family"],
    "type": "matchExtraction",
    "combineRoutes": false,
    "createEngines": true,
    "enginePoolSize": 1,
    "engineTimeout": 30000,
    "excludeFlags": ["TEXT_BLOCK"],
    "includeFlags": [],
    "exactTags": false,
    "includeMetadata": true,
    "includeComponents": true,
    "includeTheseComponents": ["{engine}"],
    "includeComponentMetadata": true,
    "maxCharsSizeToProcess": 0,
    "multiline": true,
    "splitRegex": "[\r\n]+",
    "refreshExpiration": true,
    "expires": {
        "amount": 10,
        "timeUnit": "minutes"
    }
}'
Response
{
    "unit": "this_unit",
    "message": "Processing unit this_unit created"
}

No Expiration or Volatile

Curl Request - No Expiration but volatile
curl --location --request POST 'http://localhost:8080/saga/api/client/process/generate' \
--header 'Content-Type: application/json' \
--data-raw '{
    "unit": "this_unit",
    "tags": ["model", "family"],
    "type": "matchExtraction",
    "combineRoutes": false,
    "createEngines": true,
    "enginePoolSize": 1,
    "engineTimeout": 30000,
    "excludeFlags": ["TEXT_BLOCK"],
    "includeFlags": [],
    "exactTags": false,
    "includeMetadata": true,
    "includeComponents": true,
    "includeTheseComponents": ["{engine}"],
    "includeComponentMetadata": true,
    "maxCharsSizeToProcess": 0,
    "multiline": true,
    "splitRegex": "[\r\n]+"
}'
Response
{
    "unit": "this_unit",
    "message": "Processing unit this_unit created"
}

Persistent

Curl Request - Unit persistent
curl --location --request POST 'http://localhost:8080/saga/api/client/process/generate' \
--header 'Content-Type: application/json' \
--data-raw '{
    "unit": "this_unit",
    "tags": ["model", "family"],
    "type": "matchExtraction",
    "combineRoutes": false,
    "createEngines": true,
    "enginePoolSize": 1,
    "engineTimeout": 30000,
    "excludeFlags": ["TEXT_BLOCK"],
    "includeFlags": [],
    "exactTags": false,
    "includeMetadata": true,
    "includeComponents": true,
    "includeTheseComponents": ["{engine}"],
    "includeComponentMetadata": true,
    "maxCharsSizeToProcess": 0,
    "multiline": true,
    "splitRegex": "[\r\n]+",
    "persistent": true
}'
Response
{
    "unit": "this_unit",
    "message": "Processing unit this_unit created"
}


GET /saga/api/client/process/units

Get all the valid process units, and see the name, type, expiration time or is it persistent

Curl Request
curl --location --request GET 'http://localhost:8080/saga/api/client/process/units'
Response
[
    {
        "expires": "n/a",
        "name": "default",
        "type": "json",
        "persistent": false
    },
    {
        "expires": "2022-07-21T20:52:11.471-0600",
        "name": "test_unit_expiration",
        "type": "matchExtraction",
        "persistent": false
    },
    {
        "expires": "n/a",
        "name": "test_unit_persitent",
        "type": "json",
        "persistent": true
    },
    {
        "expires": "n/a",
        "name": "test_unit_no_expiration",
        "type": "matchExtraction",
        "persistent": false
    }
]


GET /saga/api/client/process/units/:unit

Get all the configuration of a specific process unit, including all the defaults values you may have omitted

URL Parameters

  • :unit ( type=string | default=default | required ) - Name of the process unit


Curl Request - Get Details on a bare minimum unit
curl --location --request GET 'http://localhost:8080/saga/api/client/process/units/default'
Response - Showing all the default values
{
    "includeComponentMetadata": false,
    "expires": false,
    "excludeFlags": [
        "TEXT_BLOCK"
    ],
    "unitName": "default",
    "includeTheseComponents": [],
    "enginePoolSize": 1,
    "type": "json",
    "splitRegex": "[\r\n]+",
    "tags": [
        "model",
        "family"
    ],
    "engineTimeout": 30000,
    "combineRoutes": false,
    "includeMetadata": false,
    "includeComponents": true,
    "maxCharsSizeToProcess": 0,
    "multiline": true,
    "includeFlags": [
        "SEMANTIC_TAG"
    ],
    "refreshExpiration": false,
    "createEngines": false,
    "exactTags": false,
    "persistent": false
}


DELETE /saga/api/client/process/units/:unit

Delete a specific process unit, if the process is currently you will get a error message "Unable to delete Processing unit :unit, since it's processing"

URL Parameters 

  • unit ( type=string | default=default | required ) - Name of the process unit


Curl Request
curl --location --request DELETE 'http://localhost:8080/saga/api/client/process/units/test_unit'
Response
{
    "unit": "test_unit",
    "message": "Processing unit test_unit deleted"
}


POST /saga/api/client/process/units/:unit/reload

If any of the resources has being updated, you can force a reload of the unit using this call (although the recharge will be done automatically eventually)

URL Parameters

  • unit ( type=string | default=default | required ) - Name of the process unit.


Curl Request
curl --location --request DELETE 'http://localhost:8080/saga/api/client/process/units/test_unit'
Response
{
    "unit": "test_unit",
    "message": "Processing unit test_unit reloaded"
}


POST /saga/api/client/process/text

Process a text using a process unit, for this call we have 2 options put the raw text in the body, or upload a file (.txt only) with the text in it.


You can use the parameter 'createProcessingUnit' set to true in the query string in order to generate the processing unit before processing the text.

This is useful if you want to only send a single request instead of 2.  Just remember that you'll need to send all parameters needed when creation a Processing Unit described previously


Body Parameters

  • unit ( type=string | default=default | required ) - Name of the process unit
  • doc ( type=string | required ) - Path to File


Curl Request
curl --location --request POST 'http://localhost:8080/saga/api/client/process/text' \
--header 'Content-Type: application/json' \
--data-raw '{
    "unit": "default",
    "doc": "The Boeing 737 released the new model MIG29M with an engine DB 601"
}'

Form Parameters

  • unit ( type=string | default=default | required ) - Name of the process unit
  • docs ( type=string | required ) - Path to File, or multiple files


Curl Request
curl --location --request POST 'http://localhost:8080/saga/api/client/process/batch' \
--form 'unit="default"' \
--form 'docs=@"/C:/Users/user/Desktop/testFile.txt"' 

The responses for this call will change according with the type of unit used




POST /saga/api/client/process/batch

Same as the endpoint text, but it will process a batch of texts  either in sequence or in parallel. Same as text you can choose to put the raw texts in an array, or upload a set of files (.txt only)

Body Parameters

  • unit ( type=string | default=default | required ) - Name of the process unit
  • parallel ( type=boolean | default=false | optional ) - Only applicable when parameter docs is set. Process the documents in parallel


Curl Request
curl --location --request GET 'http://localhost:8080/saga/api/client/process/batch' \
--data-raw '{
    "unit": "default",
    "docs": ["The Boeing 737 released the new model MIG29M with an engine DB 601",
    "The Boeing 737 released the new model MIG29M with an engine DB 601"]
}'

Form Parameters

  • unit ( type=string | default=default | required ) - Name of the process unit
  • parallel ( type=boolean | default=false | optional ) - Only applicable when parameter docs is set. Process the documents in parallel


Curl Request
curl --location --request POST 'http://localhost:8080/saga/api/client/process/batch' \
--form 'unit="default"' \
--form 'parallel="true"' \
--form 'docs=@"/C:/Users/user/Desktop/testFile1.txt"' \
--form 'docs=@"/C:/Users/user/Desktop/testFile2.txt"' \
--form 'docs=@"/C:/Users/user/Desktop/testFile3.txt"'

The responses for this call will change according with the type of unit used

  • No labels