Creates a process unit with defined configuration to identify tags. Result data you get depend on the type you choose. Depending whether you choose expiration or not the process unit can be disposable or reusable and even persistent when restarting the server.
But, what is a Processing Unit anyway?
Before Saga starts processing text it needs to create the processing pipeline containing all different stages defined by the user. Each stage that could be a Processor or a Recognizer needs to be initialized and added to the pipeline. This pipeline creation process takes time therefore it needs to happen only once and be reused for all the rest of request in order to maximize performance.
A Processing Unit is basically an object in server's memory where the pipeline instance created with all the configuration defined is stored so it can be reused by subsequent requests.
route - Return only the SEMANTIC_TAG tokens in the highestRoute
More types can when requested
If persistent is true, expiration will be disabled
Heavy Load
If your tags have a lot of patterns/entities and takes a while to load, you may want to check createEngines options, that way the process unit will create the engines from that start and have them ready when need.
curl --location --request POST 'http://localhost:8080/saga/api/client/process/generate' \ --header 'Content-Type: application/json' \ --data-raw '{ "tags": ["model", "family"] }'
{ "unit": "default", "message": "Processing unit default created" }
curl --location --request POST 'http://localhost:8080/saga/api/client/process/generate' \ --header 'Content-Type: application/json' \ --data-raw '{ "unit": "this_unit", "tags": ["model", "family"], "type": "matchExtraction", "combineRoutes": false, "createEngines": true, "enginePoolSize": 1, "engineTimeout": 30000, "excludeFlags": ["TEXT_BLOCK"], "includeFlags": [], "exactTags": false, "includeMetadata": true, "includeComponents": true, "includeTheseComponents": ["{engine}"], "includeComponentMetadata": true, "maxCharsSizeToProcess": 0, "multiline": true, "splitRegex": "[\r\n]+", "refreshExpiration": true, "expires": true }'
{ "unit": "this_unit", "message": "Processing unit this_unit created" }
curl --location --request POST 'http://localhost:8080/saga/api/client/process/generate' \ --header 'Content-Type: application/json' \ --data-raw '{ "unit": "this_unit", "tags": ["model", "family"], "type": "matchExtraction", "combineRoutes": false, "createEngines": true, "enginePoolSize": 1, "engineTimeout": 30000, "excludeFlags": ["TEXT_BLOCK"], "includeFlags": [], "exactTags": false, "includeMetadata": true, "includeComponents": true, "includeTheseComponents": ["{engine}"], "includeComponentMetadata": true, "maxCharsSizeToProcess": 0, "multiline": true, "splitRegex": "[\r\n]+", "refreshExpiration": true, "expires": { "amount": 10, "timeUnit": "minutes" } }'
{ "unit": "this_unit", "message": "Processing unit this_unit created" }
curl --location --request POST 'http://localhost:8080/saga/api/client/process/generate' \ --header 'Content-Type: application/json' \ --data-raw '{ "unit": "this_unit", "tags": ["model", "family"], "type": "matchExtraction", "combineRoutes": false, "createEngines": true, "enginePoolSize": 1, "engineTimeout": 30000, "excludeFlags": ["TEXT_BLOCK"], "includeFlags": [], "exactTags": false, "includeMetadata": true, "includeComponents": true, "includeTheseComponents": ["{engine}"], "includeComponentMetadata": true, "maxCharsSizeToProcess": 0, "multiline": true, "splitRegex": "[\r\n]+" }'
{ "unit": "this_unit", "message": "Processing unit this_unit created" }
curl --location --request POST 'http://localhost:8080/saga/api/client/process/generate' \ --header 'Content-Type: application/json' \ --data-raw '{ "unit": "this_unit", "tags": ["model", "family"], "type": "matchExtraction", "combineRoutes": false, "createEngines": true, "enginePoolSize": 1, "engineTimeout": 30000, "excludeFlags": ["TEXT_BLOCK"], "includeFlags": [], "exactTags": false, "includeMetadata": true, "includeComponents": true, "includeTheseComponents": ["{engine}"], "includeComponentMetadata": true, "maxCharsSizeToProcess": 0, "multiline": true, "splitRegex": "[\r\n]+", "persistent": true }'
{ "unit": "this_unit", "message": "Processing unit this_unit created" }
Get all the valid process units, and see the name, type, expiration time or is it persistent
curl --location --request GET 'http://localhost:8080/saga/api/client/process/units'
[ { "expires": "n/a", "name": "default", "type": "json", "persistent": false }, { "expires": "2022-07-21T20:52:11.471-0600", "name": "test_unit_expiration", "type": "matchExtraction", "persistent": false }, { "expires": "n/a", "name": "test_unit_persitent", "type": "json", "persistent": true }, { "expires": "n/a", "name": "test_unit_no_expiration", "type": "matchExtraction", "persistent": false } ]
Get all the configuration of a specific process unit, including all the defaults values you may have omitted
curl --location --request GET 'http://localhost:8080/saga/api/client/process/units/default'
{ "includeComponentMetadata": false, "expires": false, "excludeFlags": [ "TEXT_BLOCK" ], "unitName": "default", "includeTheseComponents": [], "enginePoolSize": 1, "type": "json", "splitRegex": "[\r\n]+", "tags": [ "model", "family" ], "engineTimeout": 30000, "combineRoutes": false, "includeMetadata": false, "includeComponents": true, "maxCharsSizeToProcess": 0, "multiline": true, "includeFlags": [ "SEMANTIC_TAG" ], "refreshExpiration": false, "createEngines": false, "exactTags": false, "persistent": false }
Delete a specific process unit, if the process is currently you will get a error message "Unable to delete Processing unit :unit, since it's processing"
curl --location --request DELETE 'http://localhost:8080/saga/api/client/process/units/test_unit'
{ "unit": "test_unit", "message": "Processing unit test_unit deleted" }
If any of the resources has being updated, you can force a reload of the unit using this call (although the recharge will be done automatically eventually)
curl --location --request DELETE 'http://localhost:8080/saga/api/client/process/units/test_unit'
{ "unit": "test_unit", "message": "Processing unit test_unit reloaded" }
Process a text using a process unit, for this call we have 2 options put the raw text in the body, or upload a file (.txt only) with the text in it.
You can use the parameter 'createProcessingUnit' set to true in the query string in order to generate the processing unit before processing the text.
This is useful if you want to only send a single request instead of 2. Just remember that you'll need to send all parameters needed when creation a Processing Unit described previously
curl --location --request POST 'http://localhost:8080/saga/api/client/process/text' \ --header 'Content-Type: application/json' \ --data-raw '{ "unit": "default", "doc": "The Boeing 737 released the new model MIG29M with an engine DB 601" }'
curl --location --request POST 'http://localhost:8080/saga/api/client/process/batch' \ --form 'unit="default"' \ --form 'docs=@"/C:/Users/user/Desktop/testFile.txt"'
The responses for this call will change according with the type of unit used
Same as the endpoint text, but it will process a batch of texts either in sequence or in parallel. Same as text you can choose to put the raw texts in an array, or upload a set of files (.txt only)
curl --location --request GET 'http://localhost:8080/saga/api/client/process/batch' \ --data-raw '{ "unit": "default", "docs": ["The Boeing 737 released the new model MIG29M with an engine DB 601", "The Boeing 737 released the new model MIG29M with an engine DB 601"] }'
curl --location --request POST 'http://localhost:8080/saga/api/client/process/batch' \ --form 'unit="default"' \ --form 'parallel="true"' \ --form 'docs=@"/C:/Users/user/Desktop/testFile1.txt"' \ --form 'docs=@"/C:/Users/user/Desktop/testFile2.txt"' \ --form 'docs=@"/C:/Users/user/Desktop/testFile3.txt"'
The responses for this call will change according with the type of unit used