Python Classification Watcher

Connects directly to the Python Bridge, to send text or sections of the interpretation graph to be processed by ML algorithms in Python.

This recognizer is used when there is a need to classify an entire document for example. That is the difference when compared to the Python Model Recognizer which will be processing or running for each token or text block.

Processing an entire document has its benefits, it may be the best way to classify a document as a whole. By running only once per document we get a boost in performance when compared to run the recognizer for each individual token or text block.

The other benefit is that we could normalize the text before sending it to the python model and also specify dependent tags so it runs in the order we need in the processing pipeline.

Uses Python Classification Watcher Stage

Trigger Flag ( type=string | default=EOF | optional ) - Flag to trigger this recognizer.
Service Protocol ( type=string | default=HTTP | optional ) - Desired protocol to connect to Python Bridge (HTTP / HTTPS)
Service Hostname ( type=string | default=localhost | required ) - Name of the host where Python Bridge is running.
Service Port ( type=string | default=5000 | required ) - Port where Python Bridge is running.
Authentication ( type=boolean | default=false | required ) - Whether or not to use authentication for the python bridge
Select Model ( type=string | required ) - Model name registered in the python bridge.
Select Version ( type=string | optional ) - Model version registered in the python wrapper to query.
Select Model Method ( type=string | required ) - Model method to call for the model.

Configuration

Send Original Text ( type=boolean | default=false | optional ) - Whether to send the original text to Python or not
Send As Text Blocks ( type=boolean | default=false | optional ) - Splits the text in blocks according to the split pattern and sends the text block list
Send as Token Collection ( type=boolean | default=false | optional ) - Sends the content as a collection of tokens. This setting interacts with the previous ones, the original text can be sent as a token list and both can be grouped in lists of tokens corresponding to text blocks
Include Vertex Text as Token ( type=boolean | default=false | optional ) - Includes text from vertex as new tokens.
Normalize Tags ( type=boolean | default=false | optional ) - Tags (flagged as SEMANTIC_TAG) are normalized.
Dependency tags ( type=string array | optional ) - List of tags to be dependent for the model.

General Settings

The general settings can be accessed by clicking on

More settings could be displayed in the same dialog, it varies per recognizer.

Enable - Enable the processor to be use in pipelines.
Base Pipeline - Indicates the last stage, from a pipeline, needed by the recognizer.
Skip Flags ( optional ) - Lexical items flags to be ignored by this processor.
Boundary Flags ( optional ) - List of vertex flags that indicate the beginning and end of a text block.
Required Flags ( optional ) - Lexical items flags required by every token to be processed.
At Least One Flag ( optional ) - Lexical items flags needed by every token to be processed.
Don't Process Flags ( optional ) - List of lexical items flags that are not processed. The difference with "Skip Flags" is that this will drop the path in the Saga graph, skip just skips the token and continues in the same path.
Confidence Adjustment - Adjustment factor to apply to the confidence value of 0.0 to 2.0 from (Applies for every match).
- 0.0 to < 1.0 decreases confidence value
- 1.0 confidence value remains the same
- > 1.0 to 2.0 increases confidence value
Debug - Enable debug logging.

Page tree

Python Classification Watcher

Configuration

General Settings