The Sentence Splitter stage is useful when you want to split sentences using the GAIA API. By adding a regex pattern or using NTLK library, you can split sentences as a part of a pipeline. The regex and NLP options are mutually exclusive, if you use NLP then the regex is ignored.
The use_nlp uses NTLK library to split sentences more precisely at a performance cost.
You need to have installed the python library, otherwise the Stage will return an ImportError.
You can execute the proper pip install command beforehand to have the library installed and ready to be used.
The result of the stage is a list of sentences that are added to the intermediate object.
Property | Description | Default | Type | Required |
---|---|---|---|---|
type | Stage class name | - | string | Yes |
enable | Enable stage for execution | true | boolean | No |
name | Name for this specific stage | string | No | |
use_nlp | Indicates the stage to use NLTK to split the sentences instead of regex patterns. | False | boolean | No |
regex | Indicates the regex pattern that will be used to split the sentences. | [//.|//!|//?]\s+ | string | No |
_split_sentence = SentenceSplitterStage( use_nlp=False regex="[//.|//!|//?]\s+", enable=True, name='split_sentence_stage', )