The implementation between each model could be completely different, we are not explaining details about each particular model, each model implementation in Python Bridge is a set of expected functions focused in functionality.
Every model class must be under the package models, inside the Python Bridge folder, and must extend from the class ModelWrapper, which can be imported from models.model_wrapper. An example of a model class with the bare minimum, can be found below:
from models.model_wrapper import ModelWrapper class Test(ModelWrapper): def __init__(self): super().__init__() def initialize(self, model_dir, **kwargs): pass def load(self, model_dir, name, version): pass def save(self): pass def clear(self): pass def feed(self, data): pass def train(self, **kwargs): pass def predict(self, data: list): pass def regress(self, data): pass def classify(self, data) -> (str, float): pass
Every class extending from ModelWrapper must implement the following methods, but if by any reason you don't need one of them, you can leave it as pass
def initialize(self, model_dir, **kwargs):
Receives the path to the model type and the configuration for the path.
def load(self, model_dir, name, version):
Receives the path to the model type, the name of the model and the version of it, in this section the loading of the model is expected.
def save(self):
Saves the current loaded model.
def clear(self):
Removes the current loaded model, and any training data in memory.
def feed(self, data):
Receives a list of string tokens to be added to the training data.
def train(self, **kwargs):
Trains the model with the documents fed, the model can be either kept in memory or saved.
def predict(self, data: list):
Retrieves a vector or an array of vectors from processing the data with the loaded model, which is returned inside a JSON { 'vector': [ ] }, the value of the vector key must be always an array.
def regress(self, data):
Implements regression. We yet haven't implement a model for this particular method, at the moment it exists for future implementation.
def classify(self, data) -> (str, float):
Retrieves a label or multiple labels using the loaded model from the data. We recommend returning the label along with its confidence.
Now that the class is ready, we need to make it available to be used, for this we need to add a reference in 2 files
models/__init__.py
Inside the models package there is a __init__.py file which exposes the Model Classes to the server. Any new class needs to be added to this files, an example can be seen below with the Test class:
from .latent_semantic_indexing import LatentSemanticIndexing from .bert import Bert from .bert_qa import BertQA from .sbert import SBert from .sentiment_analysis_vader import SentimentAnalysisVader from .sentiment_analysis_text_blob import SentimentAnalysisTextBlob from .model_wrapper import ModelWrapper from .tfidf_vectorizer import TfidfVectorizer from .test import Test
config/config.json
The other reference lies in the config.json file in the config folder, in this file there is a section called "model_types", which refers to the classes available. As in the __init__.py file, any new class needs to be referenced in this file.
"model_names" does reference the actual model data, each name in the model_names refers to a folder which also contains folders representing the versions of the model
"model_types": { "LatentSemanticIndexing" : { "enabled": true, "input_data_as_tokens": false, "model_names": ["lsi"] }, "Bert": { "enabled": false, "input_data_as_tokens": false, "model_names": ["bert-base-uncased"], "default_model": "bert-base-uncased" }, "BertQA": { "enabled": false, "input_data_as_tokens": false, "model_names": ["bert-large-uncased-whole-word-masking-finetuned-squad"], "default_model": "bert-large-uncased-whole-word-masking-finetuned-squad" }, "SBert": { "enabled": true, "input_data_as_tokens": false, "model_names": ["bert-base-nli-stsb-mean-tokens", "bert-base-nli-mean-tokens", "distilbert-base-nli-stsb-mean-tokens"], "default_model": "bert-base-nli-stsb-mean-tokens" }, "SentimentAnalysisVader": { "enabled": true, "input_data_as_tokens": false, "model_names": ["vader"] }, "SentimentAnalysisTextBlob": { "enabled": true, "input_data_as_tokens": false, "model_names": ["textBlob"] }, "TfidfVectorizer": { "enabled": true, "input_data_as_tokens": false, "model_names": ["Tfidf"] }, "Test" { "enabled": true, "input_data_as_tokens": false, "model_names": ["test"] } }
Every model to be used by its implementation needs to be stored in a specific path, composed by the Name of model type, a representative name of the model and a folder representing the version (the version doesn't have to be a number, it can be a name). As it can be seen below, the model for the Test Class was added following this structure
models_data │ ├───Bert │ └───bert-base-uncased │ └───1 ├───BertQA │ └───bert-large-uncased-whole-word-making-finetuned-squad │ └───1 ├───LatentSemanticIndexing │ └───lsi │ ├───1 │ └───2 ├───SBert │ ├───bert-base-nli-mean-tokens │ │ └───1 │ ├───bert-base-nli-stsb-mean-tokens │ │ └───1 │ └───distilbert-base-nli-mean-tokens │ └───1 ├───SentimentAnalysisTextBlob │ └───textBlob │ └───1 ├───SentimentAnalysisVader │ └───vader │ └───1 └───TfidfVectorizer │ └───tfidf │ └───1 └───Test └───test └───1