Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The Calculate Vector Stage is the first step of semantic search. In this stage the vector is calculated or retrieved based on the model given by the user. There are 3 types of models enumerated.

  1. Saga model: This model retrieves the vector from saga, an entry with the value is require in saga. The value needs to be the same as the saga stage name.
  2. Open AI model: This model calculates the vector with Open AI API, this needs the credentials to make calls to the API as environment variables. These are the supported Open AI models:
    1. text-embedding-ada-002

    2. text-search-ada-doc-001

    3. text-search-curie-doc-001

  3. Sentence Transformer GTR:  This model calculates the vector with the GTR component of the python library sentence_transformer. These are the supported GTR models:
    1. sentence-transformers/gtr-t5-base
    2. sentence-transformers/gtr-t5-large
    3. sentence-transformers/gtr-t5-xl
    4. sentence-transformers/gtr-t5-xxl

The Calculate Vector Stage stores the vector on intermediate for the Create Query Stage usageThe Query Stage is a crucial component in SearchAPI that enables you to perform powerful search queries on an index or alias. It provides a wide range of options and parameters to customize and fine-tune your search.
With the Query Stage, you can specify the fields to be used for matching terms, phrases, spans, date ranges, and ranges. This allows you to define precisely how your search results are retrieved and filtered based on your specific requirements.
One of the key features of the Query Stage is the ability to leverage PYQPL (Query Parser Language) to create complex and advanced queries. PyQPL provides a flexible syntax that allows you to construct intricate search conditions, combine multiple search criteria, and define custom operators for precise control over your search results.The Query Stage also supports various additional functionalities to enhance your search experience. These include pagination, sorting, highlighting (to emphasize search terms in the results), filters (to narrow down search results based on specific criteria), aggregations (to obtain statistical insights from search results), and the ability to fetch specific fields to optimize performance and reduce data transfer.

Table of Contents
maxLevel3

Properties

PropertyDescriptionDefaultTypeRequiredQPL Config?
enableEnable stage for executiontruebooleanNoNo
nameName for this specific stage"vector"stringNoNo
save_to_intermediateIf true, the result of the stage will be stored in the intermediate instead of the final sectionfalsebooleanNoNo
expand_resultIndicates if the result of this stage should be expanded into the final data dictionary instead of being appended as usualfalsebooleanNoNo
halt_on_exceptionIndicates if the pipeline should be interrupted in case of an exceptiontruebooleanNoNo
fields
model
Fields to be used for matching terms, phrases, spans, etc.array, object, or stringYesYes
Indicates the model to be used to calculate or retrieve the vectors.

EnumSaga.SAGA

EnumYesNo

open_ai_api_key

API key to connect to Open AI.os.environ.get('OPEN_AI_API_KEY', 'default_key')stringNoNo

open_ai_api_base_url

Base url to connect to Open AI.os.environ.get('OPEN_AI_API_BASE_URL', 'default_url')stringNoNo

open_ai_api_type

Type of the Open AI API.EnumOpenAiType.AZUREEnumNoNo

open_ai_api_version

Version of the Open AI API.os.environ.get('OPEN_AI_API_VERSION', '2023-03-15-preview')stringNoNo
type
modelEnumYesNovector_field_name


stringYesNo
Query


Calculate Vector Stage Intermediate Parameters

The Query Calculate Vector Stage offers a range of parameters that can be passed via the intermediate input to customize your search request or modify the configuration of the current stage. These parameters provide flexibility and control over the search process.


It can be an alternative to from/start. It calculates the start based on size parameter
ParameterDescription
qA string query for performing a search. Can be transformed into engine-specific queries using PyQPL (Query Parser Language).
queryEngine-specific queries for the search.knnEngine-specific queries specifically for k-nearest neighbor (KNN) searches.
sizeNumber of results to return from the search request. Overrides the size specified in the configuration.
from/startIndicates the starting point for retrieving search results. Can be used interchangeably with the page parameter.page the fetch_fieldsList of fields to fetch for each search result. Overrides the fields specified in the configuration.
exclude_fieldsList of fields to exclude from the search results. Overrides the fields specified in the configuration.
scrollScroll ID used to retrieve large numbers of results from a single search request, similar to a cursor in a traditional database.
operatorThe default operator for query string queries: AND or OR. Overrides the default operator specified in the configuration.
Tip

Remember that the intermediate can be fill with either other stages or the original request body that trigger the pipeline, making this essentially REST API parameters

Additional Classes for Configuration

QueryStageUIConfig  AnchorQueryStageUIConfigQueryStageUIConfig

The QueryStageUIConfig is a configuration object that provides UI-specific settings for the QueryStage.

QueryStageUIConfig Properties

PropertyDescriptionDefaultTypepage_sizeConfiguration for page size settings in the UIVectorNosortConfiguration for sort settings in the UIVectorNoPageSizeConfig Properties  AnchorPageSizeConfigPageSizeConfigPropertyDescriptionDefaultTypedefaultDefault page size value25integeroptionsAvailable page size options[25, 50, 100]array of integersSortConfig Properties  AnchorSortConfigSortConfigPropertyDescriptionTypeRequireddefaultDefault sort entrySortEntryYesoptionsAvailable sort optionsarray of VectorYesSortEntry Properties  AnchorSortEntrySortEntryPropertyDescriptionTypeRequiredfieldName of the field to be used for sortingstringYesdisplay_nameDisplay name for this sort entry (only applicable for user interface)stringNoorderSort order to be usedVector or objectYesSortOrder Enum  AnchorSortOrderSortOrderEnum ValueDescription"asc"Ascending sort order"desc"Descending sort order


Example Configuration

Code Block
languagepy
themeDJango
synonyms = {
            'cancer': ['cancer', 'malignancy', '363346000', 'cancers', 'malignancies', '"malignant growth"',
                       '"malignant neoplasm"', '"malignant neoplasms"', '"malignant neoplastic disease"',
                       '"malignant tumor"', '"malignant tumors"', '"neoplasm malignant"', '"neoplasm/cancer"',
                       '"tumor, malignant"'],
            'headache': ['headache', '25064002', 'cephalalgia', 'cephalgia', 'cephalgias', '"cranial pain"',
                         '"have headaches"', '"head ache"', '"head pain"', '"head pain cephalgia"', '"head pains"',
                         'headaches', '"mild global headache"', '"mild headache"', '"pain head"', '"pain in head"',
                         '"pain, head"']
        }

_query_vector_stage = QueryStage(
    engine_name=DEFAULT_ENGINE_NAME,CalculateVectorStage(
    enable=True,
    qplsave_to_enableintermediate=True,
    name=SEARCH_STAGE_NAME,
    index='movies'expand_result=False,
    synonymshalt_on_call= lambda x: self.synonyms.get(x, None), # Simple synonym implementation
    wildcard=exception=False,
    fields=['title', 'overview', 'url', 'status', 'metadata', 'metadata.production_companies'name=VECTOR_STAGE_NAME,
            'metadata.cast',
            'metadata.directors'],
    range_fields=['metadata.budget'],
    date_fields=['release_date']model=EnumOpenAI.OPENAI_EMBEDDING_ADA,
    fetch_fields=['title', 'adult', 'overview', 'url', 'release_date', 'status', 'video', 'metadata',
                  'metadata.production_companies', 'metadata.cast', 'metadata.directors'],
    exclude_fields=None,
    implicit_operator='or',
    page_size_default=25,
    sort=SortEntry(
        field='_score',
        order=SortOrder.DESC
    open_ai_api_key=os.environ.get('OPEN_AI_API_KEY'),
    aggregations=open_aggregation_stage.name,
    aggregations_filters=_aggregation_stage.filters_name,
    highlight=_highlight_stage.name,
    filters=_filter_stage.name,
    ui_only=QueryStageUIConfig(
        sort=SortConfig(
            default=SortEntry(
                field='_score',
                display_name='Score',
                order=SortOrder.DESC
            ),
            options=[
                SortEntry(
                    field='_score',
                    display_name='Score',
                    order=SortOrder.DESC
                ),
                SortEntry(
                    field='release_date',
                    display_name='Release Date',
                    order=SortOrder.ASC
                ),
                SortEntry(
                    field='metadata.budget',
                    display_name='Budget',
                    order=SortOrder.DESC
                )
            ]
        ),
        page_size=PageSizeConfig(
            default=25,
            options=[25, 50, 100]
        )
    )ai_api_base_url=os.environ.get('OPEN_AI_API_BASE_URL'),
    open_ai_api_type=EnumOpenAiType.AZURE,
    open_ai_api_version='2023-03-15-preview',
    type='CalculateVectorStage'
)