`bytewax.azure_ai_search`#

Azure Search Sink Implementation.

This module provides a dynamic sink for writing data to an Azure Search index using Bytewax’s streaming data processing framework. The sink is capable of creating and managing a connection to Azure Search and inserting documents in batches based on a user-defined schema.

Classes: AzureSearchSink: A dynamic sink that connects to an Azure Search service, manages the index, and writes data in batches. _AzureSearchPartition: A stateless partition responsible for writing batches of data to the Azure Search index.

Usage: - The AzureSearchSink class is used to define a sink that can be connected to a Bytewax dataflow. - The build method of AzureSearchSink creates an _AzureSearchPartition that handles the actual data writing process. - The sink supports inserting documents based on a user-defined schema, ensuring that the data is formatted correctly for the target index.

Logging: The module uses Python’s logging library to log important events such as index operations, API requests, and error messages.

Sample usage:

from bytewax.azure_ai_search import AzureSearchSink

schema = {
    "id": {"type": "string", "default": None},
    "content": {"type": "string", "default": None},
    "meta": {"type": "string", "default": None},
    "vector": {"type": "collection", "default": []},
}

# Initialize the AzureSearchSink with your schema
azure_sink = AzureSearchSink(
    azure_search_service="your-service-name",
    index_name="your-index-name",
    search_api_version="2024-07-01",
    search_admin_key="your-api-key",
    schema=schema,  # Pass the custom schema
)

Note The above assumes you have created a schema through Azure AI Search configuration. For more information, review the README.

Complete examples can be found here

Submodules#

bytewax.azure_ai_search.operators

Data#

logger#

V: TypeVar#

Document#

Schema#

Batch#

Classes#

class AzureSearchSink( azure_search_service: str, index_name: str, search_api_version: str, search_admin_key: str, schema: Schema, )#

Bases:

DynamicSink

A dynamic sink for writing data to an Azure Search index in a dataflow.

Initialization

Initialize the AzureSearchSink.

build( step_id: str, worker_index: int, worker_count: int, ) → bytewax.azure_ai_search._AzureSearchPartition#: Build a sink partition for writing to Azure Search.

bytewax.azure_ai_search#

Submodules#

Data#

Classes#

`bytewax.azure_ai_search`#