bytewax.azure_ai_search
#
Azure Search Sink Implementation.
This module provides a dynamic sink for writing data to an Azure Search index using Bytewax’s streaming data processing framework. The sink is capable of creating and managing a connection to Azure Search and inserting documents in batches based on a user-defined schema.
Classes: AzureSearchSink: A dynamic sink that connects to an Azure Search service, manages the index, and writes data in batches. _AzureSearchPartition: A stateless partition responsible for writing batches of data to the Azure Search index.
Usage:
- The AzureSearchSink
class is used to define a sink that can be
connected to a Bytewax dataflow.
- The build
method of AzureSearchSink
creates an _AzureSearchPartition
that handles the actual data writing process.
- The sink supports inserting documents based on a user-defined schema,
ensuring that the data is formatted correctly for the target index.
Logging: The module uses Python’s logging library to log important events such as index operations, API requests, and error messages.
Sample usage:
1from bytewax.azure_ai_search import AzureSearchSink
2
3schema = {
4 "id": {"type": "string", "default": None},
5 "content": {"type": "string", "default": None},
6 "meta": {"type": "string", "default": None},
7 "vector": {"type": "collection", "default": []},
8}
9
10# Initialize the AzureSearchSink with your schema
11azure_sink = AzureSearchSink(
12 azure_search_service="your-service-name",
13 index_name="your-index-name",
14 search_api_version="2024-07-01",
15 search_admin_key="your-api-key",
16 schema=schema, # Pass the custom schema
17)
Note The above assumes you have created a schema through Azure AI Search configuration. For more information, review the README.
Complete examples can be found here
Submodules#
Data#
- logger#
- Document#
- Schema#
- Batch#