bytewax.azure_ai_search.operators#

Operators for embedding generation using Azure OpenAI.

How to Use This Setup

  1. Using Environment Variables:

1from bytewax.connectors.azure_openai import AzureOpenAIConfig, operators as aoop
2from bytewax.dataflow import Dataflow
3
4config = AzureOpenAIConfig()  # Automatically picks up env variables
5
6flow = Dataflow("embedding-out")
7input = aoop.input("input", flow, ...)
8embedded = aoop.generate_embeddings("embedding_op", input, config)
9aoop.output("output", embedded, ...)

Passing Credentials Directly:

 1from bytewax.connectors.azure_openai import AzureOpenAIConfig, operators as aoop
 2from bytewax.dataflow import Dataflow
 3
 4config = AzureOpenAIConfig(
 5    api_key="your-api-key",
 6    service_name="your-service-name",
 7    deployment_name="your-deployment-name",
 8)
 9
10flow = Dataflow("embedding-out")
11input = aoop.input("input", flow, ...)
12embedded = aoop.generate_embeddings("embedding_op", input, config)
13aoop.output("output", embedded, ...)

Data#

logger#

Classes#

class AzureOpenAIConfig(
api_key: Optional[str] = None,
service_name: Optional[str] = None,
deployment_name: Optional[str] = None,
dimensions: int = 1536,
)#

Configuration class for Azure OpenAI integration.

This class handles the configuration and setup of the Azure OpenAI client for generating embeddings. It provides defaults based on environment variables but allows for direct parameter passing.

Initialization

Initialize the AzureOpenAIConfig with provided or environment credentials.

Args: api_key (Optional[str]): Azure OpenAI API key. service_name (Optional[str]): Azure OpenAI service name. deployment_name (Optional[str]): Azure OpenAI deployment name. dimensions (int): The dimensions of the embedding. Default is 1536.

Functions#

generate_embeddings(
step_id: str,
up: Stream[Dict[str, Any]],
config: AzureOpenAIConfig,
) Stream[Dict[str, Any]]#

Operator to generate embeddings for each item in the stream using Azure OpenAI.

Args: step_id (str): Unique ID for the operator. up (Stream[Dict[str, Any]]): Input stream of data items. config (AzureOpenAIConfig): Configuration object for Azure OpenAI.

Returns: Stream[Dict[str, Any]]: Output stream with embeddings added to each item.

output(
step_id: str,
up: Stream[Dict[str, Any]],
sink: AzureSearchSink,
) None#

Output operator for writing the stream of items to the provided sink.

Args: step_id (str): Unique ID for the operator. up (Stream[Dict[str, Any]]): Input stream of data items with embeddings. sink (AzureSearchSink): Sink to output data to.

Join our community Slack channel

Need some help? Join our community!

If you have any trouble with the process or have ideas about how to improve this document, come talk to us in the #questions-answered Slack channel!

Join now