Back to tutorials

Complete Swarm Definition

Learn about the features available in Swarm yaml file syntax in detail.

Introduction

This tutorial provides an example that will showcase the options available in a swarm definition. For those less familiar, a Swarm is a directed acyclic graph (DAG) and it is defined in a yaml file and follows a certain syntax.

Names

The names correspond to the various resources that you are creating. They need to follow Kubernetes naming rules.

Bee types and specs

Each Bee is required to have a type. Types provide the mechanism to re-use Bees across Swarms. Bees can be configured using the spec block. The fields in the spec block are specific to the type of bee.

Python bee configuration This example shows the python bee type. It requires the following spec:

  • image - the docker image to use. It is required to include the bytewax pip package.
  • file - this is the python file that will run in the Bee. This file will be loaded during runtime and each function decorated with the @register_bee decorator will be imported.
  • bee - The Bee function that will be called on each swarm event. The name corresponds to the name argument passed to @register_bee.

Inputs

This clause defines how the Bee will be triggered and where it will receive data from.

The gateway input type means that this bee will consume requests or submits passed to a bytewax gateway subject. The name corresponds to the subject and this is the argument that would be passed to either the gateway.request or gateway.submit sdk methods in an application.

The bee input type designates that the data will come from another Bee in the same swarm. The Bee listed as the input: name is required to use the swarm.publish sdk method in order for the data to be received.

Environment variables

The recommended way to configure bees at runtime is by using environment variables. These can be literal environment variable values or they can leverage Bytewax secrets. More on that below.

Literal values This line will populate a Bees environment variable LITERAL_ENV with the value literal_value.

Secrets To pass secret values like a database password or API token, Bytewax provides secret management. You can manage your secrets in the Bytewax dashboard, via the CLI tool waxctl or via the REST API. Once you create a secret, it can be passed through as an environment variable with as seen in this example with the secret clause.

Scaling bees

Bytewax provides two ways to scale bees - replicas and sizes. Replicas allow you to scale bees horizontally (replicate the number of bees running), whereas the size allows you to scale vertically, increasing CPU and Memory available to a bee. The best practice is to scale horizontally wherever possible, but the size should be just enough to fit the runtime objects (model, batch etc.)

Bee replicas let you define how many bee processes are spawned. Because replicas will be spread across multiple servers, they also provide resiliency for physical crashes of a single underlying server. It is recommended to have at least 2 replicas at all times for redundancy and the default is 3.

Bee size provides the means of vertically scaling - that is, how much memory and CPU time is reserved for each Bee replica. This should be configured to a safe minimum required for the process running. It’s generally better to have more smaller bees than a few large ones. A fraction of a CPU core means that one physical CPU core can be shared between multiple processes.

The current available sizes are shown below. If you need something else for your particular swarm, please contact us.

  • S
    • memory: 500MB
    • cpu: 1/8 core
  • M
    • memory: 1GB
    • cpu: 1/4 core
  • L
    • memory: 4GB
    • cpu: 1
  • XL
    • memory: 8GB
    • cpu: 2