Using DevSwarm and Jupyter Notebooks for forecasting
In this example, we are going to leverage the bytewax DevSwarm capabilities to develop and test our swarm locally in a jupyter notebook and then we will deploy the bees running in our notebook in our remote environment.
What is a DevSwarm?
A bytewax DevSwarm is an object that we can use to test the functionality of bees as we develop and also for writing tests it simulates much of what is done by bytewax remotely. When our swarm is running on bytewax, a swarm object is passed to it and the bee can use the
respond() methods. This is similar to the methods available to the DevSwarm. The difference is that the
respond() methods assign values to a DevSwarm object that you can retrieve in your local dev environment.
Importing the DevSwarm Class
The DevSwarm is part of the bytewax SDK. We can import it at the top of our jupyter notebook we import
DevSwarm from the bytewax SDK. This has no impact on the file when it runs on bytewax remotely.
Using the DevSwarm In order to simulate how a swarm object is passed to bees when they run remotely on bytewax, We instantiate a swarm object that we can then pass to our bees we will be developing.
Running a DevSwarm
For now let’s assume we have 2 bees,
data_prep we will be publishing data to the next bee with the
publish() method and then in
arima we will be responding to our client with the
respond() method. Now that we have our DevSwarm object that we assigned to
swarm we can then call our bees like functions (which they are) and pass the
swarm. Our first bee publishes, so we will get a DevSwarm object called
published that will contain our published result. Similarly on line 64, our bee that publishes will add its output data to the DevSwarm
A note on published and responded
responded are lists and for each time a bee publishes, the output will be appended. This will continue to add objects until it is cleared from memory, so when you are running a DevSwarm, you need to be explicit in calling the index of the output if you are passing it to the next bee.
Simulating JSON serialization in a DevSwarm
When we run this swarm we will be sending JSON data to the bytewax Gateway. In our notebook, we will use
read_json instead of reading our csv file directly to a Pandas DataFrame object.
Writing Bees in a Jupyter Notebook
In this step, we’re looking at the bee code within a Jupyter notebook. Like the python bees we have written in other tutorials we decorate the bee with
register_bee, which takes an argument called
name. It’s important that this
name argument matches the one that we defined in the
bee field in the swarm definition for this step. At the end of the function we use
swarm.publish(). Note that there is no return here and thats why we used the
published object earlier with the DevSwarm.
Making a Forecast For this swarm we are actually training our ARIMA model every time we call this bee. This might not be the best method if you are sending a stream of data to this endpoint. For more information about the ARIMA method and ARIMA package in Python you can checkout the documentation. In this bee, we are splitting the data into test and train. We are then training our model by trying various parameters for the different ARIMA coefficients. Once we have the best fit, we checkout our correlation coefficient out of sample and then we make a prediction for the future. We only return the prediction if our correlation coefficient is above 75% because we are more confident in it.
There are all sorts of cool benefits to developing in Notebooks like inline visualizations and rerunning cells, however you would not necessarily want to run a notebook in production. In our case, since we are just using the bee functions, we can put our bee into a swarm and then deploy it and we don’t have to worry about the complexities of our notebook executing every time a bee receives new data.
Defining our Swarm
Luckily for us, the hard part here is taken care of by bytewax and we can create a swarm using python bees the same way we have in the other tutorials. We are going to have a separate bee defined in our yaml file for each one of our functions and the
data-prep output will be passed to
Building the Dockerfile
Modifying the Environment and Adding Dependencies In this example we have custom requirements that we need in the environment that the Bees will run in. We specify these in requirements.txt in this example.
Writing a Dockerfile with Multi-stage Builds Building and pushing a new Docker image can sometimes take a while, especially if the docker cache is invalidated. In order to work around this and speed up the deployment time, we recommend using multi-stage builds like in the example. This can save you a lot of time when you are tweaking a swarm deployed on Bytewax.
Deploying Our Swarm to bytewax
Following the same flow as the previous tutorials, we can use
waxctl to deploy our swarm.
Making a request to bytewax
With our Swarm created, we can now look at how to send a request to bytewax. For this example, we’ll write a small script that you can run locally to send data to our swarm.
Success! 🐝 Congratulations! 🐝 We’ve successfully run a swarm locally from a Jupyter Notebook.