Operators

Operators are the processing primitives of Bytewax. Each of them gives you a "shape" of data transformation, and you give them functions to customize them to a specific task you need. The combination of each operator and their custom logic functions we call a dataflow step. You chain together steps in a dataflow to solve your high-level data processing problem.

If you've ever used Python's map or filter or functool's reduce or equivalent in other languages, operators are the same concept. If not, no worries, we'll give a quick overview of each and links to our annotated examples which demonstrate the use of each in a relevant way.

Using Operators

You can add steps to your dataflow by calling the method with the name of the operator you'd like to use on a bytewax.Dataflow instance.

There is a detailed description of every operator, its behavior, and a simple example in the API docs for bytewax.Dataflow.

Epoch Modification

Operators generally do not modify the epoch of the data that is passing through them. If they do, that will be called out explicitly in their API documentation. For example, bytewax.Dataflow.reduce() mentions that its output will be only in the epoch of the most recent value for each key.

Stateful Operators

Any operator which carries state between processing items is a stateful operator.

In order to coordinate this state in a multiple-worker execution, all stateful operators require that their input are make up of keys and values in a (key, value) two-tuple. Keys must be strings. Bytewax can then route the value to the worker that has the relevant state. Any output from these operators will also be (key, output_item) two-tuples as well.

Recoverable Operators

Stateful operators which persist state across epochs are also recoverable operators.

They all take a step_id argument so that their state is backed up correctly.