ordering#

Operators related to ordering streams.

Classes#

class OrderOut#
Bases:

Streams returned from the order operator.

down: KeyedStream[V]#

Items in timestamp order.

late: KeyedStream[V]#

Upstream items that were late.

Functions#

order(
step_id: str,
up: KeyedStream[V],
clock: Clock[V, SC],
) OrderOut[V]#

Order items by timestamp, within limits.

This allows you to take a stream of out-of-timestamp-order items and streaming sort them into timestamp order. Further stateful processing (e.g. a state machine) might need items in timestamp order.

You generally will use this operator with an EventClock. wait_for_system_duration is a crucial parameter to think about for this clock: The larger this value is, the more out-of-order the timestamps can be in your upstream without dropping items as late. But the tradeoff is that this increases the memory usage (as all items for that duration need to be buffered) and increases the latency of a real-time data source (in that you will need to wait this amount of system time to see if new out-of-order data is still coming).

For a streaming system, generally set wait_for_system_duration to the amount of system time latency you are willing to wait for as correct answers as possible.

For bounded batch processing, generally set wait_for_system_duration to the maximum amount of out-of-orderedness you think you’ll see in the data. You also can set wait_for_system_duration to datetime.timedelta.max if you are ok with buffering all of the incoming data and sorting it and don’t want to worry about a maximum amount of out-of-order-ness, but this will have memory usage implications.

This operator is not needed before standard windowing operators; they sort internally to give you their operator guarantees.

Parameters:
  • step_id – Unique ID.

  • up – Stream.

  • clock – Time definition. This will almost always be an EventClock.

Returns:

Order result streams.

Join our community Slack channel

Need some help? Join our community!

If you have any trouble with the process or have ideas about how to improve this document, come talk to us in the #questions-answered Slack channel!

Join now