Writing Documentation#

Documentation are Markdown files in the /docs folder. It is built using Sphinx and MyST for Markdown parsing.

Adding User Guides#

Articles for the user guide live in /docs/guide and sub-directories within. You can add new Markdown files to add a new article, but they must be added to a Sphinx table of contents to know where to add them in the document hierarchy. (Sphinx does not require the directory structure to match the document structure.)

The TOC for the user guide is in /docs/guide/index.md. You can add a new line with the path of that file the appropriate sub-section.

API Docs#

API reference documentation is automatically during the Sphinx build process via the sphinx-autodoc2 extension generated from our Python source in /pysrc. The build process turns the source into automatically generated Markdown files in /docs/api, which are then passed through the Sphinx builder.

All public functions, classes, and constants should have explicit type annotations.

Local Prototyping#

You’ll need to set up a local development environment. See Local Development and specifically Writing Docs.

MyST Cheat Sheet#

Here’s a quick rundown of common things in MyST flavored Markdown.

Docstrings#

Docstrings can be written using the all the features of MyST. Some docstring-specific hints will be provided here.

Arguments#

Docstrings should use MyST Markdown as the text body. Arguments, return values, etc. specified using MyST field lists which are :name value: Description lines. The field names should be the same as the Sphinx.

def my_func(x: int, y: str) -> str:
    """Do the cool thing.

    :arg x: Describe the X parameter.

    :arg y: Describe the Y parameter. If this is a really long line,
        you can wrap it with indentation. You can also use any
        **syntax** here you like.

    :returns: A description of the return value.

    """
    ...

If the function signature is coming from PyO3 (and thus there are no type hints in the code) you can use the :type var: and :rtype: fields to provide argument and return value type hints.

/// Do the cool thing.
///
/// :arg x: Describe the X parameter.
///
/// :type x: int
///
/// :arg y: Describe the Y parameter. If this is a really long line,
///     you can wrap it with indentation. You can also use any
///     **syntax** here you like.
///
/// :type y: str
///
/// :returns: A description of the return value.
///
/// :rtype: str
#[pyfunction]
fn my_func(x: usize, y: String) -> String {
    todo!();
}

Class and Module Variables#

You can add “post-variable docstrings” to document these.

from dataclasses import dataclass
from typing import TypeVar


X = TypeVar("X")
"""Type of a cool thing."""


@dataclass
class Container:
    x: int
    """This is the docstring for this attribute."""

    y: str
    """This is the docstring for this other attribute."""

Cross References#

See MyST’s documentation on cross referencing for all the ways this can work. I’ll give a quick summary here.

A Specific Section#

The system does not automatically generate xref links for headings. You can manually add a reference name to any heading via the (xref-name)= syntax just before it. In general, just add refs for sections you know you want to reference elsewhere.

(xref-specific-section)=
### A Specific Section

You can then reference it via normal Markdown link syntax with the URI being just #xref-name.

Read [how to link to a specific section](#xref-specific-section)

Appears as:

Read how to link to a specific section

Or the autolink syntax with the scheme project: and then a #xref-name.

Read about linking to <project:#xref-specific-section>

Appears as:

Read about linking to A Specific Section

Warning

All of your reference names must start with xref by convention to ensure that they are globally unique across all Sphinx domains. Unfortunately, MyST’s Markdown link xref resolver does not let you specify Sphinx domains and tries to resolve everything using the all directive, so it’s possible that the name you pick would clash with the name of a file (clashing with the doc domain) or a Python module (clashing with the py domain) and you get multiple targets. Prefixing them with xref means that we are less likely to clash.

Perhaps one day MyST will provide a syntax for unambiguously specifying an xref when they fix this issue.

Note

Either the link URI has to either start with a # and be a global Sphinx reference, or it is a path. You can’t mix and match. This will not work.

Read [how to link to a specific section](/guide/contributing/writing-docs.md#xref-specific-section)

Instead make an explicit reference target with (xref-name)=.

Other Markdown Files#

To link to an entire article, add an xref to the main header in the file and link to that. Use the steps and syntax above.

API Docs#

To link to a symbol in the Bytewax library, use the full dotted path to it surrounded by ` and proceeded by {py:obj}.

This operator returns a {py:obj}`bytewax.dataflow.Stream`.

Appears as:

This operator returns a bytewax.dataflow.Stream.

You should always use the full dotted path to reference a name, but if you don’t want it to appear as a full dotted path because of the context of the surrounding text, prefix the path with ~.

This operator returns a {py:obj}`~bytewax.dataflow.Stream`.

Appears as:

This operator returns a Stream.

Intersphinx#

Intersphinx is the system for Sphinx to connect different documentation systems together. The Sphinx config is already configured to have a few of our dependencies including the Python standard library connected.

API Docs#

For most external Python types, you can use the same xref syntax as within Bytewax:

See the standard library function {py:obj}`functools.reduce`.

Appears as:

See the standard library function functools.reduce.

Other References#

Other references use a more explicit system. You use URIs starting with inv:, then the name of the inventory in the /docs/conf.py intersphinx_mapping, then the domain, then the item name.

Learn about [how to use lambdas](inv:python:std:label#tut-lambda).

Appears as:

Learn about how to use lambdas.

Finding Reference Names#

If you don’t know the exact xref incantation, you can use the included dump tool to fuzzy search with grep or fzf over all the xrefs to find the one you want.

(dev) $ python ./intersphinxdump.py | fzf

Example Python Code#

Use backtick code blocks with the {testcode} language type. Use this instead of python to ensure that the code block is run as a doctest. It will still be syntax highlighted as if it was Python.

```{testcode}
from bytewax.dataflow import Dataflow

flow = Dataflow("doc_df")
```

Appears as:

from bytewax.dataflow import Dataflow

flow = Dataflow("doc_df")

If you are really sure that you don’t want the code to run as part of the doctest suite, you can use the python language instead.

Shell Sessions#

Use the language type console (instead of bash), and start commands you run with $ to get proper highlighting.

```console
$ waxctl list
output here
```

Appears as:

$ waxctl list
output here

Mermaid Diagrams#

We have install the Sphinx sphinxcontrib-mermaid plugin which allows you to use mermaid as a code block language name.

```mermaid
graph TD

I1[Kafka Consumer `users`] --> D1[Users Deserializer] --> K1[Key on User ID]
I2[Kafka Consumer `transactions`] --> D2[Transactions Deserializer] --> K2[Key on User ID]

K1 & K2 --> J1[Join on User ID] --> V[Validator] --> S[Enriched Serializer] --> O1[Kafka Producer `enriched_txns`]
V --> O2[Kafka Producer `enriched_txns_dead_letter_queue`]
```

Appears as:

graph TD I1[Kafka Consumer `users`] --> D1[Users Deserializer] --> K1[Key on User ID] I2[Kafka Consumer `transactions`] --> D2[Transactions Deserializer] --> K2[Key on User ID] K1 & K2 --> J1[Join on User ID] --> V[Validator] --> S[Enriched Serializer] --> O1[Kafka Producer `enriched_txns`] V --> O2[Kafka Producer `enriched_txns_dead_letter_queue`]

Current Version Number#

Sometimes you want to show a command that includes the latest version of Bytewax with version number. Instead of updating this number in every file, Sphinx has a variable that has the current version number in it that we can substitute in. To enable substitutions in a code block, unfortunately, we have to use the directive form and enable substitutions:

```{code-block} console
:substitutions:

$ pip install bytewax==|version|
```

Appears for the current version 0.21.0 as:

$ pip install bytewax==0.21.0

Linking to Files in the GitHub Repo#

If you’d like to link to a file in our public GitHub repo but want to do it in a way for which the is a permalink to the version of the file in the same Git commit as the current documentation was built, use the gh-path scheme.

Note that the path is absolute to the repo and begins with a /.

<gh-path:/examples/wikistream.py>

Appears as:

bytewax/bytewax/examples/wikistream.py

Linking to GitHub Issues or PRs#

You can link to a GitHub issue or PR in our public repo using this shorthand. It will decorate it with a little GitHub logo.

Note that the issue number does not have a # before it.

<gh-issue:123>

Appears as:

Issue #123

Doctests#

We have a Sphinx builder which to run all Python code blocks in our documentation. This is so we catch documentation we forget to update as we advance the API.

Running just test-doc will run over all:

All documentation examples in Markdown files in /docs.
All examples in docstrings in /pysrc via the API docs pages.
Docstrings from PyO3 are tested via the stubs file in /pysrc/bytewax/_bytewax.pyi and then via the API docs pages. You must rebuild stubs to test these.

For more options and details on this system, see sphinx.ext.doctest in the Sphinx docs.

Code Block with No Output#

If you have a plain Python {testcode} block, the code will be run to ensure no exceptions, but no output will be checked.

```{testcode}
x = 1 + 1
```

Appears as:

x = 1 + 1

Code Block Checking Output#

To assert some specific stdout from a code block, pair a {testoutput} block after a {testcode} one.

Here's some pre-commentary.

```{testcode}
print(1 + 1)
```

Here's some middle-commentary.

```{testoutput}
2
```

Here's some post-commentary.

Appears as:

Here’s some pre-commentary.
print(1 + 1)
Here’s some middle-commentary.
2
Here’s some post-commentary.

Testing a Dataflow / Hidden Code#

Sometimes to get these automated tests to run, you have to do setup to satisfy the interpreter, but it would distract from the flow of the documentation. In that case, you can hide {testcode} or {testoutput} blocks and they will not appear in the rendered documentation, but will still be tested. The power of this comes from you don’t have to hide both blocks of a pair.

This is commonly used to test the example output of a dataflow without needing to show the run function (since a real user would use the run script anyway).

```{testcode}
:hide:

from bytewax.dataflow import Dataflow
from bytewax.testing import run_main, TestingSource
from bytewax.connectors.stdio import StdOutSink
import bytewax.operators as op
```

```{testcode}
flow = Dataflow("test_df")
nums = op.input("inp", flow, TestingSource([1, 2, 3]))
op.output("out", nums, StdOutSink())
```

```{testcode}
:hide:

run_main(flow)
```

```{testoutput}
1
2
3
```

Appears as:

flow = Dataflow("test_df")
nums = op.input("inp", flow, TestingSource([1, 2, 3]))
op.output("out", nums, StdOutSink())

1
2
3

Using Fixture Files#

just test-doc cds into the docs/fixtures/ directory before running the test doc builder. This means you have access to all files within that directory for any of the doctests.

E.g. in our wordcount example we use a fixture file.

from bytewax.dataflow import Dataflow
from bytewax.connectors.files import FileSource

flow = Dataflow("wordcount_eg")
inp = op.input("inp", flow, FileSource("wordcount.txt"))

Doctest Code Block#

If you want to show an interactive interpreter session to show the details of an example, make it a doctest-style code block, using the doctest directive. You should prefix each line with >>> if it is input and output on the following lines.

```{doctest}
>>> 1 + 1
2
```

Appears as:

>>> 1 + 1
2

Skipping#

To skip a whole code block, use the python / text language instead of {testcode} / {testoutput}.

To skip a single line in a {doctest} block, you can use an inline doctest option.

```{doctest}
>>> datetime.date.now()   # doctest: +SKIP
datetime.date(2008, 1, 1)
```