Introduction

bytesandbrains is a Rust framework for authoring decentralized and federated machine-learning systems. You describe a workload as a Module, a Rust struct whose body method records computation onto a Graph. Module::build walks the composition tree and produces an ONNX ModelProto. The compiler partitions that program across the participating Nodes, binds your compute and transport implementations into the partitions, and stamps the result with the metadata the runtime needs. The Engine then dispatches each op to its bound impl and routes inter-Node values through a single wire envelope.

What it is

The framework is the substrate for machine learning that runs outside the data center, where the data already lives. Phones, sensors, regulated environments, on-prem fleets, peer-to-peer overlays. One authoring surface, one compiler, one runtime. Every distributed-ML strategy (federated, gossip, peer-to-peer, split) composes as a binding on the same foundation rather than fragmenting into one library per paradigm.

bytesandbrains is sans-IO. The Node is a state machine. The caller drives poll() on a runtime of their choice and ships outbound envelopes through whichever transport the deployment can reach. There is no tokio in src/. Transport adapters live outside the core crate.

The framework owns the bytes between Nodes. The host owns its sockets. That separation is the whole point of the design: the runtime is the same on every Node, and the only thing that changes between deployments is the host integration around it.

What it is not

The framework is not a federated-learning library, a gossip-protocol library, or a vector-search library. Those things are concretes you bind into the substrate. The bb-ops crate ships a working set (FedAvg, GlobalRegistryServer, GlobalRegistryClient, CpuBackend, the wire transport), but the surface those concretes plug into is generic. You write your own Model, Index, Aggregator, or Protocol impl and the same compiler + runtime hosts it.

The framework is also not a remote-procedure-call layer wrapped around ONNX. The IR is ONNX, and the program is the ModelProto, but the runtime is not an inference runtime. It is a partitioned executor: each Node holds its own piece of the graph and executes against bound concretes that may run a forward pass, hit a wire, sample peers, fold contributions into an aggregator, or all of the above in one poll cycle.

How a program looks

Every program is a Module. A Module is a struct whose body method records DSL calls onto a Graph. The DSL is the recording surface; the Graph is the recorder. Inputs are declared with g.input("name"), local outputs with g.output("name", value), and network outputs with g.net_out("name", peers, value). Role-method calls (self.backend.matmul(g, a, b), ModelSlot.forward(g, batch), and so on) record one NodeProto each into the in-progress FunctionProto; g.input and g.output extend the function’s input list and output binding without emitting a NodeProto, and g.net_out records a wire.Send NodeProto that the compiler later splits across partitions.

The skeleton below is a small shape a Module takes. It declares two inputs, records four role-method calls against two slot placeholders, and emits the result on a network port. The slot placeholders (ModelSlot, DataLoaderSlot) are generic stand-ins. The compiler binds them to concrete impls at compile time through the Compiler::bind_<role>::<T>("slot") chain.

// Skeleton derived from examples/federated_learning.rs (ClientLogic body).
use bytesandbrains::prelude::*;
use bytesandbrains::placeholders::{DataLoaderSlot, ModelSlot};

struct ClientLogic;

impl Module for ClientLogic {
    fn name(&self) -> &str {
        "ClientLogic"
    }

    fn body(&self, g: &mut Graph) {
        // One declared input: the latest global params from the server.
        let server_params = g.input("server_params");

        // Role-method recording: each call lowers to a NodeProto
        // routed by `(domain, op_type, instance)` at runtime.
        let _ = ModelSlot.load_parameters(g, server_params);
        let (batch, _labels) = DataLoaderSlot.next_batch(g);
        let _prediction = ModelSlot.forward(g, batch);
        let updated_params = ModelSlot.params(g);

        // One declared network output. The compiler cuts the graph
        // at this boundary; the synth-recv pass materializes the
        // matching `wire.Recv` on every consumer-side partition.
        let server_peer = g.input("server_peer");
        g.net_out("updated_params", server_peer, updated_params);
    }
}

A program is built, compiled, and installed in three phases. Each phase crosses a stable IR boundary so the framework can verify the artifact between steps. The chain below is the canonical entry point, distilled from the federated-learning example. A peer can host multiple partitions: install takes a Vec<Address> of addresses to register against the local PeerId and a &[&str] of target function names so one Node can land both a Client and a Server partition. After install the host calls Node::run_bootstrap to drive any recorded setup before the body phase polls for the first time.

// Derived from examples/federated_learning.rs (main).
use std::task::{Context, Waker};

use bytesandbrains::aggregators::FedAvg;
use bytesandbrains::backends::cpu::CpuBackend;
use bytesandbrains::proto::onnx::ModelProto;
use bytesandbrains::{
    install, Address, BootstrapTarget, Compiler, Config, Module, PeerId,
};

const SERVER_PEER: u64 = 100;

// Phase 1 - author: record the Module body into a `ModelProto`.
let server_reduce_proto: ModelProto = ServerReduce.build()?;

// Phase 2 - compile: bind concretes, run the canonical pipeline,
// stamp the result with the compilation passport.
let server_reduce_artifact = Compiler::new()
    .bind_aggregator::<FedAvg<CpuBackend>>("aggregator")
    .bind_backend::<CpuBackend>("backend")
    .compile(server_reduce_proto)?;

// Phase 3 - install: verify the passport, construct every bound
// concrete via the inventory, return a Node ready to poll. The
// address list is a `Vec<Address>`; the target list is `&[&str]`
// so one Node can host multiple partitions.
let server_peer = PeerId::from(SERVER_PEER);
let target = server_reduce_artifact.functions[0].name.clone();
let mut server_reduce = install(
    server_peer,
    vec![Address::empty().p2p(server_peer)],
    server_reduce_artifact,
    &[target.as_str()],
    Config::new(),
)?;

// Drive any recorded bootstrap to completion (no-op when the Module
// has no `bootstrap` override), then poll the body on your runtime of
// choice.
server_reduce.run_bootstrap(BootstrapTarget::All)?;
let waker = Waker::noop();
let mut cx = Context::from_waker(waker);
while let std::task::Poll::Ready(_steps) = server_reduce.poll(&mut cx) {}

The three phases are the same on every Node in a deployment. The artifact a Node installs may differ (each partition of the compiled graph is a distinct install target), but the install path is one function, the runtime is one Engine, and the wire is one envelope.

The mental model: programs are graphs, the runtime owns the bytes

The mental model in one line: programs are graphs, the runtime owns the bytes between Nodes.

The graph part is literal. Module::build returns a single ModelProto. Each role-method call inside body records a NodeProto. The compiler walks the recorded FunctionProto, partitions it across the Nodes the host will deploy, and emits one root FunctionProto per partition. The Engine on each Node holds its partition as a dispatch table keyed by (domain, op_type, instance).

The runtime part is what makes the framework a substrate rather than a DSL. Each role’s concrete (Backend, Model, Index, Aggregator, Codec, DataSource, PeerSelector, Protocol) plugs into the same dispatch surface. When a NodeProto’s bound concrete sits on another Node, the compiler inserts wire.Send on the producer side and wire.Recv on the consumer side. The wire envelope carries the value across the network, the receiving Node decodes it, and the dispatch table continues as if the producer were local.

Reading order

The chapters that follow are linear. Read them in order on the first pass; jump back to specific chapters once you know the shape.

Chapter 2 walks through installing one of the shipped examples and running it. Chapter 3 covers the IR and the DSL: how Module plus Graph lower onto ONNX and what the recorded ModelProto looks like. Chapter 4 is the Syscalls Reference: the canonical NodeProtos the framework emits and dispatches. Chapter 5 covers authoring Modules and Components: the Module trait you implement, the #[derive(bb::Concrete)] plus per-role derives (bb::Backend, bb::Model, etc.) that bridge a Contract impl to the engine’s role runtime, and the inventory submissions the installer reads. Chapter 6 covers the seven Contract traits the framework dispatches against, plus the register_protocol! macro that registers a Protocol-role impl as the eighth dispatch surface. Chapters 7 and 8 cover the dependency declaration system and the polymorphic type system. Chapters 9 and 10 cover the compiler pipeline and the Engine state machine. Chapter 11 covers the wire envelope and the addressing model. Chapter 12 covers the deployment surface, including snapshot loads. Chapter 13 is a tour of the seven shipped examples.

Note: the seven Contract traits (Backend, Index, Model, Aggregator, Codec, DataSource, PeerSelector) plus the Bootstrap Contract and the register_protocol!-driven Protocol surface are enumerated in chapter 6 (Roles). The BootstrapTarget enum variants that drive Node::run_bootstrap(...) (All, ModuleNames, ModuleRequests, Slots) are enumerated in chapter 10 (The Engine).

When the prose disagrees with the code, the code wins. Every chapter is reconciled against the 0.3.0 release of the framework. The status strip at the top of each chapter names the bytesandbrains source file the chapter is rooted in so the reader can read the canonical text side by side.

Where this lives

Framework facade and prelude: bytesandbrains/src/lib.rs.
Module trait and Graph recorder: bytesandbrains/bb-dsl/src/.
Compiler driver: bytesandbrains/bb-compiler/src/.
Engine state machine and Node: bytesandbrains/bb-runtime/src/.
Installer entry point: bytesandbrains/src/install.rs.
Slot placeholders and shipped concretes: bytesandbrains/bb-ops/src/.
Reference examples: bytesandbrains/examples/.