Introduction
bytesandbrains is a Rust framework for authoring decentralized and
federated machine-learning systems. You describe a workload as a
Module, a Rust struct whose body method records computation onto a
Graph. Module::build walks the composition tree and produces an
ONNX ModelProto. The compiler partitions that program across the
participating Nodes, binds your compute and transport implementations
into the partitions, and stamps the result with the metadata the
runtime needs. The Engine then dispatches each op to its bound impl and
routes inter-Node values through a single wire envelope.
What it is
The framework is the substrate for machine learning that runs outside the data center, where the data already lives. Phones, sensors, regulated environments, on-prem fleets, peer-to-peer overlays. One authoring surface, one compiler, one runtime. Every distributed-ML strategy (federated, gossip, peer-to-peer, split) composes as a binding on the same foundation rather than fragmenting into one library per paradigm.
bytesandbrains is sans-IO. The Node is a state machine. The caller
drives poll() on a runtime of their choice and ships outbound
envelopes through whichever transport the deployment can reach. There
is no tokio in src/. Transport adapters live outside the core
crate.
The framework owns the bytes between Nodes. The host owns its sockets. That separation is the whole point of the design: the runtime is the same on every Node, and the only thing that changes between deployments is the host integration around it.
What it is not
The framework is not a federated-learning library, a gossip-protocol
library, or a vector-search library. Those things are concretes you
bind into the substrate. The bb-ops crate ships a working set
(FedAvg, GlobalRegistryServer, GlobalRegistryClient, CpuBackend,
the wire transport), but the surface those concretes plug into is
generic. You write your own Model, Index, Aggregator, or
Protocol impl and the same compiler + runtime hosts it.
The framework is also not a remote-procedure-call layer wrapped around
ONNX. The IR is ONNX, and the program is the ModelProto, but the
runtime is not an inference runtime. It is a partitioned executor: each
Node holds its own piece of the graph and executes against bound
concretes that may run a forward pass, hit a wire, sample peers, fold
contributions into an aggregator, or all of the above in one poll
cycle.
How a program looks
Every program is a Module. A Module is a struct whose body method
records DSL calls onto a Graph. The DSL is the recording surface; the
Graph is the recorder. Inputs are declared with g.input("name"),
local outputs with g.output("name", value), and network outputs with
g.net_out("name", peers, value). Role-method calls
(self.backend.matmul(g, a, b), ModelSlot.forward(g, batch), and so
on) record one NodeProto each into the in-progress FunctionProto;
g.input and g.output extend the function’s input list and output
binding without emitting a NodeProto, and g.net_out records a
wire.Send NodeProto that the compiler later splits across partitions.
The skeleton below is a small shape a Module takes. It declares two
inputs, records four role-method calls against two slot placeholders,
and emits the result on a network port. The slot placeholders
(ModelSlot, DataLoaderSlot) are generic stand-ins. The compiler
binds them to concrete impls at compile time through the
Compiler::bind_<role>::<T>("slot") chain.
// Skeleton derived from examples/federated_learning.rs (ClientLogic body).
use bytesandbrains::prelude::*;
use bytesandbrains::placeholders::{DataLoaderSlot, ModelSlot};
struct ClientLogic;
impl Module for ClientLogic {
fn name(&self) -> &str {
"ClientLogic"
}
fn body(&self, g: &mut Graph) {
// One declared input: the latest global params from the server.
let server_params = g.input("server_params");
// Role-method recording: each call lowers to a NodeProto
// routed by `(domain, op_type, instance)` at runtime.
let _ = ModelSlot.load_parameters(g, server_params);
let (batch, _labels) = DataLoaderSlot.next_batch(g);
let _prediction = ModelSlot.forward(g, batch);
let updated_params = ModelSlot.params(g);
// One declared network output. The compiler cuts the graph
// at this boundary; the synth-recv pass materializes the
// matching `wire.Recv` on every consumer-side partition.
let server_peer = g.input("server_peer");
g.net_out("updated_params", server_peer, updated_params);
}
}
A program is built, compiled, and installed in three phases. Each phase
crosses a stable IR boundary so the framework can verify the artifact
between steps. The chain below is the canonical entry point, distilled
from the federated-learning example. A peer can host multiple
partitions: install takes a Vec<Address> of addresses to register
against the local PeerId and a &[&str] of target function names so
one Node can land both a Client and a Server partition. After
install the host calls Node::run_bootstrap to drive any recorded
setup before the body phase polls for the first time.
// Derived from examples/federated_learning.rs (main).
use std::task::{Context, Waker};
use bytesandbrains::aggregators::FedAvg;
use bytesandbrains::backends::cpu::CpuBackend;
use bytesandbrains::proto::onnx::ModelProto;
use bytesandbrains::{
install, Address, BootstrapTarget, Compiler, Config, Module, PeerId,
};
const SERVER_PEER: u64 = 100;
// Phase 1 - author: record the Module body into a `ModelProto`.
let server_reduce_proto: ModelProto = ServerReduce.build()?;
// Phase 2 - compile: bind concretes, run the canonical pipeline,
// stamp the result with the compilation passport.
let server_reduce_artifact = Compiler::new()
.bind_aggregator::<FedAvg<CpuBackend>>("aggregator")
.bind_backend::<CpuBackend>("backend")
.compile(server_reduce_proto)?;
// Phase 3 - install: verify the passport, construct every bound
// concrete via the inventory, return a Node ready to poll. The
// address list is a `Vec<Address>`; the target list is `&[&str]`
// so one Node can host multiple partitions.
let server_peer = PeerId::from(SERVER_PEER);
let target = server_reduce_artifact.functions[0].name.clone();
let mut server_reduce = install(
server_peer,
vec![Address::empty().p2p(server_peer)],
server_reduce_artifact,
&[target.as_str()],
Config::new(),
)?;
// Drive any recorded bootstrap to completion (no-op when the Module
// has no `bootstrap` override), then poll the body on your runtime of
// choice.
server_reduce.run_bootstrap(BootstrapTarget::All)?;
let waker = Waker::noop();
let mut cx = Context::from_waker(waker);
while let std::task::Poll::Ready(_steps) = server_reduce.poll(&mut cx) {}
The three phases are the same on every Node in a deployment. The artifact a Node installs may differ (each partition of the compiled graph is a distinct install target), but the install path is one function, the runtime is one Engine, and the wire is one envelope.
The mental model: programs are graphs, the runtime owns the bytes
The mental model in one line: programs are graphs, the runtime owns the bytes between Nodes.
The graph part is literal. Module::build returns a single
ModelProto. Each role-method call inside body records a NodeProto.
The compiler walks the recorded FunctionProto, partitions it across
the Nodes the host will deploy, and emits one root FunctionProto per
partition. The Engine on each Node holds its partition as a dispatch
table keyed by (domain, op_type, instance).
The runtime part is what makes the framework a substrate rather than a
DSL. Each role’s concrete (Backend, Model, Index, Aggregator,
Codec, DataSource, PeerSelector, Protocol) plugs into the same
dispatch surface. When a NodeProto’s bound concrete sits on another
Node, the compiler inserts wire.Send on the producer side and
wire.Recv on the consumer side. The wire envelope carries the value
across the network, the receiving Node decodes it, and the dispatch
table continues as if the producer were local.
Reading order
The chapters that follow are linear. Read them in order on the first pass; jump back to specific chapters once you know the shape.
Chapter 2 walks through installing one of the shipped examples and
running it. Chapter 3 covers the IR and the DSL: how Module plus
Graph lower onto ONNX and what the recorded ModelProto looks like.
Chapter 4 is the Syscalls Reference: the canonical NodeProtos the
framework emits and dispatches. Chapter 5 covers authoring Modules and
Components: the Module trait you implement, the
#[derive(bb::Concrete)] plus per-role derives (bb::Backend,
bb::Model, etc.) that bridge a Contract impl to the engine’s role
runtime, and the inventory submissions the installer reads. Chapter 6
covers the seven Contract traits the framework dispatches against,
plus the register_protocol! macro that registers a Protocol-role
impl as the eighth dispatch surface. Chapters 7 and 8 cover the
dependency declaration system and the polymorphic type system.
Chapters 9 and 10 cover the compiler pipeline and the Engine state
machine. Chapter 11 covers the wire envelope and the addressing model.
Chapter 12 covers the deployment surface, including snapshot loads.
Chapter 13 is a tour of the seven shipped examples.
Note: the seven Contract traits (Backend, Index, Model,
Aggregator, Codec, DataSource, PeerSelector) plus the Bootstrap
Contract and the register_protocol!-driven Protocol surface are
enumerated in chapter 6 (Roles). The BootstrapTarget enum variants
that drive Node::run_bootstrap(...) (All, ModuleNames,
ModuleRequests, Slots) are enumerated in chapter 10 (The Engine).
When the prose disagrees with the code, the code wins. Every chapter is reconciled against the 0.3.0 release of the framework. The status strip at the top of each chapter names the bytesandbrains source file the chapter is rooted in so the reader can read the canonical text side by side.
Where this lives
- Framework facade and prelude:
bytesandbrains/src/lib.rs. Moduletrait andGraphrecorder:bytesandbrains/bb-dsl/src/.- Compiler driver:
bytesandbrains/bb-compiler/src/. - Engine state machine and Node:
bytesandbrains/bb-runtime/src/. - Installer entry point:
bytesandbrains/src/install.rs. - Slot placeholders and shipped concretes:
bytesandbrains/bb-ops/src/. - Reference examples:
bytesandbrains/examples/.