The IR and the DSL
bytesandbrains programs are ONNX models. The IR is the ONNX
ModelProto schema. The DSL is a Rust surface that records into one
of those ModelProtos. Authors never construct a separate IR. They
write a Module, call methods on a Graph, and the recorder fills in
the canonical ONNX messages that the compiler and the engine read.
This chapter walks that recording surface end to end. By the end you
will know which ONNX message each authoring concept rides on, how the
Output handles thread metadata through method chains, and where the
Rust-dispatch boundary cuts the IR from the runtime.
ModelProto is the IR
The framework adds no parallel schema on top of ONNX. The same
ModelProto, FunctionProto, NodeProto, ValueInfoProto,
TensorProto, and AttributeProto messages that ONNX defines carry
every authoring concept the framework needs. Three things extend
canonical ONNX:
- Three vendor opsets under the
ai.bytesandbrains.*domain (ai.bytesandbrains.syscall,ai.bytesandbrains.wire, and one domain per role). - Vendor scalar types via
TypeProto.Opaqueunder theai.bytesandbrainsdomain. - A small set of
NodeProto.metadata_propskeys (ai.bytesandbrains.required_trait,ai.bytesandbrains.slot_id,ai.bytesandbrains.concrete_type,ai.bytesandbrains.instance,ai.bytesandbrains.module_instance).
The framework crate bb-ir generates Rust types from
proto/onnx-ml.proto via prost and re-exports them under
bytesandbrains::proto::onnx. A Module::build() call returns a real
ModelProto:
// from bytesandbrains/bb-dsl/src/module.rs:148-196
fn build(self) -> Result<ModelProto, BuildError>
where
Self: Sized,
{
let mut body_g = Graph::new();
let bindings: Vec<(String, Output)> = Vec::new();
let _ = self.op(&mut body_g, &bindings);
let mut pending = body_g.take_pending_errors();
if !pending.is_empty() {
return Err(pending.remove(0));
}
let body_recorded = body_g.finish();
// ... bootstrap recording elided ...
Ok(ModelProto {
functions,
..Default::default()
})
}
There is no BuiltModule wrapper. There is no codec layer beyond
prost. The compiler consumes the same ModelProto and returns a
ModelProto. The installer consumes the same ModelProto and
returns a Node. The IR is one type all the way through.
Multi-target compile + entry-point semantics
Compiler::compile(module) emits a single ModelProto whose
functions[] carries every partition the pipeline produced. A
federated Module that partitions into Client + Server emits both
as sibling FunctionProtos under model.functions; sub-Module
bodies and synthesized helpers (gate carriers, lifecycle containers)
ride alongside in the same list. The compilation passport
(ai.bytesandbrains.compiled = "v1") plus per-target binding
metadata (ai.bytesandbrains.binding.<target>.<slot> = "<role>|<TYPE_NAME>|<slot_id>") stamps onto model.metadata_props
keyed by partition name, so the same proto carries every target’s
binding spec without colliding.
bb::install(peer_id, addresses, model, targets: &[&str], config)
(bytesandbrains/src/install.rs:237-340) takes an ordered slice of
target names and installs every one onto the same Node. The host
picks which partitions live on each peer by passing different
targets slices to install on different peers; the proto is the
same artifact across the deployment. A peer hosting both halves of
a federated round receives &["Client", "Server"]; a single-Node
demo passes &["MyModule"]. The order is observable: install records
each target’s bootstrap function onto BootstrapState::install_order
in slice order, and the host kick (Node::run_bootstrap) walks the
list front-to-back. See The Engine for the
host-driven bootstrap surface.
The compiled ModelProto is shared across targets at the Node
layer: bb::install wraps it in Arc<ModelProto> once via
Node::set_model and shares the handle across every
Node::register_module call so the proto bytes live on the Node
exactly once (bytesandbrains/src/install.rs:332-335).
How a Module maps to ONNX messages
The mapping is mechanical. Every concept the author writes lands on a named field of a canonical ONNX message.
| Authoring concept | ONNX home |
|---|---|
| Program | ModelProto |
| Module | FunctionProto with (domain, name) identity, registered in ModelProto.functions |
| Module body | FunctionProto.node: repeated NodeProto |
| Module input | FunctionProto.input: repeated string paired with FunctionProto.value_info |
| Module output | FunctionProto.output: repeated string paired with FunctionProto.value_info |
| Sub-Module call | A NodeProto whose (op_type, domain) matches the sub-Module’s FunctionProto name |
| Generic slot placeholder | FunctionProto.attribute: repeated string, one entry per unfilled slot |
| Concrete slot binding | FunctionProto.attribute_proto: repeated AttributeProto, payload carries the construction state |
| DSL op call | One NodeProto per call, op_type derived from the method name |
| Slot identity stamp | NodeProto.metadata_props["ai.bytesandbrains.required_trait"] + ["ai.bytesandbrains.slot_id"] |
| Tensor type | TypeProto.Tensor { elem_type, shape } on ValueInfoProto.type |
| Tensor memory ownership | Backend-owned. Backend::Tensor is an Arc-shared handle around a backend-managed buffer (CpuTensor(Arc<CpuBackendBuffer>) at bb-ops/src/backends/cpu/tensor.rs:65). Clone is Arc::clone. Wire-receive routes through Backend::materialize_from_wire(type_hash, bytes: Vec<u8>) for backend-bound slots; the engine wraps the result in BackendTensorCarrier for slot residency. |
| Vendor scalar type | TypeProto.Opaque { domain: "ai.bytesandbrains", name: "<TypeName>" } |
| Opset declaration | ModelProto.opset_import: repeated OperatorSetIdProto |
| Sub-graph on an op | AttributeProto.g: GraphProto (used by If, Loop) |
The compiler reads from these fields, mutates them in place across
the pass pipeline, and writes the stamped result back into the same
ModelProto. The engine then loads the proto, walks the bound
function’s node list, and dispatches each NodeProto against its
bound impl.
The Graph recorder
The DSL is a Rust struct named Graph. A Graph is a thin wrapper
around an in-progress FunctionProto. Authors do not construct one
directly. The Module trait’s default build() method allocates a
Graph, hands it to the user’s body(&self, g: &mut Graph), and
moves the recorded FunctionProto into a returned ModelProto.
The relevant header on the recorder spells out the contract:
// from bytesandbrains/bb-dsl/src/graph.rs:40-91
pub struct Graph {
/// The IR body. Everything semantically meaningful goes here.
function: FunctionProto,
/// Output-name counter. Could be derived from
/// `function.node[].output[]` but cheaper to track.
site_counter: u64,
/// `(TypeId, *const ())`-keyed dedup. Shared namespace across
/// concrete and generic.
instance_for_pointer: HashMap<(TypeId, *const ()), u32>,
next_instance_id: u32,
// ...
}
The FunctionProto is the canonical IR record. The remaining Rust
state carries only what the proto cannot represent: pointer-identity
dedup for placeholder instances, an output-name counter to mint fresh
value names, and a stack tracking which sub-function the recorder is
currently writing into.
Three primitives drive every method on Graph:
g.input("name")declares a Module-level input port and returns anOutputhandle bound to that name. The port is appended to the enclosingFunctionProto.inputlist and a matchingValueInfoProtois pushed intofunction.value_info.g.output("name", value)registers a local output port. The recorder emits aPassThroughNodeProto that renames the producer value to the port name, then records the port name onFunctionProto.outputandvalue_info.g.net_out("name", peers, value)is the single network-output primitive. It records awire.SendNodeProto under theai.bytesandbrains.wiredomain and registers the port name on the current function. The compiler’s partition pass cuts the graph at this boundary; a synthesis pass materializes the matchingwire.Recvon every consumer-side partition.
Each call walks through Graph::push_node, which appends one
NodeProto to the current function’s node list and stamps the
composition-hierarchy chain into the new node’s metadata_props when
the recorder is inside a nested function scope.
Output handles thread typed metadata, not phantom types
The DSL is non-generic across languages. The framework’s design
forbids Output<T> phantom types and method-generic operations like
g.send::<T>(...), because the same DSL must port to Python (and
other host languages) without dragging Rust’s type parameters along.
The Rust handle that threads through every method chain is called
Output:
// from bytesandbrains/bb-dsl/src/output.rs:1-32
//! Non-generic `Output` handle threaded through DSL method chains.
//!
//! Per `docs/API_DESIGN.md` §4 + `docs/IR_AND_DSL.md` Part 6. The DSL
//! is non-generic across languages - type metadata rides on
//! `&'static TypeNode`, NOT a `PhantomData<T>` tag. Identity is the
//! `name: String` (the ONNX value name in `FunctionProto.input` /
//! `NodeProto.input` / `NodeProto.output`); the wire-level type
//! identity rides on the `TypeNode` reference.
use bb_ir::types::TypeNode;
#[derive(Clone, Debug)]
pub struct Output {
/// ONNX value name. Matches a `FunctionProto.input` entry, a
/// `NodeProto.output` entry, or a `next_site_name()` mint.
pub name: String,
/// Static `TypeNode` reference. Pointer equality is meaningful
/// - every canonical type lives in a single `static`.
pub type_node: &'static TypeNode,
}
Two fields. The name is the ONNX value name. Every reference to the
value in NodeProto.input, NodeProto.output, and
ValueInfoProto.name cites that string. The type_node is a
&'static TypeNode, a reference to one of the canonical static
TypeNode declarations the framework registers at process start.
Pointer equality on the static reference is the type-identity check.
When a DSL method returns multiple values, it returns a tuple of
Output handles. When the next method consumes them, it reads each
handle’s name for the NodeProto.input list and its type_node for
the downstream type check. The recorder never needs to know the
Rust-level static type of a value; it tracks the ONNX type tree
through the TypeNode reference instead.
The recorder seeds inputs with the canonical bytes-typed sentinel:
// from bytesandbrains/bb-dsl/src/graph.rs:368-415
pub fn input(&mut self, name: &str) -> Output {
// ... port declaration + value_info stamping elided ...
Output::new(name.to_string(), &bb_ir::types::TYPE_BYTES)
}
The compiler’s TypeSolver narrows that sentinel to a concrete
TypeNode once the input flows into typed ops, and stamps the
refinement back onto the ValueInfoProto on the recorded function.
A complete recording
The shortest end-to-end example records one next_batch, one
forward, and one net_out. The client logic from the federated
learning example is the canonical version:
// from bytesandbrains/examples/federated_learning.rs:117-144
use bytesandbrains::placeholders::{DataLoaderSlot, ModelSlot};
use bytesandbrains::{Graph, Module};
struct ClientLogic;
impl Module for ClientLogic {
fn name(&self) -> &str {
"ClientLogic"
}
fn body(&self, g: &mut Graph) {
// Inbound from the server: the latest global model params.
let server_params = g.input("server_params");
// Apply the global params to the local model.
let _ = ModelSlot.load_parameters(g, server_params);
// Local training step.
let (batch, _labels) = DataLoaderSlot.next_batch(g);
let _prediction = ModelSlot.forward(g, batch);
// Read the updated params and ship them to the server peer.
let updated_params = ModelSlot.params(g);
let server_peer = g.input("server_peer");
g.net_out("updated_params", server_peer, updated_params);
}
}
That body records into one FunctionProto named "ClientLogic". The
function’s input list holds "server_params" and "server_peer".
The output list holds "updated_params". The node list holds one
NodeProto per DSL call, in the order the author wrote them. Each
NodeProto carries the slot-identity stamps the compiler needs to bind
the role at compile time.
Calling ClientLogic.build() returns the wrapping ModelProto. The
proto’s functions[0] is the recorded ClientLogic body, stamped
with the canonical module_phase = "body" key. Sub-Modules reached
during recording become entries in functions[1..]. A bootstrap
recording (when the Module overrides Module::bootstrap) lands on a
sibling <Name>__bootstrap function in the same functions list.
Module::bootstrap and host-staged input formals
Module::bootstrap(&self, g: &mut Graph) is the author entry point
for pre-body initialization. The trait method defaults to no-op
(bytesandbrains/bb-dsl/src/module.rs); authors override it next to
Module::body. Inputs declared inside the override (g.input(name))
become formals on the emitted "<module>__bootstrap" FunctionProto,
addressable from the host via BootstrapRequest::inputs:
impl Module for VectorStore {
fn bootstrap(&self, g: &mut Graph) {
// Each input becomes a declared formal on the emitted
// `"<module>__bootstrap"` FunctionProto. The host stages
// bytes via `BootstrapRequest::inputs`.
let seed_corpus = g.input("seed_corpus");
let _ = self.index.train(g, seed_corpus);
}
fn body(&self, g: &mut Graph) {
let query = g.input("query");
let _ = self.index.search(g, query, 10);
}
}
The host kicks the recorded bootstrap via the immediate-fire entry
point on Node:
node.run_bootstrap(bb::BootstrapTarget::ModuleRequests(&[
bb::engine::BootstrapRequest {
target: "VectorStore",
inputs: &[("seed_corpus", corpus_bytes.as_slice())],
},
]))?;
node.poll(cx); // drives the bootstrap body to quiescence
The engine validates inputs against the target’s declared formals
at the boundary
(bytesandbrains/bb-runtime/src/engine/core.rs:1464,
Engine::enqueue_bootstrap_request):
UnknownInput rejects extras, MissingInput rejects gaps, and
UnknownTarget rejects unknown names, all before any bytes stage.
Validated requests follow the Principle 1a copy
(try_charge → try_reserve_exact → extend_from_slice); the caller’s
borrowed &[u8] slices may drop the moment run_bootstrap returns.
A bootstrap that takes no formals records zero g.input calls; the
host kicks it via Node::run_bootstrap(BootstrapTarget::All) or
Node::run_bootstrap(BootstrapTarget::ModuleNames(&["<target>"])).
See The Engine for the BootstrapState
architecture and the AlreadyTransitivelyQueued cycle defense.
Opsets
The framework declares three vendor opsets plus a required subset of
ai.onnx. Every NodeProto carries a domain plus an op_type; the
loaded ModelProto.opset_import array names the domain versions the
program imports. The engine dispatches each NodeProto by
(domain, op_type, instance).
The wire opset is the smallest. It carries two ops and is the only domain that crosses the network plane:
| op_type | inputs | outputs | semantics |
|---|---|---|---|
Send | data: any, dest: Address | (none) | Fire-and-forget broadcast. The wire envelope packs data as one SlotFill to dest. |
Recv | (none) | trigger: Opaque<Trigger>, payload: any | Declare an inbound type acceptance. The Recv’s site identity becomes the routable destination; senders construct the matching /site/<id> suffix. |
The other two framework domains:
ai.bytesandbrains.syscall v1(38 ops): framework primitives. Most schedule triggers and timers (Pulse,OnTrigger,Interval,After,Sleep), some manage values across the graph (PassThrough,Tee,Constant,Hold.Stash,Hold.Flush,Serialize.Enqueue,Serialize.Dequeue), some mutate the peer address book (AddPeerAddress,RemovePeerAddress,DropPeer).ai.bytesandbrains.role.<role> v1(six domains, one per role): every role-method call from a placeholder records into its role-specific opset.
The ai.onnx opset is the canonical ONNX one. The framework’s
required subset is documented inline at the source. Backends ship
graph-execution support for that subset; user-authored backends that
import a higher version inherit the standard ONNX semantics for
the additional ops.
Composition: bundling typed Outputs
net_out ships one typed value per envelope. Aggregator protocols
that pair a contribution tensor with a sample count, or hierarchical
overlays that need to atomically attach metadata to a payload, would
otherwise emit two Sends and rely on receivers to correlate them.
Graph::bundle packs N typed Outputs into ONE composite Output for
transmission through a single port; the matching Graph::unbundle
decomposes the envelope back into N typed children on the receiver.
Single-port DAG semantics hold because the bundle/unbundle pair
threads one Output between peers.
use bytesandbrains::types::{TYPE_PEER_ID, TYPE_TENSOR_F32};
use bytesandbrains::{Graph, Module};
struct AggregatorStage;
impl Module for AggregatorStage {
fn name(&self) -> &str { "AggregatorStage" }
fn body(&self, g: &mut Graph) {
let params = g.input("params");
let owner = g.input("owner");
let peers = g.input("peers");
let composite = g.bundle(&[params, owner]);
g.net_out("contribution", peers, composite);
}
}
struct ReceiverStage;
impl Module for ReceiverStage {
fn name(&self) -> &str { "ReceiverStage" }
fn body(&self, g: &mut Graph) {
let received = g.lookup_output("contribution")
.expect("contribution port");
let parts = g.unbundle(
received,
&[&TYPE_TENSOR_F32, &TYPE_PEER_ID],
);
g.output("params", parts[0].clone());
g.output("owner", parts[1].clone());
}
}
The recorded ops belong to a dedicated ai.bytesandbrains.composite
domain. Bundle’s NodeProto has variable-arity input (one entry per
child) and a single composite output port typed against the new
TYPE_COMPOSITE TypeNode. Unbundle reads the composite, validates
the declared child count against the runtime envelope, and re-emits
each child as its original concrete SlotValue carrier. The
ValueInfoProto.denotation on every child output carries the
declared TypeNode, and downstream consumers downcast directly to
the concrete carrier (.as_any().downcast_ref::<T>()) without a
bincode-against-denotation hop.
In-process forwarding pays one SlotValue::clone_boxed per child
at Bundle and Unbundle. No bincode encode or decode runs unless the
envelope crosses a Node boundary. At the wire boundary,
CompositeValue serializes each child as (type_hash, child.to_wire_bytes())
and the receiver materializes typed carriers via the global decoder
registry. Carrier authors register the lattice binding and the wire
decoder with a single register_type_node! invocation. See
bb-runtime/src/syscall/values.rs:80-165 for the carrier shape and
the wire codec; bb-ir/src/slot_value.rs:188-256 for the registry
and the registration macro.
The same decoder registry drives the single-fill receive path:
inbound wire.Recv payloads materialise into the typed
SlotValue carrier the sender stamped, not a BytesValue. The
Bundle codec and the wire receive path are symmetric, so
registering one carrier participates in both surfaces with no
extra step. The full receive flow (failure modes, partial
delivery, destination metadata) lives in the
Wire and addressing chapter.
Empty parts (Bundle) or empty part_types (Unbundle) panic at
recording time. Composition of zero values has no semantic meaning
and is almost certainly an author bug.
Vendor scalar types
Non-tensor scalars ride on TypeProto.Opaque under the
ai.bytesandbrains domain. The codebase registers each one as a
&'static TypeNode and submits it to the inventory at process start.
The denotation strings the recorder stamps onto every
ValueInfoProto.type.denotation look like this:
// from bytesandbrains/bb-ir/src/types/builtins.rs:38-44
pub static TYPE_TENSOR_F32: TypeNode = TypeNode {
id: "tensor.f32",
parent: Some("tensor"),
kind: TypeKind::Concrete,
ffi_name: "bb_tensor_f32_t",
wire_hash: 0x0000_0000_0000_0101,
denotation: "ai.bytesandbrains.tensor.f32",
};
The denotation is the key the recorder writes into the
ValueInfoProto.type.denotation field. The wire codec uses
wire_hash (a stable u64) to identify the type across the network
without a registry round-trip. Both ends compute the same hash from
the same denotation and the same opset version, so a receiver routes
inbound bytes to the correct decoder without coordinating with the
sender.
A modest selection from the built-in registry, all under the
ai.bytesandbrains domain:
| TypeNode | Kind | Denotation |
|---|---|---|
TYPE_TENSOR_F32 | Concrete | ai.bytesandbrains.tensor.f32 |
TYPE_TENSOR_F64 | Concrete | ai.bytesandbrains.tensor.f64 |
TYPE_TENSOR_I32 | Concrete | ai.bytesandbrains.tensor.i32 |
TYPE_TENSOR_BOOL | Concrete | ai.bytesandbrains.tensor.bool |
TYPE_PEER_ID | Concrete | (registered Opaque under ai.bytesandbrains) |
TYPE_BYTES | Concrete | (sentinel for unresolved port types) |
TYPE_ANY | Abstract | (used as a permissive port bound) |
TYPE_TENSOR | Abstract | (matches any Tensor<T> concrete leaf) |
User crates submit their own leaves through the same inventory
mechanism. The compiler’s TypeSolver walks the lattice at compile
time to resolve every value to a concrete (leaf) TypeNode.
The Rust-dispatch boundary
The IR is graph-expressible up to one boundary and Rust-dispatched beyond it. The rule the spec calls “the Rust-dispatch boundary” fences off where the compiler can keep optimizing and where the engine takes over:
Graph decomposition stops at Rust dispatch. Every op in a loaded
ModelProtois either graph-expressible, in which case its body is a sub-GraphProtoand the compiler may inline it, or Rust-dispatched, in which case the engine calls a Rust function the bound runtime supplied and from that point the op is opaque to the IR. There is no third mode.
In practice the boundary lives at the (domain, op_type) pair on
each NodeProto. Anything under a registered atomic-op opset is
Rust dispatch. Anything else is graph-level composition.
The implication for authors is concrete: a Module body is fully
graph-traversable. The compiler can inline it, collapse it,
partition it across Nodes, snapshot it, and export it to any
ONNX-aware consumer. A bound concrete’s execute_graph is a
Rust-dispatch terminal: once the engine hands a GraphProto to the
backend, what the backend does internally (JIT compilation, kernel
fusion, GPU dispatch) is invisible to the IR. The contract the IR
guarantees is the (inputs, GraphProto) -> outputs shape; the
implementation is the vendor’s.
What ONNX gives the framework for free
Riding inside canonical ONNX messages costs the framework one design constraint: every authoring concept must land on a real ONNX field. In exchange the framework inherits a long list of capabilities without code:
- Netron, the Python
onnxpackage, ONNX Runtime, Burn’s loader, and TFLite’s converter all read framework graphs natively. The vendor opsets show as namespaced ops; the rest is just ONNX. - Snapshot is
ModelProtobytes. Any ONNX-aware tool opens a framework snapshot. Diffing, lineage analysis, and visualization come for free. FunctionProto-based composition is how ONNX itself models reusable graphs. Inlining, parameter substitution, and multi-instance composition are spec-defined.opset_importsolves version negotiation. The same mechanism used between PyTorch and ONNX Runtime works for framework graphs across Nodes and across framework releases.GraphProto.initializerweights round-trip without serializer code. A bound model whose construction state references initializer names exports a graph that any ONNX consumer can load with weights intact.TypeProto.Opaqueis the right primitive for vendor scalars. Python’sonnxlibrary preserves thedomainandnamewithout trying to interpret them; the framework’s deserializer registry interprets them where the runtime needs them.
The chapters that follow lean on this base. Chapter 4 is the Syscalls
Reference: the canonical NodeProtos the framework emits and dispatches.
Chapter 5 covers the authoring macros (#[derive(bb::Module)] and the
role derives) that emit declarative port sets and inventory submissions.
Chapter 6 covers the seven Contract traits the framework dispatches
against. Chapter 9 covers the compiler’s 17-pass pipeline that turns a
recorded ModelProto into an installable one.
Where this lives
- ONNX schema bindings (prost-generated):
bytesandbrains/bb-ir/src/proto/mod.rs. - Vendor type registry:
bytesandbrains/bb-ir/src/types/. - Syscall and wire id constants:
bytesandbrains/bb-ir/src/syscall_ids.rs. - The
Moduletrait andModule::build():bytesandbrains/bb-dsl/src/module.rs. - The
Graphrecorder:bytesandbrains/bb-dsl/src/graph.rs. - The
Outputhandle:bytesandbrains/bb-dsl/src/output.rs. - Slot placeholders and their DSL bodies:
bytesandbrains/bb-ops/src/placeholders/mod.rs. - DSL-side syscall helpers (
pass_through,add_peer_address, etc.):bytesandbrains/bb-dsl/src/syscalls.rs. - The federated learning example used in this chapter:
bytesandbrains/examples/federated_learning.rs.