The IR and the DSL

bytesandbrains programs are ONNX models. The IR is the ONNX ModelProto schema. The DSL is a Rust surface that records into one of those ModelProtos. Authors never construct a separate IR. They write a Module, call methods on a Graph, and the recorder fills in the canonical ONNX messages that the compiler and the engine read.

This chapter walks that recording surface end to end. By the end you will know which ONNX message each authoring concept rides on, how the Output handles thread metadata through method chains, and where the Rust-dispatch boundary cuts the IR from the runtime.

ModelProto is the IR

The framework adds no parallel schema on top of ONNX. The same ModelProto, FunctionProto, NodeProto, ValueInfoProto, TensorProto, and AttributeProto messages that ONNX defines carry every authoring concept the framework needs. Three things extend canonical ONNX:

Three vendor opsets under the ai.bytesandbrains.* domain (ai.bytesandbrains.syscall, ai.bytesandbrains.wire, and one domain per role).
Vendor scalar types via TypeProto.Opaque under the ai.bytesandbrains domain.
A small set of NodeProto.metadata_props keys (ai.bytesandbrains.required_trait, ai.bytesandbrains.slot_id, ai.bytesandbrains.concrete_type, ai.bytesandbrains.instance, ai.bytesandbrains.module_instance).

The framework crate bb-ir generates Rust types from proto/onnx-ml.proto via prost and re-exports them under bytesandbrains::proto::onnx. A Module::build() call returns a real ModelProto:

// from bytesandbrains/bb-dsl/src/module.rs:148-196
fn build(self) -> Result<ModelProto, BuildError>
where
    Self: Sized,
{
    let mut body_g = Graph::new();
    let bindings: Vec<(String, Output)> = Vec::new();
    let _ = self.op(&mut body_g, &bindings);
    let mut pending = body_g.take_pending_errors();
    if !pending.is_empty() {
        return Err(pending.remove(0));
    }
    let body_recorded = body_g.finish();
    // ... bootstrap recording elided ...
    Ok(ModelProto {
        functions,
        ..Default::default()
    })
}

There is no BuiltModule wrapper. There is no codec layer beyond prost. The compiler consumes the same ModelProto and returns a ModelProto. The installer consumes the same ModelProto and returns a Node. The IR is one type all the way through.

Multi-target compile + entry-point semantics

Compiler::compile(module) emits a single ModelProto whose functions[] carries every partition the pipeline produced. A federated Module that partitions into Client + Server emits both as sibling FunctionProtos under model.functions; sub-Module bodies and synthesized helpers (gate carriers, lifecycle containers) ride alongside in the same list. The compilation passport (ai.bytesandbrains.compiled = "v1") plus per-target binding metadata (ai.bytesandbrains.binding.<target>.<slot> = "<role>|<TYPE_NAME>|<slot_id>") stamps onto model.metadata_props keyed by partition name, so the same proto carries every target’s binding spec without colliding.

bb::install(peer_id, addresses, model, targets: &[&str], config) (bytesandbrains/src/install.rs:237-340) takes an ordered slice of target names and installs every one onto the same Node. The host picks which partitions live on each peer by passing different targets slices to install on different peers; the proto is the same artifact across the deployment. A peer hosting both halves of a federated round receives &["Client", "Server"]; a single-Node demo passes &["MyModule"]. The order is observable: install records each target’s bootstrap function onto BootstrapState::install_order in slice order, and the host kick (Node::run_bootstrap) walks the list front-to-back. See The Engine for the host-driven bootstrap surface.

The compiled ModelProto is shared across targets at the Node layer: bb::install wraps it in Arc<ModelProto> once via Node::set_model and shares the handle across every Node::register_module call so the proto bytes live on the Node exactly once (bytesandbrains/src/install.rs:332-335).

How a Module maps to ONNX messages

The mapping is mechanical. Every concept the author writes lands on a named field of a canonical ONNX message.

Authoring concept	ONNX home
Program	`ModelProto`
Module	`FunctionProto` with `(domain, name)` identity, registered in `ModelProto.functions`
Module body	`FunctionProto.node: repeated NodeProto`
Module input	`FunctionProto.input: repeated string` paired with `FunctionProto.value_info`
Module output	`FunctionProto.output: repeated string` paired with `FunctionProto.value_info`
Sub-Module call	A `NodeProto` whose `(op_type, domain)` matches the sub-Module’s FunctionProto name
Generic slot placeholder	`FunctionProto.attribute: repeated string`, one entry per unfilled slot
Concrete slot binding	`FunctionProto.attribute_proto: repeated AttributeProto`, payload carries the construction state
DSL op call	One `NodeProto` per call, `op_type` derived from the method name
Slot identity stamp	`NodeProto.metadata_props["ai.bytesandbrains.required_trait"]` + `["ai.bytesandbrains.slot_id"]`
Tensor type	`TypeProto.Tensor { elem_type, shape }` on `ValueInfoProto.type`
Tensor memory ownership	Backend-owned. `Backend::Tensor` is an `Arc`-shared handle around a backend-managed buffer (`CpuTensor(Arc<CpuBackendBuffer>)` at `bb-ops/src/backends/cpu/tensor.rs:65`). `Clone` is `Arc::clone`. Wire-receive routes through `Backend::materialize_from_wire(type_hash, bytes: Vec<u8>)` for backend-bound slots; the engine wraps the result in `BackendTensorCarrier` for slot residency.
Vendor scalar type	`TypeProto.Opaque { domain: "ai.bytesandbrains", name: "<TypeName>" }`
Opset declaration	`ModelProto.opset_import: repeated OperatorSetIdProto`
Sub-graph on an op	`AttributeProto.g: GraphProto` (used by `If`, `Loop`)

The compiler reads from these fields, mutates them in place across the pass pipeline, and writes the stamped result back into the same ModelProto. The engine then loads the proto, walks the bound function’s node list, and dispatches each NodeProto against its bound impl.

The Graph recorder

The DSL is a Rust struct named Graph. A Graph is a thin wrapper around an in-progress FunctionProto. Authors do not construct one directly. The Module trait’s default build() method allocates a Graph, hands it to the user’s body(&self, g: &mut Graph), and moves the recorded FunctionProto into a returned ModelProto.

The relevant header on the recorder spells out the contract:

// from bytesandbrains/bb-dsl/src/graph.rs:40-91
pub struct Graph {
    /// The IR body. Everything semantically meaningful goes here.
    function: FunctionProto,

    /// Output-name counter. Could be derived from
    /// `function.node[].output[]` but cheaper to track.
    site_counter: u64,

    /// `(TypeId, *const ())`-keyed dedup. Shared namespace across
    /// concrete and generic.
    instance_for_pointer: HashMap<(TypeId, *const ()), u32>,
    next_instance_id: u32,
    // ...
}

The FunctionProto is the canonical IR record. The remaining Rust state carries only what the proto cannot represent: pointer-identity dedup for placeholder instances, an output-name counter to mint fresh value names, and a stack tracking which sub-function the recorder is currently writing into.

Three primitives drive every method on Graph:

g.input("name") declares a Module-level input port and returns an Output handle bound to that name. The port is appended to the enclosing FunctionProto.input list and a matching ValueInfoProto is pushed into function.value_info.
g.output("name", value) registers a local output port. The recorder emits a PassThrough NodeProto that renames the producer value to the port name, then records the port name on FunctionProto.output and value_info.
g.net_out("name", peers, value) is the single network-output primitive. It records a wire.Send NodeProto under the ai.bytesandbrains.wire domain and registers the port name on the current function. The compiler’s partition pass cuts the graph at this boundary; a synthesis pass materializes the matching wire.Recv on every consumer-side partition.

Each call walks through Graph::push_node, which appends one NodeProto to the current function’s node list and stamps the composition-hierarchy chain into the new node’s metadata_props when the recorder is inside a nested function scope.

Output handles thread typed metadata, not phantom types

The DSL is non-generic across languages. The framework’s design forbids Output<T> phantom types and method-generic operations like g.send::<T>(...), because the same DSL must port to Python (and other host languages) without dragging Rust’s type parameters along.

The Rust handle that threads through every method chain is called Output:

// from bytesandbrains/bb-dsl/src/output.rs:1-32
//! Non-generic `Output` handle threaded through DSL method chains.
//!
//! Per `docs/API_DESIGN.md` §4 + `docs/IR_AND_DSL.md` Part 6. The DSL
//! is non-generic across languages - type metadata rides on
//! `&'static TypeNode`, NOT a `PhantomData<T>` tag. Identity is the
//! `name: String` (the ONNX value name in `FunctionProto.input` /
//! `NodeProto.input` / `NodeProto.output`); the wire-level type
//! identity rides on the `TypeNode` reference.

use bb_ir::types::TypeNode;

#[derive(Clone, Debug)]
pub struct Output {
    /// ONNX value name. Matches a `FunctionProto.input` entry, a
    /// `NodeProto.output` entry, or a `next_site_name()` mint.
    pub name: String,

    /// Static `TypeNode` reference. Pointer equality is meaningful
    /// - every canonical type lives in a single `static`.
    pub type_node: &'static TypeNode,
}

Two fields. The name is the ONNX value name. Every reference to the value in NodeProto.input, NodeProto.output, and ValueInfoProto.name cites that string. The type_node is a &'static TypeNode, a reference to one of the canonical static TypeNode declarations the framework registers at process start. Pointer equality on the static reference is the type-identity check.

When a DSL method returns multiple values, it returns a tuple of Output handles. When the next method consumes them, it reads each handle’s name for the NodeProto.input list and its type_node for the downstream type check. The recorder never needs to know the Rust-level static type of a value; it tracks the ONNX type tree through the TypeNode reference instead.

The recorder seeds inputs with the canonical bytes-typed sentinel:

// from bytesandbrains/bb-dsl/src/graph.rs:368-415
pub fn input(&mut self, name: &str) -> Output {
    // ... port declaration + value_info stamping elided ...
    Output::new(name.to_string(), &bb_ir::types::TYPE_BYTES)
}

The compiler’s TypeSolver narrows that sentinel to a concrete TypeNode once the input flows into typed ops, and stamps the refinement back onto the ValueInfoProto on the recorded function.

A complete recording

The shortest end-to-end example records one next_batch, one forward, and one net_out. The client logic from the federated learning example is the canonical version:

// from bytesandbrains/examples/federated_learning.rs:117-144
use bytesandbrains::placeholders::{DataLoaderSlot, ModelSlot};
use bytesandbrains::{Graph, Module};

struct ClientLogic;
impl Module for ClientLogic {
    fn name(&self) -> &str {
        "ClientLogic"
    }
    fn body(&self, g: &mut Graph) {
        // Inbound from the server: the latest global model params.
        let server_params = g.input("server_params");

        // Apply the global params to the local model.
        let _ = ModelSlot.load_parameters(g, server_params);

        // Local training step.
        let (batch, _labels) = DataLoaderSlot.next_batch(g);
        let _prediction = ModelSlot.forward(g, batch);

        // Read the updated params and ship them to the server peer.
        let updated_params = ModelSlot.params(g);
        let server_peer = g.input("server_peer");
        g.net_out("updated_params", server_peer, updated_params);
    }
}

That body records into one FunctionProto named "ClientLogic". The function’s input list holds "server_params" and "server_peer". The output list holds "updated_params". The node list holds one NodeProto per DSL call, in the order the author wrote them. Each NodeProto carries the slot-identity stamps the compiler needs to bind the role at compile time.

Calling ClientLogic.build() returns the wrapping ModelProto. The proto’s functions[0] is the recorded ClientLogic body, stamped with the canonical module_phase = "body" key. Sub-Modules reached during recording become entries in functions[1..]. A bootstrap recording (when the Module overrides Module::bootstrap) lands on a sibling <Name>__bootstrap function in the same functions list.

Module::bootstrap and host-staged input formals

Module::bootstrap(&self, g: &mut Graph) is the author entry point for pre-body initialization. The trait method defaults to no-op (bytesandbrains/bb-dsl/src/module.rs); authors override it next to Module::body. Inputs declared inside the override (g.input(name)) become formals on the emitted "<module>__bootstrap" FunctionProto, addressable from the host via BootstrapRequest::inputs:

impl Module for VectorStore {
    fn bootstrap(&self, g: &mut Graph) {
        // Each input becomes a declared formal on the emitted
        // `"<module>__bootstrap"` FunctionProto. The host stages
        // bytes via `BootstrapRequest::inputs`.
        let seed_corpus = g.input("seed_corpus");
        let _ = self.index.train(g, seed_corpus);
    }

    fn body(&self, g: &mut Graph) {
        let query = g.input("query");
        let _ = self.index.search(g, query, 10);
    }
}

The host kicks the recorded bootstrap via the immediate-fire entry point on Node:

node.run_bootstrap(bb::BootstrapTarget::ModuleRequests(&[
    bb::engine::BootstrapRequest {
        target: "VectorStore",
        inputs: &[("seed_corpus", corpus_bytes.as_slice())],
    },
]))?;
node.poll(cx); // drives the bootstrap body to quiescence

The engine validates inputs against the target’s declared formals at the boundary (bytesandbrains/bb-runtime/src/engine/core.rs:1464, Engine::enqueue_bootstrap_request): UnknownInput rejects extras, MissingInput rejects gaps, and UnknownTarget rejects unknown names, all before any bytes stage. Validated requests follow the Principle 1a copy (try_charge → try_reserve_exact → extend_from_slice); the caller’s borrowed &[u8] slices may drop the moment run_bootstrap returns. A bootstrap that takes no formals records zero g.input calls; the host kicks it via Node::run_bootstrap(BootstrapTarget::All) or Node::run_bootstrap(BootstrapTarget::ModuleNames(&["<target>"])). See The Engine for the BootstrapState architecture and the AlreadyTransitivelyQueued cycle defense.

Opsets

The framework declares three vendor opsets plus a required subset of ai.onnx. Every NodeProto carries a domain plus an op_type; the loaded ModelProto.opset_import array names the domain versions the program imports. The engine dispatches each NodeProto by (domain, op_type, instance).

The wire opset is the smallest. It carries two ops and is the only domain that crosses the network plane:

op_type	inputs	outputs	semantics
`Send`	`data: any`, `dest: Address`	(none)	Fire-and-forget broadcast. The wire envelope packs `data` as one `SlotFill` to `dest`.
`Recv`	(none)	`trigger: Opaque<Trigger>`, `payload: any`	Declare an inbound type acceptance. The Recv’s site identity becomes the routable destination; senders construct the matching `/site/<id>` suffix.

The other two framework domains:

ai.bytesandbrains.syscall v1 (38 ops): framework primitives. Most schedule triggers and timers (Pulse, OnTrigger, Interval, After, Sleep), some manage values across the graph (PassThrough, Tee, Constant, Hold.Stash, Hold.Flush, Serialize.Enqueue, Serialize.Dequeue), some mutate the peer address book (AddPeerAddress, RemovePeerAddress, DropPeer).
ai.bytesandbrains.role.<role> v1 (six domains, one per role): every role-method call from a placeholder records into its role-specific opset.

The ai.onnx opset is the canonical ONNX one. The framework’s required subset is documented inline at the source. Backends ship graph-execution support for that subset; user-authored backends that import a higher version inherit the standard ONNX semantics for the additional ops.

Composition: bundling typed Outputs

net_out ships one typed value per envelope. Aggregator protocols that pair a contribution tensor with a sample count, or hierarchical overlays that need to atomically attach metadata to a payload, would otherwise emit two Sends and rely on receivers to correlate them. Graph::bundle packs N typed Outputs into ONE composite Output for transmission through a single port; the matching Graph::unbundle decomposes the envelope back into N typed children on the receiver. Single-port DAG semantics hold because the bundle/unbundle pair threads one Output between peers.

use bytesandbrains::types::{TYPE_PEER_ID, TYPE_TENSOR_F32};
use bytesandbrains::{Graph, Module};

struct AggregatorStage;
impl Module for AggregatorStage {
    fn name(&self) -> &str { "AggregatorStage" }
    fn body(&self, g: &mut Graph) {
        let params = g.input("params");
        let owner = g.input("owner");
        let peers = g.input("peers");

        let composite = g.bundle(&[params, owner]);
        g.net_out("contribution", peers, composite);
    }
}

struct ReceiverStage;
impl Module for ReceiverStage {
    fn name(&self) -> &str { "ReceiverStage" }
    fn body(&self, g: &mut Graph) {
        let received = g.lookup_output("contribution")
            .expect("contribution port");
        let parts = g.unbundle(
            received,
            &[&TYPE_TENSOR_F32, &TYPE_PEER_ID],
        );
        g.output("params", parts[0].clone());
        g.output("owner", parts[1].clone());
    }
}

The recorded ops belong to a dedicated ai.bytesandbrains.composite domain. Bundle’s NodeProto has variable-arity input (one entry per child) and a single composite output port typed against the new TYPE_COMPOSITE TypeNode. Unbundle reads the composite, validates the declared child count against the runtime envelope, and re-emits each child as its original concrete SlotValue carrier. The ValueInfoProto.denotation on every child output carries the declared TypeNode, and downstream consumers downcast directly to the concrete carrier (.as_any().downcast_ref::<T>()) without a bincode-against-denotation hop.

In-process forwarding pays one SlotValue::clone_boxed per child at Bundle and Unbundle. No bincode encode or decode runs unless the envelope crosses a Node boundary. At the wire boundary, CompositeValue serializes each child as (type_hash, child.to_wire_bytes()) and the receiver materializes typed carriers via the global decoder registry. Carrier authors register the lattice binding and the wire decoder with a single register_type_node! invocation. See bb-runtime/src/syscall/values.rs:80-165 for the carrier shape and the wire codec; bb-ir/src/slot_value.rs:188-256 for the registry and the registration macro.

The same decoder registry drives the single-fill receive path: inbound wire.Recv payloads materialise into the typed SlotValue carrier the sender stamped, not a BytesValue. The Bundle codec and the wire receive path are symmetric, so registering one carrier participates in both surfaces with no extra step. The full receive flow (failure modes, partial delivery, destination metadata) lives in the Wire and addressing chapter.

Empty parts (Bundle) or empty part_types (Unbundle) panic at recording time. Composition of zero values has no semantic meaning and is almost certainly an author bug.

Vendor scalar types

Non-tensor scalars ride on TypeProto.Opaque under the ai.bytesandbrains domain. The codebase registers each one as a &'static TypeNode and submits it to the inventory at process start. The denotation strings the recorder stamps onto every ValueInfoProto.type.denotation look like this:

// from bytesandbrains/bb-ir/src/types/builtins.rs:38-44
pub static TYPE_TENSOR_F32: TypeNode = TypeNode {
    id: "tensor.f32",
    parent: Some("tensor"),
    kind: TypeKind::Concrete,
    ffi_name: "bb_tensor_f32_t",
    wire_hash: 0x0000_0000_0000_0101,
    denotation: "ai.bytesandbrains.tensor.f32",
};

The denotation is the key the recorder writes into the ValueInfoProto.type.denotation field. The wire codec uses wire_hash (a stable u64) to identify the type across the network without a registry round-trip. Both ends compute the same hash from the same denotation and the same opset version, so a receiver routes inbound bytes to the correct decoder without coordinating with the sender.

A modest selection from the built-in registry, all under the ai.bytesandbrains domain:

TypeNode	Kind	Denotation
`TYPE_TENSOR_F32`	Concrete	`ai.bytesandbrains.tensor.f32`
`TYPE_TENSOR_F64`	Concrete	`ai.bytesandbrains.tensor.f64`
`TYPE_TENSOR_I32`	Concrete	`ai.bytesandbrains.tensor.i32`
`TYPE_TENSOR_BOOL`	Concrete	`ai.bytesandbrains.tensor.bool`
`TYPE_PEER_ID`	Concrete	(registered Opaque under `ai.bytesandbrains`)
`TYPE_BYTES`	Concrete	(sentinel for unresolved port types)
`TYPE_ANY`	Abstract	(used as a permissive port bound)
`TYPE_TENSOR`	Abstract	(matches any `Tensor<T>` concrete leaf)

User crates submit their own leaves through the same inventory mechanism. The compiler’s TypeSolver walks the lattice at compile time to resolve every value to a concrete (leaf) TypeNode.

The Rust-dispatch boundary

The IR is graph-expressible up to one boundary and Rust-dispatched beyond it. The rule the spec calls “the Rust-dispatch boundary” fences off where the compiler can keep optimizing and where the engine takes over:

Graph decomposition stops at Rust dispatch. Every op in a loaded ModelProto is either graph-expressible, in which case its body is a sub-GraphProto and the compiler may inline it, or Rust-dispatched, in which case the engine calls a Rust function the bound runtime supplied and from that point the op is opaque to the IR. There is no third mode.

In practice the boundary lives at the (domain, op_type) pair on each NodeProto. Anything under a registered atomic-op opset is Rust dispatch. Anything else is graph-level composition.

The implication for authors is concrete: a Module body is fully graph-traversable. The compiler can inline it, collapse it, partition it across Nodes, snapshot it, and export it to any ONNX-aware consumer. A bound concrete’s execute_graph is a Rust-dispatch terminal: once the engine hands a GraphProto to the backend, what the backend does internally (JIT compilation, kernel fusion, GPU dispatch) is invisible to the IR. The contract the IR guarantees is the (inputs, GraphProto) -> outputs shape; the implementation is the vendor’s.

What ONNX gives the framework for free

Riding inside canonical ONNX messages costs the framework one design constraint: every authoring concept must land on a real ONNX field. In exchange the framework inherits a long list of capabilities without code:

Netron, the Python onnx package, ONNX Runtime, Burn’s loader, and TFLite’s converter all read framework graphs natively. The vendor opsets show as namespaced ops; the rest is just ONNX.
Snapshot is ModelProto bytes. Any ONNX-aware tool opens a framework snapshot. Diffing, lineage analysis, and visualization come for free.
FunctionProto-based composition is how ONNX itself models reusable graphs. Inlining, parameter substitution, and multi-instance composition are spec-defined.
opset_import solves version negotiation. The same mechanism used between PyTorch and ONNX Runtime works for framework graphs across Nodes and across framework releases.
GraphProto.initializer weights round-trip without serializer code. A bound model whose construction state references initializer names exports a graph that any ONNX consumer can load with weights intact.
TypeProto.Opaque is the right primitive for vendor scalars. Python’s onnx library preserves the domain and name without trying to interpret them; the framework’s deserializer registry interprets them where the runtime needs them.

The chapters that follow lean on this base. Chapter 4 is the Syscalls Reference: the canonical NodeProtos the framework emits and dispatches. Chapter 5 covers the authoring macros (#[derive(bb::Module)] and the role derives) that emit declarative port sets and inventory submissions. Chapter 6 covers the seven Contract traits the framework dispatches against. Chapter 9 covers the compiler’s 17-pass pipeline that turns a recorded ModelProto into an installable one.

Where this lives

ONNX schema bindings (prost-generated): bytesandbrains/bb-ir/src/proto/mod.rs.
Vendor type registry: bytesandbrains/bb-ir/src/types/.
Syscall and wire id constants: bytesandbrains/bb-ir/src/syscall_ids.rs.
The Module trait and Module::build(): bytesandbrains/bb-dsl/src/module.rs.
The Graph recorder: bytesandbrains/bb-dsl/src/graph.rs.
The Output handle: bytesandbrains/bb-dsl/src/output.rs.
Slot placeholders and their DSL bodies: bytesandbrains/bb-ops/src/placeholders/mod.rs.
DSL-side syscall helpers (pass_through, add_peer_address, etc.): bytesandbrains/bb-dsl/src/syscalls.rs.
The federated learning example used in this chapter: bytesandbrains/examples/federated_learning.rs.