BYTES AND BRAINS v0.3.0 . CRATES.IO
UPLINK OK
guide/09 . the-compiler // SOURCE: bytesandbrains/docs/COMPILER.md
Chapter 9

The Compiler

Chapter 8 closed out the static surface. Storage, the type tree, the solver pass. None of that work runs at install time. The compiler runs it. This chapter walks the compiler entry point, the pipeline it drives each Module::build() output through, and the error variants it surfaces when an input is malformed or under-bound.

The whole surface is one struct, a builder chain that records bindings, one compile method that returns a single ModelProto, and one error enum the caller matches on. The bind chain is statically typed: each bind_<role>::<T>(slot) method has a trait bound that says the concrete T actually implements the role you are binding it under. Misuse fails at compile time. Pipeline failures surface as typed CompileError variants that point at the offending NodeProto.

The three-phase pipeline

The framework is a three-phase construction. Authors record a program. The compiler binds concretes and runs the pipeline. The host installs the result onto a Node.

// Phase 1: record the program shape (pure recording, no compiler work).
let recorded: ModelProto = MyModule.build()?;

// Phase 2: bind concretes and run the canonical pipeline.
let compiled: ModelProto = Compiler::new()
    .bind_backend::<CpuBackend>("compute")
    .bind_index::<HnswIndex>("primary_index")
    .compile(recorded)?;

// Phase 3: install one or more entry-point targets onto a Node.
let node = bb::install(
    peer_id,
    addresses,
    compiled,
    &[target],
    Config::new(),
)?;

Module::build() returns a ModelProto whose functions[] is the recorded program shape: the root function plus every sub-Module body the recorder reached, plus any bootstrap function the Module declares. That model is not directly executable. It carries placeholder denotations for every Contract-method port, no wire pair synthesis, no gate insertion, no partition slicing.

Compiler::new().bind_*().compile() is the compiler entry point. It walks the canonical pipeline, mutates the recorded model into an engine-ready form, stamps a compilation passport plus per-target binding metadata onto metadata_props, and returns a single ModelProto whose functions[] carries every partition the pipeline produced. The compiler itself is single-target-agnostic. It emits sibling FunctionProtos for every partition that partition_by_wire_ops produced and lets the host decide at install time which partitions live on which peer.

bb::install walks compiled.functions[], finds the function whose name matches each entry in the targets slice, reads the binding metadata for every resolved target, dedupes shared slot bindings across targets into one ComponentRef per slot, constructs the concrete instances from the inventory registry exactly once, and brings the Node up. Different BB Nodes pick different targets slices from the same compiled ModelProto. A single-target Module compiles to one partition. A multi-target Module compiles to one partition per inferred BB-Node class, and a peer hosting more than one of those partitions installs them all together as install(..., &["A", "B"], ...).

Config::new() is the empty per-deployment configuration bag passed to bb::install. Attach a typed config to one slot with Config::new().with("compute", burn_cfg); install downcasts each attached value to the bound concrete’s <T as ConcreteComponent>::Config associated type, surfacing InstallError::ConfigTypeMismatch on shape mismatch. Slots whose concrete declares type Config = () need no entry: install supplies the unit value automatically.

Compiler::new() and the bind chain

Compiler::new() returns a fresh compiler ready to accept binding declarations. The default state turns strict type-solver mode on, sets the per-hop deadline budget to the framework’s DEFAULT_PER_HOP_BUDGET_NS constant, sets the target IR version to FRAMEWORK_IR_VERSION, and carries an empty bindings list.

// from bytesandbrains/bb-compiler/src/driver.rs:132-134
pub fn new() -> Self {
    Self::default()
}

The bind chain is the surface authors use to declare which concrete component handles which role at runtime. Each method on Compiler is generic over the concrete type. The trait bound encodes “this concrete implements the role you are binding it under”:

// from bytesandbrains/bb-compiler/src/driver.rs:195-208
pub fn bind_backend<T>(self, slot: impl Into<String>) -> Self
where
    T: bb_runtime::concrete::ConcreteComponent + bb_runtime::roles::BackendRuntime,
{
    self.bind_concrete_with_storage::<T>(slot.into(), "BackendRuntime", &["tensor"])
}

pub fn bind_index<T>(self, slot: impl Into<String>) -> Self
where
    T: bb_runtime::concrete::ConcreteComponent + bb_runtime::roles::IndexRuntime,
{
    self.bind_concrete_with_storage::<T>(slot.into(), "IndexRuntime", &["vector"])
}

There is one bind method per role: bind_backend, bind_index, bind_model, bind_aggregator, bind_codec, bind_data_source, bind_peer_selector, bind_protocol. The slot string is the author-chosen name matching the #[depends(role = "<slot>")] attribute on sibling components, and matching the slot id stamped on each Contract-method NodeProto in the recorded function.

Each bind call records a (slot, role_runtime, concrete_type_name) triple and, when the concrete used #[derive(bb::<Role>)], looks up the per-port Storage::TYPE statics from the inventory registry. Those statics drive the refine_polymorphic_value_info pre-pipeline pass: the placeholder TYPE_TENSOR denotations the DSL recorder stamped on every Contract-method NodeProto get narrowed to the bound concrete’s actual storage type before the type solver walks the graph.

The chain is fluent. Bindings declared earlier do not constrain bindings declared later. Three configuration methods sit alongside the binds:

// from bytesandbrains/bb-compiler/src/driver.rs:139-167
pub fn with_target_version(mut self, version: u32) -> Self { ... }
pub fn with_per_hop_budget_ns(mut self, budget_ns: u64) -> Self { ... }
pub fn with_permissive_types(mut self) -> Self { ... }
pub fn without_stage(mut self, name: &str) -> Self { ... }

with_target_version overrides which FRAMEWORK_IR_VERSION the compiler will accept on the input model; mismatch raises CompileError::IrVersionMismatch at the top of the canonical pipeline, before any pass mutates the recorded model. with_per_hop_budget_ns overrides the budget the derive_wire_deadlines pass uses when stamping static deadlines. with_permissive_types falls back to the relaxed solve() path on the type solver so hand-built test fixtures can leave unresolved values at TYPE_ANY. without_stage disables a canonical pass by name for test scenarios; the pass name list is described in the next section.

Three stage methods let user-supplied passes fire after the canonical pipeline, once per emitted partition:

// from bytesandbrains/bb-compiler/src/driver.rs:170-187
pub fn push_back_stage<S: CompilerStage + 'static>(mut self, stage: S) -> Self { ... }
pub fn push_front_stage<S: CompilerStage + 'static>(mut self, stage: S) -> Self { ... }
pub fn insert_stage<S: CompilerStage + 'static>(mut self, index: usize, stage: S) -> Self { ... }

The CompilerStage trait carries a stable name and a run method that mutates the emitted ModelProto. The custom-pass example in bytesandbrains/examples/custom_compiler_pass.rs walks every wire.Send NodeProto in each emitted partition and stamps a tracing identifier into metadata_props. The full example is reproduced later in this chapter.

The pass pipeline

Compiler::compile() runs three structural steps before and after the canonical pipeline:

// from bytesandbrains/bb-compiler/src/driver.rs:307-348
pub fn compile(self, mut model: ModelProto) -> Result<ModelProto, CompileError> {
    let mut binding_spec = BindingSpec::new();
    // ... build the spec from recorded bindings ...

    // Pre-pipeline: refine placeholder TYPE_TENSOR denotations.
    refine_polymorphic_value_info(&mut model, &binding_spec)?;

    let mut models = self.run_pipeline(model)?;

    // Post-pipeline: verify declared deps + stamp metadata.
    resolve_component_dependencies(&binding_spec, &mut models)?;
    validate_all_slots_bound(&binding_spec, &models)?;

    // Stamp the compilation passport + per-target binding table.
    // ... per-partition stamp_compilation_metadata calls ...

    merge_partitions_into_one(models)
}

The first step, refine_polymorphic_value_info, runs before run_pipeline because it needs access to BindingSpec. The recorder stamps a polymorphic TYPE_TENSOR placeholder on every Contract-method port. Each bound concrete’s Storage::TYPE narrows that placeholder to the concrete’s actual leaf in the type tree. The type solver inside run_pipeline then walks the narrowed denotations, not the placeholders.

The middle step, run_pipeline, drives the canonical pipeline. The ordered pass name list lives in CANONICAL_PASS_NAMES:

// from bytesandbrains/bb-compiler/src/runner.rs:22-40
pub const CANONICAL_PASS_NAMES: &[&str] = &[
    "inline_for_partition",
    "derive_wire_deadlines",
    "validate",
    "expand_ops",
    "type_solver",
    "infer_peer_classes",
    "synthesize_wire_recvs",
    "partition_by_wire_ops",
    "resolve_slots",
    "analyze_wire_edges",
    "insert_dedup_gate_rx",
    "insert_peer_health_gate_rx",
    "insert_backoff_gate_rx",
    "insert_peer_health_gate_tx",
    "insert_backoff_gate_tx",
    "insert_async_deadlines",
    "validate_runtime_complete",
];

validate_bootstrap_composition runs at the top of run_pipeline_with_options, between the front-half seam checks and the per-target loop. It walks the bootstrap call graph and surfaces BootstrapCompositionGap or BootstrapCompositionCycle if a parent Module’s bootstrap recording calls a child target that has no matching FunctionProto.

Touch-set computation

The engine’s per-component body-op gate (see The Engine) needs the closure of every ComponentRef each Module bootstrap’s body reaches. This is the bootstrap’s touch set. The computation lives on the engine, not in the compiler, because slot id to ComponentRef binding only resolves at install time when concretes instantiate.

Engine::compute_touch_set(function_key) (bytesandbrains/bb-runtime/src/engine/core.rs:1145-1196) walks the bootstrap function body once after install populates self.functions plus self.slot_id_to_cref. For each NodeProto in the body:

  1. Direct touch. Read the NodeProto’s metadata_props["ai.bytesandbrains.slot_id"] stamp. If present, look up slot_id → ComponentRef via Engine::slot_id_to_cref and add the cref to the touch set.
  2. Transitive touch via FunctionCall. Build the callee FunctionKey from (domain, op_type, overload). When the callee resolves a sibling FunctionProto in self.functions, recurse on the callee body. A visited_keys: HashSet<FunctionKey> defends against cycles (Module A bootstrap calling Module B body via FunctionCalls).

The result stamps onto BootstrapState::module_bootstraps[name].touch_set (bytesandbrains/bb-runtime/src/engine/core.rs:1126-1131) before any bootstrap fires. At gate time is_op_locked reads this pre-stamped set in O(1) without per-call body walks. The compiler’s contract is to leave every slot_id stamp and every FunctionCall domain or op_type intact through validate_bootstrap_composition and resolve_slots.

The last step, after the pipeline, calls resolve_component_dependencies to verify every #[depends(...)] attribute on every bound concrete points at a slot the bind chain declared, then validate_all_slots_bound to confirm every slot the runtime needs has an entry in the spec. Missing slots surface as CompileError::UnboundSlot. After that the compiler stamps the compilation passport (ai.bytesandbrains.compiled = "v1") and per target binding triples (ai.bytesandbrains.binding.<target>.<slot> = "<role>|<TYPE_NAME>|<slot_id>") onto each partition’s metadata_props, then merges the per-partition models into one ModelProto whose functions[] carries the full set.

Pass groups

The pipeline divides into four functional groups. Each group’s passes write invariants the next group’s passes assume.

Refine types. refine_polymorphic_value_info narrows placeholder denotations to the bound concrete’s storage. This pass runs before run_pipeline, so it sits outside the CANONICAL_PASS_NAMES list but inside Compiler::compile().

Surface and validate. inline_for_partition inlines three classes of function at every CALL site: wire-touching functions (any function whose transitive closure contains an ai.bytesandbrains.wire op), pure-ONNX functions (transitive closure entirely under ai.onnx.*), and single-call functions (called from exactly one site). Multi-call sub-Modules without wire ops survive as FunctionProto referenced by CALL nodes. derive_wire_deadlines stamps each wire.Send’s deadline_ns attribute from chain_depth × per_hop_budget_ns (defaulting chain_depth = 1 when no metadata is present). validate runs structural sanity checks: rule 1 op type known, rule 2 inputs reachable, rule 3 outputs unique, rule 5 type declarations present, rule 6 slot metadata well formed, rule 7 no cycles. Rules 4 (wire pairing) and 8 (opsets imported) are deferred and currently no-op.

Expand, solve, infer, synthesize. expand_ops materializes op-variant defaults. type_solver walks the constraint network; in strict mode it narrows every value to a concrete leaf TypeNode and surfaces unresolved slots as CompileError::UnresolvedType, in permissive mode unresolved values stay at TYPE_ANY. infer_peer_classes stamps HOME_CLASS_KEY on every NodeProto with the class of Node it runs on. synthesize_wire_recvs inserts a synthesized wire.Recv NodeProto on each consumer-side class downstream of every user-authored wire.Send.

Partition, resolve, analyze, gate. partition_by_wire_ops slices the recorded function at wire ops, grouping by HOME_CLASS_KEY. resolve_slots walks each partition’s role-domain NodeProtos and matches concrete vs. generic slot providers. analyze_wire_edges classifies each cross-partition edge as data or trigger_only and groups outbound sends by destination for batching. The five gate-insert passes splice DedupGateRx, PeerHealthGateRx, BackoffGateRx, PeerHealthGateTx, and BackoffGateTx NodeProtos adjacent to wire ops; the next section walks each gate’s per-op semantics, state machine, attributes, and fire/drop conditions. insert_async_deadlines stamps deadline_ns on every async-suspending op carrier. validate_runtime_complete runs the final per-partition pre-flight: every peer-routed wire.Send has its TX gate chain, every peer-routed wire.Recv has its RX gate chain, every NodeProto carrying deadline_ns is paired with a DeadlineCheck, and every registered GateContract asserts its canonical insertion.

Stamp metadata. stamp_compilation_metadata is the final pass that turns each per-partition ModelProto into a complete install artifact. It writes the compilation passport (ai.bytesandbrains.compiled = "v1") plus one ai.bytesandbrains.binding.<target>.<slot> entry per BindingSlot so install can look up the bound impl by name. It also walks every wire.Recv NodeProto whose payload output feeds a role NodeProto’s input and stamps RECV_SLOT_ID_KEY on the Recv node’s metadata_props with the consumer role’s slot_id. Install reads that stamp to populate GraphSlot::recv_site_to_slot_id so the engine’s decode_typed_fill step can cross from data-plane identity (NodeSiteId) to binding identity (slot_id) and route backend-bound tensor fills through Backend::materialize_from_wire. Recv nodes whose payload does not flow into a role NodeProto stay unstamped and take the framework-carrier decode path.

Partition by wire ops

The dissection pass is partition_by_wire_ops. It produces one sub-graph per BB-Node class and one WireEdge per matched send-receive pair:

// from bytesandbrains/bb-compiler/src/partition_by_wire_ops.rs:52-85
#[derive(Debug, Default)]
pub struct NetworkAnalysis {
    pub per_role: BTreeMap<String, GraphProto>,
    pub wire_edges: Vec<WireEdge>,
}

#[derive(Debug)]
pub struct WireEdge {
    pub producer_role: String,
    pub consumer_role: String,
    pub value_name: String,
    pub send_node: NodeProto,
    pub recv_node: NodeProto,
}

Wire ops are the partition boundary. Every NodeProto under the ai.bytesandbrains.wire domain breaks the dataflow graph. Two non-wire ops belong to the same partition iff a dataflow path connects them without crossing a wire op. Send-flavored ops attach to the partition of their data-input producers. Recv-flavored ops attach to the partition of their data-output consumers.

Partition names come from the HOME_CLASS_KEY metadata that infer_peer_classes stamped on every NodeProto in the prior pass. Single-Node Modules without peer-class metadata fall through to the canonical SELF_CLASS ("@self"). Federated Modules produce one partition per inferred class. Each partition’s emitted function is named from the partition class plus a 16-hex-character content hash of its NodeProto bodies, so identical content yields stable names and changes to a Module body shift the hash.

Partition runs before resolve_slots because different BB Nodes can bind different concretes for the same role. One target might bind BurnBackend; another might bind CandleBackend. Resolving slots globally before partitioning would mis-bind across targets.

Wire-gate insertion

Five passes near the end of the canonical pipeline splice gate NodeProtos adjacent to every peer-routed wire op. The RX side runs first, in order insert_dedup_gate_rx, insert_peer_health_gate_rx, insert_backoff_gate_rx. The TX side runs after, in order insert_peer_health_gate_tx, insert_backoff_gate_tx. Each pass walks the per-class sub-graph, decides whether to insert based on a fire condition the source code makes explicit, and rewires neighbouring nodes so the gate sits in the dataflow path.

All five inserted ops live under one domain. Op-type names match the bb_ir::syscall_ids constants:

// from bytesandbrains/bb-ir/src/syscall_ids.rs:9-80
SYSCALL_DOMAIN          = "ai.bytesandbrains.syscall"
OP_DEDUP_GATE_RX        = "DedupGateRx"
OP_PEER_HEALTH_GATE_RX  = "PeerHealthGateRx"
OP_BACKOFF_GATE_RX      = "BackoffGateRx"
OP_PEER_HEALTH_GATE_TX  = "PeerHealthGateTx"
OP_BACKOFF_GATE_TX      = "BackoffGateTx"

The runtime syscall implementations live under bb-ops/src/syscalls/gates/, one file per op, each one registered via inventory::submit! so the engine’s dispatch table picks them up.

Fire conditions

A TX gate fires (inserts a NodeProto) when the candidate wire.Send satisfies three conditions: it sits under the ai.bytesandbrains.wire domain with op-type Send, it carries no prior idempotence stamp for this gate, and it carries an ATTR_PEER attribute readable via bb_ir::wire_shape::read_peer_bytes. Sends without peer route via the runtime address book using dest_target metadata and are not peer-specific, so they do not get the per-peer TX gate chain. The check sits on each pass’s main loop, e.g.

// from bytesandbrains/bb-compiler/src/insert_backoff_gate_tx.rs:28-51
for node in sub_graph.node.iter_mut() {
    if node.domain != WIRE_DOMAIN || node.op_type != WIRE_SEND_OP {
        continue;
    }
    if metadata_value(node, GATED_KEY).is_some() {
        continue;
    }
    let Some(peer) = read_peer(node) else {
        continue;
    };
    let Some(gated_input) = node.input.first().cloned() else {
        continue;
    };
    // ... build gate, rewire input[0], stamp GATED_KEY ...
}

An RX gate fires for every ai.bytesandbrains.wire / Recv NodeProto that has not already been stamped with the gate’s idempotence key. Unlike TX, the RX side does not consult an attribute for peer identity: the inbound peer rides on RuntimeResourceRef::envelope_src_peer which the engine populates per inbound envelope. The compiler-side trigger is presence-of-Recv, the runtime-side check uses the live envelope.

Each pass writes a distinct idempotence stamp so a second invocation is a no-op:

DedupGateRx       "ai.bytesandbrains.dedup_rx_gated"
PeerHealthGateRx  "ai.bytesandbrains.peer_health_rx_gated"
BackoffGateRx     "ai.bytesandbrains.backoff_rx_gated"
PeerHealthGateTx  "ai.bytesandbrains.peer_health_tx_gated"
BackoffGateTx     "ai.bytesandbrains.backoff_tx_gated"

RX chain rewiring

RX gates chain off a single metadata cursor, RX_CHAIN_HEAD_KEY = "ai.bytesandbrains.rx_chain_head", stamped on the wire.Recv. The head defaults to recv.output[0]. Each successive gate reads the current head, builds a gate node whose input is the head and whose output is a fresh name format!("{recv_name}#<gate>_rx_out"), rewires every other node whose input names the old head to point at the new output, and advances the head:

// from bytesandbrains/bb-compiler/src/insert_dedup_gate_rx.rs:36-55
for recv_idx in recv_indices {
    if metadata_value(&sub_graph.node[recv_idx], GATED_KEY).is_some() {
        continue;
    }
    let recv_name = sub_graph.node[recv_idx].name.clone();
    let head = rx_chain_head(&sub_graph.node[recv_idx]);
    let new_head = format!("{recv_name}#dedup_rx_out");

    new_gates.push(build_gate_node(&recv_name, &head, &new_head));

    rewire_consumers(sub_graph, recv_idx, &head, &new_head);

    set_metadata(&mut sub_graph.node[recv_idx].metadata_props, GATED_KEY, "true");
    set_rx_chain_head(&mut sub_graph.node[recv_idx], &new_head);
}

That produces the chain Recv -> DedupGateRx -> PeerHealthGateRx -> BackoffGateRx, in run order. Every gate’s output port is named value. Downstream consumers see exactly one input even though three gates sit between them and the Recv.

TX chain rewiring

TX gates do not share a metadata cursor. Each TX pass directly rewires wire.Send to consume the gate’s output by overwriting node.input[0]:

// from bytesandbrains/bb-compiler/src/insert_peer_health_gate_tx.rs:38-47
let gate_output = format!("{}#peer_health_tx_gated", node.name);
gates.push(build_gate_node(&node.name, &gated_input, &gate_output, &peer));
node.input[0] = gate_output;
set_metadata(&mut node.metadata_props, GATED_KEY, "true");

Run order is PeerHealthGateTx then BackoffGateTx. The first pass swaps the Send’s input[0] for the peer-health gate output, the second pass repeats the swap so the final chain is PeerHealthGateTx -> BackoffGateTx -> wire.Send. Each TX gate’s sole output is named trigger and carries a TriggerValue (no payload, fire-only).

Attribute schemas

Each TX gate carries an ATTR_PEER attribute stamped via bb_ir::wire_shape::stamp_peer_bytes. The schema is the canonical multihash byte form, written to attribute.s (bytes) with AttributeType::String:

// from bytesandbrains/bb-ir/src/wire_shape.rs:154-181
ATTR_PEER = "peer"
attribute.s     = PeerId.to_bytes()   // multihash, canonical
attribute.type  = AttributeType::String

RX gates carry no peer attribute. The runtime reads ctx.current.inbound.src_peer instead, so an envelope’s source identity rides on the envelope itself rather than the gate.

Both sides stamp a metadata_props entry naming the source wire op they protect, used by post-pipeline diagnostics:

DedupGateRx       "ai.bytesandbrains.dedup_rx_source"        = recv_name
PeerHealthGateRx  "ai.bytesandbrains.peer_health_rx_source"  = recv_name
BackoffGateRx     "ai.bytesandbrains.backoff_rx_source"      = recv_name
PeerHealthGateTx  "ai.bytesandbrains.peer_health_tx_source"  = send_name
BackoffGateTx     "ai.bytesandbrains.backoff_tx_source"      = send_name

Per-op runtime semantics

Once the gates are spliced, the runtime invokes each gate at the appropriate phase of Engine::poll. The dispatch surface is syscall::invoke(node, inputs, ctx) -> Result<DispatchResult, OpError>. Each gate decides Allow / Deny against a framework primitive and the engine surfaces Deny as an OpError whose detail carries a stable reason label downstream consumers can match.

DedupGateRx. Hashes the input value’s wire bytes via FNV-1a 64, records the hash in bb_runtime::framework::InboundDedup, and forwards the value on first arrival. The dedup window is a sliding-window seen-set with default capacity 8192 (oldest entry evicted on overflow). A repeat hash returns OpError with detail containing reason=duplicate.

// from bytesandbrains/bb-ops/src/syscalls/gates/dedup_rx.rs:36-54
let bytes = value.to_wire_bytes()?;
let hash = fnv1a_64(&bytes);
if ctx.net.dedup.record(hash) {
    return Err(OpError {
        detail: "DedupGateRx dropped envelope: reason=duplicate".into(),
        ..Default::default()
    });
}
Ok(DispatchResult::Immediate(vec![("value".to_string(), value.clone_boxed())]))

PeerHealthGateRx. Calls PeerGovernor::check_inbound(src_peer). The governor enforces a blocklist (explicit deny) and an optional allowlist (deny everyone not in it). Allow forwards the input; Deny returns OpError whose detail includes the stable reason label.

PeerHealthGateTx. Reads the destination peer from the gate’s own ATTR_PEER attribute and calls PeerGovernor::check_outbound(peer, &backoff, now_ns). The outbound check layers a third Deny variant on top of inbound: a peer whose BackoffTable::should_retry returns false is denied with BlockReason::Cooldown { retry_ns }. Allow emits a TriggerValue; Deny short-circuits the downstream wire.Send with a stable OpError.

Reason labels are stable strings drawn from peer_governor::BlockReason:

// from bytesandbrains/bb-ops/src/syscalls/gates/peer_health_tx.rs:67-74
pub fn reason_label(reason: &BlockReason) -> &'static str {
    match reason {
        BlockReason::Blocklisted => "blocklisted",
        BlockReason::NotAllowlisted => "not_allowlisted",
        BlockReason::Cooldown { .. } => "cooldown",
    }
}

BackoffGateRx / BackoffGateTx. Both consult BackoffTable::should_retry(peer, now_ns). The RX side uses the inbound src_peer, the TX side uses the gate’s own ATTR_PEER. A peer with no recorded failures retries immediately. A peer with attempts >= 1 retries once now_ns >= state.next_retry_ns. record_failure increments attempts and schedules next_retry_ns = now_ns + delay_for(attempts). The schedule is exponential with a base and cap:

// from bytesandbrains/bb-runtime/src/framework/backoff_table.rs:23, 27, 163-169
DEFAULT_BASE_NS      = 10_000_000           // 10 ms
DEFAULT_MAX_DELAY_NS = 60_000_000_000       // 60 s

delay_for(attempts) = min(BASE_NS * 2^(attempts - 1), MAX_DELAY_NS)

record_success clears the per-peer state, so the next failure restarts from attempts = 1 (10 ms delay). The TX side emits TriggerValue on retry-eligible, the RX side forwards the value slot; either side denies with reason=cooldown.

State-machine summary

The framework primitives the gates consult are stateful. The relevant transitions are:

BackoffTable per-peer state (bb-runtime/src/framework/backoff_table.rs:71-101):
  Untracked ---record_failure--> { attempts: 1, next_retry: now + 10ms }
            ---record_success--> Untracked          // remains untracked
  Tracked   ---record_failure--> { attempts: n+1, next_retry: now + delay_for(n+1) }
            ---record_success--> Untracked          // clears state
            ---should_retry(t)-> true  iff  t >= next_retry_ns

PeerGovernor per-peer health (bb-runtime/src/framework/peer_governor.rs:202-239):
  { consecutive_failures: 0, down: false }
      ---record_failure---> consecutive_failures += 1
                             down = (consecutive_failures >= threshold)
      ---record_success---> { consecutive_failures: 0, down: false }
  Default threshold = 5 (DEFAULT_FAILURE_THRESHOLD).
  Lifecycle transitions WentDown / CameUp emit as EngineStep::PeerDown / PeerUp.

InboundDedup (bb-runtime/src/framework/inbound_dedup.rs:46-65):
  Sliding-window seen-set, default capacity 8192.
  record(hash) returns true if hash already in window (duplicate).
  Oldest entry evicted on insertion when window is full.

Validation pairing

validate_runtime_complete runs last in the canonical pipeline. It enforces presence pairing: any partition that ships a peer-routed wire.Send must also ship a PeerHealthGateTx and BackoffGateTx; any partition with a peer-routed wire.Recv must ship DedupGateRx, PeerHealthGateRx, and BackoffGateRx. Missing pieces surface as CompileError::Internal naming the partition:

// from bytesandbrains/bb-compiler/src/validate_runtime_complete.rs:50-67
if has_peer_send {
    if !has_op(PEER_HEALTH_GATE_TX_OP, SYSCALL_DOMAIN) {
        return Err(CompileError::Internal {
            detail: format!(
                "validate_runtime_complete: partition `{}` has a peer-routed wire.Send but no PeerHealthGateTx",
                sub_graph.name,
            ),
        });
    }
    if !has_op(BACKOFF_GATE_TX_OP, SYSCALL_DOMAIN) {
        return Err(CompileError::Internal { /* ... */ });
    }
}

The validator then walks every GateContract registered through inventory::submit! and calls assert_inserted on the sub-graph. That structure is the extension point: adding a new gate is “ship the inserting pass + register its contract” rather than “edit validate_runtime_complete”. The contract trait lives at bb-compiler/src/gate_contract.rs:29-40 and the registration carrier at bb-compiler/src/gate_contract.rs:45-50.

Error variants

Every pass surfaces failures through one of two enums in bb_compiler::error. ValidationError is exclusive to the validate pass and carries one variant per rule. CompileError is the wrapper the runner returns; it wraps ValidationError via From and adds the later-pass variants.

// from bytesandbrains/bb-compiler/src/error.rs:17-76
pub enum ValidationError {
    UnknownOp { node_name: String, op_type: String, domain: String },
    DanglingInput { node_name: String, input_name: String },
    DuplicateOutput { value_name: String, node_a: String, node_b: String },
    MissingTypeInfo { input_name: String },
    MalformedSlotMetadata { node_name: String, detail: String },
    CyclicGraph { involves: Vec<String> },
    OpsetNotImported { domain: String, version_used: i64 },
}

UnknownOp fires when a NodeProto’s (domain, op_type) is not in the reserved opsets (ai.bytesandbrains.*, ai.onnx) and is not declared via the inventory registry. DanglingInput fires when a NodeProto input value name has no producer in the graph. DuplicateOutput fires when two NodeProtos write the same output value name. MissingTypeInfo fires when a GraphProto.input lacks a ValueInfoProto.type. MalformedSlotMetadata fires when a role-domain NodeProto carries neither (concrete_type, instance) nor (required_trait, slot_id) metadata. CyclicGraph fires when topological sort finds a cycle. OpsetNotImported fires when an op uses an opset that is not in ModelProto.opset_import.

The CompileError enum carries the post-validate variants. The following is a partial listing covering the variants you are most likely to see at the user surface:

// from bytesandbrains/bb-compiler/src/error.rs:111-443
pub enum CompileError {
    Validation(ValidationError),
    ExpansionFailed { op_type: String, domain: String, reason: String },
    RoleMethodFailed { slot: String, op_type: String, source: String },
    AmbiguousRole {
        role: String,
        concrete_type: String,
        generic_slot_id: u32,
    },
    UnresolvedPeerClass { node_name: String, peer_input: String },
    CrossClassDataflow { node_name: String, home_a: String, home_b: String },
    IrVersionMismatch { expected: u32, got: u32 },
    MissingBinding { slot: String, site: String },
    EmptyFunctionTable,
    RuntimeIncomplete { missing: String },
    Internal { detail: String },
    TypeConstraintFailed { op: String, detail: String },
    UnresolvedType { value: String },
    UnboundDependency {
        component: String,
        bound_at_slot: String,
        required_role: String,
        required_slot: String,
    },
    // ... see bb-compiler/src/error.rs for the full enum ...
}

UnresolvedType is the most common author-facing failure. The strict type solver narrows every value to a concrete leaf in the type tree. A value that cannot resolve gets reported with its value name so the author can trace it back to the unconstrained port. Switching to with_permissive_types() falls back to the relaxed solve() path that lets unresolved values stay at TYPE_ANY.

UnboundSlot and UnboundDependency cover the binding gap cases. The first variant fires when the recorded Module body uses a placeholder of role R but no .bind_<role>::<T>("…") call supplied a concrete for R. The second fires when a bound concrete declares #[depends(role = "slot")] for a slot the bind chain did not include.

IncompatibleStorageOnEdge is the wire-edge type check. After synthesize_wire_recvs runs, the runner re-runs the type solver and walks every synthesized Recv. If the send-side value name and the recv-side value name resolve to different concrete storage types, the edge needs a Codec bridge. The error message names both type ids and suggests the codec to insert.

The full enum is longer; the comprehensive list lives at bb-compiler/src/error.rs. Every variant carries enough detail to locate the offending NodeProto or binding in the recorded model.

A custom compiler stage

The custom_compiler_pass example wires a user-supplied CompilerStage into the bind chain. The stage walks every wire.Send NodeProto in each emitted partition and stamps a tracing id. The complete copy-paste skeleton, including the imports and the three const declarations the example threads through:

// from bytesandbrains/examples/custom_compiler_pass.rs:19-89
use bytesandbrains::compiler::{CompilerStage, PassError};
use bytesandbrains::proto::onnx::{ModelProto, StringStringEntryProto};

const WIRE_DOMAIN: &str = "ai.bytesandbrains.wire";
const SEND_OP: &str = "Send";
const TRACING_ID_KEY: &str = "example.tracing_id";

struct StampTracingIds {
    counter: std::sync::atomic::AtomicU32,
}

impl CompilerStage for StampTracingIds {
    fn name(&self) -> &'static str {
        "stamp_tracing_ids"
    }
    fn run(&self, model: &mut ModelProto) -> Result<(), PassError> {
        for func in &mut model.functions {
            for node in &mut func.node {
                if node.domain == WIRE_DOMAIN && node.op_type == SEND_OP {
                    let id = self
                        .counter
                        .fetch_add(1, std::sync::atomic::Ordering::Relaxed);
                    node.metadata_props.push(StringStringEntryProto {
                        key: TRACING_ID_KEY.into(),
                        value: format!("trace-{id}"),
                    });
                }
            }
        }
        Ok(())
    }
}

The stage is registered through push_back_stage, which slots it after the canonical pipeline. The compiler runs every user stage once per emitted partition; the model: &mut ModelProto the stage receives is the partition’s emitted model, not the merged top-level output.

// from bytesandbrains/examples/custom_compiler_pass.rs:106-117
let stage = StampTracingIds {
    counter: std::sync::atomic::AtomicU32::new(0),
};
let compiled = Compiler::new()
    .bind_backend::<bytesandbrains::ops::backends::cpu::CpuBackend>("compute")
    .push_back_stage(stage)
    .compile(recorded_model)?;

push_front_stage puts the user stage at the front of the user-stage list, before any previously pushed stages but still after the canonical pipeline. insert_stage(index, stage) inserts at a clamped index. without_stage(name) removes a user stage by name, or disables a canonical pass if the name matches one in CANONICAL_PASS_NAMES.

Run the example with:

cargo run --example custom_compiler_pass

The expected output reports one wire.Send NodeProto per emitted partition that carries a tracing_id metadata entry.

Pipeline output

Compiler::compile() returns one ModelProto. There is no wrapper struct, no separate compiled artifact type. The bare ModelProto is the compiled form. The output carries:

- functions[0..n]    one FunctionProto per partition + sub-Module body.
                     Each partition's main name matches a target arg
                     for bb::install.
- opset_import[]     every (domain, version) referenced anywhere.
- metadata_props[]   ai.bytesandbrains.compiled = "v1"
                     ai.bytesandbrains.binding.<target>.<slot> =
                         "<role>|<TYPE_NAME>|<slot_id>"
- graph              left empty; partition bodies live in functions[].

bb::install(peer_id, addresses, compiled, &[target], Config::new()) walks compiled.functions[], resolves each entry in the targets slice (exact name first, then the <target>#<hash> content-suffix), reads the binding metadata for every resolved target, and brings up a Node. Different BB Nodes pick different targets slices from the same compiled ModelProto. The compiled form round-trips through prost serialization with no extra wrapper, so a compiled ModelProto is itself a Module whose op() replays the stored function into a parent graph.

Per-pass testability

Each pass is a pure function on IR. Tests live alongside the pass source files at bb-compiler/src/<pass>_tests.rs. Each test constructs a minimal IR fragment, calls the pass directly, and asserts on the output IR shape plus the diagnostics. Cross-pass integration tests live in bb-compiler/src/driver_tests.rs and exercise the full Compiler::new().bind_*().compile() chain against the bundled examples.

That is the compiler. One pre-pipeline pass, seventeen canonical passes, and a short post-pipeline tail (resolve_component_dependencies, validate_all_slots_bound, stamp_compilation_metadata, merge_partitions_into_one) that produces one ModelProto ready for bb::install to pick a target from. Each pass pure, well ordered, explicitly diagnosable. The next chapter walks the engine that executes the compiled artifact.

Where this lives

  • bytesandbrains/bb-compiler/src/driver.rs: Compiler, CompilerStage, PassError, the bind chain.
  • bytesandbrains/bb-compiler/src/runner.rs: CANONICAL_PASS_NAMES, the pass orchestrator.
  • bytesandbrains/bb-compiler/src/error.rs: CompileError, ValidationError, SlotSource.
  • bytesandbrains/bb-compiler/src/refine_polymorphic_value_info.rs: the pre-pipeline pass that narrows TYPE_TENSOR placeholders.
  • bytesandbrains/bb-compiler/src/partition_by_wire_ops.rs: the dissection pass, NetworkAnalysis, WireEdge.
  • bytesandbrains/bb-compiler/src/validate.rs: rule-by-rule structural sanity checks.
  • bytesandbrains/examples/custom_compiler_pass.rs: end-to-end custom stage example.
  • bytesandbrains/docs/COMPILER.md: the bb-private architecture spec.