The Compiler
Chapter 8 closed out the static surface. Storage, the type tree, the
solver pass. None of that work runs at install time. The compiler runs
it. This chapter walks the compiler entry point, the pipeline it drives
each Module::build() output through, and the error variants it
surfaces when an input is malformed or under-bound.
The whole surface is one struct, a builder chain that records bindings,
one compile method that returns a single ModelProto, and one error
enum the caller matches on. The bind chain is statically typed: each
bind_<role>::<T>(slot) method has a trait bound that says the
concrete T actually implements the role you are binding it under.
Misuse fails at compile time. Pipeline failures surface as typed
CompileError variants that point at the offending NodeProto.
The three-phase pipeline
The framework is a three-phase construction. Authors record a program.
The compiler binds concretes and runs the pipeline. The host installs
the result onto a Node.
// Phase 1: record the program shape (pure recording, no compiler work).
let recorded: ModelProto = MyModule.build()?;
// Phase 2: bind concretes and run the canonical pipeline.
let compiled: ModelProto = Compiler::new()
.bind_backend::<CpuBackend>("compute")
.bind_index::<HnswIndex>("primary_index")
.compile(recorded)?;
// Phase 3: install one or more entry-point targets onto a Node.
let node = bb::install(
peer_id,
addresses,
compiled,
&[target],
Config::new(),
)?;
Module::build() returns a ModelProto whose functions[] is the
recorded program shape: the root function plus every sub-Module body
the recorder reached, plus any bootstrap function the Module declares.
That model is not directly executable. It carries placeholder
denotations for every Contract-method port, no wire pair synthesis, no
gate insertion, no partition slicing.
Compiler::new().bind_*().compile() is the compiler entry point. It
walks the canonical pipeline, mutates the recorded model into an
engine-ready form, stamps a compilation passport plus per-target
binding metadata onto metadata_props, and returns a single
ModelProto whose functions[] carries every partition the pipeline
produced. The compiler itself is single-target-agnostic. It emits
sibling FunctionProtos for every partition that
partition_by_wire_ops produced and lets the host decide at install
time which partitions live on which peer.
bb::install walks compiled.functions[], finds the function whose
name matches each entry in the targets slice, reads the binding
metadata for every resolved target, dedupes shared slot bindings
across targets into one ComponentRef per slot, constructs the
concrete instances from the inventory registry exactly once, and
brings the Node up. Different BB Nodes pick different targets
slices from the same compiled ModelProto. A single-target Module
compiles to one partition. A multi-target Module compiles to one
partition per inferred BB-Node class, and a peer hosting more than
one of those partitions installs them all together as
install(..., &["A", "B"], ...).
Config::new() is the empty per-deployment configuration bag passed
to bb::install. Attach a typed config to one slot with
Config::new().with("compute", burn_cfg); install downcasts each
attached value to the bound concrete’s <T as ConcreteComponent>::Config
associated type, surfacing InstallError::ConfigTypeMismatch on
shape mismatch. Slots whose concrete declares type Config = () need
no entry: install supplies the unit value automatically.
Compiler::new() and the bind chain
Compiler::new() returns a fresh compiler ready to accept binding
declarations. The default state turns strict type-solver mode on, sets
the per-hop deadline budget to the framework’s DEFAULT_PER_HOP_BUDGET_NS
constant, sets the target IR version to FRAMEWORK_IR_VERSION, and
carries an empty bindings list.
// from bytesandbrains/bb-compiler/src/driver.rs:132-134
pub fn new() -> Self {
Self::default()
}
The bind chain is the surface authors use to declare which concrete
component handles which role at runtime. Each method on Compiler is
generic over the concrete type. The trait bound encodes “this concrete
implements the role you are binding it under”:
// from bytesandbrains/bb-compiler/src/driver.rs:195-208
pub fn bind_backend<T>(self, slot: impl Into<String>) -> Self
where
T: bb_runtime::concrete::ConcreteComponent + bb_runtime::roles::BackendRuntime,
{
self.bind_concrete_with_storage::<T>(slot.into(), "BackendRuntime", &["tensor"])
}
pub fn bind_index<T>(self, slot: impl Into<String>) -> Self
where
T: bb_runtime::concrete::ConcreteComponent + bb_runtime::roles::IndexRuntime,
{
self.bind_concrete_with_storage::<T>(slot.into(), "IndexRuntime", &["vector"])
}
There is one bind method per role: bind_backend, bind_index,
bind_model, bind_aggregator, bind_codec, bind_data_source,
bind_peer_selector, bind_protocol. The slot string is the
author-chosen name matching the #[depends(role = "<slot>")] attribute
on sibling components, and matching the slot id stamped on each
Contract-method NodeProto in the recorded function.
Each bind call records a (slot, role_runtime, concrete_type_name)
triple and, when the concrete used #[derive(bb::<Role>)], looks up the
per-port Storage::TYPE statics from the inventory registry. Those
statics drive the refine_polymorphic_value_info pre-pipeline pass:
the placeholder TYPE_TENSOR denotations the DSL recorder stamped on
every Contract-method NodeProto get narrowed to the bound concrete’s
actual storage type before the type solver walks the graph.
The chain is fluent. Bindings declared earlier do not constrain bindings declared later. Three configuration methods sit alongside the binds:
// from bytesandbrains/bb-compiler/src/driver.rs:139-167
pub fn with_target_version(mut self, version: u32) -> Self { ... }
pub fn with_per_hop_budget_ns(mut self, budget_ns: u64) -> Self { ... }
pub fn with_permissive_types(mut self) -> Self { ... }
pub fn without_stage(mut self, name: &str) -> Self { ... }
with_target_version overrides which FRAMEWORK_IR_VERSION the
compiler will accept on the input model; mismatch raises
CompileError::IrVersionMismatch at the top of the canonical
pipeline, before any pass mutates the recorded model.
with_per_hop_budget_ns overrides the budget the
derive_wire_deadlines pass uses when stamping static deadlines.
with_permissive_types falls back to the relaxed solve() path on the
type solver so hand-built test fixtures can leave unresolved values at
TYPE_ANY. without_stage disables a canonical pass by name for test
scenarios; the pass name list is described in the next section.
Three stage methods let user-supplied passes fire after the canonical pipeline, once per emitted partition:
// from bytesandbrains/bb-compiler/src/driver.rs:170-187
pub fn push_back_stage<S: CompilerStage + 'static>(mut self, stage: S) -> Self { ... }
pub fn push_front_stage<S: CompilerStage + 'static>(mut self, stage: S) -> Self { ... }
pub fn insert_stage<S: CompilerStage + 'static>(mut self, index: usize, stage: S) -> Self { ... }
The CompilerStage trait carries a stable name and a run method
that mutates the emitted ModelProto. The custom-pass example in
bytesandbrains/examples/custom_compiler_pass.rs walks every
wire.Send NodeProto in each emitted partition and stamps a tracing
identifier into metadata_props. The full example is reproduced
later in this chapter.
The pass pipeline
Compiler::compile() runs three structural steps before and after the
canonical pipeline:
// from bytesandbrains/bb-compiler/src/driver.rs:307-348
pub fn compile(self, mut model: ModelProto) -> Result<ModelProto, CompileError> {
let mut binding_spec = BindingSpec::new();
// ... build the spec from recorded bindings ...
// Pre-pipeline: refine placeholder TYPE_TENSOR denotations.
refine_polymorphic_value_info(&mut model, &binding_spec)?;
let mut models = self.run_pipeline(model)?;
// Post-pipeline: verify declared deps + stamp metadata.
resolve_component_dependencies(&binding_spec, &mut models)?;
validate_all_slots_bound(&binding_spec, &models)?;
// Stamp the compilation passport + per-target binding table.
// ... per-partition stamp_compilation_metadata calls ...
merge_partitions_into_one(models)
}
The first step, refine_polymorphic_value_info, runs before
run_pipeline because it needs access to BindingSpec. The recorder
stamps a polymorphic TYPE_TENSOR placeholder on every Contract-method
port. Each bound concrete’s Storage::TYPE narrows that placeholder to
the concrete’s actual leaf in the type tree. The type solver inside
run_pipeline then walks the narrowed denotations, not the
placeholders.
The middle step, run_pipeline, drives the canonical pipeline. The
ordered pass name list lives in CANONICAL_PASS_NAMES:
// from bytesandbrains/bb-compiler/src/runner.rs:22-40
pub const CANONICAL_PASS_NAMES: &[&str] = &[
"inline_for_partition",
"derive_wire_deadlines",
"validate",
"expand_ops",
"type_solver",
"infer_peer_classes",
"synthesize_wire_recvs",
"partition_by_wire_ops",
"resolve_slots",
"analyze_wire_edges",
"insert_dedup_gate_rx",
"insert_peer_health_gate_rx",
"insert_backoff_gate_rx",
"insert_peer_health_gate_tx",
"insert_backoff_gate_tx",
"insert_async_deadlines",
"validate_runtime_complete",
];
validate_bootstrap_composition runs at the top of
run_pipeline_with_options, between the front-half seam checks and the
per-target loop. It walks the bootstrap call graph and surfaces
BootstrapCompositionGap or BootstrapCompositionCycle if a parent
Module’s bootstrap recording calls a child target that has no matching
FunctionProto.
Touch-set computation
The engine’s per-component body-op gate (see
The Engine) needs the closure of every
ComponentRef each Module bootstrap’s body reaches. This is the
bootstrap’s touch set. The computation lives on the engine, not
in the compiler, because slot id to ComponentRef binding only
resolves at install time when concretes instantiate.
Engine::compute_touch_set(function_key)
(bytesandbrains/bb-runtime/src/engine/core.rs:1145-1196) walks the
bootstrap function body once after install populates
self.functions plus self.slot_id_to_cref. For each NodeProto in
the body:
- Direct touch. Read the NodeProto’s
metadata_props["ai.bytesandbrains.slot_id"]stamp. If present, look upslot_id → ComponentRefviaEngine::slot_id_to_crefand add the cref to the touch set. - Transitive touch via FunctionCall. Build the callee
FunctionKeyfrom(domain, op_type, overload). When the callee resolves a sibling FunctionProto inself.functions, recurse on the callee body. Avisited_keys: HashSet<FunctionKey>defends against cycles (Module A bootstrap calling Module B body via FunctionCalls).
The result stamps onto
BootstrapState::module_bootstraps[name].touch_set
(bytesandbrains/bb-runtime/src/engine/core.rs:1126-1131) before
any bootstrap fires. At gate time is_op_locked reads this
pre-stamped set in O(1) without per-call body walks. The compiler’s
contract is to leave every slot_id stamp and every FunctionCall
domain or op_type intact through validate_bootstrap_composition
and resolve_slots.
The last step, after the pipeline, calls
resolve_component_dependencies to verify every #[depends(...)]
attribute on every bound concrete points at a slot the bind chain
declared, then validate_all_slots_bound to confirm every slot the
runtime needs has an entry in the spec. Missing slots surface as
CompileError::UnboundSlot. After that the compiler stamps the
compilation passport (ai.bytesandbrains.compiled = "v1") and per
target binding triples
(ai.bytesandbrains.binding.<target>.<slot> = "<role>|<TYPE_NAME>|<slot_id>")
onto each partition’s metadata_props, then merges the per-partition
models into one ModelProto whose functions[] carries the full set.
Pass groups
The pipeline divides into four functional groups. Each group’s passes write invariants the next group’s passes assume.
Refine types. refine_polymorphic_value_info narrows placeholder
denotations to the bound concrete’s storage. This pass runs before
run_pipeline, so it sits outside the CANONICAL_PASS_NAMES list but
inside Compiler::compile().
Surface and validate. inline_for_partition inlines three classes of
function at every CALL site: wire-touching functions (any function
whose transitive closure contains an ai.bytesandbrains.wire op),
pure-ONNX functions (transitive closure entirely under ai.onnx.*),
and single-call functions (called from exactly one site).
Multi-call sub-Modules without wire ops survive as FunctionProto
referenced by CALL nodes. derive_wire_deadlines stamps each
wire.Send’s deadline_ns attribute from
chain_depth × per_hop_budget_ns (defaulting chain_depth = 1 when
no metadata is present). validate runs structural sanity checks:
rule 1 op type known, rule 2 inputs reachable, rule 3 outputs unique,
rule 5 type declarations present, rule 6 slot metadata well formed,
rule 7 no cycles. Rules 4 (wire pairing) and 8 (opsets imported) are
deferred and currently no-op.
Expand, solve, infer, synthesize. expand_ops materializes
op-variant defaults. type_solver walks the constraint network; in
strict mode it narrows every value to a concrete leaf TypeNode and
surfaces unresolved slots as CompileError::UnresolvedType, in
permissive mode unresolved values stay at TYPE_ANY.
infer_peer_classes stamps HOME_CLASS_KEY on every NodeProto with
the class of Node it runs on. synthesize_wire_recvs inserts a
synthesized wire.Recv NodeProto on each consumer-side class
downstream of every user-authored wire.Send.
Partition, resolve, analyze, gate. partition_by_wire_ops slices the
recorded function at wire ops, grouping by HOME_CLASS_KEY.
resolve_slots walks each partition’s role-domain NodeProtos and
matches concrete vs. generic slot providers. analyze_wire_edges
classifies each cross-partition edge as data or trigger_only and
groups outbound sends by destination for batching. The five gate-insert
passes splice DedupGateRx, PeerHealthGateRx, BackoffGateRx,
PeerHealthGateTx, and BackoffGateTx NodeProtos adjacent to wire
ops; the next section walks each gate’s per-op semantics, state
machine, attributes, and fire/drop conditions.
insert_async_deadlines stamps deadline_ns on every
async-suspending op carrier. validate_runtime_complete runs the
final per-partition pre-flight: every peer-routed wire.Send has its
TX gate chain, every peer-routed wire.Recv has its RX gate chain,
every NodeProto carrying deadline_ns is paired with a
DeadlineCheck, and every registered GateContract asserts its
canonical insertion.
Stamp metadata. stamp_compilation_metadata is the final pass that
turns each per-partition ModelProto into a complete install
artifact. It writes the compilation passport
(ai.bytesandbrains.compiled = "v1") plus one
ai.bytesandbrains.binding.<target>.<slot> entry per BindingSlot so
install can look up the bound impl by name. It also walks every
wire.Recv NodeProto whose payload output feeds a role NodeProto’s
input and stamps RECV_SLOT_ID_KEY on the Recv node’s
metadata_props with the consumer role’s slot_id. Install reads
that stamp to populate GraphSlot::recv_site_to_slot_id so the
engine’s decode_typed_fill step can cross from data-plane
identity (NodeSiteId) to binding identity (slot_id) and route
backend-bound tensor fills through Backend::materialize_from_wire.
Recv nodes whose payload does not flow into a role NodeProto stay
unstamped and take the framework-carrier decode path.
Partition by wire ops
The dissection pass is partition_by_wire_ops. It produces one
sub-graph per BB-Node class and one WireEdge per matched send-receive
pair:
// from bytesandbrains/bb-compiler/src/partition_by_wire_ops.rs:52-85
#[derive(Debug, Default)]
pub struct NetworkAnalysis {
pub per_role: BTreeMap<String, GraphProto>,
pub wire_edges: Vec<WireEdge>,
}
#[derive(Debug)]
pub struct WireEdge {
pub producer_role: String,
pub consumer_role: String,
pub value_name: String,
pub send_node: NodeProto,
pub recv_node: NodeProto,
}
Wire ops are the partition boundary. Every NodeProto under the
ai.bytesandbrains.wire domain breaks the dataflow graph. Two non-wire
ops belong to the same partition iff a dataflow path connects them
without crossing a wire op. Send-flavored ops attach to the partition
of their data-input producers. Recv-flavored ops attach to the
partition of their data-output consumers.
Partition names come from the HOME_CLASS_KEY metadata that
infer_peer_classes stamped on every NodeProto in the prior pass.
Single-Node Modules without peer-class metadata fall through to the
canonical SELF_CLASS ("@self"). Federated Modules produce one
partition per inferred class. Each partition’s emitted function is
named from the partition class plus a 16-hex-character content hash of
its NodeProto bodies, so identical content yields stable names and
changes to a Module body shift the hash.
Partition runs before resolve_slots because different BB Nodes can
bind different concretes for the same role. One target might bind
BurnBackend; another might bind CandleBackend. Resolving slots
globally before partitioning would mis-bind across targets.
Wire-gate insertion
Five passes near the end of the canonical pipeline splice gate
NodeProtos adjacent to every peer-routed wire op. The RX side runs
first, in order insert_dedup_gate_rx, insert_peer_health_gate_rx,
insert_backoff_gate_rx. The TX side runs after, in order
insert_peer_health_gate_tx, insert_backoff_gate_tx. Each pass
walks the per-class sub-graph, decides whether to insert based on a
fire condition the source code makes explicit, and rewires
neighbouring nodes so the gate sits in the dataflow path.
All five inserted ops live under one domain. Op-type names match the
bb_ir::syscall_ids constants:
// from bytesandbrains/bb-ir/src/syscall_ids.rs:9-80
SYSCALL_DOMAIN = "ai.bytesandbrains.syscall"
OP_DEDUP_GATE_RX = "DedupGateRx"
OP_PEER_HEALTH_GATE_RX = "PeerHealthGateRx"
OP_BACKOFF_GATE_RX = "BackoffGateRx"
OP_PEER_HEALTH_GATE_TX = "PeerHealthGateTx"
OP_BACKOFF_GATE_TX = "BackoffGateTx"
The runtime syscall implementations live under
bb-ops/src/syscalls/gates/, one file per op, each one registered via
inventory::submit! so the engine’s dispatch table picks them up.
Fire conditions
A TX gate fires (inserts a NodeProto) when the candidate wire.Send
satisfies three conditions: it sits under the
ai.bytesandbrains.wire domain with op-type Send, it carries no
prior idempotence stamp for this gate, and it carries an ATTR_PEER
attribute readable via bb_ir::wire_shape::read_peer_bytes. Sends
without peer route via the runtime address book using dest_target
metadata and are not peer-specific, so they do not get the per-peer
TX gate chain. The check sits on each pass’s main loop, e.g.
// from bytesandbrains/bb-compiler/src/insert_backoff_gate_tx.rs:28-51
for node in sub_graph.node.iter_mut() {
if node.domain != WIRE_DOMAIN || node.op_type != WIRE_SEND_OP {
continue;
}
if metadata_value(node, GATED_KEY).is_some() {
continue;
}
let Some(peer) = read_peer(node) else {
continue;
};
let Some(gated_input) = node.input.first().cloned() else {
continue;
};
// ... build gate, rewire input[0], stamp GATED_KEY ...
}
An RX gate fires for every ai.bytesandbrains.wire / Recv NodeProto
that has not already been stamped with the gate’s idempotence key.
Unlike TX, the RX side does not consult an attribute for peer
identity: the inbound peer rides on RuntimeResourceRef::envelope_src_peer
which the engine populates per inbound envelope. The compiler-side
trigger is presence-of-Recv, the runtime-side check uses the live
envelope.
Each pass writes a distinct idempotence stamp so a second invocation is a no-op:
DedupGateRx "ai.bytesandbrains.dedup_rx_gated"
PeerHealthGateRx "ai.bytesandbrains.peer_health_rx_gated"
BackoffGateRx "ai.bytesandbrains.backoff_rx_gated"
PeerHealthGateTx "ai.bytesandbrains.peer_health_tx_gated"
BackoffGateTx "ai.bytesandbrains.backoff_tx_gated"
RX chain rewiring
RX gates chain off a single metadata cursor, RX_CHAIN_HEAD_KEY = "ai.bytesandbrains.rx_chain_head", stamped on the wire.Recv. The
head defaults to recv.output[0]. Each successive gate reads the
current head, builds a gate node whose input is the head and whose
output is a fresh name format!("{recv_name}#<gate>_rx_out"),
rewires every other node whose input names the old head to point at
the new output, and advances the head:
// from bytesandbrains/bb-compiler/src/insert_dedup_gate_rx.rs:36-55
for recv_idx in recv_indices {
if metadata_value(&sub_graph.node[recv_idx], GATED_KEY).is_some() {
continue;
}
let recv_name = sub_graph.node[recv_idx].name.clone();
let head = rx_chain_head(&sub_graph.node[recv_idx]);
let new_head = format!("{recv_name}#dedup_rx_out");
new_gates.push(build_gate_node(&recv_name, &head, &new_head));
rewire_consumers(sub_graph, recv_idx, &head, &new_head);
set_metadata(&mut sub_graph.node[recv_idx].metadata_props, GATED_KEY, "true");
set_rx_chain_head(&mut sub_graph.node[recv_idx], &new_head);
}
That produces the chain Recv -> DedupGateRx -> PeerHealthGateRx -> BackoffGateRx, in run order. Every gate’s output port is named
value. Downstream consumers see exactly one input even though three
gates sit between them and the Recv.
TX chain rewiring
TX gates do not share a metadata cursor. Each TX pass directly
rewires wire.Send to consume the gate’s output by overwriting
node.input[0]:
// from bytesandbrains/bb-compiler/src/insert_peer_health_gate_tx.rs:38-47
let gate_output = format!("{}#peer_health_tx_gated", node.name);
gates.push(build_gate_node(&node.name, &gated_input, &gate_output, &peer));
node.input[0] = gate_output;
set_metadata(&mut node.metadata_props, GATED_KEY, "true");
Run order is PeerHealthGateTx then BackoffGateTx. The first pass
swaps the Send’s input[0] for the peer-health gate output, the
second pass repeats the swap so the final chain is
PeerHealthGateTx -> BackoffGateTx -> wire.Send. Each TX gate’s
sole output is named trigger and carries a TriggerValue (no
payload, fire-only).
Attribute schemas
Each TX gate carries an ATTR_PEER attribute stamped via
bb_ir::wire_shape::stamp_peer_bytes. The schema is the canonical
multihash byte form, written to attribute.s (bytes) with
AttributeType::String:
// from bytesandbrains/bb-ir/src/wire_shape.rs:154-181
ATTR_PEER = "peer"
attribute.s = PeerId.to_bytes() // multihash, canonical
attribute.type = AttributeType::String
RX gates carry no peer attribute. The runtime reads
ctx.current.inbound.src_peer instead, so an envelope’s source
identity rides on the envelope itself rather than the gate.
Both sides stamp a metadata_props entry naming the source wire op
they protect, used by post-pipeline diagnostics:
DedupGateRx "ai.bytesandbrains.dedup_rx_source" = recv_name
PeerHealthGateRx "ai.bytesandbrains.peer_health_rx_source" = recv_name
BackoffGateRx "ai.bytesandbrains.backoff_rx_source" = recv_name
PeerHealthGateTx "ai.bytesandbrains.peer_health_tx_source" = send_name
BackoffGateTx "ai.bytesandbrains.backoff_tx_source" = send_name
Per-op runtime semantics
Once the gates are spliced, the runtime invokes each gate at the
appropriate phase of Engine::poll. The dispatch surface is
syscall::invoke(node, inputs, ctx) -> Result<DispatchResult, OpError>. Each gate decides Allow / Deny against a framework
primitive and the engine surfaces Deny as an OpError whose detail
carries a stable reason label downstream consumers can match.
DedupGateRx. Hashes the input value’s wire bytes via FNV-1a 64,
records the hash in bb_runtime::framework::InboundDedup, and
forwards the value on first arrival. The dedup window is a
sliding-window seen-set with default capacity 8192 (oldest entry
evicted on overflow). A repeat hash returns OpError with detail
containing reason=duplicate.
// from bytesandbrains/bb-ops/src/syscalls/gates/dedup_rx.rs:36-54
let bytes = value.to_wire_bytes()?;
let hash = fnv1a_64(&bytes);
if ctx.net.dedup.record(hash) {
return Err(OpError {
detail: "DedupGateRx dropped envelope: reason=duplicate".into(),
..Default::default()
});
}
Ok(DispatchResult::Immediate(vec![("value".to_string(), value.clone_boxed())]))
PeerHealthGateRx. Calls PeerGovernor::check_inbound(src_peer).
The governor enforces a blocklist (explicit deny) and an optional
allowlist (deny everyone not in it). Allow forwards the input;
Deny returns OpError whose detail includes the stable reason
label.
PeerHealthGateTx. Reads the destination peer from the gate’s own
ATTR_PEER attribute and calls PeerGovernor::check_outbound(peer, &backoff, now_ns). The outbound check layers a third Deny variant
on top of inbound: a peer whose BackoffTable::should_retry returns
false is denied with BlockReason::Cooldown { retry_ns }. Allow
emits a TriggerValue; Deny short-circuits the downstream wire.Send
with a stable OpError.
Reason labels are stable strings drawn from
peer_governor::BlockReason:
// from bytesandbrains/bb-ops/src/syscalls/gates/peer_health_tx.rs:67-74
pub fn reason_label(reason: &BlockReason) -> &'static str {
match reason {
BlockReason::Blocklisted => "blocklisted",
BlockReason::NotAllowlisted => "not_allowlisted",
BlockReason::Cooldown { .. } => "cooldown",
}
}
BackoffGateRx / BackoffGateTx. Both consult
BackoffTable::should_retry(peer, now_ns). The RX side uses the
inbound src_peer, the TX side uses the gate’s own ATTR_PEER.
A peer with no recorded failures retries immediately. A peer with
attempts >= 1 retries once now_ns >= state.next_retry_ns.
record_failure increments attempts and schedules
next_retry_ns = now_ns + delay_for(attempts). The schedule is
exponential with a base and cap:
// from bytesandbrains/bb-runtime/src/framework/backoff_table.rs:23, 27, 163-169
DEFAULT_BASE_NS = 10_000_000 // 10 ms
DEFAULT_MAX_DELAY_NS = 60_000_000_000 // 60 s
delay_for(attempts) = min(BASE_NS * 2^(attempts - 1), MAX_DELAY_NS)
record_success clears the per-peer state, so the next failure
restarts from attempts = 1 (10 ms delay). The TX side emits
TriggerValue on retry-eligible, the RX side forwards the value
slot; either side denies with reason=cooldown.
State-machine summary
The framework primitives the gates consult are stateful. The relevant transitions are:
BackoffTable per-peer state (bb-runtime/src/framework/backoff_table.rs:71-101):
Untracked ---record_failure--> { attempts: 1, next_retry: now + 10ms }
---record_success--> Untracked // remains untracked
Tracked ---record_failure--> { attempts: n+1, next_retry: now + delay_for(n+1) }
---record_success--> Untracked // clears state
---should_retry(t)-> true iff t >= next_retry_ns
PeerGovernor per-peer health (bb-runtime/src/framework/peer_governor.rs:202-239):
{ consecutive_failures: 0, down: false }
---record_failure---> consecutive_failures += 1
down = (consecutive_failures >= threshold)
---record_success---> { consecutive_failures: 0, down: false }
Default threshold = 5 (DEFAULT_FAILURE_THRESHOLD).
Lifecycle transitions WentDown / CameUp emit as EngineStep::PeerDown / PeerUp.
InboundDedup (bb-runtime/src/framework/inbound_dedup.rs:46-65):
Sliding-window seen-set, default capacity 8192.
record(hash) returns true if hash already in window (duplicate).
Oldest entry evicted on insertion when window is full.
Validation pairing
validate_runtime_complete runs last in the canonical pipeline.
It enforces presence pairing: any partition that ships a peer-routed
wire.Send must also ship a PeerHealthGateTx and BackoffGateTx;
any partition with a peer-routed wire.Recv must ship DedupGateRx,
PeerHealthGateRx, and BackoffGateRx. Missing pieces surface as
CompileError::Internal naming the partition:
// from bytesandbrains/bb-compiler/src/validate_runtime_complete.rs:50-67
if has_peer_send {
if !has_op(PEER_HEALTH_GATE_TX_OP, SYSCALL_DOMAIN) {
return Err(CompileError::Internal {
detail: format!(
"validate_runtime_complete: partition `{}` has a peer-routed wire.Send but no PeerHealthGateTx",
sub_graph.name,
),
});
}
if !has_op(BACKOFF_GATE_TX_OP, SYSCALL_DOMAIN) {
return Err(CompileError::Internal { /* ... */ });
}
}
The validator then walks every GateContract registered through
inventory::submit! and calls assert_inserted on the sub-graph.
That structure is the extension point: adding a new gate is “ship
the inserting pass + register its contract” rather than “edit
validate_runtime_complete”. The contract trait lives at
bb-compiler/src/gate_contract.rs:29-40 and the registration carrier
at bb-compiler/src/gate_contract.rs:45-50.
Error variants
Every pass surfaces failures through one of two enums in
bb_compiler::error. ValidationError is exclusive to the validate
pass and carries one variant per rule. CompileError is the wrapper
the runner returns; it wraps ValidationError via From and adds the
later-pass variants.
// from bytesandbrains/bb-compiler/src/error.rs:17-76
pub enum ValidationError {
UnknownOp { node_name: String, op_type: String, domain: String },
DanglingInput { node_name: String, input_name: String },
DuplicateOutput { value_name: String, node_a: String, node_b: String },
MissingTypeInfo { input_name: String },
MalformedSlotMetadata { node_name: String, detail: String },
CyclicGraph { involves: Vec<String> },
OpsetNotImported { domain: String, version_used: i64 },
}
UnknownOp fires when a NodeProto’s (domain, op_type) is not in the
reserved opsets (ai.bytesandbrains.*, ai.onnx) and is not declared
via the inventory registry. DanglingInput fires when a NodeProto
input value name has no producer in the graph. DuplicateOutput fires
when two NodeProtos write the same output value name. MissingTypeInfo
fires when a GraphProto.input lacks a ValueInfoProto.type.
MalformedSlotMetadata fires when a role-domain NodeProto carries
neither (concrete_type, instance) nor (required_trait, slot_id)
metadata. CyclicGraph fires when topological sort finds a cycle.
OpsetNotImported fires when an op uses an opset that is not in
ModelProto.opset_import.
The CompileError enum carries the post-validate variants. The
following is a partial listing covering the variants you are most
likely to see at the user surface:
// from bytesandbrains/bb-compiler/src/error.rs:111-443
pub enum CompileError {
Validation(ValidationError),
ExpansionFailed { op_type: String, domain: String, reason: String },
RoleMethodFailed { slot: String, op_type: String, source: String },
AmbiguousRole {
role: String,
concrete_type: String,
generic_slot_id: u32,
},
UnresolvedPeerClass { node_name: String, peer_input: String },
CrossClassDataflow { node_name: String, home_a: String, home_b: String },
IrVersionMismatch { expected: u32, got: u32 },
MissingBinding { slot: String, site: String },
EmptyFunctionTable,
RuntimeIncomplete { missing: String },
Internal { detail: String },
TypeConstraintFailed { op: String, detail: String },
UnresolvedType { value: String },
UnboundDependency {
component: String,
bound_at_slot: String,
required_role: String,
required_slot: String,
},
// ... see bb-compiler/src/error.rs for the full enum ...
}
UnresolvedType is the most common author-facing failure. The strict
type solver narrows every value to a concrete leaf in the type tree.
A value that cannot resolve gets reported with its value name so the
author can trace it back to the unconstrained port. Switching to
with_permissive_types() falls back to the relaxed solve() path
that lets unresolved values stay at TYPE_ANY.
UnboundSlot and UnboundDependency cover the binding gap cases. The
first variant fires when the recorded Module body uses a placeholder
of role R but no .bind_<role>::<T>("…") call supplied a concrete for
R. The second fires when a bound concrete declares #[depends(role = "slot")] for a slot the bind chain did not include.
IncompatibleStorageOnEdge is the wire-edge type check. After
synthesize_wire_recvs runs, the runner re-runs the type solver and
walks every synthesized Recv. If the send-side value name and the
recv-side value name resolve to different concrete storage types, the
edge needs a Codec bridge. The error message names both type ids and
suggests the codec to insert.
The full enum is longer; the comprehensive list lives at
bb-compiler/src/error.rs. Every variant carries enough detail to
locate the offending NodeProto or binding in the recorded model.
A custom compiler stage
The custom_compiler_pass example wires a user-supplied CompilerStage
into the bind chain. The stage walks every wire.Send NodeProto in
each emitted partition and stamps a tracing id. The complete copy-paste
skeleton, including the imports and the three const declarations the
example threads through:
// from bytesandbrains/examples/custom_compiler_pass.rs:19-89
use bytesandbrains::compiler::{CompilerStage, PassError};
use bytesandbrains::proto::onnx::{ModelProto, StringStringEntryProto};
const WIRE_DOMAIN: &str = "ai.bytesandbrains.wire";
const SEND_OP: &str = "Send";
const TRACING_ID_KEY: &str = "example.tracing_id";
struct StampTracingIds {
counter: std::sync::atomic::AtomicU32,
}
impl CompilerStage for StampTracingIds {
fn name(&self) -> &'static str {
"stamp_tracing_ids"
}
fn run(&self, model: &mut ModelProto) -> Result<(), PassError> {
for func in &mut model.functions {
for node in &mut func.node {
if node.domain == WIRE_DOMAIN && node.op_type == SEND_OP {
let id = self
.counter
.fetch_add(1, std::sync::atomic::Ordering::Relaxed);
node.metadata_props.push(StringStringEntryProto {
key: TRACING_ID_KEY.into(),
value: format!("trace-{id}"),
});
}
}
}
Ok(())
}
}
The stage is registered through push_back_stage, which slots it
after the canonical pipeline. The compiler runs every user stage once
per emitted partition; the model: &mut ModelProto the stage receives
is the partition’s emitted model, not the merged top-level output.
// from bytesandbrains/examples/custom_compiler_pass.rs:106-117
let stage = StampTracingIds {
counter: std::sync::atomic::AtomicU32::new(0),
};
let compiled = Compiler::new()
.bind_backend::<bytesandbrains::ops::backends::cpu::CpuBackend>("compute")
.push_back_stage(stage)
.compile(recorded_model)?;
push_front_stage puts the user stage at the front of the user-stage
list, before any previously pushed stages but still after the canonical
pipeline. insert_stage(index, stage) inserts at a clamped index.
without_stage(name) removes a user stage by name, or disables a
canonical pass if the name matches one in CANONICAL_PASS_NAMES.
Run the example with:
cargo run --example custom_compiler_pass
The expected output reports one wire.Send NodeProto per emitted
partition that carries a tracing_id metadata entry.
Pipeline output
Compiler::compile() returns one ModelProto. There is no wrapper
struct, no separate compiled artifact type. The bare ModelProto is
the compiled form. The output carries:
- functions[0..n] one FunctionProto per partition + sub-Module body.
Each partition's main name matches a target arg
for bb::install.
- opset_import[] every (domain, version) referenced anywhere.
- metadata_props[] ai.bytesandbrains.compiled = "v1"
ai.bytesandbrains.binding.<target>.<slot> =
"<role>|<TYPE_NAME>|<slot_id>"
- graph left empty; partition bodies live in functions[].
bb::install(peer_id, addresses, compiled, &[target], Config::new())
walks compiled.functions[], resolves each entry in the targets
slice (exact name first, then the <target>#<hash> content-suffix),
reads the binding metadata for every resolved target, and brings up a
Node. Different BB Nodes pick different targets slices from the
same compiled ModelProto. The compiled form round-trips through
prost serialization with no extra wrapper, so a compiled
ModelProto is itself a Module whose op() replays the stored
function into a parent graph.
Per-pass testability
Each pass is a pure function on IR. Tests live alongside the pass
source files at bb-compiler/src/<pass>_tests.rs. Each test
constructs a minimal IR fragment, calls the pass directly, and asserts
on the output IR shape plus the diagnostics. Cross-pass integration
tests live in bb-compiler/src/driver_tests.rs and exercise the full
Compiler::new().bind_*().compile() chain against the bundled
examples.
That is the compiler. One pre-pipeline pass, seventeen canonical
passes, and a short post-pipeline tail (resolve_component_dependencies,
validate_all_slots_bound, stamp_compilation_metadata,
merge_partitions_into_one) that produces one ModelProto ready for
bb::install to pick a target from. Each pass pure, well ordered,
explicitly diagnosable. The next chapter walks the engine that
executes the compiled artifact.
Where this lives
bytesandbrains/bb-compiler/src/driver.rs:Compiler,CompilerStage,PassError, the bind chain.bytesandbrains/bb-compiler/src/runner.rs:CANONICAL_PASS_NAMES, the pass orchestrator.bytesandbrains/bb-compiler/src/error.rs:CompileError,ValidationError,SlotSource.bytesandbrains/bb-compiler/src/refine_polymorphic_value_info.rs: the pre-pipeline pass that narrowsTYPE_TENSORplaceholders.bytesandbrains/bb-compiler/src/partition_by_wire_ops.rs: the dissection pass,NetworkAnalysis,WireEdge.bytesandbrains/bb-compiler/src/validate.rs: rule-by-rule structural sanity checks.bytesandbrains/examples/custom_compiler_pass.rs: end-to-end custom stage example.bytesandbrains/docs/COMPILER.md: the bb-private architecture spec.