The Engine
Chapter 9 closed with the compiler emitting one ModelProto ready to
install. bb::install resolves the binding metadata, constructs each
bound concrete, and hands back a Node. From there a single function
on the public surface drives the entire runtime: Node::poll.
The engine is sans-IO. It holds no socket, no thread pool, no async
runtime. The host calls poll in its own executor, the engine returns
a vector of EngineStep values, and the host ships envelopes through
its transport of choice. Two queues separate concurrency models: the
frontier is a single-threaded VecDeque walked under &mut self, and
the ingress is a lock-free MPMC queue that external threads push into.
Bootstrap is no longer a lifecycle phase. As of 0.3.0 it is a regular
function call (for Modules) or a Contract method (for Components) that
the host kicks via Node::run_bootstrap(BootstrapTarget). Async Contract dispatch is
the only special completion path, and it routes through the same
ingress every external producer uses.
The sections below walk the install handoff, the Node::poll cycle,
the IngressEvent taxonomy, op invocation, async completion via
CompletionHandle, and the introspection surface the host uses to
observe what the engine is doing.
Construction via bb::install
bb::install is the only public Node constructor for production code.
It verifies the compilation passport stamped onto the model, looks up
the binding metadata for every target named in the slice, dedupes
shared slot bindings across targets, constructs each bound concrete
via the inventory’s construct_fn exactly once, and returns a Node
whose engine is ready to dispatch.
// from bytesandbrains/src/install.rs:237-243
pub fn install(
peer_id: PeerId,
addresses: Vec<Address>,
model: ModelProto,
targets: &[&str],
config: Config,
) -> Result<Node, InstallError> {
The targets slice is an ordered list of entry-point function names.
A single-Node demo passes &["MyModule"]; a peer hosting both
partitions of a federated round passes &["Client", "Server"]. The
install path resolves each name against model.functions[] (exact
match first, then the compiler’s <target>#<hash> content-suffix),
parses the matching binding.<target>.<slot> metadata for each
target, and groups bindings by slot name in first-seen call order.
Any slot whose contributors disagree on (TYPE_NAME, role) fails
with InstallError::SlotBindingConflict { slot, conflicts } before
any concrete is instantiated (bytesandbrains/src/install.rs:540-571).
An empty targets slice is rejected with InstallError::EmptyTargets
(bytesandbrains/src/install.rs:247-249).
The addresses bag is shared across every entry in the slice. The
install path registers it once against the Node’s own PeerId in the
engine’s AddressBook, so every target sees the same
local_addresses() view. An empty vec skips self-registration so
the Node renders outbound identity-protocol ops as “no addresses”
failures at the protocol level rather than synthesizing a
/p2p/<PeerId> placeholder.
The Config argument is a slot-keyed map of per-concrete user configs.
Concretes whose Config associated type is () need no entry; the
inventory’s construct_fn receives a &() default. Per-slot configs
apply to the single shared ComponentRef for that slot, so every
target jointly declaring the slot sees the same configured instance.
Failures from construction, missing inventory entries, slot type
mismatches, slot-binding conflicts, and passport verification all
surface as a typed InstallError before the first poll, so the host
never sees a half-built Node.
A complete single-target install at the user surface looks like this:
// from bytesandbrains/examples/custom_index_hnsw.rs:291-301
let target = compiled.functions[0].name.clone();
let mut node = install(
PeerId::from(1u64),
vec![Address::empty()],
compiled,
&[target.as_str()],
// HnswIndex derives `bb::Concrete` with the default
// `type Config = ()`, so install constructs a fresh
// instance via `HnswIndex::default()`. The locally-built
// `index` is only kept here for the dispatch demo below.
Config::new(),
)?;
A multi-target install (one Node hosting both halves of a federated round) passes the two compiled partition names in install order:
// from bytesandbrains/examples/single_node_federated_learning.rs:343-349
let mut node = install(
peer,
addrs,
compiled,
&[client_target.as_str(), server_target.as_str()],
Config::new(),
)?;
The Node returned holds a fully resolved engine: every NodeProto in
every target function has its OpDispatch stamped, every slot is
bound to a single shared ComponentRef, and every framework syscall
is registered. The ingress queue is empty. Any per-target Module
bootstrap functions are recorded onto BootstrapState::install_order
in slice order, and every Component’s bb::Bootstrap dispatcher is
recorded onto BootstrapState::component_bootstraps. Install does
not arm the queue: the host kicks bootstrap via
Node::run_bootstrap(BootstrapTarget) once it has staged any input
formals it intends to supply.
NodeConfig and the production caps
NodeConfig carries the construction-time knobs that bound the
engine’s per-cycle work. Every field carries a production-conservative
default. The minimal construction reads:
// from bytesandbrains/bb-runtime/src/node/config.rs:201-219
pub fn new(peer_id: PeerId) -> Self {
Self {
peer_id,
cycle_op_budget: DEFAULT_CYCLE_OP_BUDGET,
max_pending_async: DEFAULT_MAX_PENDING_ASYNC,
max_outbound_queue: DEFAULT_MAX_OUTBOUND_QUEUE,
bus_capacity: DEFAULT_BUS_CAPACITY,
per_hop_budget_ns: DEFAULT_PER_HOP_BUDGET_NS,
envelope_caps: EnvelopeCaps::default(),
backpressure_high_water_pct: DEFAULT_HIGH_WATER_PCT,
backpressure_k_before_silent: DEFAULT_K_BEFORE_SILENT,
backpressure_min_notice_interval_ns: DEFAULT_MIN_NOTICE_INTERVAL_NS,
ingress_byte_budget: DEFAULT_INGRESS_BYTE_BUDGET,
max_app_event_bytes: DEFAULT_MAX_APP_EVENT_BYTES,
max_invoke_inputs: DEFAULT_MAX_INVOKE_INPUTS,
max_invoke_bytes: DEFAULT_MAX_INVOKE_BYTES,
max_completion_result_bytes: DEFAULT_MAX_COMPLETION_RESULT_BYTES,
}
}
The four caps that shape per-poll behavior:
// from bytesandbrains/bb-runtime/src/node/config.rs:19-34
pub const DEFAULT_CYCLE_OP_BUDGET: Option<usize> = Some(1000);
pub const DEFAULT_MAX_PENDING_ASYNC: Option<usize> = Some(10_000);
pub const DEFAULT_MAX_OUTBOUND_QUEUE: Option<usize> = Some(10_000);
pub const DEFAULT_BUS_CAPACITY: usize = 1024;
cycle_op_budget is the soft per-poll fairness cap. After 1000
op-invocations in a single cycle the engine yields and emits
EngineStep::CycleBudgetExceeded { ops_invoked }; the host re-polls
to drain the rest. max_pending_async caps the number of in-flight
DispatchResult::Async commands; further async dispatches fail
synchronously when the cap is hit. max_outbound_queue is the cap
OutboundQueue::with_cap is constructed with. The push policy is
strict head-eviction FIFO: when a push would push queue.len() to
or above the cap, the queue pops from the front, increments
dropped_since_last_drain, and then pushes the new envelope at the
back:
// from bytesandbrains/bb-runtime/src/framework/outbound_queue.rs:49-60
pub fn push(&mut self, mut env: WireEnvelope) {
if env.schema_version == 0 {
env.schema_version = ENVELOPE_SCHEMA_VERSION;
}
if let Some(cap) = self.cap {
while self.queue.len() >= cap {
self.queue.pop_front();
self.dropped_since_last_drain += 1;
}
}
self.queue.push_back(env);
}
The newest envelope always lands in the queue; the oldest is the one
sacrificed. Phase 8 calls take_dropped_count once per cycle to read
and reset the counter and emits a single
EngineStep::OutboundDropped { count } if any drops occurred. The
shape matters at the boundary: a transport adapter that briefly stalls
sees the queue grow, then sheds load by dropping its oldest pending
envelopes rather than refusing new ones, so the freshest application
state has the best shot at the wire. bus_capacity bounds the typed
bus; overflow drops the oldest event and bumps a counter the engine
publishes as InfraEvent::BusOverflow on the next cycle.
with_cycle_op_budget, with_max_pending_async,
with_max_outbound_queue, with_bus_capacity, and
with_per_hop_budget_ns override individual caps. The matching
without_* methods disable a cap entirely.
with_ingress_byte_budget, with_max_app_event_bytes,
with_max_invoke_inputs, with_max_invoke_bytes, and
with_max_completion_result_bytes
(bb-runtime/src/node/config.rs:326-353) override the boundary
allocation caps documented under
Ingress boundaries and allocation safety.
NodeConfig::new_edge(peer) is a convenience factory that returns
a config with the tighter edge-device presets (8 MiB ingress budget,
64 KiB per-event cap, etc.) applied in one call. Node::with_config
accepts a replacement config post-install and re-applies the caps to
the engine immediately. The typical flow is:
let node = bb::install(...)?.with_config(NodeConfig::new(peer).with_cycle_op_budget(500));
The poll cycle
Node::poll is the single drive method. The host’s executor passes a
Context<'_>; the engine returns Poll::Pending when there is no work
(stashing the waker so the next ingress push or timer maturity wakes
the executor) and Poll::Ready(steps) when it has work to report.
// from bytesandbrains/bb-runtime/src/node/mod.rs:935-950
pub fn poll(
&mut self,
cx: &mut std::task::Context<'_>,
) -> std::task::Poll<Vec<crate::engine::EngineStep>> {
let steps = self.engine.poll();
if let Some(tap) = self.telemetry_tap.as_mut() {
for step in &steps {
tap(step);
}
}
if steps.is_empty() {
self.engine.ingress.register_waker(cx.waker());
return std::task::Poll::Pending;
}
std::task::Poll::Ready(steps)
}
Inside, Engine::poll runs eight phases in a fixed order. Each phase
is a no-op when its source queue is empty. The phases are:
- Drain ingress. Pop every
IngressEventthe queue holds, dispatch each variant to its handler. Inbound envelopes route toroute_envelope. Completions route tohandle_completion. App events seed module input sites. Invokes batch-seed several inputs under a singleExecId. Off-thread timer matures advance the scheduler clock. - Drain the frontier. Pop each fireable
(OpRef, ExecId)pair and callinvoke_one. Each invocation produces oneEngineStep. The loop breaks when the frontier empties or whencycle_op_budgetis hit. - Route bus events. For each
NodeEventon the bus, look up the subscribedNodeSiteIds, write aTriggerValueto each one at a freshExecId, and push the site’s downstream consumers onto the frontier. - Poll matured timers. Walk the scheduler heap, dispatch each matured
TimerKindto its handler (SleepandCompletionresume a parkedCommandId;Intervalreschedules itself). - Expire deadlines and drain pending completions. The engine scans
pending_asyncfor entries past theirdeadline_nsand fails them through the typedOpFailedpath. Then it drains everyPendingCompletionproduced by in-cycle hooks and callshandle_completionfor each. - Final frontier drain. Cascades produced by Phases 3 to 5 walk to quiescence (subject to the same op budget).
- Fire lifecycle phases. Any phase the host queued via
Engine::fire_lifecyclepushes its enrolled ops onto the frontier at a freshExecIdand cascade-drains them in this same cycle. - Drain the outbound queue. Every queued
WireEnvelopebecomes oneEngineStep::SendEnvelope. The host’s transport adapter picks an address from each envelope’sdest_peer_addresseslist and ships the bytes.
Engine::poll no longer auto-seeds bootstrap. Install records every
Module bootstrap on BootstrapState::install_order and every
Component bootstrap on BootstrapState::component_bootstraps without
arming the queue. The host kicks the queue via
Node::run_bootstrap(BootstrapTarget) (covered below). Between
phases 5 and 6 the engine runs an RTT-tracker
liveness scan, and the BootstrapComplete or WaitingOnBootstrap
step lands at the end of the cycle that drained the last bootstrap op
for the in-flight target.
Host-driven bootstrap
Pre-0.3.0 designs spelled bootstrap as a lifecycle phase. The 0.3.0
engine treats bootstrap as a host-driven phase with two surfaces
side by side. Module bootstraps record a FunctionProto body the
engine dispatches through the standard OpDispatch::FunctionCall
path. Component bootstraps run user Rust through the bb::Bootstrap
Contract method (Roles and Contracts covers the
trait shape). Both surfaces share a single BootstrapState bundle
on the engine, and both fire only when the host kicks the queue.
BootstrapState
Engine consolidates every bootstrap field into a single struct
(bb-runtime/src/engine/bootstrap.rs:246-299):
pub(crate) struct BootstrapState {
/// Per-target Module bootstrap metadata. Stamped at install
/// when a `module_phase = bootstrap` FunctionProto lands.
module_bootstraps: HashMap<String, ModuleBootstrap>,
/// Per-slot Component bootstrap metadata. Populated by the
/// `BootstrapDispatcherRegistration` install walk.
component_bootstraps: HashMap<String, ComponentBootstrap>,
/// Append-only sequence of Module bootstrap target names in
/// install order. The seeder walks front-to-back.
install_order: Vec<String>,
/// Host-supplied bootstrap input staging requests parked
/// awaiting a conflict-free slot.
pending_requests: VecDeque<OwnedBootstrapRequest>,
/// Currently executing bootstraps. Vec shape supports
/// concurrent disjoint Component bootstraps.
in_flight: Vec<InFlightBootstrap>,
/// Validated + staged bootstraps ready to fire once `in_flight`
/// drains (overlap promotion path).
waiting: VecDeque<QueuedBootstrap>,
/// Seed pointer into `install_order`. Bumps each time
/// `maybe_complete_bootstrap` observes a phase drained.
next_idx: usize,
/// Coarse "queue still has work" flag the body-op gate
/// consults to skip the bootstrap path on idle cycles.
pending: bool,
}
Replaces the pre-0.3.0 bootstrap_function_keys plus bootstrap_next_idx
plus bootstrap_pending plus bootstrap_exec_id quartet. Every read
and write of bootstrap state goes through BootstrapState; the
engine borrows it as one field.
Module bootstrap recording
Module::bootstrap(&self, g: &mut Graph) is the Module trait’s
optional second recording method:
// from bytesandbrains/bb-dsl/src/module.rs:117-128
pub trait Module {
fn name(&self) -> &str;
fn body(&self, g: &mut Graph);
fn bootstrap(&self, _g: &mut Graph) {}
}
The compiler emits the bootstrap recording as a separate
FunctionProto whose name suffixes the Module name with
__bootstrap and stamps MODULE_PHASE_BOOTSTRAP onto the
function’s MODULE_PHASE_KEY metadata. Install records each
target’s FunctionKey on BootstrapState::install_order in the
slice order the host passed targets (so multi-target installs
see deterministic bootstrap ordering across partitions). Install
does NOT seed the queue. The host kick is explicit so test
fixtures, daemons, and orchestrators can stage input formals via
the bootstrap request queue before the first body op runs.
Component bootstrap (the bb::Bootstrap Contract)
bb::Bootstrap is the Contract trait every #[derive(bb::Concrete)]
type implements. The derive emits a default no-op impl so most
concretes pay zero boilerplate; authors override when a Component
needs to allocate buffer pools, mmap state, prime a calibration
cache, or dial seed peers before its primary Contract methods
run. The trait lives at
bytesandbrains/bb-runtime/src/contracts/bootstrap.rs:
pub trait Bootstrap {
type Error: std::error::Error + Send + Sync + 'static;
fn bootstrap(&mut self, _ctx: &mut BootstrapCtx)
-> Result<(), Self::Error>
{
Ok(())
}
}
The host fires Component bootstraps explicitly through the slot the
bind chain bound the concrete onto via
Node::run_bootstrap(BootstrapTarget::Slots(&[slot, ...])). Disjoint
Component bootstraps fire concurrently; the touch-set gate (below)
parks only the touched components. See
Roles and Contracts for the override walkthrough.
Host kick: run_bootstrap
One host entry point drives the queue
(bb-runtime/src/node/mod.rs):
pub fn run_bootstrap(
&mut self,
target: BootstrapTarget<'_>,
) -> Result<Vec<EngineStep>, BootstrapError>;
BootstrapTarget is the open enum that selects which bootstraps
to fire:
pub enum BootstrapTarget<'a> {
/// Drive every install-order Module bootstrap target on this Node.
All,
/// Drive specific Module bootstrap targets by name (with empty inputs).
ModuleNames(&'a [&'a str]),
/// Drive Module bootstrap targets with explicit inputs.
ModuleRequests(&'a [BootstrapRequest]),
/// Drive Component bootstraps by slot name.
Slots(&'a [&'a str]),
}
The variant chosen routes through one of the following paths:
-
BootstrapTarget::Allarms the install-order queue, seeds the first Module target, and drivesEngine::pollin a loop until every queued bootstrap drains (or one suspends on async). Returns the fullVec<EngineStep>the bootstrap path emitted. Idempotent: a Node whose queue already drained returns an empty vec. Snapshot restore deliberately clears the pending flag so a restored Node does not re-fire bootstrap. -
BootstrapTarget::ModuleRequests(&[BootstrapRequest])is the batch entry for Module bootstraps with input formals. EachBootstrapRequest { target: &str, inputs: &[(&str, &[u8])] }stages owned-form bytes into the target’s input sites via the immediate-fire path. The engine validates the batch atomically up front (unknown target or duplicate batch entry surfacesBootstrapError::UnknownTargetorBootstrapError::AlreadyTransitivelyQueuedbefore any staging happens).BootstrapTarget::ModuleNames(&[&str])is the sugar form that fills empty-input requests for the no-formals case. -
BootstrapTarget::Slots(&[&str])fires Component bootstraps by slot name. The engine resolvesslot → ComponentRefvia thebootstrap.component_bootstrapsregistry the install walk populated and dispatches through the per-TBootstrapdispatcher.
BootstrapRequest and input staging
BootstrapRequest carries the target name plus the input formals
the host supplies. The shape:
pub struct BootstrapRequest<'a> {
pub target: &'a str,
pub inputs: &'a [(&'a str, &'a [u8])],
}
run_bootstrap(BootstrapTarget::ModuleRequests(_)) validates the
batch in one pass before any bytes stage. BootstrapError carries
the per-call rejection variants:
UnknownTarget { target, available }rejects a name absent fromBootstrapState::install_order.AlreadyTransitivelyQueued { target }rejects a re-staging of a target the queue already holds. Cycle defense.UnknownInput { target, input, declared }rejects an input name the target’s bootstrap recording did not declare viag.input.MissingInput { target, input }rejects a gap when a declared formal has no matching entry ininputs.
Validated requests follow the Principle 1a copy (try_charge then
try_reserve_exact then extend_from_slice,
bb-runtime/src/engine/core.rs:1534-1567) and the framework-owned
BytesValue carriers land in the bootstrap’s slot table entries at
the body’s fresh ExecId. The caller’s borrowed &[u8] slices may
drop the moment run_bootstrap returns. A bootstrap that takes no
formals records zero g.input calls; the host kicks it via
Node::run_bootstrap(BootstrapTarget::All) or
Node::run_bootstrap(BootstrapTarget::ModuleNames(&["<target>"])).
Per-component gate: is_op_locked
While a bootstrap holds an in-flight entry, the engine gates
body-phase ops by the bootstrap’s touch set: the closure of
every ComponentRef the bootstrap body reaches. The gate runs in
Engine::is_op_locked (bb-runtime/src/engine/core.rs:1762-1806).
Resolution order:
- No in-flight bootstraps fires the op (gate dormant).
- The op’s
ExecIddescends from some in-flight bootstrap’sExecIdvia thepending_calls.parent_exec_idchain. The op fires. The bootstrap body itself and its sub-FunctionCalls invoke ops freely. - Resolve the touched
ComponentReffrom the op’s NodeProto viaSLOT_ID_KEY → slot_id_to_cref. If the touched cref sits in some in-flight bootstrap’stouch_set, the op parks on the frontier. Stateless syscalls (noslot_idstamp, no role binding) fire because they reach no component.
pop_frontier_fireable (bb-runtime/src/engine/core.rs:2096-2110)
scans the frontier for the first entry the gate accepts; parked ops
stay on the frontier until the in-flight set drops. Disjoint
components keep firing during a bootstrap.
The touch set is computed once at install time by
Engine::compute_touch_set
(bb-runtime/src/engine/core.rs:1145-1196): walk the bootstrap
function body, read every NodeProto’s slot_id stamp into the set,
and recurse into every FunctionCall callee. The result stamps onto
BootstrapState::module_bootstraps[name].touch_set before any
bootstrap fires. Gate-time is_op_locked reads the pre-stamped set
in O(1).
Conflict queue and concurrent in-flight bootstraps
BootstrapState::process_pending_requests
(bb-runtime/src/engine/bootstrap.rs:459-497) drains the parked
request queue once per poll. For each request the engine looks up
the target’s touch set and compares against every currently
in-flight bootstrap’s touch_set. Disjoint targets surface as
ReadyBootstraps the engine seeds immediately; overlapping ones
park as QueuedBootstraps in bootstrap.waiting.
BootstrapState::on_bootstrap_drained
(bb-runtime/src/engine/bootstrap.rs:499-522) retires the in-flight
entry by ExecId, then walks waiting once and promotes any waiter
whose touch set no longer conflicts.
Engine::maybe_complete_bootstrap
(bb-runtime/src/engine/core.rs:1824-1887) is the call site: it
runs after every drain phase, advances next_idx, and fires
promoted waiters in-cycle so the host sees every
BootstrapComplete step and the body’s first ops in a single poll
when the budget permits.
BootstrapStatus
Node::bootstrap_status() (bb-runtime/src/node/mod.rs:925-927)
returns BootstrapStatus::{Idle, Running, WaitingForInput} without
advancing any queue. Running means at least one entry occupies
bootstrap.in_flight. WaitingForInput means the install-order
queue still has unseeded targets or host-staged requests sit on
pending_requests / waiting. Idle otherwise. The host consults
this when deciding whether to keep polling or surface a prompt for
more inputs.
The two queues
The frontier and the ingress are deliberately separate data structures with separate concurrency models.
The frontier is a single-threaded VecDeque<(OpRef, ExecId)>. The
engine pops the head, invokes the op, writes its outputs, and pushes
every downstream consumer whose inputs are now all ready. No locking,
no atomics, no allocation per push after warm-up. FIFO ordering is
load-bearing: two ops that become ready in the same cycle fire in the
order they were pushed.
The ingress is a lock-free MPMC queue plus an AtomicWaker slot.
External threads push IngressEvents; the engine’s Phase 1 drains
them on the next poll. The lock-free push path costs one CAS plus an
atomic store plus one waker swap. The Node’s only Send + Sync seam
to other threads is the Arc<IngressQueue> handle. Everything else
lives behind the single-threaded Engine.
// from bytesandbrains/bb-runtime/src/ingress.rs:182-252
pub struct IngressQueue {
queue: ConcurrentQueue<IngressEvent>,
waker: AtomicWaker,
dropped_overflow: AtomicU64,
completion_result_cap: AtomicUsize,
}
impl IngressQueue {
pub fn new() -> Self {
Self::with_capacity(DEFAULT_INGRESS_CAPACITY)
}
#[allow(clippy::result_large_err)]
pub fn push(&self, event: IngressEvent) -> Result<(), IngressEvent> {
match self.queue.push(event) {
Ok(()) => {
self.waker.wake();
Ok(())
}
Err(PushError::Full(ev)) => {
self.dropped_overflow.fetch_add(1, Ordering::Relaxed);
Err(ev)
}
Err(PushError::Closed(ev)) => Err(ev),
}
}
}
DEFAULT_INGRESS_CAPACITY is bus_capacity * 4 (4096 with the spec
default). Overflow returns the event back to the caller and bumps the
dropped_overflow counter; the transport adapter decides whether to
retry, drop with a metric, or escalate as back-pressure.
drain_all() is what Phase 1 calls. register_waker is what
Node::poll calls when it returns Pending. len() is approximate
(the MPMC queue’s len is a snapshot).
Off-thread producers reach the queue through Node::ingress_handle(),
which returns an IngressQueueRef cheap-clone wrapper:
// from bytesandbrains/bb-runtime/src/node/mod.rs:682-684
pub fn ingress_handle(&self) -> crate::ingress::IngressQueueRef {
crate::ingress::IngressQueueRef::new(self.engine.ingress_queue_handle())
}
A transport adapter on its own thread holds an IngressQueueRef,
receives bytes from the socket, decodes them into a WireEnvelope,
and pushes IngressEvent::EnvelopeFrom { src_peer, envelope }. The
push wakes the engine; the next poll cycle drains and routes.
IngressEvent variants
Every external entry point pushes one of eight variants onto the ingress:
// from bytesandbrains/bb-runtime/src/ingress.rs:47-158
pub enum IngressEvent {
EnvelopeFrom {
src_peer: crate::ids::PeerId,
envelope: crate::envelope::WireEnvelope,
src_observed_address: Option<crate::framework::Address>,
},
AppEvent {
module_name: String,
input_name: String,
value_bytes: Vec<u8>,
},
TimerMatured {
at_ns: u64,
},
Invoke {
module_name: String,
inputs: Vec<(String, Vec<u8>)>,
exec_id: crate::ids::ExecId,
},
Completion {
cmd_id: CommandId,
results: Vec<Vec<u8>>,
},
CompletionFailed {
cmd_id: CommandId,
detail: String,
},
SendFailed {
wire_req_id: u64,
peer: Vec<u8>,
reason: &'static str,
},
AppIngressError {
source: AppIngressSource,
byte_count: usize,
kind: AppIngressErrorKind,
},
}
EnvelopeFrom carries an inbound wire envelope plus the transport’s
identified source peer and an optional reflexive
src_observed_address (the dialer endpoint the adapter actually
saw). The in-process router populates the observed field via
IngressEvent::from_in_process(src_peer, envelope) at
bb-runtime/src/ingress.rs:167-177; production transports source it
from whatever NAT-traversed multiaddr the connection negotiated.
Phase 1 of the poll cycle merges both the sender-claimed
envelope.src_peer_addresses bag and the observed address into the
receiver’s AddressBook entry for src_peer
(bb-runtime/src/engine/poll.rs:497-505,1005-1062) so future replies
can dial back on any reachable interface. The engine routes each
SlotFill by the multiaddr in its dest_suffix. AppEvent and Invoke are the
host-side data entry points; the convenience methods deliver_event
and invoke on Node wrap the push and verify the module name
exists. TimerMatured is an off-thread clock hook that advances the
scheduler when the host owns its own clock. Completion and
CompletionFailed are the async-Contract paths: a worker thread or
remote RPC reports the result of a ContractResponse::Later
dispatch. SendFailed lets a transport adapter report an outbound
delivery failure without waiting for the engine’s per-poll TTL drain
to evict the request-tracker entry. AppIngressError is the
cross-thread bridge for off-thread sinks (notably
CompletionSink::complete when the per-completion byte cap fires)
that lack direct &mut bus access. The engine drains the variant and
publishes a matching InfraEvent::AppIngressError so subscribers see
the rejection; synchronous Node::deliver_event and Node::invoke
publish directly without routing through this variant.
The three high-traffic variants in production are EnvelopeFrom,
Completion, and CompletionFailed. The wire and async paths run
through them. The host-side entry points (deliver_event, invoke)
push AppEvent and Invoke from the application thread that owns
the data.
Op invocation and dispatch_atomic
Each frontier pop calls Engine::invoke_one. The engine looks at the
op’s pre-stamped OpDispatch (one of four variants) and routes the
call. The two variants that dominate at runtime are Stateless for
framework syscalls and Atomic { component_ref } for every
Contract-method op.
Atomic dispatch is the load-bearing surface for every bound
concrete. Per Roles and Contracts the framework
speaks to user code through seven method-style Contract trait
families (Index, Aggregator, Model, Codec, DataSource,
PeerSelector, Backend) plus the Protocol slot, which authors
register with the declarative bb::register_protocol!{} macro
rather than a Contract trait. The #[derive(bb::<Role>)] macros
bridge each Contract impl to the engine-internal <Role>Runtime
trait. dispatch_atomic is the entry point that bridge emits.
When the engine fires an atomic op:
- It takes the dispatching component out of
Engine.componentsviatake_component(which isOption::takeon the slot) so a liveComponentsViewcan hand sibling components to the user’s method while the dispatching slot is held exclusively. - It builds the per-call
RuntimeResourceRef<'_>by split-borrowing the framework primitives (scheduler, request tracker, peer governor, address book, outbound queue), the bus, the ingress, and the per-call context (OpRef,ExecId,self_peer, thenext_command_idallocator). - It calls the bridge’s
dispatch_atomicarm, which downcasts each input&dyn SlotValueto the expected primitive, opens aCompletionHandleviactx.open_completion::<R, E>(), and calls the user’s Contract method. - It interprets the returned
ContractResponse:ContractResponse::Now(Ok(value))translates toDispatchResult::Immediate(bincode::serialize(value)). The engine writes the bytes to the op’s output sites at the currentExecIdand pushes ready consumers onto the frontier.ContractResponse::Now(Err(e))becomes the dispatch error and fails the op through theOpFailedpath.ContractResponse::Latertranslates toDispatchResult::Async(handle.cmd_id()). The engine records the op inpending_async, emitsEngineStep::AsyncSuspended { op_ref, exec_id, cmd_id }, and leaves the op parked until aCompletionlands on the ingress.
After the call returns, the engine drains
ctx.pending_completions (in-cycle hooks that called
ctx.complete_command) onto the engine’s pending_completions
queue and restores the dispatching component into Engine.components
so the next op can dispatch.
CompletionHandle and async dispatch
CompletionHandle<R, E> is the type the Contract method holds while
its work is in flight. It carries the CommandId the engine parked
the op behind and a shared Arc<dyn CompletionSink> that receives the
serialized result.
// from bytesandbrains/bb-runtime/src/completion.rs:23-66
pub struct CompletionHandle<R, E> {
cmd_id: CommandId,
sink: Arc<dyn CompletionSink>,
_marker: PhantomData<fn() -> (R, E)>,
}
impl<R, E> CompletionHandle<R, E> {
pub fn new(cmd_id: CommandId, sink: Arc<dyn CompletionSink>) -> Self {
Self {
cmd_id,
sink,
_marker: PhantomData,
}
}
pub fn cmd_id(&self) -> CommandId {
self.cmd_id
}
}
impl<R, E> CompletionHandle<R, E>
where
R: serde::Serialize,
E: std::fmt::Display,
{
pub fn complete(self, result: Result<R, E>) {
match result {
Ok(value) => {
let bytes = bincode::serialize(&value).unwrap_or_default();
self.sink.complete(self.cmd_id, &bytes);
}
Err(e) => {
let detail = e.to_string();
self.sink.fail(self.cmd_id, &detail);
}
}
}
}
The handle is Send + Sync because the inner Arc<dyn CompletionSink>
is. Contract methods can ship it across thread boundaries freely.
bb-runtime implements CompletionSink on the Node’s IngressQueue.
The sink takes the bincode-serialized bytes as a borrowed &[u8] and
the failure detail as a borrowed &str (Principle 1a: the sink copies
into framework-owned storage before returning), then pushes
IngressEvent::Completion or IngressEvent::CompletionFailed. The
engine drains those on the next poll cycle and unparks the suspended
op.
The ContractResponse<R, E> discriminant on the Contract method’s
return type lets the implementation pick between an inline result
that skips the park cycle and a deferred result that uses the
handle:
// from bytesandbrains/bb-runtime/src/completion.rs:71-76
pub enum ContractResponse<R, E> {
Now(Result<R, E>),
Later,
}
The worker-thread pattern is the canonical structure for compute-heavy
or non-blocking work. The Contract method captures the
CompletionHandle onto a work item, ships it to a worker via an
mpsc channel, and returns Later. The worker does the work and calls
completion.complete(...) from off the engine thread; the result
lands back on the ingress as IngressEvent::Completion. The
custom_index_hnsw example wires this end to end:
// from bytesandbrains/examples/custom_index_hnsw.rs:193-234
impl Index for HnswIndex {
type Vector = [f32];
type Error = HnswError;
fn add(
&mut self,
_ctx: &mut bytesandbrains::runtime::RuntimeResourceRef<'_>,
vec: &Self::Vector,
completion: CompletionHandle<u64, Self::Error>,
) -> ContractResponse<u64, Self::Error> {
self.send(WorkItem::Add {
vec: vec.to_vec(),
completion,
});
ContractResponse::Later
}
fn search(
&self,
_ctx: &mut bytesandbrains::runtime::RuntimeResourceRef<'_>,
query: &Self::Vector,
k: u32,
completion: CompletionHandle<Vec<(u64, f32)>, Self::Error>,
) -> ContractResponse<Vec<(u64, f32)>, Self::Error> {
self.send(WorkItem::Search {
query: query.to_vec(),
k,
completion,
});
ContractResponse::Later
}
fn remove(
&mut self,
_ctx: &mut bytesandbrains::runtime::RuntimeResourceRef<'_>,
_id: u64,
completion: CompletionHandle<(), Self::Error>,
) -> ContractResponse<(), Self::Error> {
self.send(WorkItem::Remove { completion });
ContractResponse::Later
}
}
The matching worker side calls completion.complete(Ok(value)) from
its own thread once the work finishes. The push routes through the
ingress queue the framework already wired into the handle; no shared
state crosses the boundary beyond the queue itself.
For an immediate result the implementation returns
ContractResponse::Now(Ok(value)) and drops the handle unused. The
engine never registers the op in pending_async and writes the
result directly into the op’s output sites.
What EngineStep carries
Engine::poll returns a Vec<EngineStep> per cycle. The enum has
seventeen variants; each one is something the host can observe and
react to. The complete list, in declaration order:
// from bytesandbrains/bb-runtime/src/engine/step.rs:13-194
pub enum EngineStep {
OpCompleted { op_ref, exec_id, sites_written },
AsyncSuspended { op_ref, exec_id, cmd_id },
SendEnvelope (WireEnvelope),
AppEvent { module_name, topic, value_bytes },
LifecycleFired { phase },
BootstrapComplete,
WaitingOnBootstrap,
OpFailed { op_ref, exec_id, error },
CycleBudgetExceeded { ops_invoked },
OutboundDropped { count },
WireDecodeFailed { hash, payload_size, detail },
WireReceiveFailed { src_peer, fill_index, actual_hash,
payload_size, kind },
WireTimeout { wire_req_id, target_site,
started_at_ns, parked_op },
PeerBlocked { peer, reason },
PeerDown { peer },
PeerUp { peer },
PeerResolveFailed { peer, op_ref, exec_id },
}
Grouped by purpose:
Op outcomes (four). OpCompleted { op_ref, exec_id, sites_written }
is the synchronous-completion path: the op fired, wrote its outputs,
and pushed downstream consumers. AsyncSuspended { op_ref, exec_id, cmd_id } is the parking path: the op returned
ContractResponse::Later and is now waiting on the named cmd_id.
OpFailed { op_ref, exec_id, error } carries a typed OpError from
either a synchronous failure (ContractResponse::Now(Err(_))) or a
deadline-expired pending_async entry; the same context is mirrored
on the bus as InfraEvent::OpFailure. LifecycleFired { phase }
fires when the host queued a lifecycle phase via
Engine::fire_lifecycle; the engine drained that phase’s enrolled
ops in this cycle and surfaces the phase name (such as
"PreShutdown") so the host can hook teardown.
Bootstrap signals (two). BootstrapComplete fires at most once
per Node lifetime, on the cycle that drained the last bootstrap op.
WaitingOnBootstrap is the suspension signal: at least one
bootstrap-phase op returned async, and the engine is gating body
work until the matching completion lands on the ingress.
App-side I/O (two). SendEnvelope(WireEnvelope) is the outbound
shipment path: the transport adapter picks one entry from
envelope.dest_peer_addresses and writes the bytes. AppEvent { module_name, topic, value_bytes } carries a function.output
landing or a mid-cycle AppEmit / AppNotify; value_bytes is
empty for marker-only notifications.
Back-pressure surfaces (four). CycleBudgetExceeded { ops_invoked } is the soft fairness yield, emitted at most once
per poll. OutboundDropped { count } is the
OutboundQueue head-eviction count since the previous poll, also
emitted at most once. WireDecodeFailed { hash, payload_size, detail } is the inbound payload-decode telemetry hook; the
envelope’s slot fill was dropped before any user code ran.
WireReceiveFailed { src_peer, fill_index, actual_hash, payload_size, kind } mirrors InfraEvent::WireReceiveError for per-fill typed-decode
failures (allocation, budget, or wire-type mismatch). Other fills in
the same envelope still deliver: this is the partial-delivery
telemetry hook.
Wire and peer health (five). WireTimeout { wire_req_id, target_site, started_at_ns, parked_op } fires when
RequestTracker::drain_stale evicts an in-flight chain entry whose
per-entry TTL elapsed without a matching response; the originator’s
local DAG continuation parked behind parked_op is failed with
“chain timeout”. PeerBlocked { peer, reason: BlockReason } fires
on the inbound side when PeerGovernor::check_inbound rejects an
envelope before any slot is written; the BlockReason is one of
Blocklisted, NotAllowlisted, or Cooldown { retry_ns }.
PeerDown { peer } and PeerUp { peer } are the consecutive-failure
transitions tracked by PeerGovernor::record_failure /
record_success; each fires at most once per crossing.
PeerResolveFailed { peer, op_ref, exec_id } fires when a
wire::Send cannot resolve its destination’s addresses against the
AddressBook: either the peer is unknown, its address list is
empty, or the Send op’s peer input did not carry a valid
PeerId. The Send produces no envelope; the host reacts via this
event and the mirrored InfraEvent::PeerResolveFailure.
The full enum, with the variant-level field docs, lives at
bytesandbrains/bb-runtime/src/engine/step.rs:13-194.
App-side entry points and introspection
The host pushes external work through three convenience methods on
Node plus the off-thread ingress_handle for transport adapters:
// from bytesandbrains/bb-runtime/src/node/mod.rs:958-1074
pub fn deliver_inbound(
&mut self,
src_peer: crate::ids::PeerId,
bytes: &[u8],
) -> Result<(), crate::errors::delivery::DeliveryError>;
pub fn deliver_event(
&mut self,
module: &str,
input: &str,
value_bytes: &[u8],
) -> Result<(), crate::errors::delivery::DeliveryError>;
pub fn invoke(
&mut self,
module: &str,
inputs: &[(&str, &[u8])],
) -> Result<crate::ids::ExecId, crate::errors::delivery::DeliveryError>;
Every external boundary takes a borrowed slice. The framework copies
the bytes into a framework-owned Vec<u8> inside the call. The
caller may free the slice the moment the call returns. Many
transport stacks (libp2p reads, QUIC io_uring completion buffers,
DPDK queues) hand the framework a pointer into THEIR memory. The
framework must not keep a reference past the call (the transport
reclaims the buffer immediately on return) and must not take
ownership (the transport’s allocator owns the buffer, not the
framework’s). The copy IS the ownership transition.
deliver_inbound decodes the wire bytes through
EnvelopeCodec::decode_capped (respecting the envelope_caps cap
list) and pushes EnvelopeFrom. deliver_event pushes a single
AppEvent onto the named module’s named input. invoke allocates
a fresh ExecId, pushes Invoke with the supplied inputs, and
returns the ExecId so the host can correlate downstream
AppEvent / OpCompleted steps back to the call.
Ingress boundaries and allocation safety
External boundaries entering the engine never panic on allocation
failure or budget exhaustion. The runtime caps the payload against
the matching NodeConfig knob, charges the cumulative byte total
against Engine::ingress_byte_budget, and try_reserve_exacts
framework-owned storage. Any failure returns a typed
DeliveryError synchronously and publishes a matching
InfraEvent::AppIngressError (host-pushed surface) or
InfraEvent::WireReceiveError (wire surface) on the bus. The
offending bytes drop at the boundary, the engine continues
processing other envelopes and events, and the parked op the
failure interrupts times out naturally.
The per-source caps and the cumulative ingress budget form a two-level defence:
// from bytesandbrains/bb-runtime/src/node/config.rs:51-72,165-195
pub const DEFAULT_INGRESS_BYTE_BUDGET: usize = 256 * 1024 * 1024;
pub const DEFAULT_MAX_APP_EVENT_BYTES: usize = 1024 * 1024;
pub const DEFAULT_MAX_INVOKE_INPUTS: usize = 100;
pub const DEFAULT_MAX_INVOKE_BYTES: usize = 10 * 1024 * 1024;
pub const DEFAULT_MAX_COMPLETION_RESULT_BYTES: usize = 4 * 1024 * 1024;
ingress_byte_budget is the total in-flight ingress bytes the
engine holds across the ingress queue plus the slot table plus
pending async completion buffers. Default 256 MiB; the
NodeConfig::edge() preset tightens to 8 MiB for resource-bound
devices. max_app_event_bytes is the per-call cap for
deliver_event. max_invoke_inputs plus max_invoke_bytes cap
the count and cumulative payload of a single invoke.
max_completion_result_bytes caps the CompletionSink::complete
result. Each per-source cap rejects oversize single payloads with
DeliveryError::OversizePayload; the cumulative budget guards
sustained load with DeliveryError::BudgetExceeded. Boundary
callers try_charge before installing a payload, and the
slot-table writer releases the charge through
SlotValue::charged_bytes on overwrite or eviction
(bb-runtime/src/engine/core.rs:540-552). Hot-path cost is one
saturating-add plus one comparison per fill admission.
CompletionSink::complete and fail follow the same shape:
// from bytesandbrains/bb-runtime/src/runtime.rs:29-97
impl CompletionSink for IngressQueue {
fn complete(&self, cmd_id: CommandId, result_bytes: &[u8]) {
// cap-check, charge ingress_byte_budget, try_reserve_exact,
// extend_from_slice, push IngressEvent::Completion
}
fn fail(&self, cmd_id: CommandId, detail: &str) {
// truncate at COMPLETION_DETAIL_CAP (4 KiB) on a UTF-8
// character boundary, push IngressEvent::CompletionFailed
}
}
The bytes never linger as a borrowed slice past the call. The
detail string is truncated rather than rejected so the host’s
display message always lands. Failures publish
InfraEvent::AppIngressError { source: Completion { command }, ... }
and drop. The parked op times out naturally on the host side, which
is the same surface as a missing completion.
Inside the engine, normal Rust allocation patterns apply.
Components are designed to play by the runtime contract: a
Vec::push that runs out of memory inside a Contract body is a
process abort, not a framework-handled failure. The fallibility
line lives at the boundary itself.
Introspection lives on the same surface:
// from bytesandbrains/bb-runtime/src/node/mod.rs:501-538
pub fn incarnation(&self) -> u64;
pub fn loaded_modules(&self) -> Vec<&str>;
pub fn linked_components(&self) -> Vec<&ComponentHandle>;
pub fn peer_id(&self) -> crate::ids::PeerId;
pub fn execution_state(
&self,
exec_id: crate::ids::ExecId,
) -> Option<&crate::engine::ExecutionState>;
pub fn pending_async_count(&self) -> usize;
pub fn engine_stats(&self) -> crate::engine::EngineStats;
engine_stats() returns the hot-path counters: frontier length, bus
length, pending-async count, slot-table occupancy, ingress depth,
outbound-queue depth, event-subscription count, registered component
count, and graph-slot count. The numbers are useful for the host’s
observability dashboard and for deciding when to throttle the input
side.
set_telemetry_tap(closure) installs a per-EngineStep observer that
the Node invokes once per produced step. The tap runs inside poll,
so the host can wire it to a structured logger without touching the
returned vec. The closure type is FnMut(&EngineStep) + 'static.
The Node also exposes peer-governor controls (block_peer,
unblock_peer, set_allowlist), address-book mutators
(add_peer, drop_peer), local-address accessors
(local_addresses, add_local_address, forget_local_address),
and read-only views on the peer health (peer_health,
resolve_peer_addresses).
PeerState, PeerHealth, and the block reasons
The engine’s per-peer bookkeeping lives in a single bundle on
FrameworkComponents. PeerState collapses four sub-primitives
under one field so component authors reach them through
ctx.peer_state.{gate, governor, backoff, backpressure} instead of
four siblings on the framework bundle:
// from bytesandbrains/bb-runtime/src/framework/peer_state.rs:21-37
pub struct PeerState {
pub gate: PeerGate,
pub governor: PeerGovernor,
pub backoff: BackoffTable,
pub backpressure: BackpressureTracker,
}
PeerGate is the named concurrency limiter consulted by
Limit.Acquire / Limit.Release syscalls. PeerGovernor is the
single source of truth for blocklist / allowlist policy plus
per-peer health tracking; the engine’s Phase 1 envelope router calls
check_inbound, and the compiler-inserted PeerHealthGate syscall
op upstream of every wire::Send calls check_outbound.
BackoffTable is the per-peer exponential backoff schedule the
outbound gate consults at check_outbound time.
BackpressureTracker is the receiver-side per-peer state for the
backpressure protocol: it tracks notices emitted to each sender, the
duplicate-suppression window, and the K-then-silent fallback consulted
at Phase 1 when ingress depth crosses the high-water mark.
PeerHealth is the per-peer snapshot the governor publishes:
// from bytesandbrains/bb-runtime/src/framework/peer_governor.rs:52-65
pub struct PeerHealth {
pub consecutive_failures: u32,
pub last_event_ns: u64,
pub down: bool,
}
The default failure threshold is 5
(DEFAULT_FAILURE_THRESHOLD: u32 = 5 at
bytesandbrains/bb-runtime/src/framework/peer_governor.rs:33):
five consecutive record_failure calls without an intervening
record_success mark down = true and produce a
LifecycleTransition::WentDown, which the engine surfaces as
EngineStep::PeerDown. A subsequent success after a down streak
produces LifecycleTransition::CameUp and surfaces as
EngineStep::PeerUp. NodeConfig does not currently expose a
threshold override; production tuning is via the
PeerGovernor::with_failure_threshold builder during
construction.
BlockReason is the rejection taxonomy that rides on
PeerBlocked { peer, reason }:
// from bytesandbrains/bb-runtime/src/framework/peer_governor.rs:36-49
pub enum BlockReason {
Blocklisted,
NotAllowlisted,
Cooldown { retry_ns: u64 },
}
Blocklisted is the explicit block_peer deny. NotAllowlisted
fires when an allowlist is configured (set_allowlist(Some(_)))
and the peer is not on it. Cooldown { retry_ns } is the outbound
gate’s failure-driven back-off; the inbound path never produces
Cooldown because the gate that consults BackoffTable only sits
upstream of wire::Send.
BackoffTable is exponential with a configurable base and cap.
The defaults match a fast initial retry that caps out at a minute:
// from bytesandbrains/bb-runtime/src/framework/backoff_table.rs:22-27
pub const DEFAULT_BASE_NS: u64 = 10_000_000;
pub const DEFAULT_MAX_DELAY_NS: u64 = 60_000_000_000;
10 ms base, 60 s cap. The schedule is
delay(n) = min(BASE_NS * 2^n, MAX_DELAY_NS). A peer’s
BackoffState carries attempts, last_attempt_ns, and
next_retry_ns; should_retry(peer, now_ns) returns true when
no failure was ever recorded or when now_ns >= state.next_retry_ns.
A record_success removes the entry entirely so the next attempt
proceeds without a cooldown.
Deadline propagation: strategy and lineage
A request that touches three peers carries the latency tail of all three. Fixed per-call timeouts paper over the variance: pick a number that catches the slow path and the fast path starves; pick one that fits the fast path and the slow path fails healthy work. Dean and Barroso (The Tail at Scale, CACM 2013) called this the central scaling problem in distributed systems and argued for per-call deadlines that flow through the call graph the same way distributed-tracing spans do. The engine takes that posture end-to-end: every wire.Send carries a deadline, every receiver inherits a remaining budget, and the engine reaps anything that runs past its allotted nanoseconds.
The lineage begins at compile time with a graph walk over the cross-
node DAG. derive_wire_deadlines iterates every function and every
NodeProto, picks out each wire.Send, reads the optional
chain_depth metadata (defaulting to 1 for single-hop sends), and
stamps deadline_ns = chain_depth × per_hop_budget_ns as an
attribute (bytesandbrains/bb-compiler/src/derive_wire_deadlines.rs:42-64
:73-88for the idempotent upsert). At runtime the wire envelope carries the same budget across the network: theWireEnvelope.remaining_deadline_nsproto field is documented as Dapper-style deadline propagation (bytesandbrains/proto/bb_core.proto:92-98), the receiver unpacks it into the per-ExecIdInboundContexton arrival (bytesandbrains/bb-runtime/src/engine/poll.rs:657-661), and awire.Sendforwarding inside a chain forwards the budget minus elapsed local time on the next envelope (bytesandbrains/bb-runtime/src/engine/poll.rs:776-789). Phase 5 of the poll cycle drains expired entries frompending_async, failing each through the typedOpFailedpath with kindTimeout(bytesandbrains/bb-runtime/src/engine/poll.rs:99-121). The shape mirrors the per-span deadline propagation described in Sigelman et al., Dapper, a Large-Scale Distributed Systems Tracing Infrastructure (Google Technical Report, 2010): one budget is minted at the root, every hop inherits the remainder, and the leaf enforces by clock.
The runtime adaptive layer refines the compile-time number once the
network has actually been observed. RttTracker::estimate_budget_ns
walks five tiers (per-edge, per-site, per-chain, global prior, and
the static NodeConfig.per_hop_budget_ns fallback) and returns the
first warm EMA it finds
(bytesandbrains/bb-runtime/src/framework/rtt_tracker.rs:283-321).
The EMA itself is canonical Jacobson smoothing with α = 1/8 and
β = 1/4 and a budget of SRTT + 4·RTTVAR
(bytesandbrains/bb-runtime/src/framework/rtt_tracker.rs:82-122).
The smoothing constants come from Karn and Partridge, Improving
Round-Trip Time Estimates in Reliable Transport Protocols
(ACM SIGCOMM 1987), later codified as RFC 6298.
Three layers compose: compile-time deadlines give every deployment a
sane baseline without runtime measurement; runtime EMA refines it
once warm samples exist; engine expiry in Phase 5 is the final
enforcer. The same code path carries the budget from the compiler
pass that stamps it onto the IR, through the envelope that ships it
to the next hop, and into the poll cycle that reaps anything past
its time.
References
- Dean, J., & Barroso, L. A. (2013). The Tail at Scale. Communications of the ACM, 56(2). https://research.google/pubs/pub40801/
- Sigelman, B. H., Barroso, L. A., Burrows, M., Stephenson, P., Plakal, M., Beaver, D., Jaspan, S., & Shanbhag, C. (2010). Dapper, a Large-Scale Distributed Systems Tracing Infrastructure. Google Technical Report dapper-2010-1. https://research.google/pubs/pub36356/
- Karn, P., & Partridge, C. (1987). Improving Round-Trip Time Estimates in Reliable Transport Protocols. ACM SIGCOMM 1987. https://dl.acm.org/doi/10.1145/55482.55484
- Paxson, V., Allman, M., Chu, J., & Sargent, M. (2011). RFC 6298: Computing TCP’s Retransmission Timer. https://datatracker.ietf.org/doc/html/rfc6298
- Hayashibara, N., Défago, X., Yared, R., & Katayama, T. (2004). The φ Accrual Failure Detector. SRDS 2004. https://dl.acm.org/doi/10.1109/RELDIS.2004.1353004
Backpressure: explicit signal, eventual silence
Backpressure under congestion is well-studied territory. Receivers that have no way to push back on overloaded senders end up dropping work silently, and senders that read silence as success amplify the overload by the next round. Welsh, Culler, and Brewer (SEDA, SOSP 2001) made the inverse case canonical: every stage exposes its queue depth, and upstream stages observe the depth and adapt their submission rate. Without that signal an overloaded service degrades from “slower” to “silently lossy” the moment the queue saturates. The engine takes the SEDA posture at the per-peer granularity: hop-local backpressure is a typed wire op, not a fall-through to ingress overflow.
The substrate implements the signal as a typed envelope and parks the
receiver-side state on PeerState alongside the existing health
primitives. BackpressureTracker is the receiver-side per-peer
bookkeeping (bytesandbrains/bb-runtime/src/framework/backpressure_tracker.rs:133-138),
joining gate, governor, and backoff as the fourth sub-field
(bytesandbrains/bb-runtime/src/framework/peer_state.rs:21-37).
Detection composes with two existing primitives. PhiAccrualState
(see the deadline-propagation section above) supplies the
BackoffCause::PhiAccrual trigger when a sender’s inter-arrival
distribution flips to Suspect. IngressQueue depth tracking is
snapshotted at Phase 1 entry on phase1_pre_drain_depth
(bytesandbrains/bb-runtime/src/engine/poll.rs:197) so the per-envelope
handler can compare the pre-drain depth against the configured
high-water fraction without re-reading a queue that has already been
drained to zero. The notice itself rides as a one-fill WireEnvelope
whose type_hash matches backoff_notice_type_hash()
(bytesandbrains/bb-runtime/src/framework/backpressure_notice.rs:153-155),
built by build_backoff_notice_envelope
(bytesandbrains/bb-runtime/src/framework/backpressure_notice.rs:166-195).
The K-then-silent threshold defaults to 3 on DEFAULT_K_BEFORE_SILENT
(bytesandbrains/bb-runtime/src/framework/backpressure_tracker.rs:50-56).
The K-then-silent design separates cooperative clients from adversarial
or buggy ones with a small explicit budget. Cooperative clients receive
up to K typed BackoffNotice envelopes; each one updates the sender’s
BackoffTable via record_remote_advisory so the existing
BackoffGateTx consultation gates the next outbound send. Adversarial
or buggy clients that keep sending after K notices flip into silent-drop
mode: the receiver’s Phase 1 envelope handler returns immediately on
is_silent_drop_active and the envelope is dropped without further
notice emission
(bytesandbrains/bb-runtime/src/engine/poll.rs:485-492). The full
decision lives in maybe_emit_backoff_notice
(bytesandbrains/bb-runtime/src/engine/poll.rs:1074-1157), and the
sender-side ingest path that decodes the notice and updates local state
lives in ingest_backoff_notice
(bytesandbrains/bb-runtime/src/engine/poll.rs:1008-1060). K = 3 was
chosen to match the RttTracker warmth threshold of three samples at
bytesandbrains/bb-runtime/src/framework/rtt_tracker.rs:126-128: an
EMA is allowed to win the fallback hierarchy once three observations
have arrived, and a sender is allowed three explicit notices before
the receiver concludes the cooperation contract has broken. All three
knobs are config-driven on NodeConfig
(bytesandbrains/bb-runtime/src/node/config.rs:90-112) with the
matching with_backpressure_* builders
(bytesandbrains/bb-runtime/src/node/config.rs:204-227).
Composition reuses the framework’s existing observability and selection
surfaces. A PeerSelector Component can read PeerGovernor state via
ctx.peer_state.governor and bias toward healthy peers; the
PeerHealth snapshot already carries consecutive_failures and
down, and the backpressure path’s ingest_backoff_notice records a
matching record_failure on the governor so a sender that keeps
receiving notices crosses the 5-consecutive-failure threshold without
any selector-side wiring. Two typed bus events surface the local
overload decision: InfraEvent::BackoffNoticeSent { peer, cause, min_backoff_ns } fires on every emission and
InfraEvent::SilentDropActive { peer } fires once when the K threshold
is crossed
(bytesandbrains/bb-runtime/src/bus.rs:125-148). Both events ride the
same typed bus a PeerSelector already subscribes to, so the same
EventSource wiring that surfaces PeerSuspect / PeerDown carries
the backpressure signals into tracing without bespoke plumbing.
References
- Welsh, M., Culler, D., & Brewer, E. (2001). SEDA: An Architecture for Well-Conditioned, Scalable Internet Services. SOSP 2001. https://www.sosp.org/2001/papers/welsh.pdf
RttTracker EWMA constants and the φ-accrual detector
Adaptive deadlines for every wire round-trip come from
framework::RttTracker, a per-NodeSiteId Jacobson EMA bank. The
EMA itself is canonical RFC 6298 §2 with α = 1/8 and β = 1/4 (the
“observe with default rates” call wraps a _with_alpha_beta call
that uses alpha_shift = 3, beta_shift = 2):
// from bytesandbrains/bb-runtime/src/framework/rtt_tracker.rs:85-87
pub fn observe(&mut self, sample_ns: u64) {
self.observe_with_alpha_beta(sample_ns, 3, 2);
}
The deadline derivation is the canonical Karn/Partridge
recommendation, SRTT + 4·RTTVAR:
// from bytesandbrains/bb-runtime/src/framework/rtt_tracker.rs:119-122
pub fn budget_ns(&self) -> u64 {
self.srtt_ns
.saturating_add(self.rttvar_ns.saturating_mul(4))
}
An EMA is “warm” (allowed to win the fallback hierarchy) once it
has at least three samples
(bytesandbrains/bb-runtime/src/framework/rtt_tracker.rs:126-128).
Below three the deadline falls through to a coarser tier. The
global prior receives the same observations with a smaller learning
rate, (alpha_shift = 6, beta_shift = 5), so noisy single-peer
samples do not dominate the cross-runtime estimate
(rtt_tracker.rs:362).
RttTracker::estimate_budget_ns walks five tiers in order, stopping
at the first warm hit:
- Per-edge
(site, chain_id, hop_index)exact match. - Per-site aggregate Jacobson over every round-trip to this site.
- Per-chain prior aggregated across every peer that has carried this chain’s traffic.
- Global prior across every round-trip in the runtime.
- The static
NodeConfig.per_hop_budget_nsfallback.
Tiers 1 through 4 are the same RttEma type at successively
coarser scopes; tier 5 is the constant the host configured at
construction.
The tracker also runs a φ-accrual failure detector per site, fed
by the same observe_round_trip path. The default
PhiAccrualState keeps a rolling history of 1000 inter-arrival
samples and ratchets through three states:
// from bytesandbrains/bb-runtime/src/framework/rtt_tracker.rs:157-166
impl Default for PhiAccrualState {
fn default() -> Self {
Self {
inter_arrival_history: std::collections::VecDeque::new(),
history_capacity: 1000,
threshold_phi: 8.0,
down_phi: 16.0,
}
}
}
PhiState::Live (the default) is “healthy”. Suspect fires when
the current φ crosses threshold_phi = 8.0; Down fires when φ
crosses down_phi = 16.0. Between phases 5 and 6 of the poll cycle
the engine calls RttTracker::scan_phi(now_ns); per-site state
changes since the previous scan return as PhiTransition::Live,
Suspect, or Down entries, which the engine maps onto the typed
bus events InfraEvent::PeerLive, PeerSuspect, and PeerDown.
The per-entry last_phi_state ratchet is what keeps the engine
from re-emitting the same event on every poll while a peer stays
silent.
Snapshot, restore, clear
The engine survives process restart through Node::snapshot and
Node::restore. Snapshot captures the engine’s transient state plus
every bound concrete’s per-component bytes:
// from bytesandbrains/bb-runtime/src/node/mod.rs:176-184
pub fn snapshot(&self) -> Result<crate::snapshot::NodeSnapshot, crate::errors::SnapshotError> {
let queued = self.engine.bus.len();
if queued > 0 {
return Err(crate::errors::SnapshotError::BusNotDrained { queued, dropped: 0 });
}
Ok(self.snapshot_inner())
}
Snapshot refuses to proceed when the bus still carries un-drained
events; the host drives Node::poll to quiescence first and retries.
The captured NodeSnapshot round-trips through the host’s persistence
layer (typically prost or bincode) and lands at Node::restore,
which validates the spec version, reinstalls each captured graph,
restores each component’s bytes via the inventory-registered
restore_fn, replays the counters, the bus subscriptions, the
address book, the peer governor, the backoff table, the pending-async
entries, and the queued outbound envelopes, then bumps the
incarnation counter. The restored Node resumes from exactly where
the snapshot was taken.
Node::clear resets transient state without dropping bound
components or installed graphs. Used in tests and for graceful
recycling between test rounds.
A minimal poll loop
The end-to-end shape of a driven Node, copied from the canonical HNSW example:
// from bytesandbrains/examples/custom_index_hnsw.rs:307-336
let query: Vec<f32> = vec![2.0, 2.1, 2.2, 2.3];
let query_bytes = bincode::serialize(&query)?;
let _ = node.ingress_handle().push(IngressEvent::Invoke {
module_name: artifact_module_name(&node),
inputs: vec![("query".into(), query_bytes)],
exec_id: bytesandbrains::ids::ExecId::from(0u64),
});
let waker = Waker::noop();
let mut cx = Context::from_waker(waker);
let mut total_steps = 0usize;
for _ in 0..40 {
match node.poll(&mut cx) {
std::task::Poll::Ready(steps) => {
if steps.is_empty() {
break;
}
total_steps += steps.len();
}
std::task::Poll::Pending => break,
}
}
The example pushes an Invoke onto the ingress, then loops
Node::poll against a no-op waker until the engine yields. Each
ready cycle accumulates a vector of steps; the host’s production
loop ships envelopes, surfaces app events, logs failures, and re-polls.
The same drive pattern with a real waker on a tokio runtime is what
the transport adapters use in production.
Run the example to see the async Index Contract dispatch end to end:
cargo run --example custom_index_hnsw --features test-components
Where this lives
bytesandbrains/bb-runtime/src/engine/core.rs: theEnginestruct, the fields, the install-time setup methods, the dispatch registries.bytesandbrains/bb-runtime/src/engine/poll.rs: the 8-phase poll cycle, the ingress router, the inbound envelope routing,handle_completion,handle_completion_failed, deadline expiry.bytesandbrains/bb-runtime/src/engine/invoke.rs:invoke_one, function-call splice, role-dispatcher registry, backend-subgraph dispatch.bytesandbrains/bb-runtime/src/engine/step.rs: theEngineStepenum and every observable variant.bytesandbrains/bb-runtime/src/engine/pending_async.rs:PendingAsyncandExecutionState.bytesandbrains/bb-runtime/src/engine/call_context.rs: theCallContextthat backs bootstrap andFunctionCalldispatch.bytesandbrains/bb-runtime/src/ingress.rs:IngressQueue,IngressEvent,IngressQueueRef.bytesandbrains/bb-runtime/src/completion.rs:CompletionHandle,CompletionSink,ContractResponse.bytesandbrains/bb-runtime/src/node/mod.rs: theNodestruct,poll,bootstrap,deliver_inbound,deliver_event,invoke,snapshot,restore, the introspection surface.bytesandbrains/bb-runtime/src/node/config.rs:NodeConfig, the defaults, the builder methods.bytesandbrains/bb-runtime/src/framework/peer_state.rs,peer_governor.rs,backoff_table.rs: thePeerStatebundle,BlockReason,PeerHealth,LifecycleTransition,DEFAULT_FAILURE_THRESHOLD, the backoff schedule defaults.bytesandbrains/bb-runtime/src/framework/rtt_tracker.rs:RttEma(α = 1/8, β = 1/4),PhiAccrualState,PhiState, the five-tier fallback walk, the global-prior learning rate.bytesandbrains/bb-runtime/src/framework/outbound_queue.rs: the head-eviction FIFO and theOutboundDroppedcounter.bytesandbrains/src/install.rs:bb::install,Config,InstallError. The dedup walk lives atbytesandbrains/src/install.rs:524-571; the per-targetFunctionKeyqueue lands on the engine viainstall_targetsatbytesandbrains/src/install.rs:469-500.bytesandbrains/examples/custom_index_hnsw.rs: the end-to-end worker-thread pattern with async Contract dispatch.bytesandbrains/docs/ENGINE.md,bytesandbrains/docs/CONTRACT_DISPATCH.md: the bb-private architecture specs.