Gossip Peer Sampling: Rust Implementation Deep Dive
This post walks through the Rust implementation of the gossip peer sampling protocol in the bytesandbrains library. For the theory behind gossip protocols and the peer sampling service, see the overview post. For the foundational types this protocol builds on (PeerId, Peer, AddressBook, OverlayProtocol, PendingRequestManager, etc.), see The bytesandbrains Library.
The gossip protocol lives in the bb-overlay crate and is built on top of generic traits and types from bb-core. The implementation follows the T-Man protocol framework: the gossip loop (poll, message handling, request tracking) is written once, and the peer selection and view exchange strategies are pluggable via traits. The randomized gossip protocol is one instantiation of this framework. This post focuses on the gossip implementation itself.
Project Structure
bb-overlay/src/gossip/
├── mod.rs # GossipSampling<A, P, S, X> protocol, GossipMessage, type aliases
├── config.rs # GossipConfig, GossipMode
├── peer.rs # GossipPeerType trait, AgePeer<A>
├── selector.rs # PeerSelector trait, RandomizedSelector, RandomizedSelectorMode
├── exchange.rs # ViewExchange trait, RandomizedExchange
└── proto/
├── gossip.proto # Protobuf schema
├── mod.rs # Generated code re-exports
└── conversions.rs # From/TryFrom implementations
Peer Types and Configuration
GossipPeerType and AgePeer<A>
The gossip framework defines a GossipPeerType trait that all peer types must implement, giving the framework access to the underlying Peer<A>:
pub trait GossipPeerType<A: Address>: Clone + fmt::Debug {
fn peer(&self) -> &Peer<A>;
fn peer_mut(&mut self) -> &mut Peer<A>;
fn peer_id(&self) -> PeerId { self.peer().peer_id }
fn from_address(address: A) -> Self;
}
The randomized gossip protocol uses AgePeer<A>, which wraps a Peer<A> with an age counter:
pub struct AgePeer<A: Address> {
pub peer: Peer<A>,
pub age: u32,
}
Construction derives the PeerId from the address and creates an AddressBook:
impl<A: Address> AgePeer<A> {
pub fn new(address: A, age: u32) -> Self {
let peer_id = PeerId::from_data(&address.to_string());
let addresses = AddressBook::new(address, 5);
Self {
peer: Peer::new(peer_id, addresses),
age,
}
}
}
The type alias GossipPeer<A> = AgePeer<A> is provided for convenience.
GossipConfig
Protocol-level parameters are captured in GossipConfig. Strategy-specific configuration (healing, swap) lives in the exchange type, not here:
pub struct GossipConfig {
pub view_size: usize,
pub mode: GossipMode,
pub max_concurrent_requests: usize,
pub retry_time: Duration,
pub request_timeout: Duration,
pub poll_interval: Duration,
}
| Parameter | Default | Description |
|---|---|---|
view_size | 10 | Maximum number of peers in the view |
mode | PushPull | Gossip exchange mode |
max_concurrent_requests | 3 | Maximum number of in-flight requests before throttling |
retry_time | 2s | Time after which a pending request is retried to a different peer |
request_timeout | 5s | Time after which a pending request is fully timed out and discarded |
poll_interval | 20s | Minimum time between starting new gossip rounds |
GossipMode and GossipMessage
The exchange mode determines what data flows in each direction:
pub enum GossipMode {
Push,
Pull,
PushPull,
}
Protocol messages are generic over the peer type P:
pub enum GossipMessage<P: Clone> {
Request {
request_id: RequestId,
mode: GossipMode,
view: Vec<P>,
},
Response {
request_id: RequestId,
view: Vec<P>,
},
}
Both variants carry a request_id so responses can be matched to their originating request. A Request also carries the mode (so the receiver knows how to respond) and optionally a view (empty for Pull-only).
GossipEvent
The protocol emits events to notify higher layers of view changes and request failures:
pub enum GossipEvent<A: Address> {
PeerAdded(Peer<A>),
PeerRemoved(Peer<A>),
RequestTimeout(PeerId),
}
PeerAdded and PeerRemoved are emitted whenever an exchange changes the view. RequestTimeout fires when a pending request exceeds request_timeout without receiving a response.
Peer Selection Strategies
The PeerSelector trait defines how a peer is chosen for gossip exchange. Each selector defines its own Mode type, which is exposed through PeerSampling::SamplingMode so callers can choose a strategy at runtime:
pub trait PeerSelector<P, A>: Clone + fmt::Debug
where
A: Address,
P: GossipPeerType<A>,
{
type Mode: Clone + fmt::Debug;
fn select(&self, mode: &Self::Mode, peers: &[P], rng: &mut ThreadRng) -> Option<usize>;
fn select_excluding(
&self,
mode: &Self::Mode,
peers: &[P],
exclude: &HashSet<PeerId>,
rng: &mut ThreadRng,
) -> Option<usize>;
fn default_mode(&self) -> Self::Mode;
}
The GossipPeerType<A> bound on P means select_excluding can call p.peer_id() directly to check against the exclusion set, with no closures or dynamic dispatch needed.
The default_mode() method provides the mode used by the gossip loop for internal peer selection (initiating exchanges).
RandomizedSelector
The randomized gossip protocol uses RandomizedSelector with two modes:
pub enum RandomizedSelectorMode {
Tail,
UniformRandom,
}
| Strategy | Behavior |
|---|---|
| Tail | Selects the peer with the highest age. Promotes diversity by talking to nodes we haven’t heard from recently. |
| UniformRandom | Selects a random peer uniformly from the view. Simple and fair but may repeat selections. |
The implementation dispatches based on the mode variant:
- Tail finds the index with maximum age via
max_by_key - UniformRandom generates a random index with
rng.gen_range()
default_mode() returns Tail.
Exclusion-Aware Selection
Each mode also handles select_excluding(), which skips peers whose PeerId is in a given exclusion set. This is critical for the request tracking system: when sending a retry or new request, we exclude peers that already have pending requests to avoid duplicate in-flight requests to the same peer.
Each mode handles exclusion differently:
- Tail adds a
.filter()beforemax_by_keyto skip excluded peers - UniformRandom builds an eligible index list, then picks randomly from it
View Exchange Strategies
The ViewExchange trait controls how the view is prepared for sending and how received peers are integrated:
pub trait ViewExchange<P, A>: Clone + fmt::Debug
where
A: Address,
P: GossipPeerType<A>,
{
fn prepare_tx(
&self,
local: &P,
peers: &mut Vec<P>,
max_size: usize,
rng: &mut ThreadRng,
) -> Vec<P>;
fn integrate_rx(
&self,
peers: &mut Vec<P>,
lut: &mut HashMap<PeerId, usize>,
incoming: Vec<P>,
local_peer_id: &PeerId,
max_size: usize,
rng: &mut ThreadRng,
) -> (Vec<P>, Vec<P>);
fn on_round_complete(&self, peers: &mut Vec<P>);
}
RandomizedExchange
The randomized gossip protocol uses RandomizedExchange, which carries its own configuration:
pub struct RandomizedExchange {
pub healing: usize, // default 5
pub swap: usize, // default 5
}
| Parameter | Default | Description |
|---|---|---|
healing | 5 | Number of oldest peers to evict first during overflow |
swap | 5 | Number of head peers to evict second during overflow |
Preparing Outgoing Gossip: prepare_tx
When it’s time to gossip, prepare_tx() builds the outgoing buffer:
- Push local peer first: Always include ourselves so the recipient knows how to reach us
- Permute: Randomly shuffle the view to avoid biased selection
- Move old to back: Sort the
healingoldest peers to the end, so they’re least likely to be included - Take from head: Grab up to
view_size/2 - 1peers from the front (the freshest after shuffle + sort)
The buffer size is view_size / 2, half the view. This bounds bandwidth while giving enough information for the receiver to work with.
Receiving Gossip: integrate_rx
When peers arrive from another node, integrate_rx() integrates them using a three-stage eviction cascade and returns tuples of (added, removed) peers:
incoming peers
|
v
+--------------+
| append_many | Add all new peers (skip self, reset age on duplicates)
+------+-------+ -> returns list of newly added peers
|
v still over max_size?
+--------------+
| remove_old | Evict up to `healing` oldest peers
+------+-------+ -> returns removed peers
|
v still over max_size?
+--------------+
| remove_head | Evict up to `swap` peers from the front
+------+-------+ -> returns removed peers
|
v still over max_size?
+--------------+
| remove_random| Evict remaining excess at random
+--------------+ -> returns removed peers
This cascade prioritizes removing stale peers (high age), then recently gossiped peers (head of the shuffled list), and finally random peers as a last resort. The healing and swap parameters control how aggressive each stage is.
The return value is what enables the event system. Each removal step returns the peers it evicted, and the protocol converts these into PeerAdded and PeerRemoved events.
on_round_complete
After each exchange, on_round_complete() increments the age of every peer in the view by 1. This is what drives the age-based eviction in the randomized protocol: peers that are actively gossiping get their age reset by incoming descriptors, while silent peers gradually age out.
Duplicate Handling and Age Reset
When a peer that’s already in the view is received again, the view doesn’t add a duplicate. Instead, it resets the age if the incoming copy is younger:
if let Some(&idx) = lut.get(&pid) {
if peers[idx].age > peer.age {
peers[idx].age = peer.age;
}
} else {
let new_idx = peers.len();
lut.insert(pid, new_idx);
added.push(peer.clone());
peers.push(peer);
}
This is important: if a peer is actively gossiping, its descriptor will circulate with low age values. If it goes silent, its age climbs and eventually it gets evicted by the healing mechanism. The age-reset-on-duplicate behavior keeps active peers fresh.
The GossipSampling Protocol
GossipSampling<A, P, S, X> ties everything together. It is generic over the address type, peer type, selector strategy, and exchange strategy. It implements the OverlayProtocol and PeerSampling traits from bb-core.
pub struct GossipSampling<A, P, S, X>
where
A: Address,
P: GossipPeerType<A>,
S: PeerSelector<P, A>,
X: ViewExchange<P, A>,
{
local_peer: P,
peers: Vec<P>,
lut: HashMap<PeerId, usize>,
config: GossipConfig,
selector: S,
exchange: X,
pending: RequestTracker<Instant>,
last_round_start: Option<Instant>,
rng: ThreadRng,
}
The standard randomized gossip protocol is a type alias:
pub type RandomizedGossip<A> = GossipSampling<A, AgePeer<A>, RandomizedSelector, RandomizedExchange>;
The peer view is stored directly as Vec<P> with a HashMap<PeerId, usize> lookup table, rather than in a separate View struct. The selector and exchange strategies operate on these directly.
Construction
impl<A: Address> RandomizedGossip<A> {
pub fn new_randomized(address: A, config: GossipConfig) -> Self {
Self::new(address, config, RandomizedSelector, RandomizedExchange::default())
}
}
The local peer starts with age 0. The RequestTracker is configured with the request_timeout duration, and entries older than this are automatically drained on the next process_timeouts() call.
Request Tracking
The RequestTracker<Instant> from bb-core tracks which peers have outstanding requests and when those requests were sent. This enables three capabilities:
- Duplicate prevention: Before sending a request, the protocol collects all pending peer IDs into an exclusion set and selects a peer not in that set.
- Retry logic: Requests older than
retry_timeare retried to a different peer, while the original stays pending. - Timeout detection: Requests older than
request_timeoutare fully removed andRequestTimeoutevents are emitted.
The helper method that wires this together:
fn send_request_excluding(
&mut self,
exclude: &HashSet<PeerId>,
now: Instant,
) -> Option<OutMessage<Self>> {
let default_mode = self.selector.default_mode();
let target = self.select_peer_excluding(&default_mode, exclude)?.peer().clone();
let tx = self.build_tx();
let request_id = self.pending.insert(target.peer_id, now);
let msg = GossipMessage::Request {
request_id,
mode: self.config.mode,
view: tx,
};
Some(OutMessage {
destination: target,
message: msg,
})
}
send_request_excluding() is the single point where outgoing requests are created in Pull/PushPull mode. It uses the selector’s default_mode() to pick a peer (skipping excluded ones), builds the gossip buffer via the exchange strategy, records the request in pending, and returns the OutMessage.
poll(): Push Mode
Push mode is handled separately by poll_push() because it’s fundamentally different: fire-and-forget with no request tracking:
fn poll_push(&mut self) -> Step<Self> {
let now = Instant::now();
let should_start_round = match self.last_round_start {
None => true,
Some(last) => now.duration_since(last) >= self.config.poll_interval,
};
if !should_start_round {
return Step::new();
}
let default_mode = self.selector.default_mode();
let target = match self.select_peer_ref(&default_mode) {
Some(p) => p.peer().clone(),
None => return Step::new(),
};
let tx = self.build_tx();
self.exchange.on_round_complete(&mut self.peers);
self.last_round_start = Some(now);
// ...
}
Push mode calls on_round_complete() (which increments ages) immediately after sending. The round is complete the moment the message is dispatched. No response is expected, so no tracking is needed. The poll_interval gates how frequently rounds start.
poll(): Pull and PushPull
For Pull and PushPull modes, poll() runs a 3-phase pipeline:
fn poll(&mut self) -> Step<Self> {
if self.config.mode == GossipMode::Push {
return self.poll_push();
}
let now = Instant::now();
let mut step = Step::new();
// Phase 1: Process full timeouts.
let timed_out = self.pending.process_timeouts();
for (key, _sent_at) in &timed_out {
step.events.push(GossipEvent::RequestTimeout(key.peer_id));
}
if !timed_out.is_empty() {
self.exchange.on_round_complete(&mut self.peers);
}
// Phase 2: Send retry requests for stale entries.
// ...selects different peers via send_request_excluding()
// Phase 3: Start new periodic round.
// ...gated by poll_interval and max_concurrent_requests
step
}
Phase 1: Timeout processing. The RequestTracker drains all entries whose absolute timeout has elapsed. For each timed-out peer, a RequestTimeout event is emitted. Crucially, on_round_complete() is called on timeout, because a failed round is still a round, and stale entries should age out regardless of whether the exchange succeeded.
Phase 2: Retries. Pending requests older than retry_time (but not yet fully timed out) get a retry: a new request is sent to a different peer selected via send_request_excluding(). The exclusion set contains all currently pending peer IDs, so retries always target fresh peers. The max_concurrent_requests cap is respected; if we’re at capacity, no more retries are sent.
Phase 3: New round. If poll_interval has elapsed since the last round started and we’re under max_concurrent_requests, a new round is initiated. Again, the peer is selected excluding all pending targets.
on_message()
on_message() handles all incoming message types. Each arm that receives peers calls integrate_and_advance(), which delegates to the exchange strategy’s integrate_rx() followed by on_round_complete():
fn on_message(&mut self, from: Peer<A>, msg: GossipMessage<P>) -> Step<Self> {
match msg {
GossipMessage::Request { mode: GossipMode::Push, view, .. } => {
let (added, removed) = self.integrate_and_advance(view);
// emit PeerAdded/PeerRemoved events
}
GossipMessage::Request { mode: GossipMode::Pull, request_id, .. } => {
let tx = self.build_tx();
self.exchange.on_round_complete(&mut self.peers);
// send Response with tx
}
GossipMessage::Request { mode: GossipMode::PushPull, request_id, view } => {
let tx = self.build_tx();
let (added, removed) = self.integrate_and_advance(view);
// send Response with tx, emit events
}
GossipMessage::Response { request_id, view } => {
if self.pending.remove(&from.peer_id, &request_id).is_some() {
let (added, removed) = self.integrate_and_advance(view);
// emit events
}
}
}
}
Two key design points:
View diff events. Every arm that integrates peers collects the (added, removed) tuples and converts them to PeerAdded/PeerRemoved events. This lets higher layers react to view changes. For example, a DHT protocol built on top of gossip could update routing tables when peers are discovered or lost.
Response guard. The Response arm guards on self.pending.remove(&from.peer_id, &request_id).is_some(). If the sender doesn’t have a pending request with a matching ID, the response is silently discarded. This rejects untracked responses (a peer we never sent a request to) and late responses (a peer whose request already timed out).
The key subtlety in the PushPull request handler remains: build_tx() is called before integrate_and_advance(). This prevents echoing back the peers we just received.
PeerSampling
The PeerSampling implementation is generic over all strategy types. SamplingMode is bound to the selector’s Mode type, so callers choose their selection strategy at each call site:
impl<A, P, S, X> PeerSampling for GossipSampling<A, P, S, X>
where
A: Address,
P: GossipPeerType<A>,
S: PeerSelector<P, A>,
X: ViewExchange<P, A>,
{
type Peer = P;
type PeerView<'a> = GossipViewRef<'a, P> where Self: 'a;
type SamplingMode = S::Mode;
type SelectPeerRef<'a> = GossipSelectPeerRef<P> where Self: 'a;
fn view(&self) -> Self::PeerView<'_> {
GossipViewRef { peers: &self.peers }
}
fn view_len(&self) -> usize {
self.peers.len()
}
fn select_peer(&mut self, mode: &Self::SamplingMode) -> Self::SelectPeerRef<'_> {
GossipSelectPeerRef {
result: self.select_peer_ref(mode).cloned(),
}
}
fn broadcast(&self) -> Vec<Self::Peer> {
self.peers.clone()
}
}
For the randomized gossip protocol, this means callers use select_peer(&RandomizedSelectorMode::Tail) for age-biased selection or select_peer(&RandomizedSelectorMode::UniformRandom) for uniform random, without needing to reconfigure the protocol.
Protocol Buffers and Serialization
The protocol uses protobuf for wire serialization. The schema defines data messages only, with no RPC service definition. Keeping serialization separate from transport means the same protocol logic works over TCP, QUIC, WebRTC, or an in-memory test harness without touching the proto layer:
The proto schema mirrors the trait boundaries. Protocol-level config, exchange strategy config, and peer data each have their own message:
// Protocol-level config (maps to GossipConfig)
message GossipConfigProto {
uint32 view_size = 1;
GossipModeProto mode = 4;
uint32 max_concurrent_requests = 6;
DurationProto retry_time = 7;
DurationProto request_timeout = 8;
DurationProto poll_interval = 9;
}
// Exchange strategy config (maps to RandomizedExchange)
message RandomizedExchangeConfigProto {
uint32 healing = 1;
uint32 swap = 2;
}
// Shared peer envelope: each peer type reads the fields it needs.
// AgePeer uses `peer` + `age`. Future types may add fields.
message GossipPeerProto {
bb_core.PeerProto peer = 1;
uint32 age = 2;
}
enum RandomizedSelectorModeProto {
TAIL = 0;
UNIFORM_RANDOM = 1;
}
Each strategy type owns its proto conversions. GossipConfig serializes to/from GossipConfigProto. RandomizedExchange serializes to/from RandomizedExchangeConfigProto. AgePeer serializes to/from GossipPeerProto. The boundaries are clean: each type knows which proto fields are required for its serialization.
The GossipPeerType trait enforces proto convertibility at the type level. When the proto feature is enabled, implementing GossipPeerType<A> requires Into<GossipPeerProto> and TryFrom<GossipPeerProto> as supertraits. This means GossipMessage serialization is generic over any peer type:
impl<P> From<GossipMessage<P>> for GossipMessageProto
where
P: Clone + Into<GossipPeerProto>,
{
fn from(msg: GossipMessage<P>) -> Self {
match msg {
GossipMessage::Request { request_id, mode, view } => GossipMessageProto {
message: Some(gossip_message_proto::Message::Request(
GossipRequestProto {
request_id: request_id.0,
mode: GossipModeProto::from(mode).into(),
view: view.into_iter().map(|p| p.into()).collect(),
},
)),
},
// ...
}
}
}
The conversion just calls .into() on each peer in the view. It doesn’t need to know whether it’s an AgePeer or some future peer type. Any P satisfying GossipPeerType<A> can be serialized, with no runtime dispatch.
Domain-to-proto conversions are infallible (From), proto-to-domain can fail (TryFrom) since wire data might be malformed. Since every proto3 field is optional, a future peer type (like an EmbeddingPeer) would add new fields to GossipPeerProto and implement TryFrom to read the fields it needs while ignoring the rest.
Reflections
The T-Man abstraction, with PeerSelector and ViewExchange as pluggable traits, means new topology strategies can be implemented without touching the gossip loop. A proximity-based overlay, a latency-optimizing overlay, or any other ranking function slots in by implementing two traits and a peer type.