Last updated on

The bytesandbrains Library


Every major AI system today depends on a central server. Vector databases, model registries, inference endpoints: they all assume someone is running the cluster. That works until it doesn’t. Edge devices generate data locally, need answers locally, and have no reason to route everything through a datacenter thousands of miles away.

bytesandbrains is a Rust library for building decentralized AI systems. It provides composable, transport-agnostic building blocks for P2P overlay networking, vector search, and distributed optimization. Nodes can discover each other through gossip, search for peers by semantic similarity, and compress high-dimensional vectors without central coordination. We are exploring decentralized AI and this is where the tools are aggregated.

Quick Start

The bytesandbrains crate on crates.io is the main entry point. It re-exports all sub-crates under feature flags.

cargo add bytesandbrains --features full

Or in your Cargo.toml:

[dependencies]
bytesandbrains = { version = "0.1", features = ["full"] }

Feature Flags

FeatureDescription
overlayOverlay protocol base (pulled in by protocols)
gossipGossip peer sampling protocol (implies overlay)
codecProduct quantization codec
indexVector indexing structures
mlML utilities (k-means)
protoProtobuf serialization
simdSIMD-accelerated distance functions
fullEnable everything

Pick what you need. For example, if you only want gossip with protobuf serialization: features = ["gossip", "proto"]. Protocol features like gossip imply overlay, so you rarely need to enable it by hand.

Example: Gossip Peer Sampling

This sets up a gossip node, bootstraps it with peers, and runs one round of the protocol:

use bytesandbrains::core::OverlayProtocol;
use bytesandbrains::overlay::gossip::{RandomizedGossip, GossipConfig, AgePeer};

fn main() {
    // Create a randomized gossip node
    let config = GossipConfig::default();
    let mut node = RandomizedGossip::new_randomized("10.0.0.1:5000".to_string(), config);

    // Bootstrap with known peers
    let peers: Vec<AgePeer<String>> = (0..5)
        .map(|i| AgePeer::new(format!("10.0.0.{}:5000", i + 10), 0))
        .collect();
    node.rx_nodes(peers);

    // Poll to advance the protocol
    let step = node.poll();

    // step.messages contains outgoing messages to send
    for out_msg in step.messages {
        println!("Send to {}: {:?}", out_msg.destination.peer_id, out_msg.message);
    }
}

The protocol never touches a socket. It produces OutMessages that your transport layer delivers, so the same logic works over TCP, QUIC, WebRTC, or an in-memory test harness.

Example: Embedding Spaces and Distance

Computing distances between vectors using the built-in L2 and cosine spaces:

use bytesandbrains::core::embedding::{
    F32L2Space, F32CosineSpace, F32Embedding, EmbeddingSpace,
};

fn main() {
    // L2 (squared Euclidean) distance
    let l2 = F32L2Space::<3>;
    let a = F32Embedding::<3>([1.0, 2.0, 3.0]);
    let b = F32Embedding::<3>([4.0, 5.0, 6.0]);
    let dist = l2.distance(&a, &b);
    println!("L2 distance: {:?}", dist); // F32Distance(27.0)

    // Cosine distance (0 = identical, 2 = opposite)
    let cosine = F32CosineSpace::<3>;
    let dist = cosine.distance(&a, &b);
    println!("Cosine distance: {:?}", dist);
}

Architecture

The library is organized as a Cargo workspace with five sub-crates, each re-exported through the umbrella bytesandbrains crate:

ModuleCrateRole
bytesandbrains::corebytesandbrains-coreShared traits and types
bytesandbrains::overlaybytesandbrains-overlayProtocol implementations
bytesandbrains::codecbytesandbrains-codecVector compression
bytesandbrains::indexbytesandbrains-indexVector indexing
bytesandbrains::mlbytesandbrains-mlML utilities

Everything builds on a small set of core traits.

OverlayProtocol and Step

Every overlay protocol advances via poll() (timers and round initiation) and on_message() (incoming messages). Both return a Step bundling outgoing messages and events. The trait also exposes peer management and bootstrap so the transport layer can feed connection state back in:

pub trait OverlayProtocol {
    type Address: Address;
    type Message;
    type Event;
    type Peer: Clone;
    type BootstrapConfig;
    type BootstrapRef<'a>: OpRef where Self: 'a;

    const PROTOCOL_ID: &'static str;

    fn poll(&mut self) -> Step<Self>;
    fn on_message(&mut self, from: Peer<Self::Address>, msg: Self::Message) -> Step<Self>;
    fn local_peer_id(&self) -> &PeerId;

    fn bootstrap(&mut self, config: Self::BootstrapConfig) -> Self::BootstrapRef<'_>;
    fn on_connection_failed(&mut self, peer_id: &PeerId) -> Step<Self>;

    fn add_peer(&mut self, peer: Self::Peer) -> Step<Self>;
    fn remove_peer(&mut self, peer_id: &PeerId) -> Option<Self::Peer>;
    fn add_address(&mut self, peer_id: &PeerId, address: Self::Address);
    fn remove_address(&mut self, peer_id: &PeerId, address: &Self::Address) -> Option<Self::Peer>;
}

The protocol never opens a connection or binds a port. It produces typed messages, and your transport layer delivers them. Connection failures and peer discovery results flow back through on_connection_failed, add_peer, and friends. This keeps protocols testable in isolation and portable across any network stack.

The BootstrapRef<'_> associated type is an example of the library’s operation-handle pattern: anything that might complete asynchronously (bootstrap, index query, codec encode) returns an OpRef handle. In-process implementations finish synchronously; networked implementations use the same handle to track an in-flight RPC. Callers poll is_finished() or wait for events, then call finish() to take the result.

PeerSampling

A capability trait for protocols that maintain a view of known peers and can sample from it. It’s a standalone trait rather than an OverlayProtocol supertrait, so codecs and other non-overlay components can also expose peer sampling when it makes sense:

pub trait PeerSampling {
    type Peer: Clone;
    type PeerView<'a> where Self: 'a;
    type SamplingMode;
    type SelectPeerRef<'a>: OpRef where Self: 'a;

    fn view(&self) -> Self::PeerView<'_>;
    fn view_len(&self) -> usize;
    fn select_peer(&mut self, mode: &Self::SamplingMode) -> Self::SelectPeerRef<'_>;
    fn broadcast(&self) -> Vec<Self::Peer>;
}

The SamplingMode associated type lets each protocol define its own selection strategies (random, tail, weighted) without baking the strategy into the trait. select_peer returns an OpRef handle so networked samplers can defer selection until a remote round-trip completes, while in-process samplers finish synchronously. broadcast returns the full view in one shot for fan-out.

Embedding, EmbeddingSpace, and Distance

The vector space abstraction layer. Embedding defines a fixed-dimensionality vector. EmbeddingSpace ties an embedding type to a distance metric and defines how to compute distances:

pub trait EmbeddingSpace: Clone + PartialEq + Eq + fmt::Debug + Send + Sync {
    type EmbeddingData: Embedding;
    type DistanceValue: Distance;
    type Prepared: Clone;

    fn distance(&self, lhs: &Self::EmbeddingData, rhs: &Self::EmbeddingData) -> Self::DistanceValue;
    fn prepare(&self, embedding: &Self::EmbeddingData) -> Self::Prepared;
    fn distance_prepared(&self, prepared: &Self::Prepared, target: &Self::EmbeddingData) -> Self::DistanceValue;
}

The Prepared associated type allows precomputing state for efficient batch queries. The Distance trait provides total ordering (including consistent NaN handling) so distance values work correctly as map keys and in sorted structures.

Index and Codec

Index defines vector storage and similarity search. Codec defines vector compression. Every mutating or query operation returns an OpRef handle, so the same trait describes in-process indices (synchronous completion) and networked indices (handles tracking a remote RPC).

pub trait Index<S: EmbeddingSpace> {
    type Value: Clone;
    type SearchType;
    type AddType;
    type RemoveType;
    type TrainType;
    type SearchRef<'a>: OpRef where Self: 'a;
    type AddRef<'a>: OpRef where Self: 'a;
    type RemoveRef<'a>: OpRef where Self: 'a;
    type TrainRef<'a>: OpRef where Self: 'a;
    type ObserveRef<'a>: OpRef where Self: 'a;

    fn search(&mut self, search_embedding: &S::EmbeddingData, search_type: &Self::SearchType) -> Self::SearchRef<'_>;
    fn add(&mut self, embedding: &S::EmbeddingData, value: Self::Value, add_type: &Self::AddType) -> Self::AddRef<'_>;
    fn remove(&mut self, embedding: &S::EmbeddingData, remove_type: &Self::RemoveType) -> Self::RemoveRef<'_>;
    fn train(&mut self, data: &[S::EmbeddingData], train_type: &Self::TrainType) -> Self::TrainRef<'_>;
    fn observe(&mut self, embedding: &S::EmbeddingData) -> Self::ObserveRef<'_>;
    fn reset(&mut self);
    fn len(&self) -> usize;
    fn is_trained(&self) -> bool;
    fn is_empty(&self) -> bool;
}

pub trait Codec<S: EmbeddingSpace> {
    type Encoded: Clone;
    type EncodeRef<'a>: OpRef where Self: 'a;
    type DecodeRef<'a>: OpRef where Self: 'a;
    type TrainRef<'a>: OpRef where Self: 'a;
    type ObserveRef<'a>: OpRef where Self: 'a;

    fn encode(&mut self, embedding: &S::EmbeddingData) -> Self::EncodeRef<'_>;
    fn encode_batch(&mut self, embeddings: &[S::EmbeddingData]) -> Vec<Self::EncodeRef<'_>>;
    fn decode(&self, encoded: &Self::Encoded) -> Self::DecodeRef<'_>;
    fn decode_batch(&self, encoded: &[Self::Encoded]) -> Vec<Self::DecodeRef<'_>>;
    fn code_size(&self) -> Option<usize>;
    fn train(&mut self, embeddings: &[S::EmbeddingData]) -> Self::TrainRef<'_>;
    fn observe(&mut self, embedding: &S::EmbeddingData) -> Self::ObserveRef<'_>;
    fn observe_batch(&mut self, embeddings: &[S::EmbeddingData]) -> Vec<Self::ObserveRef<'_>>;
    fn is_trained(&self) -> bool;
}

These traits are the extension points for custom index and compression strategies. The library ships with FlatIndex (brute-force search) and ProductQuantizer (vector quantization), and the same trait surface covers streaming observe/observe_batch updates alongside batch train. See Product Quantization: Vector Compression for Fast Search for details on PQ.

Protocols

Gossip Peer Sampling maintains a bounded, self-healing view of known peers through periodic gossip exchanges. It provides randomized peer discovery for any overlay that needs it. For the full walkthrough, see Gossip Peer Sampling Service and the Rust implementation deep dive.

GOSSIP-PQ is a decentralized protocol for learning globally representative Product Quantization codebooks via gossip-based subspace exchange. It enables distributed vector compression without central coordination. This one is still in development.

Further Reading