HNSW Index

LatticeDB uses HNSW (Hierarchical Navigable Small World) for approximate nearest neighbor search. This chapter explains how the algorithm works and how to tune it for your use case.

Algorithm Overview

HNSW constructs a multi-layer graph where:

Upper layers are sparse and allow quick navigation to the general region
Lower layers are dense and enable precise local search
Layer 0 contains all points with the maximum connectivity

Layer 2:  ●───────────────────────●  (sparse, fast navigation)
          │                       │
          ▼                       ▼
Layer 1:  ●─────●─────●─────●─────●  (medium density)
          │     │     │     │     │
          ▼     ▼     ▼     ▼     ▼
Layer 0:  ●─●─●─●─●─●─●─●─●─●─●─●─●  (dense, all points)

Search Process

Starting from a random entry point at the top layer:

Find the nearest neighbor in the current layer (greedy search)
Use that node as the entry point for the next layer
Repeat until reaching layer 0

#![allow(unused)]
fn main() {
// Simplified pseudocode
let mut current = entry_point;
for layer in (1..=max_layer).rev() {
    current = find_nearest_in_layer(query, current, layer);
}
}

Phase 2: Layer 0 Search

At layer 0, perform a beam search to find ef candidates:

Maintain a priority queue of candidates (sorted by distance)
Explore neighbors of the closest unvisited candidate
Add promising neighbors to the queue
Stop when the closest candidate is farther than the farthest result

#![allow(unused)]
fn main() {
// Returns top-k results from ef candidates
let candidates = search_layer(query, entry_points, ef, layer=0);
candidates.into_iter().take(k).collect()
}

Configuration

HnswConfig Parameters

#![allow(unused)]
fn main() {
pub struct HnswConfig {
    pub m: usize,               // Connections per node (default: 16)
    pub m0: usize,              // Layer 0 connections (default: 2 * m = 32)
    pub ml: f64,                // Level multiplier (default: 1/ln(m))
    pub ef: usize,              // Search queue size (default: 100)
    pub ef_construction: usize, // Build queue size (default: 200)
}
}

Parameter Guidelines

Parameter	Description	Trade-off
`m`	Connections per node in upper layers	Higher = better recall, more memory
`m0`	Connections per node in layer 0	Higher = better recall, more memory
`ef`	Search queue size	Higher = better recall, slower search
`ef_construction`	Build queue size	Higher = better index quality, slower build

Recommended Values

Dataset Size	m	m0	ef_construction	Notes
< 10K	8	16	100	Low memory, fast build
10K - 100K	16	32	200	Balanced (default)
100K - 1M	24	48	300	Higher recall
> 1M	32	64	400	Maximum recall

Usage

Basic Search

#![allow(unused)]
fn main() {
use lattice_core::{HnswIndex, Distance, HnswConfig};

// Create index with default config
let config = HnswConfig::default();
let mut index = HnswIndex::new(config, Distance::Cosine);

// Insert points
for point in points {
    index.insert(&point);
}

// Search: find 10 nearest neighbors with ef=100
let results = index.search(&query_vector, k=10, ef=100);
}

Batch Search

For multiple queries, batch search is more efficient:

#![allow(unused)]
fn main() {
// Prepare query references
let query_refs: Vec<&[f32]> = queries.iter()
    .map(|q| q.as_slice())
    .collect();

// Parallel batch search (uses rayon on native)
let results = index.search_batch(&query_refs, k=10, ef=100);
}

Adjusting ef at Search Time

ef can be adjusted per-query to trade off speed vs recall:

#![allow(unused)]
fn main() {
// Fast search (lower recall)
let quick_results = index.search(&query, 10, ef=50);

// High-recall search (slower)
let accurate_results = index.search(&query, 10, ef=200);
}

Memory Layout

Dense Vector Storage

Vectors are stored in a flat, contiguous array for cache efficiency:

┌──────────────────────────────────────────────────────┐
│ [v0_d0, v0_d1, ..., v0_dn] [v1_d0, v1_d1, ..., v1_dn] │
└──────────────────────────────────────────────────────┘
     Vector 0 (dim=n)            Vector 1 (dim=n)

Access pattern: &vectors[id * dim .. (id + 1) * dim]

Benefits:

Cache-friendly: Sequential memory access during index construction
SIMD-friendly: Aligned access for vectorized distance calculations
Predictable: O(1) indexed access via dense array

Layer Storage

Each layer stores:

Node list: Points present at this layer
Neighbor lists: Connections for each node

#![allow(unused)]
fn main() {
struct HnswNode {
    id: PointId,
    level: u16,  // Highest layer this node appears in
    // neighbors[layer] = Vec<PointId>
    neighbors: SmallVec<[Vec<PointId>; 4]>,
}
}

Optimizations

1. Shortcut Search (VLDB 2025)

If the best neighbor doesn’t change at a layer, we can skip redundant distance calculations:

#![allow(unused)]
fn main() {
// Track if we improved at each layer
let (new_current, new_dist, improved) =
    search_layer_single_with_shortcut(query, current, current_dist, layer);

if !improved && layer > 1 {
    // Can potentially skip to next layer faster
    continue;
}
}

2. Software Prefetching

Hide memory latency by prefetching future vectors:

#![allow(unused)]
fn main() {
// Prefetch vectors ahead of current iteration
const PREFETCH_DISTANCE: usize = 4;

for (i, &id) in neighbor_ids.iter().enumerate() {
    if i + PREFETCH_DISTANCE < neighbor_ids.len() {
        prefetch_read(vectors.get_ptr(neighbor_ids[i + PREFETCH_DISTANCE]));
    }
    // Calculate distance for current id
    distances.push(calc_distance(query, vectors.get(id)));
}
}

3. Thread-Local Scratch Space

Avoid allocation per search by reusing scratch space:

#![allow(unused)]
fn main() {
thread_local! {
    static SCRATCH: RefCell<SearchScratch> = RefCell::new(SearchScratch::new());
}

fn search_layer(&self, ...) -> Vec<Candidate> {
    SCRATCH.with(|scratch| {
        let mut scratch = scratch.borrow_mut();
        scratch.clear();  // Reuse allocated memory
        // ... perform search ...
    })
}
}

4. Vec-Based Results

Use a Vec instead of BinaryHeap for results, with periodic compaction:

#![allow(unused)]
fn main() {
// Add to results
results.push(candidate);

// Compact when 2x over limit (amortizes sort cost)
if results.len() >= ef * 2 {
    results.sort_unstable_by(|a, b| a.distance.partial_cmp(&b.distance).unwrap());
    results.truncate(ef);
}
}

PQ Acceleration

For very large indexes, Product Quantization can accelerate search:

#![allow(unused)]
fn main() {
// Build PQ accelerator (one-time cost)
let accelerator = index.build_pq_accelerator(
    m: 8,  // 8 subvectors for 128-dim
    training_sample_size: 10000,
);

// Search with PQ-accelerated coarse filtering
let results = index.search_with_pq(
    &query,
    k: 10,
    ef: 100,
    &accelerator,
    rerank_factor: 3,  // Re-rank top 30 candidates
);
}

See Quantization for details.

Performance Characteristics

Operation	Complexity	Notes
Insert	O(log N × m × ef_construction)	Per-point amortized
Search	O(log N × m × ef)	Per-query
Delete	O(m)	Just removes node and edges
Memory	O(N × (dim + m × layers))	Vectors + graph

Typical search latency (128-dim, 100K vectors, cosine):

ef=50: ~50 µs
ef=100: ~100 µs
ef=200: ~200 µs

Next Steps

Distance Metrics - Choosing the right distance function
Quantization - Memory-efficient storage
SIMD Optimization - Hardware acceleration

LatticeDB Documentation