Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Tuning Guide

This chapter covers performance optimization strategies for LatticeDB, including HNSW parameters, memory configuration, and query optimization.

HNSW Parameter Tuning

Core Parameters

ParameterDefaultRangeEffect
m164-64Connections per node
m0328-128Layer 0 connections
ef_construction20050-500Build quality
ef10010-500Search quality

Tuning for Recall

Higher recall (accuracy) requires:

  • Higher m and m0
  • Higher ef_construction
  • Higher ef at search time
#![allow(unused)]
fn main() {
// High recall configuration
let config = HnswConfig {
    m: 32,
    m0: 64,
    ef_construction: 400,
    ef: 200,
    ml: HnswConfig::recommended_ml(32),
};
}

Trade-offs:

  • Higher memory usage (more edges per node)
  • Slower index construction
  • Slower searches (more candidates explored)

Tuning for Speed

Faster search with acceptable recall:

  • Lower m and m0
  • Lower ef at search time
#![allow(unused)]
fn main() {
// Fast search configuration
let config = HnswConfig {
    m: 8,
    m0: 16,
    ef_construction: 100,
    ef: 50,
    ml: HnswConfig::recommended_ml(8),
};
}

Trade-offs:

  • Lower recall (may miss some neighbors)
  • Lower memory usage

Tuning for Memory

Minimize memory footprint:

#![allow(unused)]
fn main() {
// Memory-efficient configuration
let config = HnswConfig {
    m: 8,
    m0: 16,
    ef_construction: 100,
    ef: 100,
    ml: HnswConfig::recommended_ml(8),
};
}

Combine with quantization:

#![allow(unused)]
fn main() {
// Use scalar quantization (4x memory reduction)
let quantized = QuantizedVector::quantize(&vector);
}

Dataset Size Guidelines

Dataset Sizemm0ef_constructionMemory/Vector
< 1K816100~200 bytes
1K - 10K1224150~300 bytes
10K - 100K1632200~400 bytes
100K - 1M2448300~600 bytes
> 1M3264400~800 bytes

Search Optimization

Adjusting ef at Runtime

ef can be tuned per-query:

#![allow(unused)]
fn main() {
// Quick search (lower recall)
let fast_results = engine.search(&query.with_ef(50))?;

// High-quality search (higher recall)
let accurate_results = engine.search(&query.with_ef(300))?;
}

Batch Queries

For multiple queries, use batch search:

#![allow(unused)]
fn main() {
// 5-10x faster than individual searches
let results = index.search_batch(&queries, k=10, ef=100);
}

Benefits:

  • Parallel processing (on native)
  • Better cache utilization
  • Amortized overhead

Pre-filtering

Filter before vector search when possible:

#![allow(unused)]
fn main() {
// Instead of post-filtering 10K results...
let all_results = engine.search(&query.with_limit(10000))?;
let filtered: Vec<_> = all_results
    .into_iter()
    .filter(|r| r.payload.get("category") == Some("tech"))
    .take(10)
    .collect();

// Pre-filter using graph/index
let tech_ids = engine.get_nodes_by_label("tech")?;
let results = engine.search_among(&query, &tech_ids, k=10)?;
}

Memory Optimization

Vector Storage Options

OptionMemorySpeedUse Case
Dense (default)100%FastestSmall-medium datasets
Scalar Quantized25%95%Large datasets
Product Quantized3-5%80%Very large datasets
Memory-mappedVariable90%Larger than RAM

Enabling Quantization

#![allow(unused)]
fn main() {
// Scalar quantization
use lattice_core::QuantizedVector;

let quantized_vectors: Vec<QuantizedVector> = vectors
    .iter()
    .map(|v| QuantizedVector::quantize(v))
    .collect();

// Product quantization accelerator
let accelerator = index.build_pq_accelerator(m=8, training_size=10000);
let results = index.search_with_pq(&query, k, ef, &accelerator, rerank=3);
}

Memory-Mapped Storage (Native)

#![allow(unused)]
fn main() {
// Export vectors to mmap file
index.export_vectors_mmap(Path::new("vectors.mmap"))?;

// Load with mmap (vectors stay on disk, loaded on demand)
let mmap_store = MmapVectorStore::open(Path::new("vectors.mmap"))?;
}

Storage Optimization

Choosing a Backend

BackendPersistenceSpeedUse Case
MemStorageNoFastestTesting, ephemeral
DiskStorageYesFastServer deployments
OpfsStorageYesMediumBrowser persistent
IndexedDBYesSlowerBrowser fallback

Page Size Tuning

For disk storage, larger pages improve sequential access:

#![allow(unused)]
fn main() {
// Default: 4KB pages
let storage = DiskStorage::with_page_size(4096);

// Larger pages for bulk operations
let storage = DiskStorage::with_page_size(64 * 1024); // 64KB
}

Query Optimization

Cypher Query Patterns

Use labels for filtering:

-- Good: Uses label index
MATCH (n:Person) WHERE n.age > 25 RETURN n

-- Less efficient: Full scan
MATCH (n) WHERE n.type = 'Person' AND n.age > 25 RETURN n

Limit early:

-- Good: Limits before ordering
MATCH (n:Person) RETURN n ORDER BY n.name LIMIT 10

-- Less efficient: Orders everything first
MATCH (n:Person) RETURN n ORDER BY n.name

Use parameters:

-- Good: Query can be cached
MATCH (n:Person {name: $name}) RETURN n

-- Less efficient: New query parse each time
MATCH (n:Person {name: 'Alice'}) RETURN n

Hybrid Query Optimization

Vector-first for similarity:

#![allow(unused)]
fn main() {
// Good: Vector search narrows candidates
let similar = engine.search(&query.with_limit(100))?;
let expanded = expand_graph(&similar);

// Less efficient: Graph-first with large result set
let all_docs = engine.query("MATCH (n:Document) RETURN n")?;
let similar = filter_by_vector(&all_docs, &query);
}

Graph-first for structured queries:

#![allow(unused)]
fn main() {
// Good: Graph query with few results
let authors = engine.query(
    "MATCH (p:Person)-[:AUTHORED]->(d:Document {topic: $topic}) RETURN p",
    params
)?;
let ranked = rank_by_vector(&authors, &query);

// Less efficient: Vector search on entire corpus
let all_similar = engine.search(&query.with_limit(1000))?;
let authors = filter_by_graph(&all_similar);
}

Monitoring

Memory Statistics

#![allow(unused)]
fn main() {
let stats = engine.stats();
println!("Vectors: {} ({} bytes)", stats.vector_count, stats.vector_bytes);
println!("Index: {} bytes", stats.index_bytes);
println!("Graph: {} edges", stats.edge_count);
}

Query Performance

#![allow(unused)]
fn main() {
use std::time::Instant;

let start = Instant::now();
let results = engine.search(&query)?;
let duration = start.elapsed();

println!("Search took {:?}", duration);
println!("Returned {} results", results.len());
}

Profiling

# CPU profiling with flamegraph
cargo install flamegraph
cargo flamegraph --bin my_benchmark

# Memory profiling
cargo install heaptrack
heaptrack ./target/release/my_benchmark

Platform-Specific Tips

Native (Server)

  • Enable LTO in release builds
  • Use memory-mapped storage for large datasets
  • Configure thread pool size based on CPU cores
  • Consider NUMA awareness for multi-socket systems
# Cargo.toml
[profile.release]
lto = true
codegen-units = 1

WASM (Browser)

  • Use OPFS for persistent storage
  • Limit concurrent operations (single-threaded)
  • Offload heavy operations to Web Workers
  • Pre-load WASM module during page load
// Preload WASM
const wasmPromise = init();

// Later, when needed
await wasmPromise;
const db = await LatticeDB.create(config);

Troubleshooting

  1. Check ef parameter (too low = poor recall, too high = slow)
  2. Verify SIMD is enabled (cargo build --features simd)
  3. Profile to identify bottleneck (distance calc vs graph traversal)

High Memory Usage

  1. Consider quantization (4-32x reduction)
  2. Use memory-mapped storage
  3. Reduce m and m0 parameters
  4. Check for memory leaks with profiler

Poor Recall

  1. Increase ef at search time
  2. Increase ef_construction and rebuild index
  3. Verify distance metric matches your data
  4. Check for data quality issues (zero vectors, outliers)

Next Steps