Quantization

Quantization reduces memory usage and can accelerate distance computation. LatticeDB supports two quantization methods:

Scalar Quantization (SQ): 4x memory reduction (f32 → i8)
Product Quantization (PQ): 32-64x memory reduction

Scalar Quantization

Overview

Scalar quantization maps each f32 value to an i8 using min-max scaling:

quantized[i] = round((original[i] - offset) / scale)
dequantized[i] = quantized[i] * scale + offset

where:
  offset = min(original)
  scale = (max(original) - min(original)) / 255

Memory Savings

Dimension	Original (f32)	Quantized (i8)	Savings
128	512 bytes	136 bytes	3.8x
256	1024 bytes	264 bytes	3.9x
512	2048 bytes	520 bytes	3.9x
1024	4096 bytes	1032 bytes	4.0x

The 8-byte overhead is for scale and offset values.

Usage

#![allow(unused)]
fn main() {
use lattice_core::QuantizedVector;

// Quantize a vector
let original = vec![0.1, 0.5, 0.9, -0.3, 0.0];
let quantized = QuantizedVector::quantize(&original);

// Check memory usage
println!("Original: {} bytes", original.len() * 4);
println!("Quantized: {} bytes", quantized.memory_size());

// Dequantize (lossy)
let recovered = quantized.dequantize();

// Asymmetric distance (quantized DB vector vs f32 query)
let query = vec![0.2, 0.4, 0.8, -0.2, 0.1];
let distance = quantized.dot_distance_asymmetric(&query);
}

Asymmetric Distance

For search, we compute distance between:

Database vectors: Quantized (memory-efficient)
Query vector: Full precision (accurate)

#![allow(unused)]
fn main() {
impl QuantizedVector {
    // Quantized × f32 distance
    pub fn dot_distance_asymmetric(&self, query: &[f32]) -> f32;
    pub fn euclidean_distance_asymmetric(&self, query: &[f32]) -> f32;
    pub fn cosine_distance_asymmetric(&self, query: &[f32]) -> f32;
}
}

Accuracy

Scalar quantization maintains good accuracy for most use cases:

Metric	Recall@10 (128-dim, 100K vectors)
Original (f32)	99.5%
Quantized (i8)	98.2%

The 1-2% recall loss is often acceptable given the 4x memory savings.

Product Quantization (PQ)

Overview

PQ achieves higher compression by:

Splitting vectors into M subvectors
Quantizing each subvector to a cluster centroid
Storing only the cluster index (1 byte per subvector)

Original: [f32; 128] = 512 bytes
Split:    8 subvectors of [f32; 16]
Quantize: 8 cluster indices = 8 bytes
Compression: 64x

Building a PQ Accelerator

#![allow(unused)]
fn main() {
// After building the HNSW index
let accelerator = index.build_pq_accelerator(
    m: 8,                    // 8 subvectors
    training_sample_size: 10000,  // Vectors for training
);

// Check compression
println!("Compression: {}x", accelerator.compression_ratio());
// Output: Compression: 64x (for 128-dim vectors)
}

PQ-Accelerated Search

PQ enables two-phase search:

Coarse filtering: Fast approximate distances using PQ codes
Re-ranking: Exact distances for top candidates

#![allow(unused)]
fn main() {
let results = index.search_with_pq(
    &query,
    k: 10,
    ef: 100,
    &accelerator,
    rerank_factor: 3,  // Re-rank 30 candidates for final 10
);
}

Distance Table Lookup

For each query, PQ builds a distance table:

#![allow(unused)]
fn main() {
// Precompute distances from query to all centroids
let dist_table = accelerator.pq.build_distance_table(&query);

// O(M) distance lookup instead of O(D)
let approx_dist = accelerator.approximate_distance_with_table(&dist_table, point_id);
}

This reduces distance computation from O(D) to O(M), where M << D.

PQ Memory Usage

Dimension	M	Original	PQ Code	Codebook	Total Overhead
128	8	512 bytes/vec	8 bytes/vec	128 KB	64x compression
256	16	1024 bytes/vec	16 bytes/vec	256 KB	64x compression
512	32	2048 bytes/vec	32 bytes/vec	512 KB	64x compression

PQ vs SQ Trade-offs

Aspect	Scalar Quantization	Product Quantization
Compression	4x	32-64x
Accuracy loss	~1-2%	~3-5%
Build time	Fast (O(N))	Slow (k-means training)
Search speed	Same as f32	Faster (O(M) distances)
Memory per vector	N bytes	M bytes
Use case	Moderate memory savings	Maximum compression

Combining Quantization Methods

For very large datasets, combine both methods:

#![allow(unused)]
fn main() {
// 1. Quantize database vectors with SQ (4x savings)
let quantized_vectors: Vec<QuantizedVector> = vectors
    .iter()
    .map(|v| QuantizedVector::quantize(v))
    .collect();

// 2. Build PQ accelerator for fast search (additional 16x in search)
let accelerator = index.build_pq_accelerator(8, 10000);

// 3. Two-phase search:
//    - PQ for coarse filtering (very fast)
//    - SQ for re-ranking (accurate enough)
let candidates = index.search_with_pq(&query, 100, 200, &accelerator, 1);
let results = rerank_with_sq(&candidates, &quantized_vectors, &query, k);
}

When to Use Quantization

Dataset Size	Recommendation
< 100K	No quantization needed
100K - 1M	Scalar quantization
1M - 10M	Scalar + PQ acceleration
> 10M	Full PQ with re-ranking

Best Practices

1. Train PQ on Representative Data

#![allow(unused)]
fn main() {
// Use a sample that represents your data distribution
let training_sample: Vec<Vector> = dataset
    .iter()
    .step_by(dataset.len() / 10000)
    .cloned()
    .collect();

let accelerator = index.build_pq_accelerator(8, training_sample.len());
}

2. Choose M Based on Dimension

#![allow(unused)]
fn main() {
// Rule of thumb: dim / M should be >= 8
let m = match dim {
    d if d <= 64 => 4,
    d if d <= 128 => 8,
    d if d <= 256 => 16,
    d if d <= 512 => 32,
    _ => 64,
};
}

3. Re-rank Enough Candidates

#![allow(unused)]
fn main() {
// Higher rerank_factor = better recall, slower search
let rerank_factor = match recall_target {
    r if r >= 0.99 => 5,
    r if r >= 0.95 => 3,
    _ => 2,
};
}

4. Update PQ Incrementally

#![allow(unused)]
fn main() {
// When adding new vectors
index.insert(&new_point);
accelerator.add(new_point.id, &new_point.vector);

// When removing vectors
index.delete(point_id);
accelerator.remove(point_id);
}

Next Steps

SIMD Optimization - Hardware acceleration for distance calculations
Performance Tuning - Overall optimization strategies

Keyboard shortcuts

LatticeDB Documentation