
ACORN Algorithm for Filtered Vector Search
Advanced algorithm designed to make hybrid searches combining metadata filters and vector similarity more efficient, implemented in Apache Solr and other vector search systems.
About this tool
Overview
ACORN (Approximate Constraint-Optimized Retrieval with Navigation) is an algorithm designed to efficiently perform hybrid searches that combine traditional filter predicates with vector similarity search, addressing a common bottleneck in production vector database applications.
Problem Statement
The Challenge
Traditional approaches to filtered vector search suffer from:
- Post-Filtering: Search first, filter later → wasted computation
- Pre-Filtering: Filter first, search on subset → potentially empty results
- Performance Degradation: Filters can slow queries 10-100x
- Quality Issues: May miss relevant results
How ACORN Works
Key Innovation
ACORN intelligently navigates the vector index while simultaneously evaluating filter conditions, avoiding full scans of irrelevant data.
Algorithm Steps
- Initialize: Start from entry point in HNSW graph
- Navigate: Move through graph toward query vector
- Filter On-The-Fly: Check predicates during navigation
- Adaptive Branching: Explore promising paths
- Early Termination: Stop when enough matches found
- Return: Best k matching results
Performance Benefits
Speed Improvements
- 2-10x faster than post-filtering on typical workloads
- Consistent latency across various filter selectivities
- Better worst-case performance
- Scales efficiently with data size
Quality Maintenance
- Similar or better recall vs baseline methods
- Handles complex filter predicates
- Adapts to data distribution
Implementation Status
Apache Solr
Implemented for hybrid search combining:
- Text queries (keyword/BM25)
- Vector similarity
- Field filters
Available in recent versions (2024+).
Other Systems
The algorithm is applicable to:
- Custom vector databases
- Search engines with vector support
- Hybrid retrieval systems
- RAG pipelines
Research Foundation
Publication
- arXiv: 2403.04871
- Year: 2024
- Research Area: Information Retrieval, Vector Databases
Key Findings
- Predicate-agnostic design works across filter types
- Graph-based navigation more efficient than tree approaches
- Adaptive exploration crucial for diverse workloads
Use Cases
Production Scenarios
-
E-Commerce:
- Find similar products (vector)
- Within category/price range (filter)
- In stock (boolean filter)
-
Document Search:
- Semantic similarity (vector)
- By author/date/department (filters)
- Security/access control (predicates)
-
Recommendation:
- Content similarity (vector)
- User preferences (filters)
- Business rules (constraints)
When Most Effective
- Moderate to high filter selectivity (1-50% of data)
- Multiple filter conditions
- Real-time applications
- Large-scale datasets
Technical Details
Predicate Types Supported
- Equality (field = value)
- Range (field > value, field BETWEEN x AND y)
- Set membership (field IN (a, b, c))
- Boolean combinations (AND, OR, NOT)
- Nested conditions
Graph Navigation Strategy
- Uses HNSW graph structure
- Maintains candidate pool
- Explores multiple paths
- Balances breadth vs depth
- Terminates optimally
Comparison with Alternatives
vs Post-Filtering
- ACORN: 2-10x faster
- ACORN: More consistent performance
- ACORN: Better resource utilization
vs Pre-Filtering
- ACORN: Better recall
- ACORN: Handles sparse filters
- ACORN: More robust
vs Separate Indexes
- ACORN: Single index
- ACORN: Less storage
- ACORN: Simpler architecture
Configuration and Tuning
Key Parameters
- Exploration Width: How many paths to explore
- Depth Limit: Maximum navigation depth
- Filter Evaluation Frequency: How often to check predicates
- Termination Criteria: When to stop
Best Practices
- Profile your filter distributions
- Test with representative queries
- Monitor query latency percentiles
- Tune for worst-case scenarios
- Benchmark against baseline
Future Research
- Multi-vector queries
- Dynamic predicate ordering
- Cost-based optimization
- ML-guided navigation
- Distributed implementations
Integration
Applicable to any system with:
- HNSW or similar graph index
- Metadata filtering needs
- Hybrid search requirements
Loading more......
