Why Your Vector Database Isn't a Replacement for Lexical Search
The article discusses how vector databases, while useful for semantic search, fail to handle certain types of queries that require exact string matching. It highlights five common query shapes that break vector search and explains how traditional lexical search algorithms like BM25 can solve these issues in a single line of code.
Why it matters
This article highlights a critical limitation of vector databases that is often overlooked, and provides a simple solution using traditional lexical search techniques.
Key Points
- 1Vector databases collapse text into high-dimensional vectors, which can fail to capture exact matches for rare identifiers, proper nouns, acronyms, negation, and numeric codes/versions
- 2These types of queries make up a significant portion (over 30%) of real-world retrieval traffic on technical products, e-commerce catalogs, and documentation sites
- 3BM25, a 40-year-old lexical search algorithm, can handle these exact-match queries reliably by counting term frequencies and inverse document frequencies
Details
The article explains that while vector databases leverage the semantic relationships between text, they can struggle with queries that require exact string matching. For example, a user searching for 'SKU-47291' may get back results for similar-sounding SKUs, but not the exact one they were looking for. The same issue applies to proper nouns, acronyms, negation, and numeric codes/versions. These types of queries are very common in technical domains, e-commerce, and customer support, making up over 30% of real-world retrieval traffic. In contrast, the BM25 lexical search algorithm, which is decades old, can handle these exact-match queries reliably by counting term frequencies and inverse document frequencies. The author argues that using vector search without lexical search is a recipe for production issues, and that the two approaches should be combined to provide a robust search experience.
No comments yet
Be the first to comment