Filtering on structured metadata as well as embedding vectors effciently has historically been a scaling challenge for applications involving RAG. AlloyDB released a new feature called inline filtering to achieve exactly that. This gist contains some of my learnings while experimenting with inline filtering using ScaNN index to achieve efficent and scalable hybrid search.
-
The recommended query utilizes a 2-stage hybrid search process, where first stage performs a search on embedding chunks using inline filtering with ScaNN index, and the second stage refines the result by selecting the highest score chunk for each document.
-
The query scales in the majority of the hybrid search scenarios with O(√n + k log k) where k << n.
-
In order to utilize inline filtering for hybrid search, it is necessary to denormalize the filtering metadata columns