Following Fastembed’s article on hybrid search available here we compare dense, sparse and hybrid retrieval on the HotPotQA dataset. The comparison is done by using the ranx library for evaluation.
retrieval
fastembed
qdrant
ranx
search
rag
Published
December 31, 2025
Setup
First let’s load the appropriate libraries and setup a class for dense, sparse and hybrid retrieval.
# https://qdrant.github.io/fastembed/examples/Hybrid_Search/import jsonimport loggingfrom contextlib import contextmanagerfrom typing import Dict, List, Optional, Tupleimport fastembedimport numpy as npimport pandas as pdfrom datasets import load_datasetfrom fastembed import SparseEmbedding, SparseTextEmbedding, TextEmbeddingfrom qdrant_client import QdrantClient, modelsfrom qdrant_client.models import ( Distance, NamedSparseVector, NamedVector, PointStruct, QueryRequest, ScoredPoint, SearchRequest, SparseIndexParams, SparseVector, SparseVectorParams, VectorParams,)from ranx import Qrels, Run, evaluatelogging.basicConfig(level=logging.ERROR) # Set to INFO for debugging.logger = logging.getLogger(__name__)fastembed.__version__
'0.7.4'
Below there will be 3 classes for dense, sparse and hybrid retrieval. Dense retrieval will be done using the light weight “BAAI/bge-small-en-v1.5” model, while sparse retrieval will be done using the bm25 model from qdrant. The bm25 model actually has 2 hyperparameters for document length and keyword influence which can’t be changed with the current setup. The hybrid retrieval model will retrieve using the dense and sparse models and then combine the scores from each model using reciprocal rank fusion (RRF). This is the standard technique for combining dense and sparse search results, used in libraries such as LangChain, because of its simplicity and robustness. See here for implementation details in numpy and links to the original paper which introduces RRF. More details about hybrid search, and other methods for combining search results such as reranking and Matryoshka embeddings can be found here.
The BaseRetriever class also has an evaluate method which calculates “ndcg@5”, “recall@3”, “precision@3”, and “mrr” using the ranx library. NDCG (normalized discounted cumulative gain) is the standard in search engine evaluation which works best with ranked data, such as the ground truth data for recommendation engines. The other metrics are simpler and useful when simpler labels are available.
Below we load the HotPotQA dataset using HuggingFace, as well as the ground truth search results using ir_datasets.
import ir_datasetsclient = QdrantClient(":memory:")subset_size =1000qrels = Qrels.from_ir_datasets("beir/hotpotqa/train")dataset = load_dataset("hotpot_qa", "distractor", split="train")hotpot_dataset = ir_datasets.load("beir/hotpotqa/train")# Get the corpus (all documents)corpus = {doc.doc_id: doc for doc in hotpot_dataset.docs_iter()}
Evaluation
First create a dense retriever instance, and index a subset of the HotPotQA dataset into an in memory instance of Qdrant.
2026-01-02 13:11:46.676 | WARNING | fastembed.common.model_management:download_files_from_huggingface:225 - Local file sizes do not match the metadata.
Below we run the dense retriver on the processed_data and evaluate the search results using qrels_subset as the ground truth search results. The evaluation method was described earlier and is in the BaseRetriever class.
2026-01-02 13:18:25.579 | WARNING | fastembed.common.model_management:download_files_from_huggingface:225 - Local file sizes do not match the metadata.2026-01-02 13:18:26.599 | WARNING | fastembed.common.model_management:download_files_from_huggingface:225 - Local file sizes do not match the metadata.
2026-01-02 13:19:41.955 | WARNING | fastembed.common.model_management:download_files_from_huggingface:225 - Local file sizes do not match the metadata.