Note: OpenAI leads the general AI space, but AI companies are developing deep research tools and experimenting with AI-powered academic searching in support of research. Perhaps you have faculty or students asking you to present these tools to classes. For more information, see Which companies are behind AI search tools?.
BM25 is a widely used ranking algorithm in search systems that estimates document relevance by balancing keyword frequency, document length normalization, and term rarity across a collection, producing transparent, reproducible search rankings.
BM25 is a foundational ranking algorithm in information retrieval systems designed to estimate how relevant a document is to a user’s search query. BM25 calculates how often terms appear in documents, adjusts for document length to avoid favouring longer texts, and weights rare terms. The result is a relevance score used to rank search results. BM25 is used for its transparency, efficiency, and strong performance across diverse collections. Even with the rise of AI-based searching, BM25 is a reliable baseline and combined with semantic or neural networks.
What exactly is BM25?
BM25 (Best Match 25) is a probabilistic ranking function used in information retrieval systems and search engines to estimate document relevance. It assigns a score to each document based on three main factors:
Term frequency (TF): How often query terms appear in the document
Inverse document frequency (IDF): Rarer terms across the corpus are weighted more heavily than common terms
Document length normalization: Adjusts scores so that longer documents are not unfairly advantaged or penalized
BM25 reflects the idea that documents containing query terms more frequently are generally found to be more relevant, but with diminishing returns for very high term frequencies. Tunable parameters allow adjustment of TF saturation and length normalization, making BM25 adaptable to different collections and queries. Higher BM25 scores indicate stronger relevance, and results are typically ranked in descending order.
BM25 was developed in the 1970s–1980s by Stephen Robertson et al at UK’s Centre for Research and Development in Information Retrieval (see Okapi BM25). In the 1990s, Robertson and Karen Sparck Jones introduced refinements such as incorporating term frequency, document length normalization, and tunable parameters to create BM25. BM25 remains a foundational baseline in modern search engines.
Is BM25 used in indexing?
BM25 is not an indexing tool nor is it used in the MTIX at NLM. Simply put, it is a ranking function applied after indexing. Indexing tools, such as MTIX at the NLM, automatically process documents to identify terms or controlled vocabulary. BM25 operates on these indexes—such as MEDLINE—to score and rank documents based on term frequency, document length, and term rarity, enabling more relevant search results.
PubMed's Best Match feature makes use of BM25; for more information, see the next section.
PubMed’s Best Match ranking
PubMed’s Best Match feature is built on BM25 (Best Match 25), a well-established probabilistic information retrieval algorithm that scores articles based on how well their text matches a user’s query. BM25 considers term frequency, inverse document frequency, and document length normalization, making it effective for ranking biomedical literature where abstracts and titles vary in length. In PubMed, BM25 is applied primarily to titles, abstracts, and selected metadata, providing a strong lexical relevance baseline. While Best Match incorporates additional signals (such as publication date boosts), BM25 remains the core scoring function, which is why PubMed excels at precise keyword-based retrieval but is less optimized for semantic or exploratory search compared to newer AI-driven systems.
PubMed now uses the Best Match 25 (BM25) algorithm, chosen because of its performance. BM25 builds on TF-IDF but calculates document and term frequencies differently. The changes brought by BM25 mostly impact term frequency. BM25 adds two constants (b and k) to adjust assigned weights. The saturation constant (k) is the value that is never exceeded by the term frequency, which reduces the difference between weights of relevant and nonrelevant documents. The constant (b) is used for document-length normalization, which adjusts for document length. In other words, a longer document is not prioritized over a shorter one simply because it has more instances of search terms. These two constants (b and k) can be manipulated to adjust the results of the algorithm.
For PubMed's Best Match, we adopted a two-stage ranking architecture—in which the two separate steps, retrieval and reordering, can be optimized independently—for using L2R, as it provides both efficiency and flexibility. As shown in Fig 1A, (1) given a user query translated and mapped to fields automatically, PubMed first retrieves documents that match it and orders them with a classical term weighting function, BM25 (see S1 Text). The top-ranked documents are further sorted by a second ranker called LambdaMART, which stands out as a robust and fast approach with superlative performance in various ranking tasks (e.g., the 2011 L2R challenge or various TREC tasks). Note that the first layer is very similar to the previous relevance system used in PubMed starting in 2013.
Why should librarians care about BM25 (Best Match 25)
Librarians should understand BM25 because it underpins how search systems retrieve and rank information, directly influencing what users see, trust, and use as evidence. Its transparency makes explicit how term frequency, document length, and vocabulary distribution affect relevance—factors critical for systematic searching and reproducibility. As AI-driven and semantic search tools expand, BM25 continues to serve as a stable baseline in library discovery layers, databases, and hybrid retrieval systems. Knowledge of BM25 allows librarians to diagnose search behaviors, explain unexpected results, design better queries, and critically evaluate “answer engines” that obscure retrieval processes.
Presentation
Note: BM25 is a state-of-the-art algorithm that has revolutionized how search engines rank documents and provide the most relevant results for user queries. This may be too complex for most librarians but the first few minutes (and conclusions) are informative.
References
Li X, Lipp J, Shakir A, Huang R, Li J. BMX: Entropy-weighted similarity and semantic-enhanced lexical search. arXiv. 2024 Aug. https://arxiv.org/abs/2408.06643
Lu M, Chen C, Eickhoff C. Cross-Encoder rediscovers a semantic variant of BM25. arXiv. 2025 Feb. https://arxiv.org/abs/2502.04645
Farivar K. Semantic search for information retrieval: a survey including BM25 baseline. arXiv. 2025 Aug 25 https://arxiv.org/html/2508.17694
Robertson SE, Walker S, Jones S, Hancock‑Beaulieu M, Gatford M. Okapi at TREC‑3. In: Proceedings of the Third Text REtrieval Conference (TREC 1994). Gaithersburg, USA. Nov 1994.
Robertson SE, Walker S, Hancock‑Beaulieu M. Okapi at TREC‑7. In: Proceedings of the Seventh Text REtrieval Conference. Gaithersburg, USA. Nov 1998.
Spärck Jones K, Walker S, Robertson SE. A probabilistic model of information retrieval: Development and comparative experiments: Part 1. Information Processing & Management. 2000;36(6):779–808. doi:10.1016/S0306‑4573(00)00015‑7
Spärck Jones K, Walker S, Robertson SE. A probabilistic model of information retrieval: Development and comparative experiments: Part 2. Information Processing & Management. 2000;36(6):809–840. doi:10.1016/S0306‑4573(00)00016‑9
Seetharaman R, Dhole KD, Bansal A. InsertRank: LLMs can reason over BM25 scores to improve listwise reranking. arXiv. 2025 Jun 17. https://arxiv.org/abs/2506.14086
Won T, Lee TK, Kim H, Lee HY. Efficiency and effectiveness of SPLADE models on billion-scale web document titles, with comparisons to BM25. arXiv [Internet]. 2025 Nov 27 [cited 2026 Jan 23]; Available from: https://arxiv.org/abs/2511.22263