Vector-based searching and embeddings

**Algorithms vs. artificial intelligence vs. machine learning vs. deep learning (Author: Johannes Vrana, CC BY-ND 4.0)**

Compiled by

Dean Giustini, UBC Biomed librarian, dean.giustini@ubc.ca

Updated

16 March 2026 | Part of Knowledge Synthesis (KS) & AI Search Wiki 2026 & A to Z Listing

Introduction

Vector-based searching (related concepts: semantic search or embedding-based search) is an information retrieval method that finds and ranks results based on the semantic meaning of content rather than exact keyword matches. It represents documents and queries as numerical vectors and retrieves results by measuring similarity between those vectors.

Traditional keyword-based searching relies on lexical matching, meaning results are returned only when query terms explicitly appear in records. Vector-based searching instead encodes meaning using machine learning models, allowing systems to retrieve relevant content even when different words, synonyms, or paraphrases are used. In vector-based systems, items with similar meanings are located close together in "a high dimensional vector space", enabling searches based on conceptual similarity rather than literal text overlap.

How vector-based searching works

Vector-based searching typically involves four stages:

1) Embedding

Content such as documents, sentences, images, or audio is converted into numerical representations called vector embeddings using a machine learning model. Queries are embedded using the same model.

2) Indexing

Embeddings are stored in a specialized index or vector database. To enable fast retrieval at scale, systems commonly use Approximate Nearest Neighbor (ANN) algorithms.

3) Querying

When a user submits a query, it is transformed into a vector embedding.

4) Similarity matching

The system calculates similarity between the query vector and stored vectors using distance metrics such as:

Cosine similarity https://en.wikipedia.org/wiki/Cosine_similarity
Euclidean distance https://en.wikipedia.org/wiki/Euclidean_distance
Dot product https://en.wikipedia.org/wiki/Dot_product

Results are ranked by closeness in the vector space.

Comparison with keyword search

Feature	Keyword Search	Vector-Based Search
Matching method	Exact word "lexical" matching	Semantic similarity
Synonym handling	Limited	Strong
Sensitivity to phrasing	High	Low
Context awareness	Minimal	High
Multilingual capability	Limited	Often supported

Models used

Vector-based searching typically relies on encoder models, which generate embeddings rather than text. Common examples include:

Bidirectional Encoder Representations from Transformers (BERT)
Sentence-BERT https://sbert.net/
E5 https://www.pinecone.io/learn/the-practitioners-guide-to-e5/
MiniLM e.g., https://www.educative.io/answers/what-is-all-minilm-l6-v2-model

These models differ from large language models (LLMs), which are designed primarily for text generation rather than semantic encoding.

Applications

Vector-based searching is widely used in the following search and information retrieval systems:

Web search engines;
Academic and biomedical databases;
Retrieval augmented generation (RAG) systems;
Recommendation systems;
Chatbots and question-answering systems;
Image and multimodal search

Hybrid search

Many modern search systems implement hybrid search techniques, combining vector-based search with traditional keyword or Boolean searching approaches. This ensemble approach balances semantic recall with lexical precision and filtering.

Advantages

Improved recall for semantically related content;
Robust handling of synonyms and paraphrases;
Better support for natural-language queries;
Cross-lingual and multimodal capabilities.

Limitations

Higher computational costs;
Reduced transparency compared to keyword matching, "opaque" and black box effects;
Potential semantic false positives;
Dependence on model quality and training data.

One-sentence definition

Vector-based searching retrieves information by comparing the semantic similarity of vector embeddings rather than matching exact words.

References

Note: I have read widely on this topic, and will be populating this section with an extensive bibliography to support the entry. This is a complex topic so thank you for your patience while I write this entry for librarians and information professionals. Some content was informed by the Wikipedia entry: https://en.wikipedia.org/wiki/Vector_database and https://learn.microsoft.com/en-us/azure/cosmos-db/vector-database and What is Vector search"? https://learn.microsoft.com/en-us/training/modules/improve-search-results-vector-search/2-vector-search

Jing Z, Su Y, Han Y. When large language models meet vector databases: A survey. In: 2025 Conference on Artificial Intelligence x Multimedia (AIxMM) 2025 Feb 3 (pp. 7-13). IEEE.
Rusum GP, Anasuri S. Vector Databases in Modern Applications: Real-Time Search, Recommendations, and Retrieval-Augmented Generation (RAG). International Journal of AI, BigData, Computational and Management Studies. 2024 Dec 30;5(4):124-36.
Salsabilla N, Wiharja K. Implementation of Semantic Search Based on Vector Database for Personal Documents. In2025 International Conference on Advancement in Data Science, E-learning and Information System (ICADEIS) 2025 Feb 3 (pp. 1-6). IEEE.
Sanca V, Ailamaki A. Efficient Data Access Paths for Mixed Vector-Relational Search. In: Proceedings of the 20th International Workshop on Data Management on New Hardware 2024 Jun 10 (pp. 1-9).

Disclaimer

Note: Please use your critical reading skills while reading entries. No warranties, implied or actual, are granted for any health or medical search or AI information obtained while using these pages. Check with your librarian for more contextual, accurate information.