Vector-based searching and embeddings
Compiled by
Updated
See also
IntroductionVector-based searching (also known as semantic search or embedding-based search) is an information retrieval method that finds and ranks results based on the semantic meaning of content rather than exact keyword matches. It represents documents and queries as numerical vectors and retrieves results by measuring similarity between those vectors. Traditional keyword-based searching relies on lexical matching, meaning results are returned only when query terms explicitly appear in records. Vector-based searching instead encodes meaning using machine learning models, allowing systems to retrieve relevant content even when different words, synonyms, or paraphrases are used. In vector-based systems, items with similar meanings are located close together in "a high dimensional vector space", enabling searches based on conceptual similarity rather than literal text overlap. How vector-based searching worksVector-based searching typically involves four stages: 1) EmbeddingContent such as documents, sentences, images, or audio is converted into numerical representations called vector embeddings using a machine learning model. Queries are embedded using the same model. 2) IndexingEmbeddings are stored in a specialized index or vector database. To enable fast retrieval at scale, systems commonly use Approximate Nearest Neighbor (ANN) algorithms. 3) QueryingWhen a user submits a query, it is transformed into a vector embedding. 4) Similarity matchingThe system calculates similarity between the query vector and stored vectors using distance metrics such as:
Results are ranked by closeness in the vector space. Comparison with keyword search
Models usedVector-based searching typically relies on encoder models, which generate embeddings rather than text. Common examples include:
These models differ from large language models (LLMs), which are designed primarily for text generation rather than semantic encoding. ApplicationsVector-based searching is widely used in the following search and information retrieval systems:
Hybrid searchMany modern search systems implement hybrid search techniques, combining vector-based search with traditional keyword or Boolean searching approaches. This ensemble approach balances semantic recall with lexical precision and filtering. Advantages
Limitations
See alsoOne-sentence definitionVector-based searching retrieves information by comparing the semantic similarity of vector embeddings rather than matching exact words. ReferencesNote: I have read widely on this topic, and will be populating this section with an extensive bibliography to support the entry. This is a complex topic so thank you for your patience while I write this entry for librarians and information professionals. Some content was informed by the Wikipedia entry: https://en.wikipedia.org/wiki/Vector_database and https://learn.microsoft.com/en-us/azure/cosmos-db/vector-database and What is Vector search"? https://learn.microsoft.com/en-us/training/modules/improve-search-results-vector-search/2-vector-search
Disclaimer
|
