Also: This open textbook (or wiki channel) is intended to help librarians and other information professionals learn about AI. It is not, in itself, meant to be seen as promotion of AI. If anything, the goal is harms mitigation or harms reduction.
Introduction
Vector-based searching (related concepts: semantic search or embedding-based search) is an information retrieval method that finds and ranks results based on the semantic meaning of content rather than matching the exact keywords or freetext in a search. It represents documents and queries as numerical vectors and retrieves results by measuring similarity between those vectors.
Traditional keyword searching (and even controlled vocabulary-driven searching) relies on lexical matching, meaning results are returned only when query terms explicitly appear in records. Vector-based searching instead encodes meaning using machine learning models, allowing systems to retrieve relevant content even when different words, synonyms, or paraphrases are used. In vector-based systems, items with similar meanings are located close together in "a high dimensional vector space", enabling searches based on conceptual similarity rather than literal text overlap.
How vector-based searching works
Vector-based searching typically involves four stages:
1) Embedding
Content such as documents, sentences, images, or audio is converted into numerical representations called vector embeddings using a machine learning model. Queries are embedded using the same model.
2) Indexing
Embeddings are stored in a specialized index or vector database. To enable fast retrieval at scale, systems commonly use Approximate Nearest Neighbor (ANN) algorithms.
3) Querying
When a user submits a query, it's "transformed" into a vector embedding.
4) Similarity matching
The system calculates similarity between the query vector and stored vectors using distance metrics such as:
Many modern search systems implement hybrid search techniques, combining vector-based search with traditional keyword or Boolean searching approaches. This ensemble approach balances semantic recall with lexical precision and filtering.
Advantages
Improved recall for semantically related content;
Robust handling of synonyms and paraphrases;
Better support for natural-language queries;
Cross-lingual and multimodal capabilities.
Limitations
Higher computational costs;
Reduced transparency compared to keyword matching, "opaque" and black box effects;
The use of semantic search with the help of vector databases has become an impressive paradigm of retrieving the pertinent information by offering the contextual and conceptual sense of the information searching more than using the conventional methods of keyword searching. This paper provides an in-depth overview of the models of vector representation, transformer-based semantic encoders, and technologies of vectors database that jointly allow efficient and error-free semantic search.
This paper investigates the enhancement of scientific literature chatbots through retrieval-augmented generation (RAG), with a focus on evaluating vector- and graph-based retrieval systems. The proposed chatbot leverages both structured (graph) and unstructured (vector) databases to access scientific articles and gray literature, enabling efficient triage of sources according to research objectives. To systematically assess performance, we examine two use-case scenarios: retrieval from a single uploaded document and retrieval from a large-scale corpus. Benchmark test sets were generated using a GPT model, with selected outputs annotated for evaluation. The comparative analysis emphasizes retrieval accuracy and response relevance, providing insight into the strengths and limitations of each approach. The findings demonstrate the potential of hybrid RAG systems to improve accessibility to scientific knowledge and to support evidence-based decision making.
Rusum GP, Anasuri S. Vector Databases in Modern Applications: Real-Time Search, Recommendations, and Retrieval-Augmented Generation (RAG). International Journal of AI, BigData, Computational and Management Studies. 2024 Dec 30;5(4):124-36.
Salsabilla N, Wiharja K. Implementation of Semantic Search Based on Vector Database for Personal Documents. In2025 International Conference on Advancement in Data Science, E-learning and Information System (ICADEIS) 2025 Feb 3 (pp. 1-6). IEEE.
Note: Please use your critical reading skills while reading entries. No warranties, implied or actual, are granted for any health or medical search or AI information obtained while using these pages. Check with your librarian for more contextual, accurate information.