Note: OpenAI leads the general AI space, but AI companies are developing deep research tools and experimenting with AI-powered academic searching in support of research.
Also: This open textbook (or wiki channel) is intended to help librarians and other information professionals learn about AI. It is not, in itself, meant to be seen as promotion of AI. If anything, the goal is harms mitigation or harms reduction.
Introduction
Artificial intelligence (AI) is an ubiquitous and disruptive technology, and its language and terminologies are unfamiliar to some library and information professionals. For anyone new to AI and its vocabulary, learning basic concepts will build knowledge and understanding. I have gathered some of the most commonly used terms in this glossary, and then some; happy to take suggestions to build the glossary for shared understanding.
A
Agentic AI — refers to AI systems designed to act autonomously, perceiving their environment, making decisions, and taking actions to achieve specific goals. An example of an advanced agentic AI system is Auto-GPT, an open-source tool that employs large language models (LLMs) to autonomously perform complex, multi-step tasks by breaking them into sub-goals, reasoning through each step, and utilizing external tools like web browsers, APIs, or file systems to achieve objectives.
AI Bubble refers to an economic bubble where the stock prices and valuations of companies involved in artificial intelligence are rapidly increasing due to immense speculation and hype, rather than a clear path to realized profits. This situation raises concerns that investor enthusiasm for the technology's potential has outpaced its actual, immediate profitability and underlying value.
AI Copyright Lawsuits this entry will assist library and information professionals in understanding the basic issues of AI copyright infringement and/or legal actions. Check with your scholarly communications and/or copyright librarian for more contextual, accurate information.
AI Ethics (or, "Ethical AI") refers to issues that stakeholders consider to ensure AI technology is developed and used responsibly. This means adopting and implementing systems that support safe, secure, unbiased, and environmentally aware approaches. For librarians, our focus has been on conflicts with our core values, and adopting responsible strategies when dealing with or getting questions from user groups. One goal for library and information professionals is developing AI literacy skills, helping users understand and address ethical challenges that arise from using generative AI in university learning, and promoting responsible, ethical uses (and even non-usage).
AI Safety refers to the field of research and practices aimed at ensuring AI systems operate in ways that are safe, reliable, and aligned with human values. It focuses on mitigating risks such as unintended consequences, misuse, bias, or loss of control, particularly as AI becomes more advanced and autonomous. The goal is to prevent harm to humans, society, or the environment. This involves technical measures (like robust design and error correction), ethical considerations, and governance frameworks. SeeCenter for AI Safety.
Approximate string matching (also fuzzy matching / string searching) is a technique used by computers to identify and match strings that are similar or "approximate" but not identical. Instead of requiring an exact match, it calculates a similarity score between strings to determine their proximity or closeness, making it ideal for handling data with typos, spelling variations, and formatting inconsistencies.
AI-powered searching - a definition : AI-informed or AI powered academic searching refers to the use of AI technologies, such as machine learning and natural language processing, to enhance the location, collection and synthesis of scholarly papers.
AI Literacy: is the ability to understand, evaluate, and effectively use artificial intelligence tools and technologies. AI literacy encompasses knowledge of AI concepts, algorithms, data privacy, ethics, and potential implications of AI on society. It empowers individuals to assess AI applications critically, make informed decisions, and navigate an increasingly AI-driven world.
AI Winter: refers to the slow progress of AI development and a period of reduced funding and interest in AI research. SeeWikipedia entry.
Algorithm is a set of instructions or rules followed to complete a task. Algorithms are useful in machine learning. Data analysts use algorithms to organize or analyze data, while data scientists may use them to make predictions, or build models.
Algorithmic bias refers to AI systems that have bias embedded in them, which can manifest through various pathways including biased training datasets or biased decisions made by humans in the design of algorithms.
Application programming interface (API): is a set of protocols that determine how two software applications will interact with each other. APIs tend to be written in programming languages such as C++ or JavaScript.
Bidirectional Encoder Representations from Transformers (BERT) is a natural language processing (NLP) model developed by Google in 2018; BERT uses a deep learning architecture to understand the context of words in text, which allows it to achieve high accuracy on tasks such as text classification, sentiment analysis, and question answering. BERT has been widely adopted and is a fundamental baseline in many NLP applications, including Google Search, and has led to the development of other models.
Bi-encoders and cross-encoders are two architectures used in natural language processing (NLP) for tasks like text similarity, retrieval, or ranking
Big data refers to large data sets that reveal patterns and trends to support decisions. Organizations can now gather massive amounts of complex data using data collection tools and systems. Big data can be collected very quickly and stored in a variety of formats.
BM25 is a ranking function used in information retrieval and search engines to estimate the relevance of documents to a user's query. It works by calculating a score for each document based on the query terms, taking into account factors like term frequency, document length, and the rarity of the terms across the entire corpus. The higher the BM25 score, the more relevant the document is to the query.
C and D
Chatbot is a software application designed to imitate human conversation through text or voice commands. "Chatbot," a fusion of "chat" and "robot," originates from its initial function as a text-based dialogue system simulating human language. Early versions used input and output masks to create a mobile user experience mimicking a real-time conversation.
ChatGPT by OpenAI (generative pre-trained transformer) is an AI language model tool, designed to process and generate text, assisting with academic tasks and communication. There are 2 models – a free version Chat GPT-3.5 and paid-for versions Chat GPT-4 and -5.
Cognitive computing is a computer model that focuses on mimicking human thought processes such as understanding natural language, pattern recognition, and learning, similar to AI. Marketing teams sometimes use this term to eliminate the sci-fi mystique of AI.
ColBERT (Contextualized Late Interaction over BERT) is a retrieval model designed to balance efficiency of traditional methods like BM25 and the accuracy of deep learning models like BERT; an open source deep learning model used for natural language understanding.
Computer vision is an interdisciplinary field of science and technology that focuses on how computers can gain understanding from images and videos. For AI engineers, computer vision allows them to automate activities that the human visual system typically performs.
Copyright in Canada is an entry meant to help library and information professionals in Canada understand copyright in the AI era. Issues such as digital copyright and intellectual property obligations under various international agreements will provide much-needed background.
E to I
DALL-E (DALL-E, DALL-E 2, and DALL-E 3 - stylised DALL·E) are text-to-image models developed by OpenAI using deep learning to generate digital images from natural language descriptions known as prompts. https://en.wikipedia.org/wiki/DALL-E
Data mining is the process of closely examining data to identify patterns and glean insights. Data mining is a central aspect of data analytics; the insights you find during the mining process will inform your business recommendations.
Data science is an interdisciplinary field of technology that uses algorithms and processes to gather and analyze large amounts of data to uncover patterns and insights that inform business decisions.
Deep learning: is a machine learning technique that layers algorithms and computing units—or neurons—into what is called an artificial neural network (ANN). Unlike machine learning, deep learning algorithms can improve incorrect outcomes through repetition without human intervention. These deep neural networks take inspiration from the structure of the human brain.
Deep fakes, also synthetic media, refer to videos or audio, created using AI techniques i.e, deep learning where images are manipulated or generated using realistic-looking content to depict individuals saying or doing things they didn’t do. By training neural networks on large datasets of images, videos, or audio, deep fakes can mimic a person’s appearance, voice, or behaviour.
Ethical AI: refers to developing and using artificial intelligence (AI) systems in a manner that ensures transparency, equity, accountability, and respect for intellectual property. It counteracts bias, discrimination, and harms, while promoting inclusivity, privacy, and social value. Ethical AI requires human oversight, and a mechanism to address harms and unintended consequences.
Expert system: refers to "...a computer system that mimics the decision-making ability of a human expert by following pre-programmed rules, such as ‘if this occurs, then do that’. These systems fuelled much of the earlier excitement surrounding AI in the 1980s, but have since become less fashionable, particularly with the rise of neural networks".
GenAI or Generative AI uses AI to create content, including text, video, code and images. A generative AI system is trained using large amounts of data, so that it can find patterns for generating new content.
Google Scholar is used around the world and valued for its tracking of academic papers. Will it survive AI-powered search tools?
Guardrails in AI are mechanisms designed to ensure systems operate within ethical, legal, and technical frameworks; the idea is to prevent AI from causing harm, making biased decisions, or being misused.
Hallucination – inaccurate, factually incorrect information generated by AI.
Hallucinatiions in AI refer to misleading or made-up outputs produced by AI; outputs may appear plausible but are inaccurate or fictional due to various factors such as biases in training data or model's inability to grasp context. For instance, [[large language models (LLMs)] might generate coherent yet factually incorrect text, emphasizing the importance of critically evaluating and fact-checking AI-generated content.
Human-Centered AI (HCAI) prioritizes human values, information needs and well-being in designing AI systems, ensuring they augment rather than replace human abilities. HCAI is a collaborative, iterative process incorporating human input from diverse stakeholders and focuses on ethics to create transparent, fair, and beneficial AI that improves human performance and fosters trust.
Interpretability: some machine learning models, particularly those trained with deep learning, are so complex that it may be difficult or impossible to know how the model produced the output. Interpretability describes the ability to present or explain a machine learning system’s decision-making process in terms that can be understood by humans. Interpretability is sometimes referred to as transparency or explainability.
L to O
Large language models (LLMs) — are trained on large amounts of text so they can understand natural language and generate human-like text. LLMs lack the ability to understand truth or meaning of words, relying instead on learned patterns. This phenomenon is an important consideration in fact-checking and assessing reliability of LLM-generated content.https://en.wikipedia.org/wiki/List_of_large_language_models
Latency: refers to the delay between an AI system receiving an input and producing an output. It essentially measures how quickly an AI model can process information and respond, significantly impacting the user experience, especially in real-time applications.
Learned sparse retrieval (LSR) or sparse neural search is an approach to information retrieval which uses a sparse vector representation of queries and documents. LSR borrows techniques both from lexical bag-of-words and vector embedding algorithms, and claims to perform better than either alone.
Lexical searching focuses on matching exact words or phrases in the query with those in a corpus of documents. Lexical searching is important in speed and precision when dealing with specific terms or structured data. Semantic searching, on the other hand, is best at handling natural language queries, understanding context, and exploring related concepts.
Librarian-Centred AI (LCAI) refers to artificial intelligence (AI) systems designed specifically to support librarians and enhance library services but puts librarians at the centre. Librarian-centred AI models should also prioritize the ethics and values of librarians, and aim to resolve a myriad of ethical conflicts presented by AI such as copyright infringement; data breaches; algorithmic biases; intellectual property theft, and so on. Above all, AI should be used to support librarians’ expertise, and be put in service of our work with user communities.
Machine learning: is a subset of AI in which algorithms mimic human learning while processing data. With machine learning, algorithms improve over time, becoming more accurate when making language predictions and text classification. Machine learning focuses on developing algorithms and models that help machines learn from data and predict trends and behaviours, without human assistance. One of the challenges in adopting machine learning is the limited understanding and transparency in algorithms. The black-box nature of some deep learning models can make it challenging to interpret their decisions, leading to concerns about trust and accountability.
Natural language processing: is a type of AI that enables computers to understand spoken and written human language. NLP enables features like text and speech recognition on devices.
Neural networks: are a deep learning technique designed to resemble the complex structure of the human brain. NNs require large data sets to perform calculations and create outputs, which enable features like speech and vision recognition.
o3 and o4 mini: are generative pre-trained transformer (GPT) models as successors to OpenAI o1 for ChatGPT. They were designed to devote additional deliberation time when addressing questions that require step-by-step logical reasoning. SeeWikipedia entry.
P to S
Pattern recognition — refers to the method of using computer algorithms to analyze, detect, and label regularities in data. This informs how the data gets classified into different categories.
Predictive analytics is analytics that uses technology to predict what will happen in a specific time frame based on historical data and patterns.
Prescriptive analytics is analytics that uses technology to analyze data for factors such as possible situations and scenarios, past and present performance, and other resources to help organizations make better strategic decisions.
A prompt is an input that users feed to an AI system in order to get a desired result or output. SeePrompt engineering.
Prompt engineering refers to crafting effective instructions (or, prompts) in AI models to guide them, and obtain desired outputs.
Prompting guardrails - a deliberate boundary, rule, or structural aid built into how prompts are written or used with an AI system to help keep outputs reliable, accurate, safe, and aligned with the intended purpose.
Really Simple Licensing (RSL) for AI is an open, XML-based document format for defining machine-readable licensing terms for digital assets, including websites, web pages, books, videos, images, and proprietary datasets. RSL builds on the familiar robots.txt protocol, but instead of just saying yes or no to crawlers, publishers can embed licensing terms directly into files. Instructions could include charges per crawl, subscription fees, and payment every time an AI model references content.
Reasoning models are a class of artificial intelligence systems that excel in tasks requiring logical reasoning, complex problem-solving, and contextual understanding. Unlike traditional models that may provide quick responses, reasoning models engage in a more deliberate thought process, analyzing multiple factors before arriving at a conclusion.
Reinforcement learning is a type of machine learning that learns by interacting with its environment and receiving positive reinforcement for correct predictions and negative reinforcement for incorrect predictions. This type of machine learning may be used to develop autonomous vehicles. Common algorithms are temporal difference, deep adversarial networks, and Q-learning.
Retrieval augmented generation (RAG) refers to a technique combining the strengths of retrieval-based and generative AI models. In RAG, an AI system first retrieves information from a large dataset or knowledge base and then uses this retrieved data to generate a response or output. Essentially, the RAG model augments the generation process with additional context or information pulled from relevant sources.
Scheming = when an AI behaves one way on the surface while hiding its true goals; current AI systems are capable of carrying out “scheming,” “deception,” “pretending,” and “faking alignment” — meaning, they're bent on carrying out their own secret goals. LLMs can be prompted—or may decide on their own—to manipulate tasks, hide their true intentions, or otherwise “scheme” when it serves its primary goal.
Semantic searching semantic searching, enabled by using artificial intelligence (AI), aims for comprehension of search queries by determining the contextual meaning of terms and phrases. Where traditional keyword searching provides an idea about the presence of terms and phrases (or, their absence) within a corpus of searchable documents, particularly their titles and abstracts, semantic searching augments this by narrowing the search to an understanding of the intent and contextual meaning of words ie., the semantics.
Supervised learning is a type of machine learning that learns from labeled historical input and output data. It’s “supervised” because you are feeding it labeled information. This type of machine learning may be used to predict real estate prices or find disease risk factors. Common algorithms used during supervised learning are neural networks, decision trees, linear regression, and support vector machines.
T to Z
Token — a token is a basic unit of text that an LLM uses to understand and generate language; may be an entire word or parts of a word.
Training data: is the information or examples given to an AI system to enable it to learn, find patterns, and create new content.
Transfer learning: is a machine learning system that takes existing, previously learned data and applies it to new tasks and activities.
Turing test: was created by computer scientist Alan Turing to evaluate a machine’s ability to exhibit intelligence equal to humans, especially in language and behavior. When facilitating the test, a human evaluator judges conversations between a human and machine. If the evaluator cannot distinguish between responses, then the machine passes the Turing test. https://en.wikipedia.org/wiki/Turing_test
Unstructured data: is data that is not organized in any apparent way. In order to analyze unstructured data, you’ll typically need to implement some type of structured organization.
Unsupervised learning: is a machine learning type that looks for data patterns but doesn’t learn from labeled data like supervised learning. This type of machine learning is often used to develop predictive models and to create clusters. You can use unsupervised learning to group customers based on purchase behaviours, and then make product recommendations based on purchasing patterns of similar customers. Hidden Markov models, k-means, hierarchical clustering, and Gaussian mixture models are common algorithms used during unsupervised learning.
Vector database (also vector store or vector search engine) is a database that uses the vector space model to store vectors (fixed-length lists of numbers) along with other data items. Vector databases employ one or more approximate nearest neighbor algorithms so that one can search the database with a query vector to retrieve the closest matching database records. Seehttps://en.wikipedia.org/wiki/Vector_database
Vector search is used in search engines to retrieve documents, articles, web pages or other textual content based on their similarities to queries. Vector searching enables users to find relevant information even if the exact terms used in the query are not present in the documents. The first step in vector searching is translating text to vectors (or numbers) and processing them through a large language model. VS is a type of AI system trained on vast amounts of text (for example, research papers, books, and web content). During training, the model learns how words appear together and in what contexts. Over time, it builds a kind of map of language, where meanings cluster naturally. In medicine, words that often appear in similar contexts, such as doctor and physician, end up close together in this semantic map. Words that rarely co-occur or belong to very different contexts, like insulin and wheelchair, are far apart.
Vibe coding: Computer scientist Andrej Karpathy, a co-founder of OpenAI and former AI leader at Tesla, introduced the term vibe coding in February 2025. The concept refers to a coding approach that relies on LLMs, allowing programmers to generate working code by providing natural language descriptions rather than manually writing it. seehttps://en.wikipedia.org/wiki/Vibe_coding
This glossary is intended to help librarians and other information professionals learn about AI. It is not, in itself, meant to be seen as promotion of AI.
Bottom line: For health sciences librarians, AI tools might help to support their work with health professionals but many of underlying processes raise concerns for anyone interested in scientific accuracy, transparency and rigour in reviews. Note information provided to you on this page is changing, so check for current information (or discuss with a librarian). Incidentally, librarians like to make a distinction between searching for sources and searching for answers. This much is true: so much AI provides the second while hiding the first; transparency is not a strength.
Disclaimer
Note: Please use your critical reading skills while reading entries. No warranties, implied or actual, are granted for any health or medical search or AI information obtained while using these pages. Check with your librarian for more contextual, accurate information.