Note: OpenAI leads the general AI space, but AI companies are developing deep research tools and experimenting with AI-powered academic searching in support of research. Perhaps you have faculty or students asking you to present these tools to classes.
Reasoning models are large language models (LLMs) in some AI systems that perform tasks requiring logical reasoning, complex problem-solving, and contextual understanding. A reasoning model is fine-tuned to break complex tasks into smaller steps, often called “reasoning traces”. Unlike traditional AI models designed to provide quick human like responses, reasoning models engage in deep learning and thought processes, analyzing multiple factors before providing possible solutions. Deep research and reasoning models in large language models (LLMs) enhance their ability to tackle complex questions by combining information gathering with logical analysis.
These capabilities make LLMs more versatile and reliable, especially for tasks requiring critical thinking or up-to-date information. Turgunbaev et al (2025) "...argues that reasoning models not only improve the accuracy and scalability of metadata extraction but also provide interpretability, adaptability, and resilience to variations in document structures. Future directions point toward hybrid systems that combine reasoning with advances in machine learning and natural language processing, creating intelligent infrastructures for the dynamic landscape of scientific publishing."
Examples of reasoning models
OpenAI's O3 and O4 Mini:: designed for step-by-step problem-solving; effective in technical domains such as science and programming; can utilize external tools for enhanced functionality; generative pre-trained transformer (GPT) models as successors to OpenAI o1 for ChatGPT; designed to devote additional deliberation time when addressing questions that require step-by-step logical reasoning. SeeWikipedia entry.
Google's Gemini 2.5: can process various data types; self-fact-checking capabilities; suitable for generating applications and games, etc.
Claude 4.1 Opus: maintains context over long conversations; excels in open-ended reasoning tasks and provides nuanced responses.
Grok xAI released Grok 4 and 4 Heavy; xAI claims models outperform rival models in benchmark tests; a generative artificial intelligence chatbot developed by xAI; generated various controversial responses, including conspiracy theories and antisemitism.
DeepSeek‑R1: designed to tackle challenging queries that require thorough analysis and structured solutions; used in complex coding challenges or detailed logical puzzles.
Environmental impact
Computational costs
Reasoning models often need far more computional time and power while answering than non-reasoning models. On AIME, they were 10 to 74 times more expensive than non-reasoning counterparts.
Generation time
Due to the tendency of reasoning language models to produce verbose outputs, the time it takes to generate an output increases greatly when compared to standard large language models (LLMs).
Librarian criticism
Reasoning models should be approached critically; the underlying systems don’t “think” but simulate reasoning patterns based on data that is biased, opaque, or simply wrong. Their step-by-step explanations can create a false sense of precision and rigour, undervaluing some evidence or creating hallucinated sources. Librarians should aim to understand how reasoning models work in order to question them - essential for teaching others about AI limits, safeguarding information ethics, and resisting the uncritical adoption of tools that may undermine expertise, privacy, equity, and established practices in evidence-based searching.
References
Have many librarians (e.g., Aaron Tay) written about this topic? Let me know. Dean Giustini, UBC Biomed librarian, dean.giustini@ubc.ca
Dangol A, Wolfe R, Zhao R, Kim J, Ramanan T, Davis K, Kientz JA. Children's Mental Models of AI Reasoning: Implications for AI Literacy Education. InProceedings of the 24th Interaction Design and Children 2025 Jun 23 (pp. 106-123).
Li L, Zhou X, Liu Z. R2MED: A Benchmark for Reasoning-Driven Medical Retrieval. arXiv preprint arXiv:2505.14558. 2025 May 20.
Tordjman M, Liu Z, Yuce M, Fauveau V, Mei Y, Hadjadj J, Bolger I, Almansour H, Horst C, Parihar AS, Geahchan A. Comparative benchmarking of the DeepSeek large language model on medical tasks and clinical reasoning. Nature medicine. 2025 Apr 23:1-.