Jump to content

Reasoning models

From UBC Wiki
Source: Preisler, 2024, pg.6.

Compiled by

Updated

See also

Introduction

Reasoning models are large language models (LLMs) in some AI systems that perform tasks requiring logical reasoning, complex problem-solving, and contextual understanding. A reasoning model is fine-tuned to break complex tasks into smaller steps, often called “reasoning traces”. Unlike traditional AI models designed to provide quick human like responses, reasoning models engage in deep learning and thought processes, analyzing multiple factors before providing possible solutions. Deep research and reasoning models in large language models (LLMs) enhance their ability to tackle complex questions by combining information gathering with logical analysis.

These capabilities make LLMs more versatile and reliable, especially for tasks requiring critical thinking or up-to-date information. Turgunbaev et al (2025) "...argues that reasoning models not only improve the accuracy and scalability of metadata extraction but also provide interpretability, adaptability, and resilience to variations in document structures. Future directions point toward hybrid systems that combine reasoning with advances in machine learning and natural language processing, creating intelligent infrastructures for the dynamic landscape of scientific publishing."

Examples of reasoning models

  • OpenAI's O3 and O4 Mini: effective in technical domains such as science, mathematics, and programming. They can utilize external tools to enhance functionality and are part of OpenAI’s generative pre-trained transformer (GPT) family. Positioned as successors to OpenAI’s o1 reasoning model for ChatGPT, they are optimized to devote additional deliberation time to questions requiring structured logical reasoning. See Wikipedia entry.
  • Google's Gemini 2.5 a multimodal model capable of processing text, images, audio, video, and code. Gemini 2.5 includes self-fact-checking and verification mechanisms and is well suited for tasks such as application development, game generation, and complex reasoning across multiple data types.
  • Claude 4.1 Opus excels in open-ended reasoning tasks, extended analysis, and nuanced responses. It is often used for complex writing, policy analysis, and problems that benefit from sustained contextual understanding.
  • Grok xAI released Grok 4 and 4 Heavy Grok models are tightly integrated with real-time data sources, and have drawn attention for generating controversial outputs, including conspiracy-related and antisemitic content, raising concerns about safety and moderation.
  • DeepSeek‑R1 is a reasoning-focused model designed to address challenging queries requiring thorough analysis and structured solutions. It is commonly applied to complex coding problems, mathematical reasoning, and detailed logical puzzles.

Environmental impact

  • Computational costs
  • Reasoning models often need far more computational time and power while answering than non-reasoning models.
  • On AIME, they were 10 to 74 times more expensive to operate than non-reasoning counterparts.
  • Generation time
  • Due to the tendency of reasoning language models to produce verbose outputs, the time it takes to generate an output increases greatly when compared to standard large language models (LLMs).

Librarian criticism

Overall, librarians may view reasoning models as tools that can augment research and learning, but not replace human judgment or scholarly sources. Our focus remains on teaching users how to evaluate, verify, and contextualize AI-generated reasoning outputs within established research practices. In other words, tools that use these models, like most AI, should be approached critically; the underlying systems don’t “think” but simulate reasoning patterns based on data that is biased, opaque, or simply wrong. Their step-by-step explanations can create a false sense of precision and rigour, undervaluing some evidence or creating hallucinated sources. Librarians should aim to understand how reasoning models work in order to question them - essential for teaching others about AI limits, safeguarding information ethics, and resisting the uncritical adoption of tools that may undermine expertise, privacy, equity, and established practices in evidence-based searching.

References

Disclaimer

  • Note: Please use your critical reading skills while reading entries. No warranties, implied or actual, are granted for any health or medical search or AI information obtained while using these pages. Check with your librarian for more contextual, accurate information.