Using AI tools to perform data extraction in knowledge synthesis (KS)

Compiled by

Dean Giustini, UBC Biomed librarian, dean.giustini@ubc.ca

Updated

10 May 2026 | Part of Knowledge Synthesis (KS) & AI Search Wiki 2026 & A to Z Listing

Introduction

Using AI tools to perform data extraction in knowledge synthesis (KS) is the application of computational methods including machine learning, natural language processing and large language models (LLMs) to automating or semi-automating the identification, retrieval, and structuring of information from scientific literature in systematic reviews and meta-analyses. Systematic reviews are resource-intensive undertakings; a comprehensive systematic review can take thousands of person hours to complete. AI-assisted data extraction has emerged as a possible solution for reducing this researcher burden.

Caveat: consult a biostatistician or research methods expert before undertaking use of any AI tool or platform for data extraction processes.

Background to the SR "data extraction" phase

Systematic reviews involve comprehensive retrieval of published and unpublished literature on a defined research question, followed by rigorous screening, quality appraisal, and synthesis of results. The data extraction phase—in which reviewers manually read each eligible study and record key variables such as population characteristics, interventions, outcome measures, and effect sizes—is among the most time-consuming steps in this process. Data extraction has typically required two independent reviewers to manually code each included study, with disagreements resolved by a third party. Given that a single systematic review may include dozens to hundreds of primary studies, this approach demands considerable human labour. The need for scalable, reproducible, and timely KS has driven substantial interest in automation and AI-powered systems.

Large language models (LLMs) continue to improve on a time scale measured in months. Even if these AI models stopped improving, the integration of these tools into systematic literature reviews (SLRs) would continue to transform workflows for years to come. This entry will provide a summary of AI use in SRs, discuss how the nature of LLM development and applications has disrupted the academic evaluation/application landscape, and provide suggestions for how we can adapt while maintaining rigour and safety.

End-to-end tools and platforms

Several tools have emerged and now support AI-assisted data extraction as well as other steps in KS. Platforms such as Rayyan, Covidence, EPPI-Reviewer, and Abstrackr incorporate machine learning and natural language processing but are designed to augment, rather than replace, human judgment.

Title and abstract screening: Abstrackr, Rayyan and Covidence are used by some researchers for title and abstract screening. Rayyan uses semi-automated ranking and prediction to learn from reviewer decisions, allowing relevant studies to be surfaced more quickly. Its collaborative interface supports blinded screening and conflict resolution, making it particularly useful for distributed review teams. Similarly, Covidence streamlines workflows by integrating citation import, deduplication, screening, and full-text review into a single environment. Its machine learning capabilities can prioritize references, thereby reducing screening burden without compromising comprehensiveness.

Text mining, coding, and data management: more advanced platforms such as EPPI-Reviewer offer end-to-end support for systematic reviews, including sophisticated text mining, coding, and data management functionalities. Developed by the EPPI-Centre in the UK, this tool is especially well-suited for complex or mixed-methods reviews, where customizable coding frameworks and iterative analysis are required. Abstrackr also focuses on semi-automated screening, using active learning algorithms to predict study relevance based on prior inclusion and exclusion decisions. This can significantly reduce the number of records that require manual review, particularly in large-scale searches.

AI-powered search tools

AI-powered search tools are reshaping how researchers discover and extract evidence by combining retrieval systems with LLMs. Unlike traditional databases, tools such as Elicit.com, Undermind.ai and Perplexity use retrieval augmented generation (RAG) to locate relevant documents and generate answers grounded in real sources, reducing hallucinations and improving transparency - but they are a work in progress. A key innovation is the integration of data extraction into the search process. For example, Elicit.com functions as an AI research assistant that retrieves papers and extracts structured data such as study design, sample size, and outcomes into comparison tables. This allows users to move quickly from discovery to synthesis, a task that traditionally required extensive manual effort. Similarly, Undermind.ai performs “deep search” by iteratively refining queries and identifying literature, producing comprehensive reports that approximate systematic searching workflows.

Other tools emphasize synthesis over extraction. Consensus focuses on peer-reviewed literature aggregating findings into a high-level “consensus” answer, helping users understand the overall direction of evidence. Perplexity, by contrast, operates as a real-time web-based answer engine that retrieves and summarizes information with inline citations. AI-powered search tools signal a shift from searching as retrieval towards searching as analysis, where finding, extracting, and synthesizing evidence occur in a single integrated workflow. More testing is required in all of these systems.

Note: before using any AI tools, consult a librarian who can explain the differences between traditional search methods and AI-powered methods.

Accuracy and validation

Evaluations of AI-assisted data extraction tools report accuracy in the range of 70–95% depending on data elements, domain, and model used.

Helms et al (2025) found that AI tools demonstrated high and similar performance in data extraction compared to human reviewers, particularly for standardized variables. Error analysis revealed confabulations in 4% of data points. They propose adopting AI-assisted extraction to replace the second human extractor, with the second human instead focusing on discrepancies between AI and primary human extractor.
Highly structured data elements, such as numerical outcomes or clearly labelled participant counts, tend to be extracted with greater accuracy in AI; however, complex or subjective variables such as intervention descriptions or clinical context are less likely to be extracted with accuracy. A persistent challenge is the lack of standardized benchmarking datasets, making it difficult to compare the performance of different tools across studies.

For relevant study examples, see:

Ethical and methodological considerations

The integration of AI into systematic reviews raises important questions about transparency, reproducibility, and accountability. Guidelines from bodies such as PRISMA and the Cochrane Collaboration are grappling with the reporting of AI use in systematic reviews, though formal standards remain under development. Key concerns include the risk of systematic bias introduced by training data, the opacity of model decision-making, and the challenge of auditing AI-generated extractions. Researchers have emphasised that AI should be positioned as a tool to augment rather than replace expert human judgement in evidence synthesis.

Future directions

Given the imperfections of fully automated systems, researchers advocate for hybrid workflows in which AI performs an initial extraction pass and human reviewers verify, correct, or augment outputs. This approach can reduce the time spent on data entry while retaining human expertise. Such workflows must be carefully designed to avoid automation bias, the tendency of human reviewers to over-trust machine outputs and fail to detect errors that would have been caught in a fully manual review. Ongoing research is exploring the use of multimodal AI systems capable of extracting data not only from text but also from tables, figures, and supplementary materials. Retrieval augmented generation (RAG) and knowledge graphs are used to improve contextual accuracy of extractions and enable more sophisticated cross-study comparisons. As LLMs continue to improve in capability and reliability, and as validation frameworks mature, AI-assisted data extraction is expected to become a standard component of the systematic review workflow, potentially enabling near-real-time evidence synthesis at a scale previously thought impossible.

Caveat: consult a biostatistician or methodologist versed in SR workflows and processes before undertaking use of any AI tool or platform.

References

Disclaimer

Note: Please use your critical reading skills while reading entries. No warranties, implied or actual, are granted for any health or medical search or AI information obtained while using these pages. Check with your librarian for more contextual, accurate information.