We are an independent group of health sciences and medical librarians interested in automated indexing, here to share research, observations, and suggestions on indexing and searching.
We came together in 2024 due to a shared interest in the Medical Text Indexer (MTI), which the National Library of Medicine developed to automate the indexing of Medline records. We have continued meeting on a monthly basis since then to collaborate and exchange ideas.
If you are interested in learning more about us or dropping in for an online meet-up, please reach out to: [method of contact to be discussed?]
Members and affiliations
Alex Amar-Zifkin, Bibliothécaire / Librarian | Université de Montréal
Eileen Chen, Education and Research Librarian | University of California, San Francisco
Dean Giustini, UBC Biomed librarian | University of British Columbia, Vancouver
Tyler Ostapyk, Liaison Librarian | Winnipeg Regional Health Authority Virtual Library, University of Manitoba, Winnipeg
Eleni Philippopoulos, Assistant Librarian | McGill University, Montreal
Aims
Monitor and share emerging developments in automated indexing;
Promote understanding of how automated indexing affects the discoverability and accuracy of biomedical information;
Alternative: Increase awareness of the changes to indexing practices and their potential impact on search and discovery
Advocate for practitioner education, indexing changes?
Maybe: advocate for transparency in processes and decision-making related to indexing;
Create a community space where professionals can exchange experiences, raise questions, and contribute to shaping the future of indexing;
Alternative: Foster dialogue among librarians, information professionals and researchers about the evolving role of automated indexing in databases;
Share (and collaborate on) research.
Introduction
Automated indexing has been defined as “indexing the subject content of papers by means of a computer, either with some human intervention and oversight, or none at all”. Automated indexing may be performed by using a range of computer-based methods, algorithms (hence, the phrase algorithmic indexing), natural language processing and even artificial intelligence (AI). Automated indexing can also refer to “semi- and/or partly automated” processes depending on the levels of curation involved. According to Ruiz and Aronson (2008), automatic indexing is a form of text categorization, where documents are assigned terms from a controlled vocabulary by machines in order to summarize their contents.
Automated (or, semi-automated) compared to human indexing?
A commonly-stated goal of state-of-the-art automated indexing is to mimic human indexing; however, its main challenge is to extract an exhaustive precise set of terms just as a human indexer would to represent the subject content of every document in a database. In 2022, NLM implemented fully automated indexing using their Medical Text Indexer (MTI) with human review for certain subjects, while other records are reviewed at random. Over the years, several large-scale MeSH indexing approaches have been proposed to improve upon the MTI such as the MeSHLabeler, DeepMeSH and MeSHProbeNet, to name a few. However, the performance of these methods is hampered by their use of the titles and abstracts of biomedical articles only. The NLM continues to evaluate innovative technologies and the full text of papers to improve automated indexing performance in MEDLINE but new problems seem to arise as new medical concepts are introduced in the biomedical literature.
What is algorithmic indexing?
Automated indexing in MEDLINE is sometimes referred to as algorithmic indexing (see Amar-Zifkin et al (2025)). In the MTI, algorithms are key in the indexing workflow at NLM. In 2022, front line indexing for all MEDLINE records was performed by the MTIA, with humans limiting their curation to sets involving genes and proteins. In 2025, the NLM uses the MTIX which is based on neural networks technology.
According to the Encyclopedia of Knowledge Organization, “[algorithmic] indexing is indexing by search-engines and other forms of automatic indexing on the web. Automation plays an important role because of the scale of available information and [they address] the lack of [human] inter-indexer consistency. However, this is not just solved by applying automatic indexing methods”. In other words, algorithmic indexing is not an “objective” process either, as it reflects a worldview of the texts it indexes, and may perpetuate its own specific perspectives and biases. A reliance on using a large corpus of raw text to return outputs means that these algorithms can suffer from a lack of indexing precision and reliability.
Medical text indexer (MTIX) and MEDLINE
The Medical Text Indexer (MTI) is the automated indexing tool developed by the National Library of Medicine (NLM) for MEDLINE. Released in 2024, the MTIX (Medical Text Indexer-NeXt Generation) uses machine learning and neural networks to assign Medical Subject Headings (MeSH) to articles, improving indexing speed and scalability. Trained on millions of MEDLINE citations from 2007–2022, the MTIX analyzes titles, abstracts, and journal metadata to recommend relevant MeSH terms with high recall (e.g., >94% for disease detection) and precision (e.g., 87% for disease categories). The MTI supports semi-automated and fully automated indexing, reducing the workload for human indexers while maintaining standards - although the MTIX has an error rate of 10% based on an F-score of .90. See Amar-Zifkin et al, 2025 and Askin et al, 2025.
Neural networks in MTIX enable rapid, precise indexing, critical for scaling up to the growing volume of biomedical literature - in 2024, 1.5 million papers. While human curation remains in place in MEDLINE for quality control, MTIX’s automation project and use of AI supports applications such as the publicly-available MeSH on Demand tool, aiding researchers in metadata identification. For medical texts, MTIX processes full-text articles when available, improving term coverage over title-and-abstract-based methods. Filtering techniques, like ranking scores and excluding lengthy documents, further boost accuracy. Despite these advancements, human indexers are still needed to correct and curate Medline records.
Bourgeois JP, Ellingson H. Ability of ChatGPT to Generate Systematic Review Search Strategies Compared to a Published Search Strategy. Med Ref Serv Q. 2025;31:1-13.
Golub K. Automated subject indexing: An overview. Cataloging & Classification Quarterly. 2021 Nov 29;59(8):702-19.
Gram EG, Kramer BS, Jørgensen KJ, Woloshin S. Trends in use of the new MeSH term “overdiagnosis”: A bibliometric review. Health Information & Libraries Journal. 2025;1–10. https://onlinelibrary.wiley.com/doi/pdf/10.1111/hir.70000
Miles WD. A history of the National Library of Medicine: the nation's treasury of medical knowledge. US Department of Health and Human Services, National Institutes of Health, National Library of Medicine; 1982.
National Library of Medicine. NLM Medical Text Indexer. NLM Technical Bulletin. March-April 2024.
Obaseki TI. Automated indexing: the key to information retrieval in the 21st century. Library Philosophy and Practice. 2010 Mar 1:1.
Park SG, Carroll M, Esteve LM, Singh K. Exploring Generative AI and Natural Language Processing to Develop Search Strategies for Systematic Reviews. In: 2024 ASEE Annual Conference & Exposition 2024 Jun 23.
Ruiz ME, Aronson AR, Hlava M. Adoption and evaluation issues of automatic and computer aided indexing systems. Proceedings of the American Society for Information Science and Technology. 2008;45(1):1-4.
Potential Platforms for Hosting Website
Wiki (managed by Dean)
Quick to set up - good interim solution
Possibly less visibility and design customizability
Update: current automated indexing wiki page has ~11000 views!