Natural language processing
Compiled by
Updated
See also
IntroductionNatural language processing (NLP) belongs to a branch of artificial intelligence (AI) enabling computers to understand spoken and written human language. NLP enables text and speech recognition on devices, for example. NLP combines computational linguistics, machine learning, and computer science to bridge gaps between human communication and machine understanding; it allows systems to process vast amounts of data, extracting insights, identifying patterns, and responding in ways that mimic human comprehension. For decades, the US National Library of Medicine has used NLP techniques in automated indexing, see entry below. Some common NLP techniquesSource: https://datasciencedojo.com/blog/natural-language-processing-applications/.
PresentationApps using NLPNatural language processing (NLP) applications power tools such as virtual assistants, chatbots, and language translation services. These systems analyze syntax, semantics, and context to perform tasks such as sentiment analysis, speech recognition, and text summarization. NLP enables voice-activated devices to respond to spoken commands and search engines to deliver relevant results based on queries. In recent years, NLP development has been accelerated by deep learning models, particularly large language models (LLMs) trained on massive datasets, which enhance their ability to grasp nuances in responses regarding tone, intent, and cultural references. Challenges with NLP include handling ambiguity, understanding low-resource languages, and mitigating biases embedded in training data. Ethical considerations are critical, as NLP systems will perpetuate stereotypes or misinformation if not carefully calibrated. Advances in transfer learning and fine-tuning have improved NLP's adaptability, allowing models to specialize in domains, such as legal or medical texts. NLP struggles with emotional intelligence and contextual depth, as human language is complex, shaped by culture, history, and personal experience. Ongoing research aims to make NLP more inclusive, efficient, and capable of understanding the subtleties of human communication, potentially transforming how we interact with technology and each other in fields ranging from education to healthcare. National Library of Medicine (NLM)'s Use of NLPIn 2002, NLM created MetaMap archive, which was integral in its early Medical Text Indexer (MTI) (used in automated indexing). The MetaMap employed linguistic knowledge to map text to UMLS codes, and identified Unified Medical Language System (UMLS) concepts in text based on linguistic principles. MM uses a minimal commitment parser, lexicon, and part-of-speech tagger, all developed at the NLM. It then retrieves candidate terms from the UMLS Metathesaurus, and scores the terms based on an evaluation function. It includes a word-sense disambiguation facility, recently enhanced with a statistical context-sensitive method. MetaMap underpins the Medical Text Indexer (MTI), which summarizes text using the Medical Subject Heading (MeSH) terminology. MTI has been used in production since 2002 for indexing MEDLINE citations, cataloguing and History of Medicine records. Although superceded by the MTIX in 2024, the prior MTI versions processed the titles and abstracts of PubMed records and then recommended MeSH terms, which were reviewed by experts who selected, revised, and approved terms. In February 2011, MTI became the first-line indexer (MTIFL) for a select number of journals, where it has historically performed well. The MTIFL indexing for these journals is only revised by an indexer. In 2025, MEDLINE indexing is driven by neural networks technology of the MTIX. Translational NLP in the biomedical domain (BioNLP) is a topic of investigation at NLM, and needs to use the vast amount of biomedical knowledge and ontologies in NLP as well as the potential for handling the very complex verb-dominated biomolecular language utilizing sublanguage theory. Another example is SemRep, based on linguistic symbolic principles, which is used to extract predications needed for biomolecular text mining. References
Disclaimer
|

