Semantic searching
Compiled by
Updated
See also
IntroductionSemantic searching is an information retrieval method that uses artificial intelligence (AI) and natural language processing (NLP) to interpret the meaning and context of words in a search query, rather than matching exact words, terms or phrases — as in lexical searching. Semantic searching is increasingly allied to AI-powered systems and large language models (LLMs) to recognize synonyms, abbreviations, related concepts, and clinical terminology, making it possible for the clinician to be understood more fully and to retrieve documents based on meaning not just keywords. Many modern information retrieval systems on the web work by using keywords to find similar, related documents and by matching those documents based on terms in a query. Traditional keyword searching reveals occurrences of words and phrases within a corpus of searchable documents or websites. Conversely, semantic searching aims to get at understanding the context of those words as used by a searcher. In response to searching for papers about heart attack, keyword searching returns documents containing the words “heart” and “attack” while a semantic search will seek out deeper associations. Semantic searching will return results that contain the terms “myocardial infarction,” “acute coronary syndrome,” and “cardiac ischemia,” even if the phrase “heart attack” is not present. Where there is a mismatch in keyword searches due to terms used, or other limitations or expansions around related terms, a semantic search will result in a more complete search — at least in theory! Lexical vs. semantic searchingLexical searching and semantic searching are two distinct approaches to information retrieval, and differ in how they process and match queries to relevant information. As mentioned, lexical search approaches focus on matching exact words and phrases in a given query with those in a corpus of records. Lexical searching is important in speed and precision when dealing with specific terms or structured data. On the other hand, semantic searching is better at handling natural language queries, understanding context, and exploring related concepts. Most bibliographic databases, licensed by libraries, such as MEDLINE and EMBASE use lexical approaches (however, increasingly, they are incorporating aspects of AI). These databases are structured using a controlled vocabulary of subject headings or "index" terms. Historically, controlled terms or subject headings are applied by human indexers to describe the subject content of papers in a database (but this is now an automated indexing or semi-automated process). The drawback to "lexical" searching is that the contextual knowledge surrounding a user’s underlying need may be complex, unlisted in the index, or have other meanings unrepresented by a given query. Semantic searching aims to resolve this challenge. One method in particular, explicit semantic analysis, aims to map a document's content as a graph of concepts. While similar to subject indexing, semantic searching is different in that it is boosted by natural language processing and other AI techniques. The problem with semantic searching is that searchers are unclear as to what is going on under the hood of AI systems, and searches are not reproducible due to their dynamic nature. Both have their strengths: lexical for pinpoint accuracy, semantic for broader, context-aware exploration. Note: Search engines such as Google Scholar rely on exact keyword matches, but AI tools such as Elicit.com, Semantic Scholar, and Undermind.ai use semantic understanding to interpret natural language queries in order to find conceptually relevant papers. References
Disclaimer
|

