Course:LIBR557/2020WT2/boolean queries

From UBC Wiki

Boolean Queries

A Boolean query is an information retrieval search strategy, based on set theory and Boolean algebra, that uses exact matching to match documents to a user’s query or information request by searching for documents that match the words in the query (Lashkari, Mahdavi, & Ghomi, 2009). In the context of retrieval, the search request consists of sets of content terms that are interrelated by the Boolean operators: AND, OR, NOT (Salton, 1984). The records to be retrieved are often identified by using an inverted term, or attribute-value, index, providing a list of the records identifiers carrying the corresponding term for each query (Salton, 1984).

Boolean Operators

And example

AND

The Boolean operator AND is the intersection of two sets; for the Boolean AND of two logical statements, x AND y, both x AND y must be satisfied (Lashkari, Mahdavi, & Ghomi, 2009).

Example 1: Input and output of an AND search

Bumblebee AND Transformers

Will output results about the Transformer Bumblebee

Or example

OR

The Boolean operator OR is the union of two sets; for the Boolean OR of two logical statements, x OR y, at least one of x OR y must be satisfied (Lashkari, Mahdavi, & Ghomi, 2009).

Example 2: Input and output of an OR search

Bumblebee OR Transformers

Will output results about both Bumblebees and the Transformers

Not example

NOT

The Boolean operator NOT is the set inverse or set difference; for the Boolean NOT of two logical statements, x NOT y, x must be satisfied, NOT y, nor the intersection of y in x. (Lashkari, Mahdavi, & Ghomi, 2009).

Example 3: Input and output of an OR search

Bumblebee NOT Transformers

Will output results about Bumblebees, excluding results about Transformers

Advantages

  • Low cost and easily implemented.
  • Boolean queries are precise; the results either match the query or not.
  • Offers greater control and transparency over what is retrieved.

(Frants, Shapiro, & Voiskunskii, 1999; Schütze, Manning, & Raghavan, 2008).

Challenges

Precision and Results

  • AND operators tend to produce high precision but low recall searches; OR operators produce results with low precision but high recall.
    • May retrieve too few or too many documents.
    • Difficult to control the number of documents retrieved.

(Lashkari, Mahdavi, & Ghomi, 2009; Schütze, Manning, & Raghavan, 2008)

Term weighting

  • A Boolean model only records the presence or absence of the term; there is no importance attributed to the term, regardless of whether it appears in the document multiple times, or only once.

(Lashkari, Mahdavi, & Ghomi, 2009; Schütze, Manning, & Raghavan, 2008)

Ranking

  • Boolean queries retrieve a set of matching documents; no effective method to order, or rank the returned results

(Lashkari, Mahdavi, & Ghomi, 2009; Schütze, Manning, & Raghavan, 2008)

Formulation of query difficult

  • For many users, developing a good Boolean query is challenging; formulating effective query statements are difficult without assistance.

(Salton, 1984)

Future considerations

The ubiquity of the Boolean search system has influenced the generation and development of terminology for facets that are eventually used in a building blocks strategy; using Boolean query formulation alongside other information retrieval concepts and strategies. (Marchionini, 1995). The development of new analytical strategies incorporating boolean search, as well as other strategies, can streamline as well as provide more relevant search results (Marchionini, 1995).

Bibliography

Frants, V. I., Shapiro, J., Taksa, I., & Voiskunskii, V. G. (1999). Boolean search: Current state and perspectives. Journal of the American Society for Information Science, 50(1), 86-95.https://doi.org/10.1002/(SICI)1097-4571(1999)50:1<86::AID-ASI10>3.0.CO;2-7

Lashkari, A. H., Mahdavi, F., & Ghomi, V. (2009). A boolean model in information retrieval for search engines. IEEE, 2009, 385-389. https://doi.org/10.1109/ICIME.2009.101

Marchionini, G. (1995). Information seeking in electronic environments. Cambridge University Press. https://doi.org/10.1017/CBO9780511626388

Salton, G. (1984). The use of extended boolean logic in information retrieval. SIGMOD Record, 14(2), 277-285. https://doi.org/10.1145/971697.602295

Schütze, H., Manning, C. D., & Raghavan, P. (2008). Introduction to information retrieval [Online edition]. Cambridge University Press.