Course:LIBR557/2020WT2/test collection

Test Collection

Test collections are perhaps the most widely used tool for evaluating the effectiveness of information retrieval (IR) technologies. Test collections consist of a set of topics or information need descriptions, a set of information objects to be searched, and relevance judgments indicating which objects are relevant for which topics.

History

the popularity of test collections in IR evaluation has flourished in large part thanks to campaigns such as the Text Retrieval Conference (TREC), the Cross-Language Evaluation Forum (CLEF), the NII Testbeds and Community for Information Access Research project (NTCIR), the Initiative for the Evaluation of XML Retrieval (INEX), and the Forum for Information Retrieval Evaluation (FIRE). In particular, TREC, which has ran since 1992, has generated and made available a number of test collections and enabled hundreds of groups from all over the world to participate in the development of next generation retrieval technologies (Voorhees and Harman 2005).

Standard Test Collections

Text Retrieval Conference (TREC)

The U.S. National Institute of Standards and Technology (NIST) has run a large IR test bed evaluation series since 1992. Within this framework, there have been many tracks over a range of different test collections, but the best known test collections are the ones used for the TREC Ad Hoc track during the first 8 TREC evaluations between 1992 and 1999. In total, these test collections comprise 6 CDs containing 1.89 million documents (mainly, but not exclusively, newswire articles) and relevance judgments for 450 information needs, which are called topics and specified in detailed text passages. Individual test collections are defined over different subsets of this data. The early TRECs each consisted of 50 information needs, evaluated over different but overlapping sets of documents. TRECs 6-8 provide 150 information needs over about 528,000 newswire and Foreign Broadcast Information Service articles. This is probably the best sub-collection to use in future work, because it is the largest and the topics are more consistent. Because the test document collections are so large, there are no exhaustive relevance judgments. Rather, NIST assessors' relevance judgments are available only for the documents that were among the $top^{k}$ returned for some system which was entered in the TREC evaluation for which the information need was developed.

In more recent years, NIST has done evaluations on larger document collections, including the 25 million page GOV2 web page collection. From the beginning, the NIST test document collections were orders of magnitude larger than anything available to researchers previously and GOV2 is now the largest Web collection easily available for research purposes. Nevertheless, the size of GOV2 is still more than 2 orders of magnitude smaller than the current size of the document collections indexed by the large web search companies.

NII Test Collections for IR Systems ( NTCIR )

The NTCIR project has built various test collections of similar sizes to the TREC collections, focusing on East Asian language and cross-language information retrieval , where queries are made in one language over a document collection containing documents in one or more other languages.

Cross Language Evaluation Forum ( CLEF )

This evaluation series has concentrated on European languages and cross-language information retrieval. See: http://www.clef-campaign.org/

References

Scholer, F., Kelly, D. & Carterette, B. Information retrieval evaluation using test collections. Inf Retrieval J 19, 225–229 (2016). https://doi.org/10.1007/s10791-016-9281-7

Robertson, S. (2008). On the history of evaluation in IR. Journal of Information Science, 34(4), 439–456.

Sakai, T. (2016). Topic set size design. Information Retrieval Journal. doi:10.1007/s10791-015-9273-z.

Voorhees, E. M., & Harman, D. (2005). TREC: Experiment and evaluation in information retrieval. Cambridge, MA: MIT Press.

Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge University Press.[1]