Library:Digging into Digital Book Collections

From UBC Wiki
Jump to navigation Jump to search

This page was created to support the UBC Library Workshop "Digging into Digital Book Collections" offered in Winter 2014.

Workshop Description

Learn how to further your research by using digital book collections including Google Books and the Internet Archive. You will leave this workshop knowing:

  • how to search within books to locate research material not evident from title or chapter descriptions
  • conduct more thorough literature reviews on primary sources
  • perform cited reference searches
  • conduct historical word searches
  • and more!

Copyright and Digital Book Collections

Ambox warning yellow.png This article is still being drafted. This means that the article is still being worked on and information may be incomplete. This template will be removed when the article is finished. If you have any concerns, please start a discussion on the talk page.

This information should not be construed to be legal advice nor UBC policy. More information about Copyright at UBC can be found at

Most material in digital book collections is still in copyright and this affects how you can access and use it. See UBC's official copyright website for more information about your rights and responsibilities regarding copyright.

Public Domain

In Canada, the copyright for a work usually expires 50 years after the death of the creator, at the end of the relevant calendar year. At this point it is said to pass into the public domain.

Example: Mordecai Richler died on 3 July 2001, and his novels will remain copyrighted until 31 December 2051, passing into the public domain on 1 January 2052.

Items that are in the public domain are free to use in any way you choose. That means no restrictions on copying and adapting, no need to seek permission, and no uncertainty about your rights as a user! (There is also no legal requirement to attribute works in the public domain to their creators, although doing so is an important part of maintaining academic integrity.)

  • Determining whether a work is in the public domain can be complicated as the duration of copyright differs depending on a work's authorship and format. If you are uncertain whether a work is in the public domain in Canada you can contact UBC's copyright help list-serv for more assistance.

US Copyright

Many digital book collections were created in the United States. American copyright law is different from Canadian copyright law and there are many possible periods of copyright. While American copyright law does not apply in Canada, it does often have the effect of restricting access to digital collections held on servers in the United States.

Overview of Some Digital Book Collections

Digital Book Collections on the Internet

Google Books is a large-scale book digitization project by Google. Its goal is to make the large print collections of a number of libraries available for full-text searching on the web. The initial partner libraries included Harvard University, Stanford University, University of Michigan, New York Public Library, and the Bodleian Library at Oxford University. Additional libraries have since joined the project. While the records and text of these library collections are now available for full-text search, many records do not provide a full-text preview due to copyright restrictions. Works that are in the public domain can be downloaded.

Open Library is a project sponsored by the Internet Archive with the goal to create a web page for every published book. Like Google Books, the Open Library does permit you to conduct full text search as well as view and download open domain books. The Open Library is a Wikipedia-like project and has grown extensively through public contributions, but this also means that it has a mixed-bag of bibliographic data making searching it complex. It has also become a repository for many terminated or dormant digitization projects such as those by the Microsoft, the University of Toronto, and Cornell University. These projects often have superior OCR to those in Google Books.

Hathi Trust is a searchable repository of books based out of the University of Michigan. The Hathi Trust is a partnership of over 60 universities in the United States and was intended to provide secure future access to the large-scale book digitization projects. As such, it includes many collections that are in Google Books and the Open Library. While only members of partner universities can download full text open domain books from the Hathi Trust, anyone can register a guest account and create searchable collections. The Hathi Trust has partnered with OCLC to create a Worldcat-like search interface for their collections.

The Internet Archive hosts one of the largest collections of freely available digital content on the Web and includes digitized print books, audio files, moving images and, by means of the Wayback Machine, cached copies of websites (including a significant number of the now defunct Geocities sites). Considerable Canadian content is available - click the "Canadian Libraries" link in the top menu bar for quick links and information about the contributors to this collection. Of special note: the Canadiana Collection "of publications dating back to the early 17th century that are about Canada, or written and published by Canadians....(The content) begins with pre-1900 non-serial materials which were originally microfilmed ...gathered and produced by the Canadian Institute for Historical Microreproductions (CIHM)" (About). Coverage ends at 1920.

  • " supports all metadata about items in just about any language so long as the characters are UTF8 encoded" (FAQs) so you will find materials in a wide variety of scripts.
  • Most but not all the content is in the public domain and please note that the Internet Archive's terms limit use to "scholarship and research purposes only."
  • The copyright status for most content is found in the description menu and Creative Commons licensed materials are also clearly identified with the CC logo appearing under the file links.

Gallica: Many national libraries have sites devoted to displaying their country's cultural patrimony. Gallica, the digital library of the Bibliothèque nationale de France, is among the best. It makes available more than two million documents from a number of major French libraries. In addition to 400,000 books, Gallica also includes almost a million magazine and newspaper issues as well as over 550,000 images and a variety of other materials including 2400 sound files. The standard of presentation is uniformly high and the interface admirable.

Wikipedia maintains a growing list of other digital library projects.

UBC Library eBooks

Summon is a service that allows you to search most of UBC Library's books, journal articles, primary sources, newspapers, microforms, cds/dvds and other materials from a single search box. To narrow your results down to e-books only, simply type in your keywords/title words/author names etc., click Search and then choose "book/e-book" in the Content Type menu and "items with full-text online" in the Refine Your Search menu.

  • As these books come from a wide array of publishers and distributors the permitted uses can vary.
  • Look for links such as "copyright status," "permitted uses," "terms of use," etc. to determine what you are able to do with the materials you find.
  • If you are unsure you can contact a librarian or send an email to the copyright help list-serv.

Many ebooks available via UBC Library are on ProQuest's ebrary platform. Video tutorialsare available, and include

Downloading and reading ebrary books requires that you first download and install Adobe Digital Editions. Adobe Digital Editions is free.

If you prefer to read text instructions check out this blog entry which brings together all the steps you need to follow - both on eBrary and Adobe's sites - to download and read your ebrary book.

Theses and Dissertations

EThOS is a database of over 300,000 UK theses and dissertations many of which are available without charge. Hosted by the British Library, its intent is to offer 'a single point of access where researchers the world over can access ALL theses produced by UK Higher Education' (About). Registration is required but it is easily accomplished.

For other sources of online theses and dissertations:

  • Although ProQuest Dissertations and Theses database offers a vast array of citations and full-text documents it does not include everything. For example, UBC's recent theses and dissertations are not represented. Furthermore, the database is much stronger on North American material than from abroad.
    • To find UBC theses, including the full-text of older UBC theses and dissertations, start by going to the Library's Guide to Finding Theses & Dissertations
    • International theses may be found in the Institutional Repositories of the granting university, as well as international databases like OAIster and/or the Center for Research Libraries Online Global Resources Network.

Activities Enhanced by Digital Book Collections

Search for terms in books

  • Good for fact-finding
  • Find terms and subjects not evident in title or chapter descriptions
  • Find terms, people (e.g. minor historical figures, film directors, authors and artists) or places that are not the main subjects of works
  • Note that the full-text search will also find authors and places in footnotes and bibliographies

Create your own index by searching

  • Find terms that are not indexed in a book
  • Use the separate search box to search within the text of a single book

Search for references in books

  • Find full bibliographic citations of works including journal abbreviations or missing page numbers
  • Try to negate the effect of bibliographic styles (e.g. APA, MLA) by using search terms such as author and "title in quotations"

Conduct more thorough literature reviews

  • Search for mention of influential articles or works
  • Track mention of the work in books to quickly survey opinions and discussion
  • Useful for crowdsourcing scholarly opinion on an influential article

Conduct cited reference searches

  • Search full text for mentions of references
  • Not as precise as using Google Scholar "Cited By" but can turn up some hidden results

Searching for primary sources and discipline-specific abbreviations

  • Very handy for discovering sources that are not mentioned in bibliographies or the book index
  • Try to think of all the ways that a source can be cited

Create collections of research material


  • Google Books "My Library" feature
  • Hathi Trust "Collections" tab - may be private or public
    • Search within your own curated book collections
    • Keep a record of books you've read and/or referenced
    • Organize them into public or private collections

Activities Specific to Google Books

Finding references to the book under About this Book

  • Links to other books in Google Books and articles in Google Scholar
  • Also contains links to web pages mentioning the books including book reviews and online syllabi

Find popular passages in other books under About this Book

  • Incredibly useful for following discussion on influential passages from scholars
  • Useful also for primary sources including literary and legal passages that tend to be quoted verbatim
  • Still a test feature that does not always work well

Conduct searches for historical frequency of words using the Google N-Gram Viewer

  • Create visualizations of the usage of words in English Language
  • Useful for determining when a word was employed
  • More useful from the 19th Century onwards and less useful after 2008
  • Plot out multiple words on a single chart for comparative analysis
  • Indirectly allows you to explore how a word was being employed and whether you need to look for synonyms

Activities Specific to the Internet Archive

  • Download different formats of books
    • file types potentially available include html, Daisy, Kindle, ePub, PDF, text and DjVu
  • Search within the book when the "Read Online" file format is available
  • Use the power of Google to search the Internet Archive by limiting your search to the domain

Digital Humanities Restrictions

Most of digital library collections are not friendly to being scraped or harvested on large scales. You can contact Google to gain access to open domain datasets for the purpose of research.