Library:Circle/Performance Indicators and Assessment

Print me
Print this page

Summary

This is a collection of statistics and measures to be used in the assessment of cIRcle, and possibly for any reports on cIRcle that are generated. Some general guidelines for creating performance indicators are also given, as well as potential indicators that may or may not prove to be useful. This page will need to be updated as we understand more about how cIRcle is used, and as the technologies for measurement evolve.

Introduction

There are two systems currently used to capture statistics on cIRcle. The first is the statistics module that works within DSpace, and which can be accessed by logging in to cIRcle. This module captures information about usage of cIRcle at the various hierarchical levels: repository, community, sub-community, collection, item, and bitstream. Some features of the reporting system are still being improved and there will be periodic changes and upgrades made.

The second is through Google Analytics, which tracks all kinds of useful information on visitors to the cIRcle domain. Analytics has certain shortcomings, however; it cannot be used to count or track bitstream downloads (that is, file downloads) from cIRcle. Also, it does not 'understand' the DSpace hierarchy of communities, sub-communities and collections. This makes tracking usage patterns at different levels of the hierarchy very hard to do, although it is not impossible. For example, tracking internal cIRcle statistics such as the top 10 item page views through Google Analytics will not be practical, because Analytics cannot tell the difference between an item page and a community, or collection, or indeed any other cIRcle page.

In general, when creating reports to assess cIRcle, it is a good idea to use histograms, or distributions of given measures over time where possible, instead of totals or other aggregates. For example, instead of simply reporting the total number of file downloads for the year, it is better to report the number of file downloads (daily or weekly) against time, over the course of the year. In this way information is not lost, yet a certain level of concision is also maintained.

Content measures

These are typically useful to report to library administration, because they measure the size and growth of the repository.

Total number of items in cIRcle, plotted against time: this gives a quick idea of the growth of cIRcle. However, note that with the completion of the project to retroactively scan all UBC theses and dissertations, the growth may appear to be slowing. One way to do this is via the “Content Analysis” module in the Reporting Suite. Choose the Primary Dataset to be Time and the Secondary Dataset to be Communities & Collections, where we have to manually select all boxes. If we simply want the total number of items in cIRcle at the moment, an easy way to do this is as follows: on the cIRcle home page, browse by issue date, and then at the top of the list of results, look for the phrase "Now showing items 1-20 of #####". The number at the end is what we want. The total number of items contained in cIRcle is commonly reported by the Library Assessment Office.
Total number of text files, audio files, video files, and so on, plotted against time. This could give an idea of how cIRcle content is diversifying.
Total number of theses and dissertations in cIRcle. This data is useful for the UBC Copyright Office. We could integrate this count with the usage of theses and dissertations as well. For example, we could compare the item page views or downloads of theses and dissertations from cIRcle against the downloads or page views from the ProQuest Dissertations and Theses database. Of course, this database has only a fraction (perhaps only about 10%) of the total number of theses and dissertations created at UBC, so this comparison may have to be adjusted accordingly.

Usage measures

These measures give a sense of one of the key mandates of an institutional repository, which is how it is being accessed and used for scholarly research. They can also give us an idea of the types of visitors to cIRcle, what sorts of things they are looking for, and how they are referred to the site.

Access

The total number of file downloads, over any given time period. In the Usage analysis suite, select the Repository tab and check only the Bitstream Downloads option. The required date range can be set as shown here. The number of downloads through cIRcle is usually required by the Library Assessment Office.
The average number of downloads per item, that is, (total number of downloads from cIRcle) $\div$ (total number of items in cIRcle), over any given time period. Find the total number of items in cIRcle as described in the previous section above, and then find the total number of downloads through the Usage analysis module in the cIRcle Reporting Suite. This can also be done at the community, sub-community, or collection level. In each case, these averages can be plotted over time. Click here for more details.
For a given collection, average number of downloads per view. That is, (total number of downloads from collection) $\div$ (total number of collection page views), also measured over any given time period. This can also be done at the community and sub-community level, or at the item level. To calculate the total number of downloads, as well as the total number of page views, two separate reports will have to be run, as described here. The ratio (total number of downloads) $\div$ (total number of page views) can perhaps be used as a way to check for crawlers, spam-bots, and other internet-based robots. That is, if this ratio exceeds a certain value, which could be set experimentally, then we could say with some certainty that the collection in question has been visited by spam-bots for the time period when the ratio was high.
A ratio that tries to compensate for such crawler and spam-bot file downloads is the following: (number of downloads) $\div$ (number of page views + number of downloads). This ratio cannot be greater than 1, if it can be calculated at all; that is, if the denominator is not zero. Collections or communities that have values close to 1 would be considered successful. This is because the ratio counts the proportion of people who download the file when they encounter it, whether through cIRcle or in the results returned by search engines.
Number of bitstreams (that is, files) that have been downloaded at least once, plotted against time. At the moment, this cannot be measured using either the DSpace Reporting Suite or Google Analytics. However, this is a good indicator of the breadth of usage of the repository, an indicator of long tail activity.
Number of item pages in cIRcle that have been visited exactly once, or exactly twice, and so on, plotted against time. That is, in a table or spreadsheet with two columns that tally the number of visits and the number of pages, we record how many pages got exactly 0 visits, how many got exactly 1 visit, exactly 2, and so on. This should be measured over user-defined time spans. Again, at the moment this cannot be measured using either the DSpace Reporting Suite or Google Analytics in its current form. Still this is useful information to have, as an indicator of the breadth of use of cIRcle. We can track how many pages on the cIRcle site have been visited 1 time (or 2, or any nonzero number of times), using this custom report on Google Analytics. However, this cannot be restricted to item pages only.

Geographical information

These measures give an indication of the spread of visitors to cIRcle across the world. We can also filter (or segment, in the language of Google Analytics) these measurements to only count certain kinds of visitors.

Number of visits to cIRcle from the different countries and regions in the world. This count can be compared to the population of that country or region, or to the number of people with internet access in that country or region (although this is more complicated and involves getting outside statistics, for example from here). The number of visits from a given country can also be reported as a percentage, for example (number of visits from Canada) $\div$ (total number of visits globally). Each of these should be tracked over time.
Page views and visits from different universities and colleges across the world. This can be tracked over time. As it is hard to assess what the raw number of visits per month from each university means, perhaps this could be reported as a percentage of all visits from universities. For an example of this, login to Google Analytics and then click here.
Total number of page views and visits from across the province of British Columbia, segmented into regions and cities. The current UBC Library Strategic Plan 2010-2015 wants the library to "encourage lifelong learning among the people of British Columbia and beyond", and so tracking usage by people in British Columbia is one small step in this direction. We can also set it up to filter out visits from the ubc.ca domain, if necessary.

Keyword searches

It can be very useful to look at the types of keywords used in searches by visitors who end up coming to cIRcle, as well as those who knew they wanted to look within cIRcle in the first place. Looking at the specific kinds of keywords used, as well as their relative frequency, we can see if the terms match our content or the structure of the cIRcle site. Organic search refers to the use of search engines by visitors; you can read more about this here.

To track the search words and phrases used by visitors who are then referred to cIRcle content, as well as to see what search engine they used (Google, Yahoo, Bing, and so on), we can use Google Analytics, and specifically the Standard Report named Organic Search Traffic. Note that you have to be logged in to GA to see this page.
What search words and phrases do visitors use in cIRcle's internal search or advanced search system? Are these visitors all from Vancouver? This can also be tracked using Google Analytics, through one of the Custom Reports built for this purpose. Click here to see the relevant report (make sure you are logged in to GA first). Once the dataset is downloaded in .csv format, some work is still needed to clean up the text to make the search words and phrases readable.
How many times does a keyword or phrase have to be used in organic searches to make it into the top 10 searches that bring visitors to cIRcle? Sometimes even just 12 visits are enough, which means that the long tail of our search keyword distribution is very long. Here it is reasonable to ignore searches for “cIRcle ubc”, or “ubc cIRcle”, or any other search word combination that shows the visitor already planned to get to cIRcle, and simply used a search engine as a navigational tool. Then, we can calculate the number of searches per keyword, that is: (number of organic search visits) $\div$ (number of keywords). This ratio gives a rough measure of the breadth of content in cIRcle. The following custom report in Google Analytics can be used to calculate this value: Organic Search Traffic (not navigational). The number of visits appears just under the graph, and the number of keywords is simply the number of rows, which is shown at the bottom of the page.

Measures of search-engine optimization

This article is still being drafted. This means that the article is still being worked on and information may be incomplete. This template will be removed when the article is finished. If you have any concerns, please start a discussion on the talk page.

These indicators are principally accessed through Google Analytics, which means that we will be able to see how cIRcle content is optimized for Google only.

Conclusion

All measurement of metrics and statistics collected for cIRcle are meaningless unless we are able to compare them to similar repositories. One such comparison is done routinely by the Ranking Web of Repositories, although they use only such data as can be publicly collected. So an important question for further consideration should be how to benchmark the growth or performance of cIRcle against other university repositories. Since we don't have access to such information at the moment, we should begin by comparing cIRcle performance to previous years, or perhaps to the same semester or month in years past. As time goes on, cIRcle assessment metrics can be adapted as necessary, or as new universal standards emerge.