Library:Open Data Resources
NOTE: Although the content of this page is primarily directed towards UBC Library staff, because of the open nature of the contents, it was decided to host it here rather than on the staff intranet.
Open data is data which is made freely available to everyone to use and republish, without restrictions from copyrights, patents, or other mechanisms of control. Open data is also machine readable, making it easy to reuse and publish to the web.
- 1 Background Information
- 2 Government Open Data Sources
- 3 Other Open Data Sources
- 4 Open Data Tools
- 5 References
- 6 External Links
What is open government data?
In 2007, an Open Government Working Group developed 8 Principles of Open Government Data. Government data may be considered public if it meets the following 8 Principles:
- Data must Be Complete: all public data is made available. Public data is data that is not subject to valid privacy, security or privilege limitations.
- Data must Be Primary: Data is as collected at the source, with the highest possible level of granularity, not in aggregate or modified forms.
- Data must Be Timely: Data is made available as quickly as necessary to preserve the value of the data.
- Data Must Be Accessible: Data is available to the widest range of users for the widest range of purposes.
- Data Must Be Machine processable: Data is reasonably structured to allow automated processing.
- Access Must Be Non:Discriminatory: Data is available to anyone, with no requirement of registration.
- Data Formats Must be Non:Proprietary: Data is available in a format over which no entity has exclusive control.
- Data Must Be License:Free: Data is not subject to any copyright, patent, trademark or trade secret regulation. Reasonable privacy, security and privilege restrictions may be allowed.
David Eaves, a Vancouver-based public-policy consultant and open-government activist, distills Open Data down further with his Three Laws of Open Government Data:
- If it can’t be spidered or indexed, it doesn’t exist.
- If it isn’t available in open and machine readable format, it can’t engage.
- If a legal framework doesn’t allow it to be repurposed, it doesn’t empower.
Data-driven journalism is a journalistic process based on analyzing and filtering large data sets for the purpose of creating a news story. Data-driven journalism often deals with open data that is freely available online.
Select Data Journalism Sources
The Guardian Data Blog - uses simple tools to tell stories with data, usually including the associated data in Google Spreadsheets alongside the story.
Government Open Data Sources
Other Open Data Sources
Free GIS Datasets
Free Data Catalog
Open Access Directory Data repositories
Quora - Where can I get large datasets open to the public
The Data Hub
The Guardian World Government Data
Open Data Tools
For finding and obtaining data
ScraperWiki - A collection of web scrapers used to extract public data from otherwise cumbersome websites.
Cleaning and formatting data
Xpdf - an open-source viewer for Portable Document Format (PDF) files. Particularly helpful when trying to extract data tables from PDF files. The Xpdf project includes a PDF text extractor, PDF-to-PostScript converter, and various other utilities. To convert a pdf file to a text file, open the command line terminal, and type pdftotext -layout /the_file_path_of_the_pdf_document. Then press return to create a new text file.
Google Refine - a tool for cleaning up large, messy data sets.
For visualizing and presenting data
Timeline - A dynamic display that works with Google spreadsheets to show a series of events in a vertically time-sorted structure.