Library:Research Data/Protect

From UBC Wiki

Protecting your valuable research data from physical damage is just as important as garnering against possible tampering, loss, or theft. Reproducing data can be prohibitive and even, in some cases, impossible. Should confidential information be lost or stolen, you may be in breach of ethics as well. In some cases, data can be made more secure through the process of anonymization. Finally, determining who has a claim to ownership of the data is important to do at the planning stage in order to avoid future misunderstandings.


When gathering data it is very important to make sure that you have followed ethics protocols, especially in the case of data gathering involving human subjects. You may have to complete the TCPS 2: CORE course on research ethics in order to learn how to conduct such research responsibly and ethically. Any study that uses human subjects or live animals will be subject to ethical review, as well as studies that keep identifiers of or references specific individuals. This type of review is only unnecessary in the case of pure observational studies that satisfy the following criteria:

  • Observe human action in a forum open to the general public
  • Are non-invasive
  • Require no interaction with participants
  • Do not identify participants

Privacy and Confidentiality

The privacy of individuals can become an ethical concern when it conflict with the pressure to make data openly available as part of the record of research. It is important to note that privacy and confidentiality are not the same thing. Privacy relates to the individual or subject, whereas confidentiality relates to the actions of the researcher. In general the right to privacy refers to the state of being free from intrusion or disturbance in one’s private life or affairs. Confidentiality has to do with the agreement that is struck between the researcher and the participant about how their identifiable private information will be handled, managed, and disseminated.

  • Avoid collecting personally identifying information along with the data if possible.
  • If identifying information cannot be avoided, de-identify your data as soon as possible.
  • Do not transmit unencrypted data electronically.

Breaking a confidentiality agreement is a major problem that can result in costly punitive measures. Even in the absence of high-level ramifications it compromises the relationship of trust between the researcher and participant. If you are planning to share personal data that you have collected, you must obtain informed consent from your participants, otherwise if must not be disclosed.

Anonymizing Your Data

In some cases data can be stripped of their identifying information in order to be shared publicly or with other researchers. Anonymized data does not require consent to share, although it is considered ethical to inform your subjects about what will become of the data. If the data contain confidential information, like personally identifiable information about criminal activity, then it would be safer to anonymize the data as long as you do not require the personal information. Moreover, having a plan for how sensitive data will be stored and anonymized (if it is possible to effectively anonymize them) might actually help you get access to the data you are seeking in the first place. If someone can see you have a plan to make sure the privacy of individuals is well-cared for, a third party might be much more willing to hand over sensitive information for research purposes. It is important to realize that identifying information can extend beyond the recording of people’s names. If the details of people’s lives are specified (like their hometown or number of children), it can still be possible in some cases to identify who they are. Data may be anonymized by:

  • Removing direct identifiers, e.g. name and address, postcode
  • Aggregating or reducing the precision of information or a variable, e.g. replacing date of birth by age groups
  • Generalising the meaning of a detailed text, e.g. replacing a doctor’s detailed area of medical expertise with an area of medical speciality
  • Using pseudonyms
  • Restricting the upper or lower ranges of a variable to hide outliers, e.g. top-coding salaries

A person’s identity can be disclosed from:

  • Direct identifiers, e.g. name, address, postcode information or telephone number
  • Indirect identifiers that, when linked with other publicly available information sources, could identify someone, e.g. information on workplace, occupation or exceptional values of characteristics like salary or age

Intellectual Property Rights

Who will own the rights to the research data you create can be a thorny issue and it is important to clarify this with any other party that may have a stake in your research in order to avoid complications later on. Consultations may include the university and any other funding body from which you are receiving support. This issue can be even more important to establish before the research is undertaken when the data is expected to lead to a patent or invention. In UBC, please see Policy #85 - Scholarly Integrity to provide a guidance on this issue.