Library:Research Data/Protect/Anonymizing Your Data

In some cases data can be stripped of their identifying information in order to be shared publicly or with other researchers. Anonymized data does not require consent to share, although it is considered ethical to inform your subjects about what will become of the data. If the data contain confidential information, like personally identifiable information about criminal activity, then it would be safer to anonymize the data as long as you do not require the personal information. Moreover, having a plan for how sensitive data will be stored and anonymized (if it is possible to effectively anonymize them) might actually help you get access to the data you are seeking in the first place. If someone can see you have a plan to make sure the privacy of individuals is well-cared for, a third party might be much more willing to hand over sensitive information for research purposes. It is important to realize that identifying information can extend beyond the recording of people’s names. If the details of people’s lives are specified (like their hometown or number of children), it can still be possible in some cases to identify who they are. Data may be anonymized by:

Removing direct identifiers, e.g. name and address, postcode
Aggregating or reducing the precision of information or a variable, e.g. replacing date of birth by age groups
Generalising the meaning of a detailed text, e.g. replacing a doctor’s detailed area of medical expertise with an area of medical speciality
Using pseudonyms
Restricting the upper or lower ranges of a variable to hide outliers, e.g. top-coding salaries

A person’s identity can be disclosed from:

Direct identifiers, e.g. name, address, postcode information or telephone number
Indirect identifiers that, when linked with other publicly available information sources, could identify someone, e.g. information on workplace, occupation or exceptional values of characteristics like salary or age