Course:Cons452/IPUMS

From UBC Wiki

IPUMS data

Description

IPUMS is a collection of census and survey data from around the world. IPUMS used to stand for "Integrated Public Use Microdata Series", but in 2016 they changed it to just be a name because not all datasets included are microdata and not all are freely available for public use anymore. The data collection is the result of a collaboration between many organizations and national statistics agencies to create the worlds largest accessible database of census data. They have compiled many datasets into one location, even recently including the DHS. IPUMS data is special in the sense that it has harmonized variable codes and documentation, which means that it is fully consistent across datasets. This allows IPUMS data (which was originally from multiple sources) to be compatible for you to use in your analysis without having to convert the units or scales yourself. When you first arrive on their website you see all of the different sub-sections of IPUMS that contain data. They are:

  • IPUMS USA
  • IPUMS Current Population Survey
  • IPUMS International (100 countries census microdata)
  • IPUMS Global Health (Health survey data for Africa and Asia)
  • IPUMS NHGIS (US census data and GIS boundary files from 1790 to the present)
  • IPUMS Terra (population data integrated with environmental data from 1960 to the present) - great one to check out for social-ecological system research projects
  • IPUMS Time Use (Historical and contemporary time-use data from 1965 to the present)
  • IPUMS Health Surveys (US health survey data from 1963 to the present)
  • IPUMS Higher Ed (US science and engineering workforce survey data from 1992 to the present)

There are an incredible amount of variables available, all of which vary depending on which of the above sections you choose. You are able to browse all of the variables from within each section, so spending some time clicking through the filters is a good way to familiarize yourself with what is available. Some examples of the variables available are: language(s) spoken, ethnicity, education level, the industry a respondent works in, internet access, cellphone availability, trash disposal services, whether respondent has a refrigerator, the type of dwelling respondents live in, means of transportation, and much more.

Be aware that IPUMS does some data altering to protect respondent confidentiality. These changes depend on the country, but in all cases detailed geographic locations and names are suppressed. A top-code (imposed upper-limit) may be placed on certain variables, like income, to avoid identifying individuals. Additionally, IPUMS randomizes the order of households within districts, swaps a fraction of records from one administrative location to another, and aggregates the records of sensitive groups if identification is a concern. For more information on this, please see the DDI Codebook link on your specific extract or the IPUMS website. This has implications for data analysis and interpretation so is a good thing to keep in mind.

Metadata

Metadata Component Description
Theme Census and Survey data
Source Institute for Social Research and Data Innovation at the University of Michigan
Purpose To create the world's largest freely accessible database of census microdata, and harmonized it to be fully consistent across datasets
Time Frame Starting year varies within IPUMS datasets until present (some data available as long ago as the 1700s), but much of it is in the last 50 years
File Type Socioeconomic
File Format .csv
Structure Depending on which extract you create, it is either one person per row or one household per row
Projection and coordinate system N/A
Extent Global (a few countries missing)
Resolution or scale varies

Common Problems

  • Just like the DHS data, some data do need to be adjusted with the proper weights to analyze them properly. Only once you have applied the proper weights to the samples will they be considered representative datasets of a geographic area.
  • Due to confidentiality precautions, geographic information is usually limited, sometimes very limited. Keep in mind the adjustments that have been made to your particular dataset and how this may affect your analysis.

Check out https://usa.ipums.org/usa-action/faq#ques22 for more information on the above limitations and more FAQs related to IPUMS data.

Downloading Instructions

  1. You must register for an account before you can download any data. Once registered, go to https://ipums.org/. There you will see the tiles with all of the different sections of IPUMS datasets (the sections are listed in the first bulleted list above in "Description"). Select the section that you are interested in, and you will be redirected to a new page for that particular section of IPUMS data.
  2. Once on the IPUMS section page, you will be on a home page where there is an option to "Create an extract" with a button below it that says "Browse Data". By clicking on this button you will be redirected to a page where there is a variable selector bar. You can use the filters (or search function) to browse the variables.
  3. There are small plus signs (+) beside the variables as they appear in a list, and by clicking on the (+) you can "add them to your cart". Be careful to not only select the variables of interest, but to select samples too. Don't worry, as you won't be able to create a data extract without doing this. The webpage will prompt you to select samples if you have not done so. To select a sample, click on one of your variables. On the expanded variable page, a button appears at the top: "select samples". From there you can check the boxes of samples that you are interested in, for example: Guatemala in 1986, 2001, and 2010. This means that whatever variables you have selected, they will create a data extract for you of those variables only as sampled within Guatemala within those 3 sampling years (or whichever sample you picked).
  4. After you have selected both the samples and the variables of interest to you, go to your data cart and click "create data extract". Once you have clicked this, your data extract request will be submitted for approval and you will be sent an email to notify you when it is ready for download. Be sure to do this in advance, as requests may take a little while to be approved. (Also be sure to have an account, as you can only make data extract requests once you have a valid account). Once you have been notified, you have 72 hours to download your data. After that time, it may be removed and you will have to re-request it.
  5. Once you receive your email approval, there will be a link in that email that will take you directly to the page where you can download your data extract. The data is downloadable as a zipped .csv file. Extract the .csv from the folder and open it in your application of choice, such as Microsoft Excel. You will also see that there are links to the "Codebook" (Basic or DDI) which is an explanation of what the variables are. Be sure to look at this as some of the variable naming is not intuitive. In the DDI Codebook link, there is information on citing the data, as well as use-restrictions specific to that dataset.

Lastly, clicking on "revise" will allow you to use the data extract you originally made as the base for a new request. This is useful in case you only want to make some small modifications, you do not have to start all over again selecting your variables and samples.

Restrictions on Use

Redistribution: No redistributing the data.

Usage: You agree to use IPUMS-International data for scholarly research and educational purposes only. Commercial use is prohibited.

Confidentiality: You will not use the data to identify individuals.

Data Security: Microdata extracts must always be safely secured.

Citation: Cite the IPUMS-International data appropriately. Also make sure to cite the statistical agencies that originally produced the data.

Violations: Any violation of this license agreement will result in disciplinary action.

The license is valid for one year and may be renewed. Please see the IPUMS website for more details on use restrictions.