Course:Cons452/DHS

From UBC Wiki

Demographic and Health Survey (DHS) data

Description

The Demographic and Health Surveys (DHS) are representative datasets on population, health, nutrition, and HIV gathered through more than 300 surveys in over 90 countries. DHS carries out a range of surveys; however, they generally cover the same core themes: socio-demographics (e.g. age, education level, wealth); health behaviours (e.g. tobacco use, alcohol consumption, cooking fuel usage); health outcomes (e.g. maternal mortality, vitamin A levels, malaria, HIV); and variables associated with women's empowerment and domestic violence. DHS surveys can be divided into 2 types: standard and interim. Standard DHS surveys have large sample sizes (of 5,000 to 30,000 households) and are typically conducted every 5 years to allow for comparisons over time. Interim DHS surveys have smaller sample sizes (though are still nationally representative if weighting is done properly: see below in the "Common Problems" section for more information), are conducted between rounds of DHS surveys, and have shorter questionnaires than the standard surveys do. Standard DHS surveys normally take 18-20 months to carry out. DHS 1 was the first survey and it was carried out in 1985. Since then there has been a round of DHS standard data collection every 5 or so years and DHS 7 is currently being implemented. Not every country will have data for every DHS survey round, so be sure to check the years available for the country that you are interested in. Geo-location of some survey data began in 1986, which allows for spatial analysis when available. DHS also has a handy “country quick-stats” feature where you can get some key statistics about a country through the survey years without having to download anything. They also have compiled country reports.

Data are organized into the following types, called "recodes". The letters in brackets next to the recode are how it appears in file naming, which you will come across when downloading the data so it is good to familiarize yourself with each code:

  • Household Recode (HH)
  • Individual's Recode (IR) - this refers to a woman
  • Children's Recode (KR)
  • Men's Recode (MR)
  • Couples Recode (CR)
  • Household Member Recode (PR)
  • Births Recode (BR)
  • HIV Recode (AR)

DHS Cluster Displacement - Geo-located DHS data (GPS or other) has been displaced to protect the confidentiality of respondents. This means that they have adjusted the latitude and longitude of the locations according to a set of parameters. Urban locations are displaced 0-2 kilometres while rural locations are displaced 0-5 kilometres with 1% (or every 100th point) displaced 0-10 kilometres.[1] Within those parameters, the angle and distance of displacement are chosen randomly. Keep this displacement in mind when interpreting and analyzing DHS data.

Data Weights - The population within countries is not evenly distributed among different regions. Over-sampling in regions with small populations ensures that they have a sample large enough to be representative. Under-sampling in regions with large populations to save costs is also done. Sample weights are mathematical adjustments applied to the data to correct for over-sampling, under-sampling, and different response rates to the survey in different regions. Weights are used so that the total sample distribution "looks like" the country's actual population distribution. Keep this in mind when analyzing your data, as you must use the data with weights to make sure you are keeping the representativity of the sample. See "Common Problems" for more information.

Appendix A in a country's final report is a good resource for detailed information on the sampling design of that particular country's DHS dataset.

Metadata

Metadata Component Description
Theme Demographics and human health
Source DHS Program, by USAID (United States Agency for International Development)
Purpose Datasets that can be used to advance the global understanding of health and population trends in developing nations
Time Frame The first round of data collection, DHS 1, was carried out starting in 1985. Survey rounds were successively carried out until present (currently on DHS 7). Geo-located datasets began in 1986 and GPS located data began in 1996.
File Type Socioeconomic data
File Format .dta (STATA), .dat (ASCII), .sas7bdat (SAS), .sav (SPSS), (some spatial .shp files when available)
Structure One household per row or one individual (woman, man, child...) per row, depending on the recode
Projection & coordinate system N/A
Extent 90 + countries, focus on developing nations
Resolution or scale Cluster (size varies)

Common Problems

  • Cluster Displacement: Cluster displacement designed to protect the confidentiality of respondents (see above under "description" for details) does introduce some limitations and uncertainty into the use of DHS data for spatial analysis. DHS Spatial Analysis Report No. 8 is a great resource that outlines how your analysis may be impacted by displacement. Many examples of different analyses and the implications of displacement on them are provided[2].
  • Weighting the data: keep in mind when analyzing your data, that you must use weights in your statistical analysis to make sure you are keeping the representativity of the sample. Sample weights are mathematical adjustments applied to the data to correct for over-sampling, under-sampling, and different response rates to the survey in different regions. Sample weights for all surveys are available in each dataset's DHS recode file. For a video tutorial on how to weight DHS data, go here Each software has a slightly different way of weighting the data, so the best way is to select the video that corresponds to the file/software that you would like to use (there is a different video to show how to do it in SAS, SPSS, STATA, etc...). Double check that you are using the weight that corresponds to your unit of analysis, as there are different weights for each (i.e. if you are looking at data for children, use the children's weight variable). Simply weighting the data is enough for basic statistics, but not for anything that involves standard errors (SE). If you are looking to do statistical analyses that involved standard errors, there are more things you must consider. [see below]
    • When analyzing men or couples, use the men's weight, as men have higher non-response rates than women.
    • Sample weights are calculated to six decimals, but are presented in the recode file without a decimal. You must divide the weight by 1,000,000 at the beginning to get the actual weight value.
  • If you would like to carry out statistical analyses involving significance testing or confidence intervals (for example: standard error), you must adjust your data not only for weighting but also for clustering and stratification (i.e. you are taking into account the sampling procedure used by DHS). Each software type has its own way of doing this, so please see the video tutorials for detailed steps on how to do it for the file you are using.
  • Do not assume that the data you want is geo-located. Not all DHS survey data is geo-located, even if it was collected recently. Make sure you know the characteristics of the data you are interested in before you rely on it for spatial analysis.
  • You must submit a request to use DHS data, and this can take around 24 hours (or more) to approve. Please ensure to do this ahead of time. Downloading the country quick-stats (though can be very useful in the preliminary stages of your inquiry) is not raw data, and does not replace the real dataset.

Downloading Instructions

  1. Go to: https://dhsprogram.com/data/available-datasets.cfm and select your desired dataset. You may also look at the questionnaire used in data collection of the data you are interested in, which is a good idea to ensure the statistic answers the question of interest to you. These questionnaires are available in the appendices of the country reports which are available for free access even without a login. So you can be sure of the questions they have asked your region before requesting a download of a dataset. These datasets are extensive and have an incredible amount of detail in most surveys. If you decide you would like a dataset, you must register as a first-time user (subsequently can just use your login and password). To access a dataset you have to input your current “project” and describe how you will use the data in a request. Requests to access datasets are usually approved within 24 hours.
  2. Once you have access to the dataset, you can see the files (.zip folders) are first sorted by survey and spatial data. Then, the survey datasets are subdivided into categories based on “recode”. Recodes are: household recode, individual recode, children’s recode, men's recode, household member recode, and HIV test recode.
    • Dataset files are named according to the following convention: [CC][DD][VV][FF][DS].ZIP
    • Code Description:
    • [CC] Country Code (All of the country codes and other codes are listed on this page)
    • [DD] Dataset Type/"Recode" (HR-Household, PR-Household Member, IR-Women, MR-Men, BR-Births, KR-Children under 5, and CR-couples)
    • [VV] Dataset Version (First Character - DHS Phase) (Second Character - Release version)
    • [FF] File Format (eg. FL-Flat, SV-SPSS, DT-Stata, SD-SAS)
    • [DS] Data Structure for SPA (SR: SPA Recode | SP: SPA Raw)
  3. After you have selected your file, make sure you pick the .zip folder with the correct format type for your software/analysis use. Ex: STATA, Flat ASCII, SAS, SPSS. If opening with the SPSS software, for example, it appears as a table with the data in one tab and then the variable codes identified in the second tab. You can convert this file (or another one) to a .csv file if you would like to use the data in excel, or in R Studio. Please see the tool wiki for details on converting these to .csv files. Once looking at your specific data, make sure you understand what variables you are dealing with. Download the DHS recode manual. This is very important as it explains in detail which variable you are looking at.

Restriction on Use

You must register to access the datasets, but they are free to use for research purposes. They require you to register for a few reasons: to ensure confidentiality with sensitive HIV & GIS data (extra digital consent must be done for this data), to contact you as a user of the data in case of an update, and to ensure that the DHS meets host-country agreements. Host countries own the data, and DHS has agreements that allow them to have the country’s data in the DHS repository. The agreement stipulates that this data is to be used by researchers for legitimate purposes, and not commercially, for example.

References

  1. Burgert, Clara R.; Colston, Josh; Roy, Thea; Zachary, Blake (2013). "Geographic displacement procedure and geo-referenced data release policy for the Demographic and Health Surveys". DHS Spatial Analysis Reports No. 7. Calverton, Maryland, USA.
  2. Perez-Heydrich, Carolina; Warren, Joshua L.; Burgert, Clara R.; Emch, Michael E. (2013). "Guidelines on the Use of DHS GPS Data". Spatial Analysis Report No. 8. Calverton, Maryland, USA.