Center for Interdisciplinary Research
in Environmental Exposures and Health
Main / gam code and synthetic data for mapping

The following article discusses methods and analyzes a synthetic data set:

Webster T, Vieira V; Weinberg J; Aschengrau A. Method for mapping population-based case-control studies using Generalized Additive Models. International Journal of Health Geographics 2006, 5:26 (9 June 2006). The full text is freely available here.

The synthetic data used in the paper and code for analyzing it are available. This work is available for use under the General Public License.

  • excel file of synthetic data

  • tab-delimited text file of synthetic data

  • text file of S-Plus code for reproducing our analysis of the synthetic data

  • text file of R code for reproducing our analysis of the synthetic data (revised April 2009)
R code (Note that R is freely available at The R Project for Statistical Computing)

  • detailed instructions for converting GAM output to a map using ArcMap

  • Point map of the synthetic data, mapped using ArcView (Figure 1 from the paper).

Locations of cases (red) and controls (blue) are shown stratified by a dichotomous variable (age). Disease odds are constant within strata, but four times higher in the old. Young are uniformly distributed; old are clustered in the northeast quadrant.

  • crude and adjusted maps of the synthetic data: output from S-Plus mapped using ArcView (Figure 5 from the paper)

The crude map of the synthetic data is elevated in the northeast quadrant due to spatial confounding, i.e., spatial clustering of the risk factor age. (To cause confounding, a variable must be associated with both outcome and exposure. In spatial confounding, location acts as the "exposure").

Adjustment for age produced a quite flat map, an expected result since we constructed the data assuming uniform disease odds within each stratum.

Return to Spatial Epidemiology

Edit - History - Print - Recent Changes - Search
Page last modified on April 07, 2009, at 09:10 PM