Enhancements to GIS Tools and Methods for
Mapping American Community Survey Estimates and Data Quality Information
As stated in the background and objectives section, we would like to
To achieve these objectives, we have developed an extension for ArcGIS to provide information and tools to use ACS data.
- - offer information and tools to use ACS data in GIS.
- provide information and tools focused on the correct use of ACS data in mapping and spatial analysis.
Downloading ACS Data and Boundary Shapefiles
In order to use ACS data in GIS, each record in the ACS table representing attributes of a census unit has to be connected to the boundary representing that unit. Therefore, both the ACS tables and boundary data have to be obtained and then merged. To assist potential users to obtain necessary data, the extension includes step-by-step instructions to download ACS data and associated boundary shapefiles from the Bureau of Census website. ACS data can be downloaded from the new American FactFinder (AFF2) under the U.S. Census Bureau's website. The AFF2 offers many channels to identify and select ACS tables and variables to download. The extension provides instructions for a relatively straight-forwarded path to download ACS tables/variables.
Boundary data of geographical areas at various levels of the census geography are derived from TIGER/Line data, and are available in ESRI shapefile format. Multiple versions of boundary shapefiles are available. In the extension, the instruction points the user to the Census webpage to access multiple versions (years) of TIGER/Line shapefiles. Specific instruction is provided to download the 2009 version of the boundary shapefiles.
Merging ACS Tables with Boundary Files
After ACS tables and associated boundary shapefiles are downloaded from the Census Bureau's website, they have to be joined to form a new set of shapefiles in order to be used in ArcGIS. Typical decennial census data tables can be joined with boundary shapefiles by using a common or shared field of unique identity values in a few steps of data manipulation. But for ACS data, the data structures are more complicated than is the case for the typical decennial census data. Each ACS estimate is accompanied by a margin of error (MOE). Several non-numerical symbols are used to indicate special situations. In addition, some name labels of estimates are very lengthy and cannot be accommodated by ArcGIS field names. Long name labels have to be shortened to be stored in ArcGIS.
In order to accommodate all these issues, the extension includes a tool to join a single or multiple ACS tables to the corresponding shapefiles. Users need to identify the common unique identity fields in the ACS table(s) and the shapefiles. Users are given the option to rename the original field names. The resultant shapefiles will include attributes in all selected ACS tables and in the original shapefiles.
Incorporating Data Quality information in Mapping ACS Data
When mapping ACS estimates or a traditional census variable from a decennial census enumeration, a map offers a platform for a cross-sectional comparison. Estimates are mapped for one time frame with the intent to show if areas are different. Another common objective of mapping a census variable is to compare the variable values over time to determine if they have changed. Some existing cartographic techniques can facilitate such comparisons with the inclusion of uncertainty information, but they fall short of assisting readers to discern if the difference between any two estimates is statistically significant or not. Two types of methods are adopted in this project. The first type of methods is intended to display reliability information together with the estimates using various map layout designs. The second type allows users to select an estimate to be compared with other estimates to determine if different estimates are statistically different or not.
Mapping ACS Estimates with Coefficients of Variation
Two types of statistical errors can be found in the ACS data: non-sampling and sampling errors. Non-sampling error refers to the variability introduced by respondents, interviewers, coders and procedures. Sampling error exists in the ACS because the sampled population may not represent the true population very well, and thus estimates are different from the parameter values in the population. The Census Bureau provides a margin of error (MOE) for each estimate of a variable. In statistics, a typical measure to indicate sampling error is the standard error, which reflects the imprecision in the estimate due to sampling. The MOE provided by the Census Bureau is 1.645 times of the standard error, which indicates a 90% confidence level. We label this MOE as 90% MOE, meaning that there is a 90 percent chance that the true population parameter lies within the range of (estimate + 90% MOE). From the MOE at 90% confidence level, the standard error can be derived from 90%MOE/1.645. With the standard error, MOEs at other levels of confidence, such as 95% or 99%, can be determined. For instance, the MOE of the more traditional 95% confidence level is 1.96 times of the standard error. The z-value of the difference of the two estimates can be used to determined if the two estimates are significantly different at a given confidence level.
While MOE can be used to test the differences between estimates, the size of MOE cannot be used to indicate the quality of an ACS estimate, as the MOE is relative to the scale of the estimate. Estimates with large values have larger MOEs in general, but these larger MOEs do not imply that the estimates are less reliable. A preferable measure to reflect the reliability of an ACS estimate is the coefficient of variation (CV), which is the standard error (SE) divided by the estimate (sometimes, it is multiplied by 100). CV indicates the relative amount of sampling error associated with the estimate and is independent of the scale of the estimate. Because CV = SE/estimate, therefore, CV can be derived directly from MOE, i.e., CV = 90%MOE/estimate. A primary objective of this project is to suggest and evaluate different methods to incorporate the corresponding MOEs or CVs into the mapping of ACS estimates.
Some visualization applications have adopted a two-map approach or the adjacency technique, with one map showing the estimate and another map showing the uncertainty information. The figure below shows such maps of the New Jersey data. The map of the coefficients of variation (CV) clearly indicates the reliability of estimates in the corresponding counties, and one can discern which estimates are more reliable than others. While Hunterdon County has the highest per capita income, its estimate is also relatively unreliable with a rather large CV level. The per capita income levels of Bergen, Middlesex and Ocean counties are in the high-middle range, but they are relatively reliable with the lowest CV values. However, the two-map scheme puts the burden of linking the estimates with the corresponding quality information on the map readers. Readers have to switch the focus back and forth between the two maps in order to build the connections between the estimates and their CV levels. Mapping only the CV values is good enough to show the relative reliability among estimates, but is not sufficient to determine if the differences between estimates of any two counties, for instance Somerset and Middlesex, are statistically different or not.
While the two-map approach puts the estimate and quality information separately on two frames of display, the bivariate legend approach, the preferred method suggested by MacEachren et al. (1998), may treat the estimate and quality level as two variables and combines the two variables onto one display frame. This specific bivariate legend design uses color fills with texture overlay. The figure below demonstrates this method. Conceptually, information reflected in the figure below is identical to the two maps combined in the figure above, but the bivariate legend approach is more efficient as map users no longer need to focus back and forth between the two maps of estimates and CVs. However, some map users may find that the legends are somewhat complicated to interpret. Sharing the same limitation with the two-map approach, mapping CV levels in the bivariate legend cannot help determine statistical differences between estimates.
Identifying Areas with Significant Differences
A basic premise in interpreting a choropleth map is that areal units in different classes (and colors) have values significantly different from each other. But if the values are ACS estimates, this premise in interpreting map can no longer be applicable. Areal units assigned to different classes surely have ACS estimates different numerically. But due to the presence of sampling error in these estimates, their difference numerically can be due to chances. On the other hand, areal units in the same class can be significantly different, especially if the class covers a wide range. The bottom line is that without testing the difference between the two estimates, one cannot conclude if they are different or not.
Although it is possible to test all possible pairs of estimates, it is somewhat impractical to perform such exercise. While most detailed comparisons among estimates focus on a few units, an interactive approach is warranted. Cliburn et al. (2002) suggest an interactive approach to allow users to select areas or regions to highlight the uncertainty characteristics of a subset. In the ACS Mapping Extension, we provide several functions for users to compare estimates. Users can campare
- the estimate of a selected unit with estimates in all other units;
- the estimates of several selected units with etimates in all other units;
- a fixed value with estimates in all unit.
The testing results should indicate if an areal unit has an estimate significantly different from the fixed value or the estimate(s) in the referenced unit(s). Such units will be highlighted with texture overlay onto the choropleth map of the estimates.
In the figure above, Middlesex County in New Jersey was selected as the referenced unit. Its ACS estimate of median household income in 2008 was compared with estimates in all other counties in New Jersey. Counties with estimates different from that in Middlesex were shaded with the cross-hatch pattern. As expected, those counties having highest income levels are significantly different from that of Middlesex, but some counties in the same category as Middlesex have estimates that are significantly different from that of Middlesex.
|For All Inquiries: David W. Wong, Geography & Geoinformation Science, George Mason University, Email: firstname.lastname@example.org
|Funded by the U.S. Census Bureau, Period of Performance: September 4, 2012 ĘC December 12, 2013, Technical Manager: Dr. Nancy Torrier