Enhancements to GIS Tools and Methods for
Mapping American Community Survey Estimates and Data Quality Information
 

Background and Objectives

Background

The population and housing data collected during the decennial censuses have been used and mapped frequently. Some data such as those in Summary File (SF) 1 were tabulated from the census short form, which was received by every household in the U.S., and theoretically these data were derived from the entire population. However, some variables in other SFs, such as those about economic and housing characteristics, were based upon the long form received by about 1 in 6 people in the U.S. Even with such a relatively large sample size, long form data do have some data quality issues. Mapping of decennial census data, regardless of 100% count data or sampled data, have very much ignored the reliability issue.

The American Community Survey (ACS) is a continuous measurement program replacing the census long form. When using the ACS data, one should pay extra attention to data quality, especially the sampling error, because the ACS data are survey data, and are based upon a sample size much smaller than those in previous decennial censuses when the long form was used. While sound analyses using the ACS data should include data quality information, mapping of ACS data should also incorporate data quality information. However, this is not a simple task, and is conceptually and technically challenging. Resultant maps are likely more complex and more difficult to interpret. One of the challenges is to design such maps that capture pertinent information, but are comprehensible by most map readers.

Designed to be a continuous measurement program to replace census long-form, ACS includes questions reasonably comparable to the long-form questions with minor differences. However, ACS includes additional and detailed questions that were not found in the long-form. In general, ACS gathers more in-depth and detailed population and housing data than those gathered from decennial censuses. A major issue or concern of using ACS data is that the sample size of ACS is relatively small as compared those receiving the long-form in the past. The small sample size of about 2.5% of housing addresses in 2005 for each year has at several significant implications from the spatial analytical perspective.

Because of the small sample size, the sampling framework adopted a two-phase, two-stage sampling frame. Such sampling framework does not support the production of ACS data for all levels of the census geography for all years. The 1-year estimates (12 months of collected data), which are less reliable than other ACS data products, are only available for areas with at least 65,000 people, but the data are the most current. The 3-year estimates (36 months of collected data) are available for areas with at least 20,000 people. The 5-year estimates (60 months of collected data), which are the most accurate, but least current, are available for areas as small as census block groups. While the 5-year estimates have the most completed geographical coverage, the use of 1-year and 3-year estimates in geography and spatial sciences would be challenging as the data are not available for the entire country, and are limited to areas of significant size.

The relatively small sample size of ACS raises some issues on the statistical precision of the estimates. The 1-in-40 sample is relatively large as compared to most surveys, but it is small as compared to the 1-in-6 sample receiving the census long-form in 2000. Different from other data gathered by the Census Bureau, the Bureau provides a margin of error for each estimate of a variable to indicate the statistical precision of the estimate due to sampling error. The margin of error (MOE) used is 1.645 times of the standard error, indicating that the MOE has a 90 percent confidence level. However, the MOE is dependent on the scale of estimate. Larger estimates in general have larger MOEs. Therefore, MOE should not be used to reflect the statistical precision of an estimate. Instead, the coefficient of variation (CV) should be used as it is standardized by the estimate. Thus, when using ACS data, even as simple as mapping, CVs of estimates should be taken into account.

The inclusion of MOE for each estimate generates an unusual data structure for ACS data downloaded from Census Bureau. Each variable occupies two columns: the estimates and corresponding MOEs. Such data format creates some complications for ACS data to be used in GIS for mapping and analysis. All columns of MOE have the same label and they are not distinguishable by many GIS software. Thus, mapping the MOE can be problematic. Our research on dissemination will address this impediment of using ACS data in GIS.

Another issue due to small sample size in ACS is about the uncertainty in detecting spatial patterns. A common objective in analyzing spatial data in general and census data specific is to identify spatial patterns. Data are mapping to visualize and explore the presence of spatial patterns. An implicit assumption is that the data are accurate, and therefore, differences between observations are real. Spatial patterns emerge because differences in values arrange in some systematic spatial patterns. However, if observed values are subject to errors, then differences between values may not be real. Thus, the observed spatial pattern may not be true. Therefore, the presence of sampling error in ACS data posts a significant challenge in using ACS data in spatial analysis in general, and pattern detection in specific. In fact, the importance of considering sampling error is highlighted in the guidelines for general researchers and social scientists in using ACS data, but no guidelines have been provided for the use in of ACS in spatial analysis and mapping (Citro and Kalton 2007, pp. 130).

Objectives

This project has the general objective to promote the correct use of ACS data in geography, spatial analysis and GIS. While other social science disciplines have already investigate the use of ACS data in their respective disciplines, guidance for the use of ACS in geography and spatial sciences are scares to non-exist. To achieve this general objectives, several specific objectives have been identified:
- In order to facilitate the use of ACS data in geography and GIS, the project will offer information and tools to use ACS data in GIS.
- Information and tools provided emphasize the correct use of ACS data in mapping and spatial analysis
Today, most geographers use GIS to map spatial data, and in particular census data. Among most GIS users, ArcGIS is the dominant GIS software package. Therefore, to meet the above objectives, an extension for ArcGIS is develop to provide information about the use of ACS, and also tools to map ACS data. These tools are designed to incorporate quality information of ACS estimates.

For All Inquiries: David W. Wong, Geography & Geoinformation Science, George Mason University, Email: dwong2@gmu.edu Tel: 703-993-9260
Funded by the U.S. Census Bureau, Period of Performance: September 4, 2012 ĘC December 12, 2013, Technical Manager: Dr. Nancy Torrier