The specialized databases have not been updated for the most recent SEER data release, which includes data from the November 2020 data submission. We are still accepting requests for the databases from the November 2018 submission.
The specialized Census Tract-level SES and Rurality Database (2000-2015) has three census tract-level attributes: socioeconomic status (SES) index and two rurality variables.
The NCI's census tract-level socioeconomic status (SES) index is a time-dependent composite score. It is constructed using a factor analysis from seven variables that measure different aspects of the SES of a census tract (Yu et al., 2014). The SES variables are chosen based on Yost et al. (2001). They are: Median household income, Median house value, Median rent, Percent below 150% of poverty line, Education Index (Liu et al., 1998), Percent working class, and Percent unemployed. The same variables are used for constructing the NCI's county-level time-dependent SES index. Definitions of these variables are described in the Time-Dependent County-level Attributes Section.
The SES indices are estimated for 2000-2015 using data from the 2000 U.S. Decennial Census long form survey, and a series of American Community Survey (ACS) 5-year estimates from 2006 to 2016. The indices are then linked to cancer cases at the census tract level by matching the survey year with the cancer diagnosis year. Cancers diagnosed in 2000 are linked with the index estimated using the 2000 census data. Cancers diagnosed in 2008-2014 are linked to indices estimated using ACS 2006-2010, 2007-2011, 2008-2012, 2009-2013, 2010-2014, 2011-2015, and 2012-2016 data, respectively. The indices for 2001 to 2007 consist of linear interpolated scores using the 2000 and 2008 values, and the index for 2015 is based on linear extrapolated scores using the 2013 and 2014 scores.
The census tract-level rurality variables are the US Department of Agriculture (USDA)’s Rural Urban Commuting Area (RUCA) codes with two categories: Urban area commuting focused (codes 1.0, 1.1, 2.0, 2.1, 3.0, 4.1, 5.1, 7.1, 8.1, and 10.1) and Not urban area commuting focused (all other codes), and the Census Bureau’s percent of the population living in non-urban areas with four categories: 100% urban, ≥50% but <100% urban, >0% but <50% urban, and 100% rural tracts. The two-category RUCA measure is most commonly used in health research papers that use RUCA-based measures. The four-category Census-based measure can be collapsed into the two- or three-category versions in several ways and, thus, provides a good deal of flexibility to the researcher. These measures are also compatible with the rurality measures available with the NAACCR Cancer in North America database. Both rurality variables are updated every ten years. Cancers diagnosed from 2000 to 2005 are linked with rurality variables estimated at 2000, and cancers diagnosed in 2006-2015 are linked with rurality variables estimated at 2010.
The geographic definitions of census tracts are updated at each decennial census year. The census data collected after each census year use the new definitions. Thus, the 2000 SES index is based on the 2000 definitions, and the 2008-2014 SES indices are based on the 2010 census definitions. In the SEER incidence database with linked census tract SES and rurality information, SEER cases diagnosed in 2000-2005 are geocoded using the 2000 definitions, and cases diagnosed in 2006-2015 are geocoded using the 2010 definitions. To generate SES scores for 2001-2005 that can be linked to cancers by census tract with the same definitions, the ACS 2006-2010 attributes are first converted from the 2010 definitions to the 2000 definitions to produce 2008 SES scores in the 2000 definitions. Those scores are then used together with the 2000 SES scores as the two end points to generate linear interpolants of 2001-2005 in 2000 definitions. The SES scores for 2006-2007 in 2010 definitions are generated in a similar fashion. Both RUCA and Census-based rurality variables are available for 2000 in the 2000 definitions and for 2010 in the 2010 definitions.
SES Quintile and Tertile
After the SES scores are generated for each year, census tracts are categorized into SES quintiles and tertiles with equal populations in each quintile (or tertile) across the entire SEER catchment area (Overall) or within each registry (Registry-specific). Using the quintile as an example, the first quintile (the group with the lowest SES) is the 20th centile or less, and the fifth quintile (the group with the highest SES) corresponds to the 80th centile or higher. The census tract-based SES quintile and tertile are available in SEER incidence and survival databases. The SES indices and geographic identifiers including state, registry, and county are not available because of confidentiality concerns.
Bridged Single-race Population Denominators for Census Tracts
The SEER census tract SES incidence database supports the calculation of incidence rates by census tract SES quintile (or tertile), race/ethnicity, single year of diagnosis from 2000-2015, 5-year age grouping (i.e. 0-4 years, 5-9, 10-14, ……, 80-84, and 85 and older) and gender. The race/ethnicity categories are Non-Hispanic (NH) White, NH-Black, NH-American Indian and Alaska Native (AIAN), NH-Asian Pacific Islander (API), and Hispanic.
The population denominator estimates are constructed using the iterative proportional fitting (IPF) algorithm (Deming and Stephan, 1940), which iteratively allocates multiracial populations to one of the four single race categories (i.e. White, Black, AIAN, and API) at the census tract level. These population estimates are available for census tracts in SEER 18 Registries catchment areas (excluding Alaska) from 2000 to 2015. They match to the National Center for Health Statistics (NCHS)’ Vintage 2015 bridged single-race population estimates at the county level when tracts are collapsed to county level and the census tract estimates obtained from the Woods and Poole Economics, Inc. when race/ethnicity is collapsed to total race. Woods and Poole estimates July 1 population by sex, 19 age groups, and census tract based on the NCHS Vintage estimates.
The estimation approach assumes that a multiracial individual select a single race as his/her main race with a probability proportional to the size of that single race in the population. Uncertainties about these estimates are not reflected. Cautions should be exercised in using these estimates, especially when the sample is small.
How to Access the Census Tract-level SES and Rurality Database
The database is available in the rate, survival, and case listing sessions in SEER*Stat for the November 2017 data submission.
In order to access a specialized database, you must already have access to SEER Research Plus data with a valid institutional account.
- If you do not have access to SEER Research Plus data, first follow the steps for institutional account holders to Access the SEER data.
- If you already have access, send an email to firstname.lastname@example.org to request access to the Census Tract-Level SES and Rurality Database.
- Include your SEER*Stat username.
- Add a brief description of your project and research goals in the email, including the types of analyses or statistics you will use.
Moss JL, Stinchcomb DG, Yu M. Providing higher resolution indicators of rurality in the Surveillance, Epidemiology, and End Results (SEER) database: Implications for patient privacy and research. Cancer Epidemiol Biomarkers Prev. 2019 Jun 14.[Epub ahead of print] [Abstract]
Yu M, Tatalovich Z, Gibson JT, Cronin KA. Using a composite index of socioeconomic status to investigate health disparities while protecting the confidentiality of cancer registry data. Cancer Causes Control. 2014 Jan;25(1):81-92. [Abstract]
Yost K, Perkins C, Cohen R, Morris C, Wright W. Socioeconomic status and breast cancer incidence in California for different race/ethnic groups. Cancer Causes Control. 2001 Oct;12(8):703-11. [Abstract]
Liu L, Deapen D, Bernstein L. Socioeconomic status and cancers of the female breast and reproductive organs: a comparison across racial/ethnic populations in Los Angeles County, California (United States). Cancer Causes Control. 1998 Aug;9(4):369-80. [Abstract]
Deming W, Stephan F. On a Least Squares Adjustment of a Sampled Frequency Table When the Expected Marginal Totals are Known. Ann Math Statist. 1940;11(4):427-444.