Incidence Data with Census Tract Attributes Database

Access the Database

Prerequisite:

You must already have access to the latest SEER Research Plus Data.

If you do not have access to SEER Research Plus Data, first request Access to the SEER data.

Review the full Requirements for Requests

Required Information:

Contact Information
SEER*Stat Username
Objectives, study design, analysis plan
Signed Specialized Database Data Use Agreement (PDF)
Signing Official Attestation (PDF)

The specialized Incidence Data with Census Tract Attributes Database has four census tract-level attributes: socioeconomic status (SES) quintile, two rurality variables, and persistent poverty.

SEER also offers combined databases with census tract attributes and additional data items. For information on these databases, refer to the list of Specialized Databases.

Database Details

Two databases will be available to all approved requestors.

SEER 21 (excl AK and IL)

Available in the Case Listing, Frequency, Rate, and Survival sessions in SEER*Stat.
November 2024 data submission with all tumor records for 2006-2022 for the included registries.
Includes the same fields as the SEER Research Plus Limited-Field databases with the following exceptions/notes:
- Includes the specialized database fields.
- There are no geographic identifiers included due to confidentiality concerns.
- Includes survival, cause of death, and follow-up fields required for survival.

SEER 17 (excl AK)

Available in the Case Listing, Frequency, Rate, and Survival sessions in SEER*Stat.
November 2024 data submission with all tumor records for 2006-2022 for the included registries.
Includes the same fields as the SEER Research Plus databases with the following exceptions/notes:
- Includes the specialized database fields.
- There are no geographic identifiers included due to confidentiality concerns.

Variable Definitions

SES Quintile

A census tract-level SES quintile is provided for assessing SES differences in cancer incidence and survival. It is constructed based on a two-step approach using census tract-level American Community Survey (ACS) 5-year estimates. Specifically, the first step of this approach is to estimate composite SES scores (also referred to as SES index) for census tracts using a factor analysis from seven variables that measure different aspects of the SES of a census tract (Yu et al., 2014). The SES variables are chosen based on Yost et al. (2001). They are: Median household income, Median house value, Median rent, Percent below 150% of poverty line, Education Index (Liu et al., 1998), Percent working class, and Percent unemployed. The same variables are used for constructing the NCI's county-level time-dependent SES index. Definitions of these variables are described in the Time-Dependent County-level Attributes Section. In the second step, census tracts are categorized into SES quintiles with equal populations in each quintile across the entire U.S. The first quintile (the group with the lowest SES) is the 20th centile or less, and the fifth quintile (the group with the highest SES) corresponds to the 80th centile or higher.

After the SES quintile is generated using various sets of ACS 5-year estimates, they are linked to tumor cases at the census tract-level by matching the ACS survey year with the tumor diagnosis year. Tumors diagnosed in 2006-2007 are linked with SES quintiles calculated using ACS 2006-2010 data. Tumors diagnosed in 2008-2017 are linked to SES quintiles based on ACS 2006-2010, 2007-2011, 2008-2012, 2009-2013, 2010-2014, 2011-2015, 2012-2016, 2013-2017, 2014-2018, and 2015-2019 data respectively. Finally, tumors diagnosed in 2018+ are linked to the index estimated from 2015-2019 ACS data. All SES quintiles are defined using the Decennial Census 2010 census tract boundaries.

Rurality Variables

Two census tract-level rurality variables are provided to facilitate analyses of urban/rural differences in cancer incidence and survival (Moss et al., 2019). The first is based on the U.S. Department of Agriculture (USDA)'s Rural Urban Commuting Area (RUCA) codes with two categories: Urban area commuting focused (codes 1.0, 1.1, 2.0, 2.1, 3.0, 4.1, 5.1, 7.1, 8.1, and 10.1) and Not urban area commuting focused (all other codes). The second is referred to as the Urban Rural Indicator Code (URIC) is based on the Census Bureau's percent of the population living in non-urban areas with four categories: 100% urban (All urban), >e;=50% but <100% urban (Mostly urban), >0% but <50% urban (Mostly rural), and 100% rural (All rural) tracts. The two-category RUCA measure is most commonly used in health research papers that use RUCA-based measures. The four-category Census-based measure reflects the rural nature of the immediate environment and may be most relevant for studies that focus on behaviors and risk. It can be collapsed into the two- or three-category versions in several ways and, thus, provides a good deal of flexibility to the researcher. These measures are also compatible with the rurality measures available with the North American Association of Central Cancer Registries (NAACCR) Cancer in North America database. For both rurality variables, the same 2010 values defined using the Decennial Census 2010 census tract boundaries are used for all years.

Persistent Poverty

The Persistent poverty variable identifies census tracts as being persistently poor if 20% or more of the population has lived below the poverty level for a period spanning about 30 years based on 1990, 2000 decennial censuses, and 2007-11 and 2015-19 American Community Survey 5-year estimates. It was developed by the National Cancer Institute in collaboration with the U.S. Department of Agriculture, Economic Research Service (ERS). This variable has two levels: census tract classified as persistent poverty or non- persistent poverty. The same variable is used for all years.

Note that census tracts may have inflated poverty rates, thus more likely to be identified as persistent poverty, if postsecondary undergraduate students make up a significant portion of the residence poverty population. It is understood that postsecondary students tend to report low incomes and their poverty is uniquely situational compared to other poverty groups. When considering this group, it should be noted that a portion of the student population may come from a poor family and their poverty is potentially more chronic. Considering this complexity, please use cautions in interpreting the results.

For detailed information about Persistent Poverty, refer to USDA ERS - Rural Poverty & Well-Being.

Bridged Single-race Population Denominators for Census Tracts

The SEER census tract SES incidence database supports the calculation of incidence rates by census tract SES quintile (or tertile), race/ethnicity, single year of diagnosis from 2006-2020, 5-year age grouping (i.e., 0-4 years, 5-9, 10-14, ……, 80-84, and 85 and older), and gender. The race/ethnicity categories are Non-Hispanic (NH) White, NH-Black, NH-American Indian and Alaska Native (AIAN), NH-Asian Pacific Islander (API), and Hispanic.

The population denominator estimates are produced by Woods & Poole Economics, Inc. (W&P) based on a hybrid regression, demographic, and proportional model jointly developed by the NCI, W&P, and the NAACCR with support from NCI through a contract. They match to the Census Bureau's Vintage 2020 bridged single-race population estimates for 2010-2020 and to the intercensal population estimates for 2006-2009 when tracts are collapsed to counties. Uncertainties about these estimates are not reflected. Cautions should be exercised in using these estimates, especially when the sample is small.

Linked to the Specialized Census Tract Attributes Database are population estimates for census tracts in SEER 17 areas excluding Alaska. Estimates for U.S. tracts in SEER and non-SEER areas are also available to anyone who requests them and agrees to certain standard data use conditions. Refer to U.S. Census Tract Population Data to learn more about the methods used and how to submit a request.

Data Limitations and Analytical Considerations

Geographic Clustering

No geographic identifiers such as census tracts, counties, states, or cancer central registries are included in this database due to concerns over disclosure. However, knowing census tracts allows the use of random effects to account for spatially correlated cases within a census tract in a regression analysis. To inform the size of potential clustering effects due to census tracts, internal data were used to characterize the distribution of cancer cases per census tract by ranking census tract quartiles from the smallest to the largest. In general, the effect decreases when the number of cases per census tract is small. For all cancer sites combined, the average number of cases per census tract per year is 9, 17, 25, and 39 among the four 25% of census tracts respectively from the smallest to the largest. This distribution is almost identical across all diagnosis years from 2006 to 2020. When analyzing an individual cancer site, the number of cases in a census tract likely is much smaller.

Census Tract Certainty

Complete geographic address is not always available to central cancer registries for geocoding the census tract of residence. Some geocoded census tracts are based on partial address information and therefore may not be of high quality. For example, five-digit ZIP Codes typically refer to large areas, making them less useful for geocoding accurate census tracts.

To help assess the impact on analyses that include cancer cases associated with either low quality census tracts, or diagnosed within registries that have relatively large shares of cancer cases with low quality census tracts, this specialized database includes two census tract quality variables:

"Census Tract Certainty Recode Specialized" variable with the following categories:
1. High certainty
2. Based on residence ZIP+2
3. Based on residence ZIP only
4. Based on ZIP of post office box
5. Unknown tract
"Registry Groupings based on Census Tract Completeness" variable with the following categories:
1. High (i.e., registry group with ≥97% of high certainty cases)
2. Medium (i.e., registry group with 95% - 96.9% of high certainty cases)
3. Low (i.e., registry group with <95% of high certainty cases)

We strongly recommended users to acknowledge and discuss this quality limitation of census tract attributes in any publications and employ sensitivity analyses to evaluate its impact on end results.

References

Moss JL, Stinchcomb DG, Yu M. Providing higher resolution indicators of rurality in the Surveillance, Epidemiology, and End Results (SEER) database: Implications for patient privacy and research. Cancer Epidemiol Biomarkers Prev. 2019 Sep;28(9):1409-1416. Epub 2019 Jun 14. [PMID: 31201223]

Yu M, Tatalovich Z, Gibson JT, Cronin KA. Using a composite index of socioeconomic status to investigate health disparities while protecting the confidentiality of cancer registry data. Cancer Causes Control. 2014 Jan;25(1):81-92. [PMID: 24178398]

Yost K, Perkins C, Cohen R, Morris C, Wright W. Socioeconomic status and breast cancer incidence in California for different race/ethnic groups. Cancer Causes Control. 2001 Oct;12(8):703-11. [PMID: 11562110]

Liu L, Deapen D, Bernstein L. Socioeconomic status and cancers of the female breast and reproductive organs: a comparison across racial/ethnic populations in Los Angeles County, California (United States). Cancer Causes Control. 1998 Aug;9(4):369-80. [PMID: 9794168]