SEER Race Recoding Update with the November 2024 Data
Starting with the November 2024 data submission, a new enhanced Race recode featuring detailed Asian and Native Hawaiian other Pacific Islander (NHoPI) replaced the previously released Race/ethnicity variable. It serves as the foundational source for all released race recodes and race and origin recodes. Two significant changes are included in the creation of this new Race recode. The first is improved data quality by implementing a series of data edit and augmentation procedures. The second is fewer detailed categories included in the new variable, with some Asian or Pacific Islander race categories collapsed to prevent disclosure.
The following steps are used to create this Race recode:
The first step is to assign a primary race for the patient. As in the past, this is done with two sub-steps:
- Use Race 1 (NAACCR Item #160) and Race 2 (NAACCR Item #161) to create a field named Race/ethnicity. To do this, SEER race priority order is enforced for the two fields and the higher priority race is assigned.
- After Step 1, if Race/ethnicity is white (1), other (98), or unknown (99), and there is a positive Indian Health Service (IHS) Linkage, Race/ethnicity is set to American Indian/Alaska Native (3).
The second step is to enhance the precision of Race/ethnicity by reassigning Asian or NHoPI patients who were initially categorized under unspecified race category, i.e., Not Otherwise Specified (NOS), to a more specific race category. Specifically, information from Birthplace-country (NAACCR Item #254), and Birthplace-state (NAACCR Item #252) are used to apply Race NAACCR Asian Pacific Islander Identification Algorithm (NAPIIA) (NAACCR Item #193) (PDF). If Race/ethnicity is Other Asian (95) or Pacific Islander, NOS (96) and NAPIIA race is coded as a specific Asian or NHoPI race respectively, Race/ethnicity is assigned the specific Asian or NHoPI race. In addition, two places of birth are used to assign a specific NHoPI, which are being considered for future updates to the NAIIPA algorithm.
The third step is to combine some of the less populous Asian races with Other Asian, and some of the less populous NHoPI races with Pacific Islander, NOS due to patient confidentiality concerns.
This new field, named Race recode (with detailed Asian and Native Hawaiian other PI), has the following race groupings:
- White
- Black
- American Indian/Alaska Native
- Asian American and Native Hawaiian other Pacific Islander
- Asian American
- Chinese
- Japanese
- Filipino
- Korean
- Asian Indian, Pakistani
- Vietnamese
- Laotian
- Kampuchean
- Other Asian American
- Native Hawaiian other Pacific Islander
- Hawaiian
- Guamanian/Chamorro
- Samoan
- Other Pacific Islander
- Other (assumed Asian and NHoPI, NOS)
- Asian American
- Unknown Race
Race/Ethnicity Groupings in SEER Reporting
In previous years, statistics for race included Hispanics, except for the category of Non-Hispanic White. Starting with the November 2021 data submission, released in April 2022, race and ethnicity are reported in five mutually exclusive categories:
- Non-Hispanic White
- Non-Hispanic Black
- Non-Hispanic Asian/Pacific Islander (API)
- Non-Hispanic American Indian/Alaska Native (AI/AN)
- Hispanic
This change in reporting resulted in a small change for most groups but a larger increase in rates for AI/AN. The larger increase in rates for Non-Hispanic AI/AN reflects the removal of Hispanic AI/AN, a group which had very low incidence rates. These low rates for Hispanic AI/AN may be influenced by several factors, including how missing race is assigned by the Census and misclassification of race in the cancer data, resulting in some level of uncertainty in Hispanic by race population estimates. Reporting incidence in the five mutually exclusive categories is consistent with mortality reporting from National Center of Health Statistics and presents a clearer picture of risk in the AI/AN population. SEER does not recommend producing rates for Hispanic AI/AN or Hispanic API.
Spanish-Hispanic-Latino Ethnicity
Incidence data for Hispanics are based on NAACCR
Hispanic Identification Algorithm (NHIA) (PDF). SEER no longer excludes cases from the Alaska Native Tumor Registry when producing Hispanic and Non-Hispanic incidence rates.
For state exclusions that SEER uses when producing Hispanic (and Non-Hispanic) mortality rates, see Policy for Calculating Hispanic Mortality.
American Indian/Alaskan Native Statistics
When producing statistics using SEER incidence data for American Indians/Alaska Natives, SEER frequently only includes cases that are in a Purchased/Referred Care Delivery Area (PRCDA). We also recommend limiting to Non-Hispanic.
In SEER incidence and NCHS mortality databases, the PRCDA 2020 variable is used starting with data through 2020. Refer to the information on Purchased/Referred Care Delivery Area (PRCDA) for variables used in previous submissions of data.
Race/Ethnicity Variable Definitions in SEER Data
Race and origin (recommended by SEER)
SEER includes a system-supplied merged variable, "Race and origin (recommended by SEER)". It includes the five mutually exclusive race and ethnicity categories SEER uses for reporting cancer statistics.
Algorithms for Creating Variable Definitions
The following describes the algorithms for creating the race and origin recode variables in the SEER incidence and U.S. mortality data.
Race Recode
- We recoded detailed race information into four major categories in order to make them compatible with available annual population estimates used as denominators for the rates: White, Black, American Indian/Alaska Native, and Asian Pacific Islander.
- For some years, both the SEER incidence and NCHS mortality data have had a code available for “all other races”, when in fact every race was already represented, and therefore the “all other races” code was not needed. These cases are now coded as "unknown" race.
- The race recodes in the SEER incidence data are created from the Race1 and Indian Health Service (IHS) Link variables. If Race1 is white, unknown, or other and the IHS Link is positive, then race/ethnicity is set to American Indian/Alaskan Native, otherwise race/ethnicity is set to the Race1 value.
Origin Recode
Incidence data for Hispanics are based on NAACCR Hispanic Identification Algorithm (NHIA) (PDF) and are recoded into two main categories for the Origin Recode NHIA variable: Non-Spanish-Hispanic-Latino and Spanish-Hispanic-Latino.
Race and Origin Recode
From the two fields above, SEER provides the Race and Origin Recode variable with the following values:
- Non-Hispanic White
- Non-Hispanic Black
- Non-Hispanic American Indian/Alaska Native
- Non-Hispanic Asian or Pacific Islander
- Hispanic (All races)
- Non-Hispanic Unknown Race