An official website of the United States government

Considerations for Survival Cohort Definitions


When calculating limited-duration prevalence, SEER*Stat uses survival tables to adjust for lost cases. It is important to consider all variables in the data that could affect the survival of the lost cases. It is generally recommended to consider variables such as age at diagnosis, year of diagnosis, sex, race and cancer site when defining survival cohorts. In addition, it is also strongly recommended to consider variables that are used as display variables or used directly in the calculations of the requested statistics.

Many databases contain multiple variables that are related to one another; therefore, SEER*Stat may issue a warning message even if the variable is actually represented in the survival cohort definition. For example, if you are calculating prevalence percentages by race, you must use the race variable that is also in the population data, e.g., "Race recode A" (All races, white, black, other, unknown). However, you may want to perform survival calculations by a more detailed race variable such as "Race/ethnicity", which includes more specific racial groups. In this case, you should use "Race recode A" as a table variable and "Race/ethnicity" in the survival cohort definition. SEER*Stat will not recognize that the variable used as the table variable (Race recode A) is for the same type of data as the variable used to define the survival cohort definition (Race/ethnicity), and will issue a warning message. In this situation, you can ignore the warning.

The three situations that cause SEER*Stat to issue the warning are:

  • When calculating age-adjusted percents, SEER*Stat will warn you if the age at diagnosis variable is not included in the survival cohort definition (age at prevalence cannot be used as a survival cohort variable). Only the age at diagnosis variable that is linked to the population and standard population data will be recognized as the appropriate variable. Other age at diagnosis variables would be appropriate for defining the survival cohorts, but would not be recognized, and would cause SEER*Stat to issue the warning.
  • When using the Display by Time Prior to Prevalence Date option on the Statistic tab, SEER*Stat will warn you if year of diagnosis is not included in the survival cohort definition.
  • SEER*Stat will issue the warning when any variable included on the Table tab is not included in the survival cohort definition. If using age at prevalence on the Table tab, account for it by using the age at diagnosis variable in the survival cohort definition.

SEER*Stat issues warnings in these situations because failure to consider a display or calculation variable in the definition of survival cohorts may cause large inaccuracies. Other situations may cause inaccuracies as well, but generally have a lesser impact.

Consider cancer site as an example. When the Surveillance Research Program calculates prevalence, survival cohorts are always defined by individual site recode values. When only displaying all cancer sites combined, including site in the cohort definition has only a minor effect on the prevalence results. This is because some lost cases would use too low a survival (e.g. breast cancer, which has a higher observed survival than all cancer sites combined), while others would use too high a survival (e.g. liver cancer, which has a lower observed survival than all cancer sites combined). In general, these inaccuracies cancel one another out, unless the likelihood of being lost to follow-up is related to the variable. However, if you display the prevalence statistics by cancer site, the effect of not using site-specific survival calculations could be much more significant. For example, breast cancer estimates would be lower than they should be, since the survival used was worse than breast cancer survival, while liver cancer estimates would be higher than they should be.