SEER*Stat Prevalence Exercise 1: Limited-Duration Prevalence

Cancer prevalence is defined as the number or percent of people alive on a certain date in a population who previously had a diagnosis of the disease. It includes new (incidence) and pre-existing cases, and is a function of both past incidence and survival. SEER*Stat can be used to calculate limited-duration prevalence statistics.

Create a table showing the number of people in the SEER 9 Registries that were diagnosed with a malignant cancer in the 39-year period prior to January 1, 2014. That is, show "January 1, 2014, 39-year limited-duration prevalence" (crude percents and counts) for the SEER 9 Registries. Include only the first malignant primary for each person. Show the results by sex and age at prevalence date.

Key Points

  • Starting with the November 2016 data submission, the default multiple primary selection in SEER*Stat is "all tumors matching the selection criteria".
  • Use the "first malignant primary only option" to include a tumor if it matches the selection criteria and is the first malignant tumor for the individual. The SEER registries collect the number (but not the site or behavior) of cancers that occur prior to the start of the registry, or prior to the person moving to a SEER catchment area. SEER makes the assumption that these cancers are malignant, as is true of the majority of SEER cancers. Thus, if the first SEER-registered tumor is coded as the person's second or later tumor (any others were non-SEER cancers) this person's cancers are excluded using this option.
  • The prevalence estimates calculated in this exercise are SEER prevalence estimates, not U.S. estimates. SEER*Stat does not project U.S. prevalence from SEER data.
  • The maximum recommended Prevalence Duration is 39 years (1975-2013 diagnoses) for a prevalence date of January 1, 2014 using the SEER 9 database. This is recommended due to inconsistent start dates among the registries. Seattle and Atlanta data are from 1974 and 1975 forward, respectively.
  • Starting with the November 2013 data submission, SEER provides several "system-supplied" variables for use as survival cohort variables in prevalence sessions. These are provided as a convenience for users who frequently created user-defined variables in order to remove any overlapping groupings from standard variable definitions (e.g., removing the All Races from Race Recode). The system-supplied variables are provided for Age, Race, Sex, Year of Diagnosis, and Site Recode.
  • Age at prevalence date (calculated) is called a "calculated variable" because it is determined during execution and is not coded in the database. This value is calculated based on the selected prevalence date and either date of birth (if available) or age and date at diagnosis.

Step 1:  Create a New Prevalence Session

  • Start SEER*Stat.
  • From the File menu select New > Limited-Duration Prevalence Session or use the Prevalence icon on the toolbar.

Step 2:  Data Tab

  • On the Data Tab select "Incidence - SEER 9 Regs Research Data, Nov 2016 Sub (1973-2014) <Katrina/Rita Population Adjustment>"

Step 3:  Understanding the Selection Tab in a Prevalence Session

As discussed in other tutorials, the statements on the Selection Tab define the subset used in your analysis. However, you will notice that in a limited-duration prevalence session, these statements are separated into three boxes. Each box contains a separate set of variables. No variable is repeated among the boxes.

  • The Age At Prevalence Date allows you to make selections based on one variable, the "Age at Prevalence Date (Calculated)" variable.The groupings defined in this variable are the same as the database's age recode variable. It is called a calculated variable because the age at prevalence is not coded in the database. This value is calculated based on the selected prevalence date and either date of birth (if available) or age and date at diagnosis.
  • The Race, Sex, Registry, County allows you to make selections based on race, sex, and geographic variables. These are the variables in the population and case files but not in the standard population file. Date variables are not available in this box since prevalence percents require the population on the prevalence date, rather than the date of diagnosis. Therefore, the Prevalence Date specified on the Statistic tab is used for selecting the appropriate populations based on date.
  • The Other (Case Files) box must be used to select records based on variables that are found only in the case data. This would include cancer-specific variables such as stage at diagnosis, histology, site, etc. Be aware that there are demographic variables in case data files such as race and age that are not included in the population data. SEER*Stat will give an error message, and not allow you to continue, if you try to execute a session calculating percents with any of these demographic variables.

Step 4:  Selection Tab Settings

  • In this exercise, we want to select malignant cases only, but since we want the first malignant primary, this selection is handled by the Multiple Primary Selection option at the bottom of the screen. Because of the Multiple Primary Option used in this exercise, the Malignant Only standard checkbox will have no impact on this analysis, however, we will leave it checked.
  • The Exclude Death Certificate Only and Autopsy Only cases is always checked in Limited-Duration Prevalence sessions. Since SEER*Stat uses the counting method these cases are never considered prevalent. This option prevents them from being included in the survival calculations.
  • Set the Multiple Primary Selection to "First Malignant Primary Only (Non-reported Assumed Malignant)".

Step 5:  Statistic Tab

The Problem Statement specifies that we want a table showing the number of people diagnosed in the 39-year period prior to January 1, 2014.

  • On the Statistic Tab, set the Prevalence Date to "January 1 2014", then set Limit Prevalence Duration to "39 years".
  • Select "Crude Percent" in the Statistic box. Counts will be included in the results.

Step 6:  Table Tab

  • Set "Sex" as a Page variable.
  • Set "Age at Prevalence Date (Calculated)" as a Row variable.

Step 7:  Survival Cohorts Tab

The expected number of cases lost to follow-up who make it to the prevalence date is computed using conditional survival curves for specified cohorts. When you define the survival cohorts, each lost case must fit into one and only one cohort; therefore, variables with overlapping groupings are not allowed and the cohorts must include all records in the analysis.

Add the following system-supplied variables to the Cohort Variables box:

  • Age recode (<60, 60-69, 70+)
    • This is based on the age at diagnosis recode variable (Age recode with < 1 year olds).
  • Race recode (White, Black, Other, Unknown) - no total
  • Sex (no total)
  • Site rec ICD-O-3/WHO 2008 (individual sites only)
  • Year of diagnosis (75-78,79-13 by 5)

Step 8:  Output Tab

  • Enter the following title on the Output Tab:
    SEER 9, 39-year Limited Duration Prevalence Estimates
    First Maligant Primary Only
    By Sex and Prevalence Age
    Limited-Duration Prevalence Exercise 1
  • Check the box to Display All Calculated Statistics In Output Matrix

Learn More...

  • There are seven statistics types available in a limited-duration prevalence session. By default, only two are shown in the output matrix unless the option to display all is selected. If you click on the Set Default button when the check box is checked, the default when you run any prevalence session will be to display all calculated statistics.

Step 9:  Execute SEER*Stat

  • Use the Execute button or select Execute from the Session menu to executethe session.
  • A dialog will display the progress of the job. When the job completes a new window will open containing the output table or matrix.

Step 10:  Check your Results

Compare your results to this SEER*Stat matrix file: key.prev1.spm. All prevalence statistics types are displayed based on the selection you made on the Output Tab. Use the Matrix Options to hide any of the statistics.