SEER*Stat Frequency Exercise 2: Cancer Site and Customizing Results

The site recode variables are derived from the "Primary site" and "Histologic type ICD-O-3" variables after the registries submit the data to SEER (see Site Recode). These variables are added to the SEER data as a convenience and are used in most SEER publications to define the cancer of interest.

For example, the sections of the SEER Cancer Statistics Review correspond to a value of a site recode variable.

Create a table showing frequencies by primary site for cases with Site recode ICD-O-3/WHO 2008 = lung and bronchus. Include only malignant cases diagnosed in the SEER 18 registries from 2000 through 2011 and exclude cases with unknown age and those not in the Research database.

Note: Starting with the November 2012 data submission, SEER uses the site recode variable named, "Site recode ICD-O-3/WHO 2008". In prior releases, the "Site rec with Kaposi and mesothelioma" variable was used.

Key Points and Reminders

  • This exercise introduces three variables related to cancer type in the SEER databases (primary site, site recode, histology).
  • In this exercise, you will calculate frequencies for cases diagnosed between 2000 and 2011, the only years for which all 18 registries have cases in the database.
  • You will set primary site as a display variable in order to create a table showing frequencies for the variable's individual values.
  • The Matrix menu provides a variety of options for modifying the layout of the output table. This exercise uses the Hide Zero Count Rows option to suppress the display of primary site codes not relevant to lung and bronchus cancer.

Step 1:  Create a new Frequency Session

  • Start SEER*Stat.
  • From the File menu select New > Frequency Session or use the Frequency button on the toolbar.

Step 2:  Select a Database (Data Tab)

  • On the Data Tab select "Incidence - SEER 18 Regs Research Data + Hurricane Katrina Impacted Louisiana Cases, Nov 2013 Sub (1973-2011 varying)".

Learn More...

Databases distributed with SEER*Stat use names designed to describe the data. The various parts of this exercise's database name indicate the following:
  • Incidence - The database contains cancer incidence data.
  • SEER 18 Regs - The database contains data for the "SEER 18" registries as defined in SEER Registry Groupings for Analyses.
  • Research Data, Nov 2013 Submission - This is the version of the database available researchers outside of the SEER program. The data was submitted to the SEER program by the registries in November 2013.
  • + Hurricane Katrina Impacted Louisiana Cases - Hurricane Katrina had a major impact on Louisiana's population for the July - December 2005 time period. Louisiana cases diagnosed for that six-month time period have been excluded from the research database. These cases are provided with the data, but they are considered supplemental data. For more information, see Adjustments for Areas Impacted by Hurricanes Katrina and Rita.
  • (1973-2011 varying) - These are the years of diagnosis for the cases included in the database. They are considered "varying" because the years of diagnoses for cases vary per registry, depending on which year the registry joined the SEER Program and began contributing data.

Step 3:  Choose the Statistics to Display (Statistic Tab)

  • Move to the Statistic Tab.
  • In the Statistic box, select Frequencies.

Step 4:  Defining the Analysis Cohort (Selection Tab)

  • Move to the Selection Tab.
  • In this exercise, we want a frequency of malignant lung and bronchus cancer cases diagnosed from 2000 through 2011. We do not want to include all of the cancer sites included in the database, and we only want to include years for which all 18 registries have data. Therefore, we need a selection statement based on site, behavior, and year of diagnosis. The database selected on the Data Tab contains cases diagnosed in the SEER 18 registries; therefore selections based on registry are not necessary.
  • The Select Only box provides a shortcut for commonly-used selections. It is very common to select only malignant behavior when analyzing cancer data. For this exercise we want to select malignant cases in the research database and exclude those with unknown age, so make sure that the Malignant Behavior, Known Age, and Cases in Research Database options are checked.
  • Click Edit to open the Case Selection window.
  • Using the controls at the top of the window, you will create a selection statement. The variables are listed in categories in the Variable box on the top left of the screen.
  • In the Variable box, use the "+" to expand the "Site and Morphology" category.
  • Select "Site recode ICD-O-3/WHO 2008".
  • Moving to the center of the window, check to see that "is = to" is selected as the Operator.
  • Scroll through the items in the Values box until you find and select "Lung and Bronchus".
  • In the Variable box, use the "+" to expand the "Race, Sex, Year Dx, Registry, County" category. Select "Year of diagnosis".
  • Moving to the center of the window, check to see that "is = to" is selected as the Operator.
  • Select "2000" from the Values box and select all the years from 2000 though 2011.
  • At this time, the following should appear in the Selection Statement box at the bottom of the window:
    {Site and Morphology.Site recode ICD-O-3/WHO 2008} = ' Lung and Bronchus'.
    AND {Race, Sex, Year Dx, Registry, County.Year of diagnosis} = '2000','2001','2002','2003','2004','2005','2006','2007','2008','2009','2010','2011'
  • Use the OK button to close the Case Selection window.

Learn More...

  • The Problem Statement specified to create a table showing the frequency of primary site for malignant cases with site recode = lung and bronchus. It is possible to select lung and bronchus cases by creating a selection statement that uses primary site and histology. However, it is much easier to use one of the site recode variables if you do not want to include hematopoietic diseases (such as lymphomas and leukemias) in the table.
  • Each line of a selection statement defines selection criteria for one variable. When you have finished defining a line -- by choosing a variable, an operator, and one or more values -- highlighting a different variable in the Variables box will automatically begin a new line.

Step 5:  View the Primary Site Variable Groupings

  • Move to the Table Tab.
  • Use the "+" to expand the "Site and Morphology" category in the Available Variables box.
  • Select "Primary Site - labeled".
  • Open the dictionary editor to view the groupings for "Primary site". In a previous exercise we learned that the dictionary editor could be opened by:
    1. using the Dictionary button on the toolbar,
    2. selecting Dictionary from the File menu, or
    3. double-clicking on the "Primary site" variable.
  • The Dictionary window should now be open.
  • If it is not already selected, select the "Primary site - labeled" variable from the "Site and Morphology" category.
  • The Create button will be enabled when a variable is selected. Use the Create button to open the Edit Variable window to view the groupings and values associated with this variable.
  • The values of the "Primary Site - labeled" variable correspond to ICD-O-3 codes. There is one grouping for each value of the primary site variable.
  • Click Cancel to close the Edit Variable window with out making any changes.
  • Click Close to close the dictionary.

Learn More...

  • Prior to the November 2007 data submission, there was only one primary site variable available which had unlabeled values. If you use the dictionary to view the "Primary Site" variable (not "labeled"), it only has one grouping which contains all possible values. The values are listed in a 3 digit format. The C and decimal point in the ICD-O-3 codes are implied, C34.0 is displayed as 340 in SEER*Stat. To use this variable as a display variable, it was necessary to create a user-defined variable with each value added as an individual grouping.

Step 6:  Set the Row Variable

  • Use the "+" to expand the "Site and Morphology" category in the Available Variables box at the bottom of the Table Tab.
  • Select "Primary site - labeled".
  • Click Row on the right hand side of the screen.
  • At this time, "Primary site - labeled" should be listed as a row variable in the Display Variables box at the top of the window.

Step 7:  Specify a Title (Output Tab)

  • Move to the Output Tab.
  • Enter the following title:
Primary Site Frequencies for Malignant Lung and Bronchus (Site recode ICD-O-3/WHO 2008)
SEER 18 Registries, 2000-2011
Frequency Exercise 2

Step 8:  Create Matrix and Hide Extraneous Rows

  • Use the Execute button or select Execute from the Session menu to execute the session.
  • A dialog will display the progress of the job. When the job completes a SEER*Stat matrix window will open containing the results.
  • The results matrix contains one line for each value of the primary site variable. The large number of rows makes it difficult to review the relevant frequencies. SEER*Stat has an option that allows you to suppress the display of the rows with a value of zero in each column.
  • Select Options from the Matrix menu.
  • In the Options box check Hide Zero Count Rows.
  • Click OK.
  • The size of the table will be reduced to six rows: one row for each of the primary site ICD-O-3 codes for lung and bronchus.
  • Compare your results to this SEER*Stat matrix file: Exercise Matrix 2 Results.

Learn More...

  • The Options window gives you the opportunity to correct typographical errors in the title. Corrections that you make to the title in the matrix options will appear in the matrix and printed output. However, sessions extracted from the matrix will retain the original, unedited title.
  • Use the Help button on the Options window to learn about the other matrix options.