In this exercise, you will define a complex selection statement to produce statistics by expanded race using the SEER Program's guidelines. You will also create selection statements using variables with unlabeled values.

Create a table showing frequencies and incidence rates (age-adjusted to the 2000 U.S. standard population) for malignant esophageal squamous cell carcinoma. Include only microscopically confirmed cases. Calculate these statistics for persons diagnosed from 1992 through 2018 in the SEER 13 Registries. Do not show statistics based on fewer than 16 cases.

Display the statistics by race, sex, and year of diagnosis. Show data for males and females separately but not combined. Use the following races: "White", "Black", "American Indian", "Asian or Pacific Islander". Include standard errors and confidence intervals in the table.

Define squamous cell carcinoma as: Histologic Type ICD-O-3 = 8070-8078,8083-8084

Key Points

  • This exercise requires that you create a complex selection statement to include the correct race and region combinations. When producing statistics using SEER Incidence data for American Indians, SEER frequently only includes cases that are in a Purchased/Referred Care Delivery Area (PRCDA). The selection statement will use parentheses and the "OR" conjunction.
  • In this exercise, you will make selections by specifying a range of numeric values to define squamous cell carcinoma using the Histologic Type ICD-O-3 variable. In previous exercises, you selected values from a list of values with labels. The ICD-O-3 Hist/behav variable has labeled values and could also be used for this selection.
  • SEER provides several "system-supplied" variables as a convenience for users who frequently created user-defined variables in order to remove any overlapping groupings from standard variable definitions (e.g., removing the Male and Female combined grouping from Sex). System-supplied variables are available in this database for Age, Race and Origin, Sex, Year of Diagnosis, and Site Recode.
  • Starting with SEER*Stat version 8.3.3, statistics based on a count of zero are suppressed when suppressing based on cell size. In prior versions, counts of zero were not suppressed.

Step 1:  Create a new Rate Session

  • Start SEER*Stat.
  • From the File menu select New > Rate Session or use the Rate button on the toolbar.

Step 2:  Select a Database (Data Tab)

  • On the Data Tab select "Incidence - SEER Research Data, 13 Registries, Nov 2020 Sub (1992-2018)".
  • Make sure the Age Variable is set to "Age recode with <1 year olds."

Learn More...

  • Databases distributed with SEER*Stat use names designed to describe the data. The various parts of this exercise's database name indicate the following:
    • Incidence - The database contains cancer incidence data.
    • SEER Research Data - This indicates the database type and which variables are included, as described in the data dictionary available on SEER*Stat Database Details.
    • 13 Registries - The database contains data for the "SEER 13 Registries" as defined in Registry Groupings in SEER Data and Statistics.
    • Nov 2020 Sub - The data was submitted to the SEER program by the registries in November 2020.
    • (1992-2018) - These are the years of diagnosis for the cases included in the database.
  • The suggested citation for the database selected on the Data Tab is shown at the bottom of the screen. For more information, see Citations for SEER Databases and SEER*Stat Software.

Step 3:  Choose the Statistics to Display (Statistic Tab)

  • Move to the Statistic Tab.
  • In the Statistics box, select Rates (Age-Adjusted).
  • In the Parameters box:
    • Make sure that the Standard Population is set to "2000 U.S. Std Population (19 age groups - Census P25-1130)".
    • Make sure the Age Variable is set to "Age recode with <1 year olds."
    • Check the Show Standard Errors and Confidence Intervals box.

Step 4:  Defining the Analysis Cohort (Selection Tab)

  • Move to the Selection Tab. Specific click-by-click instructions for creating individual selection statements were given in previous tutorials (see Frequency Exercise 1a).
  • Make sure that the Malignant Behavior option is checked in the Select Only box at the top of the tab.
  • For this problem you will need to select based on race, PRCDA region, behavior, cancer site, histology, and diagnostic confirmation. Use the Find button to locate a variable based on its name or a format/grouping label. Type at least three characters in the Search Text box and any results containing that text will appear as you type. For example, if you search for "micros" you will find the label "microscopically confirmed" is in the "Diagnostic confirmation" variable.
  • In the "Race, Sex, Year Dx (Pop, Case Files)" box, use the conjunctions "AND" and "OR", and group lines using parentheses, to make the following selections:
  • Race recode (W, B, AI, API) = White,Black,Asian or Pacific Islander
    OR ({Race, Sex, Year Dx.Race recode (W, B, AI, API)} = 'American Indian/Alaska Native'
    AND {Race, Sex, Year Dx.PRCDA 2017} = 'PRCDA')


    Note: Parentheses around a group of lines tell SEER*Stat to evaluate those lines first when processing the selection statement. When using parentheses, you must first create the selection statement lines and then add the parentheses. To add parentheses to a selection statement, click and drag your cursor to highlight the lines you want to work with, then click Add (...) to enclose those lines in parentheses.
  • In the "Other (Case Files)" box, make the following case selections:
  • {Site and Morphology.Site recode ICD-O-3/WHO 2008} = ' Esophagus'
    And {Site and Morphology.Histologic Type ICD-O-3} = 8070-8078,8083-8084
    AND {Other.Diagnostic Confirmation} = 'Microscopically confirmed'

Learn More...

  • Through the use of the complex selection statements, you were able to define an analysis cohort which includes:
    1. All records for Whites, Blacks, and Asian/Pacific Islanders for all registries and years in the selected database (SEER 13 registries, 1992-2018)
    2. All records for American Indians within the PRCDA regions.
  • When you selected the "Histologic Type" variable, the Values box in the Selection window changed format. The valid values for the Histologic Type variable are shown just above the Values box. It is not practical to list all values for variables with a large number of numeric values. If you want to specify a range of values for an unlabeled variable, use a hyphen to define the range and use commas to separate multiple values or ranges (e.g. 1-5,8-19).
  • To learn more about the squamous cell carcinoma definition, the "ICD-O-3 Hist/behav" variable has labeled values for each ICD-O-3 code. If you are unsure of which ranges define squamous cell, this variable could be used instead in the case selection. The selection statement would be:

    {Site and Morphology.ICD-O-3 Hist/behav} =
    '8070/2: Squamous cell carcinoma in situ, NOS',
    '8070/3: Squamous cell carcinoma, NOS',
    '8071/2: Squamous cell carcinoma in situ, keratinizing, NOS',
    '8071/3: Squamous cell carcinoma, keratinizing, NOS',
    '8072/2: Squamous cell CIS, large cell, nonkeratinizing',
    '8072/3: Squamous cell ca., large cell, nonkeratinizing',
    '8073/2: Squamous cell CIS, small cell, nonkeratinizing',
    '8073/3: Squamous cell ca., small cell, nonkeratinizing',
    '8074/2: Squamous cell carcinoma in situ, spindle cell',
    '8074/3: Squamous cell carcinoma, spindle cell',
    '8075/2: Squamous cell carcinoma in situ, adenoid',
    '8075/3: Squamous cell carcinoma, adenoid',
    '8076/2: Squamous cell CIS with questionable stromal invasion',
    '8076/3: Squamous cell carcinoma, micro-invasive',
    '8077/2: Squamous intraepithelial neoplasia, grade III',
    '8077/3: Squamous cell ca. & Grade III',
    '8078/3: Squamous cell carcinoma with horn formation',
    '8083/2: Basaloid squamous cell carcinoma in situ',
    '8083/3: Basaloid squamous cell carcinoma',
    '8084/3: Squamous cell carcinoma, clear cell type'

Step 5:  Create User-Defined Variables to use on the Table Tab

For this exercise, you need to define a new variable for race.

Open the Data Dictionary.

  1. Select the "Race recode (W,B,AI,API)" variable from the "Race, Sex, Year Dx" category and use the Create button to open the Edit Variable window.
  2. Change the Name of the variable to: "Race recode (W,B,AI,API) w/o unks".
  3. Delete the "Unknown" grouping in the Groupings box.
  4. When you are finished, click the OK button.

Step 6:  Set the Display Variables (Table Tab)

For this exercise, you will show data by race, sex, and year of diagnosis.

  • To add the year of diagnosis variable:
    • Use the "+" symbol to expand the Race, Sex, Year Dx category in the Available Variables box.
    • Select "Year of diagnosis", then add it as a row variable.
  • To add the new race variable you created:
    • Use the "+" symbol to expand the User-defined category in the Available Variables box at the bottom of the Table Tab.
    • Select "Race recode (W,B,AI,API) w/o unks" from the "User-Defined" category, then also add it as a row variable.
  • To add the sex variable for males and females only:
    • Use the "+" symbol to expand the System-Supplied category in the Available Variables box.
    • Select "Sex (no total)", then add it as a column variable.

Step 7:  Specify a Title and Hide Statistics (Output Tab)

  • Move to the Output Tab.
  • Enter the following title:
  • Malignant Esophageal Squamous Cell Carcinoma
    Microscopically Confirmed Cases Only, 1992-2018
    SEER 13 for White, Black, API
    SEER 13 (incl. PRCDA only) for AI/AN
    Rate Exercise 4a
  • Check the option to "Hide Statistics When Fewer Than 25 Cases", and change the limit from 25 to 16 cases.

Step 8:  Create the Matrix and Re-order the Rows

  • Use the Execute button or select Execute from the Session menu to execute the session.
  • A dialog will display the progress of the job. When the job completes, a SEER*Stat matrix window will open containing the output table.
  • The output table contains two row variables (year of diagnosis and race). The outermost row variable is the first variable listed as a row variable on the session's Table Tab. The innermost is the second row variable on the Table Tab.
  • Change the order of the row variables. From the Matrix menu, select Order and then Row.
  • Select the first variable listed and click Move Down button to switch the order of the variables.
  • Click OK.
  • The variable you moved down is now the inner row variable in your results matrix.
  • Use the Save As command on the File menu to save the matrix for use in the next exercise. Enter "Rate Exercise 4a" as the filename. SEER*Stat will assign the "sim" extension to indicate that this is a "SEER*Stat Rate Matrix" file.
  • Compare your results to this SEER*Stat matrix file: Exercise Matrix 4a Results.

Learn More...

  • The Matrix menu gives you the opportunity to customize your results, as well as export the results for use in other applications. See Results Matrix in the SEER*Stat help system for more information.