An official website of the United States government

Rate Exercise 4a: Complex Selection Statements

left-img

Create complex selection statements to include the correct race and region combinations, and use variables with unlabeled values to select records.

In this exercise, you will define a complex selection statement to produce statistics by expanded race using the SEER Program's guidelines. You will also create selection statements using variables with unlabeled values.

Exercise

Create a table showing frequencies and incidence rates (age-adjusted to the 2000 U.S. standard population) for malignant esophageal squamous cell carcinoma. Include only microscopically confirmed cases. Calculate these statistics for persons diagnosed from 1992 through 2021 in the SEER 12 Registries. Do not show statistics based on fewer than 10 cases.

Display the statistics by race, sex, and year of diagnosis. Show data for males and females separately but not combined. Use the following races: "White", "Black", "American Indian", "Asian or Pacific Islander". Include standard errors and confidence intervals in the table.

Define squamous cell carcinoma as: Histologic Type ICD-O-3 = 8070-8078,8083-8084

Key Points

  • This exercise requires that you create a complex selection statement to include the correct race and region combinations. When producing statistics using SEER Incidence data for American Indians, SEER frequently only includes cases that are in a Purchased/Referred Care Delivery Area (PRCDA).  Refer to the County Attributes web page for more information. The selection statement will use parentheses and the "OR" conjunction.
  • In this exercise, you will make selections by specifying a range of numeric values to define squamous cell carcinoma using the Histologic Type ICD-O-3 variable. In previous exercises, you selected values from a list of values with labels. The ICD-O-3 Hist/behav variable has labeled values and could also be used for this selection.
  • SEER provides several "system-supplied" variables as a convenience for users who frequently created user-defined variables in order to remove any overlapping groupings from standard variable definitions (e.g., removing the Male and Female combined grouping from Sex). System-supplied variables are available in this database for Age, Race and Origin, Sex, Year of Diagnosis, and Site Recode.
  • Starting with SEER*Stat version 8.3.3, statistics based on a count of zero are suppressed when suppressing based on cell size. In prior versions, counts of zero were not suppressed.

Instructions

Step 1:  Create a Rate Session and Select a Database

  • Start SEER*Stat.
  • Start a new Rate Session from the New Session menu.
  • On the Select Database dialog, select "Incidence - SEER Research Data, 12 Registries, Nov 2023 Sub (1992-2021)".
  • Make sure the Age Variable is set to "Age recode with < 1 year olds."

  • Databases distributed with SEER*Stat use names designed to describe the data. The various parts of this exercise's database name indicate the following:
    • Incidence - The database contains cancer incidence data.
    • SEER Research Data - This indicates the database type and which variables are included, as described in the data dictionary available on SEER*Stat Database Details.
    • 12 Registries - The database contains data for the "SEER 12 Registries" as defined in Registry Groupings in SEER Data and Statistics.
    • Nov 2023 Sub - The data was submitted to the SEER program by the registries in November 2023.
    • (1992-2021) - These are the years of diagnosis for the cases included in the database.
  • The suggested citation for the database selected on the Data options is shown at the bottom of the screen. For more information, see Citations for SEER Databases and SEER*Stat Software.

Step 2:  Choose the Statistics to Display

  1. Move to the Statistic options.
  2. In the Statistic box, select Rates (Age-Adjusted).
  3. In the Parameters box:
    • Make sure that the Standard Population is set to "2000 U.S. Std Population (19 age groups - Census P25-1130)".
    • Make sure the Age Variable is set to "Age recode with < 1 year olds."
    • Check the Show Standard Errors and Confidence Intervals box.
      • Make sure the "Use Tiwari et al., 2006 modifications for CIs" is checked.

Step 3:  Define the Analysis Cohort

For this problem you will need to select based on race, PRCDA region, behavior, cancer site, histology, and diagnostic confirmation. Tip: use the Find button on the Selection Line Construction dialog to locate a variable based on its name or a format/grouping label. Type at least three characters in the Search Text box and any results containing that text will appear as you type. For example, if you search for "micros" you will find the label "microscopically confirmed" is in the "Diagnostic confirmation" variable.

Specific click-by-click instructions for creating individual selection statements were given in previous tutorials (see Frequency Exercise 1a).

  1. Move to the Selection options.
  2. Make sure that the Malignant Behavior option is checked in the Select Only box at the top of the tab.
  3. In the "Race, Sex, Year Dx (Pop, Case Files)" box, make the following selections:
Race recode (W, B, AI, API) = 'White', 'Black', 'Asian or Pacific Islander'
OR ({Race, Sex, Year Dx.Race recode (W, B, AI, API)} = 'American Indian/Alaska Native'
AND {Race, Sex, Year Dx.PRCDA 2020} = 'PRCDA')
Note that the last two lines of the statement are grouped. To create the statement and group the statement lines:
    1. Create each new statement line on the Selection dialog.
    2. Select the bottom two lines and press the Group Items button. Parentheses will be added to the Selection Statement value. Parentheses around a group of lines tell SEER*Stat to evaluate those lines first when processing the selection statement.
    3. Make sure the outer logical operator is set to OR and the logical operator for the grouped lines is set to AND
    4. Press the OK button.
  1. In the "Other (Case Files)" box, make the following case selections:
{Site and Morphology.Site recode ICD-O-3/WHO 2008} = ' Esophagus'
And {Site and Morphology.Histologic Type ICD-O-3} = 8070-8078,8083-8084
AND {Site and Morphology.Diagnostic Confirmation} = 'Microscopically confirmed'

  • Through the use of the complex selection statements, you were able to define an analysis cohort which includes:
    1. All records for Whites, Blacks, and Asian/Pacific Islanders for all registries and years in the selected database (SEER 12 registries, 1992-2021)
    2. All records for American Indians within the PRCDA regions.
  • When you selected the "Histologic Type" variable, the Values box in the Selection window changed format. The valid values for the Histologic Type variable are shown just above the Values box. It is not practical to list all values for variables with a large number of numeric values. If you want to specify a range of values for an unlabeled variable, use a hyphen to define the range, and use commas to separate multiple values or ranges (e.g. 1-5,8-19).
  • To learn more about the squamous cell carcinoma definition, the "ICD-O-3 Hist/behav" variable has labeled values for each ICD-O-3 code. If you are unsure of which ranges define squamous cell, this variable could be used instead in the case selection. The selection statement would be:

    {Site and Morphology.ICD-O-3 Hist/behav} =
    '8070/2: Squamous cell carcinoma in situ, NOS',
    '8070/3: Squamous cell carcinoma, NOS',
    '8071/2: Squamous cell carcinoma in situ, keratinizing, NOS',
    '8071/3: Squamous cell carcinoma, keratinizing, NOS',
    '8072/2: Squamous cell CIS, large cell, nonkeratinizing',
    '8072/3: Squamous cell ca., large cell, nonkeratinizing',
    '8073/2: Squamous cell CIS, small cell, nonkeratinizing',
    '8073/3: Squamous cell ca., small cell, nonkeratinizing',
    '8074/2: Squamous cell carcinoma in situ, spindle cell',
    '8074/3: Squamous cell carcinoma, spindle cell',
    '8075/2: Squamous cell carcinoma in situ, adenoid',
    '8075/3: Squamous cell carcinoma, adenoid',
    '8076/2: Squamous cell CIS with questionable stromal invasion',
    '8076/3: Squamous cell carcinoma, micro-invasive',
    '8077/2: Squamous intraepithelial neoplasia, grade III',
    '8077/3: Squamous cell ca. & Grade III',
    '8078/3: Squamous cell carcinoma with horn formation',
    '8083/2: Basaloid squamous cell carcinoma in situ',
    '8083/3: Basaloid squamous cell carcinoma',
    '8084/3: Squamous cell carcinoma, clear cell type'

Step 4:  Create a User-Defined Table Variable

For this exercise, you need to define a new variable for race.

  1. Open the Data Dictionary.
  2. Select the "Race recode (W,B,AI,API)" variable from the "Race, Sex, Year Dx" category and use the Create button to open the Edit Variable window.
  3. Change the Name of the variable to: "Race recode (W,B,AI,API) w/o unks".
  4. Delete the "Unknown" grouping in the Groupings box.
  5. When you are finished, click the OK button and close the Dictionary.

Step 5:  Set the Display Variables

For this exercise, you will show data by race, sex, and year of diagnosis.

  1. Move to the Table options.
  2. Use the "+" symbol to expand the "Race, Sex, Year Dx" category in the Available Variables box, select "Year of diagnosis" and add it as a row variable.
  3. Use the "+" symbol to expand the "User-Defined" category in the Available Variables box, select "Race recode (W,B,AI,API) w/o unks" and add it as a row variable.
  4. Use the "+" symbol to expand the "System-Supplied" category in the Available Variables box, select "Sex (no total)", and add it as a column variable.

Step 6:  Specify a Title and Hide Statistics

  1. Move to the Output options.
  2. Enter the following title:
Malignant Esophageal Squamous Cell Carcinoma
Microscopically Confirmed Cases Only, 1992-2021
SEER 12 for White, Black, API
SEER 12 (incl. PRCDA only) for AI/AN
Rate Exercise 4a
  1. Check the option to Hide Statistics When Fewer Than and update the number of cases to 10.

Step 7:  Create the Matrix and Reorder the Rows

  1. Execute the session and press the OK button on any variable warnings that open. A dialog will display the progress of the job. When the job completes, a SEER*Stat matrix window will open containing the output table.
  2. The output table contains two row variables (year of diagnosis and race). The outermost row variable is the first variable listed as a row variable on the session's Table options. The innermost is the second row variable on the Table options. To change the Order or the variables:
    1. Right click anywhere on the row variable columns and select Order to change the order of the row variables. The Set Variable Order dialog opens.
    2. Select the first variable listed and click the Down button to switch the order of the variables.
    3. Press the OK button. The variable you moved down is now the inner row variable in your results matrix.
  3. Use the Save As command on the File menu to save the matrix for use in the next exercise. Enter "Rate Exercise 4a" as the filename. SEER*Stat will assign the "sim" extension to indicate that this is a "SEER*Stat Rate Matrix" file.
  4. Compare your results to this SEER*Stat matrix file: Exercise Matrix 4a Results (sim, 40.4 KB).

The Matrix menu gives you the opportunity to customize your results, as well as export the results for use in other applications. See the Results Matrix topic for more information.

right-img