County attribute data from the U.S. Census such as median income or educational attainment are linked to SEER incidence, U.S. mortality, and population data by state-county FIPS codes.

Create a table showing incidence rates for the SEER 21 registries by high school education quintiles. Assign quintiles of counties so that 20% of the counties are in each group. To create the quintiles use the results from the Case Listing Exercise 3. The quintiles should be based on data from all U.S. counties. Results should be based on data for female malignant cervical cancer cases for the years 2012-2016 from the SEER 21 registries.

Key Points

  • This exercise illustrates the use of SEER*Stat to view incidence rates by county attribute variables.
  • Using results from Case Listing Exercise 3, see how to determine the quintile cut points for counties by a county attribute. In Rate Exercise 6 you will produce U.S. mortality rates by the same quintiles. For consistency we will use the same quintile cut-points for both analyses.
  • Create a user-defined variable based on a county attribute variable available in the selected incidence database.

Step 1:  Create a Rate Session

  • Start SEER*Stat.
  • From the File menu select New > Rate Session or use the Rate button on the toolbar.

Step 2:  Select a Database (Data Tab)

  • It is extremely important that you select the database as the first step in order to see the correct list of variables. In this problem, we need to select a incidence database with county attribute data.
  • On the Data Tab select "Incidence - SEER 21 Regs Limited-Field Research Data + Hurricane Katrina Impacted Louisiana Cases, Nov 2018 Sub (2000-2016) <Katrina/Rita Population Adjustment>".
  • Make sure the Age Variable is set to "Age recode with <1 year olds".

Step 3:  Choose the Statistics to Display (Statistic Tab)

  • In the Statistics box, select Rates (Age-Adjusted).
  • In the Parameters box:
    • Make sure that the Standard Population is set to "2000 U.S. Std Population (19 age groups - Census P25-1130)".
    • Make sure the Age Variable is set to "Age recode with <1 year olds".

Step 4:  Defining the Analysis Cohort (Selection Tab)

Specific click-by-click instructions for creating individual selection statements were given in previous tutorials (see Frequency Exercise 1a). Use those techniques to create your selection statement.

Make sure that the Malignant Behavior and the Cases in Research Database options are checked in the Select Only box. The Known Age option is always checked and disabled in rate sessions because all records must have values that are included in the U.S. Population and Standard Population data. Unknown age is not a valid value, so records with unknown ages are excluded from the analysis.

For this problem you should create selection statements based on year of diagnosis, sex, and cancer site.

Make the following selections in the "Race, Sex, Year Dx, Registry, County (Pop, Case Files)" box:

{Race, Sex, Year Dx, Registry, County.Year of diagnosis} = '2012','2013','2014','2015','2016'
AND {Race, Sex, Year Dx, Registry, County.Sex} = ' Female'

Make the following selection in the "Other (Case Files)" box:

{Site and Morphology.Site recode ICD-O-3/WHO 2008} = ' Cervix Uteri'

Step 5:  Calculate Quintiles of Counties

Use the results from Case Listing Exercise 3 as a guide to calculate quintiles based on all U.S. counties in 2016. In the exercise, we created a table showing percentages of less than a high school education by county. View the results of this SEER*Stat matrix file: key.case3.slm.

Since there are 3142 valid counties for 2016 (shown as the 3142 rows in the case listing matrix), to create 20% groupings we will assign 628 counties to three quintiles, and 629 to two (3143/5 = 628 with 2 additional counties).

  • The 1st quintile will include counties 1 - 628
  • The 2nd quintile will include counties 629 - 1256
  • The 3rd quintile will include counties 1257 - 1884
  • The 4th quintile will include counties 1885 - 2513 (includes extra county)
  • The 5th quintile will include counties 2514 - 3142 (includes extra county)

To determine quintile cut points for a user-defined variable based what percentage of the county population had less than a high school education, use the case listing results. You will see that this matrix is sorted by percent with less than high school education. The rows are numbered on the left side to use as a guide.

  • The 1st quintile begins at (NE: Blaine County (31009) - 00128 (1.28%)
    and ends at (628) MI: Benzie County (26019) - 00855 (8.55%)
  • The 2nd quintile begins at 00855+1=00856 (8.56%)
    and ends at (1256) SD: Jerauld County (46073)- 01129 (1129%)
  • The 3rd quintile begins at 01129+1=01130 (11.30%)
    and ends at (1884) TN: Loudon County (47105) - 01470 (14.70%)
  • The 4th quintile begins at 01470+1=01471 (14.71%)
    and ends at (2513) TX: Marion County (48315) – 01975 (19.75%)
  • The 5th quintile begins at 01975+1=01976 (19.76%)
    and ends at (3142) TX: Starr County (48427) - 05148 (51.48%)

Step 6:  Create a User-Defined Variable

Now use the information from Step 5 to create the user-defined variable:

  • Return to the rate session you created in step 1.
  • Open the Data Dictionary by clicking dictionary button on the toolbar.
  • Expand the "County Attributes ACS - 2012-2016" folder and highlight % < High school education ACS 2012-2016.
  • Click the Create... button.
  • In the Name field, edit the variable name to read, "% < HS Educ ACS 2012-2016 (non-weighted quints)".
  • Delete the existing groupings in the Groupings box on the left by selecting each grouping and clicking the Delete button.
  • In the box marked Unlabeled Values, enter each quintile's grouping in the Selected textbox as follows:
    • For the 1st quintile, type "00128-00855" in the textbox, and then click Add.
    • The grouping you entered will be added to the groupings box. Change its name to "First Quintile (1.28%-8.55%)".
    • Repeat these instructions for each Quintile with the following information:

      "Second Quintile (8.56%-11.29%)": values 00856-01129
      "Third Quintile: (11.30%-14.70%)": values 01130-01470
      "Fourth Quintile (14.71%-19.75%)": values 01471-01975
      "Fifth Quintile (19.76%-51.48%)": values 01976-05148

    • Click OK to save the user-defined variable.
    • Click Close to exit the data dictionary.

Step 7:  Set Table Variables (Table Tab)

Use the Table Tab to choose variables to include in the output matrix. For this exercise, you want to show the incidence rate for malignant cervical cancers by county quintiles for the percentage of the population with less than a high school diploma.

  • On the Table Tab, the variables are listed in categories in the Available Variables box at the bottom of the screen.
  • Use the "+" to expand the "User-Defined" category.
  • Select the new variable, "% < HS Educ ACS 2012-2016 (non-weighted quints)".
  • Click Row on the right hand side of the screen to add this variable to the row dimension in the list of Display Variables at the top of the window.

Step 8:  Specify a Title (Output Tab)

  • Move to the Output Tab.
  • Enter the following title:
    Age-Adjusted Incidence Rates for Female Cervical Cancer, SEER 21 Registries, 2012-2016
    By Non-Weighted Quintiles (Based on Total U.S. Counties) of Percentage of Population (Ages 25+) without a High School Degree or Equivalent (ACS 2012-2016)
    Rate Exercise 5

Step 9:  Execute SEER*Stat and Save the Matrix

  • At this point, you have made all the necessary selections on the session tabs. Use the Execute button or select Execute from the Session menu to execute the session.
  • A new window will be opened containing the output table or matrix. Results shown in the SEER*Stat matrix window cannot be edited. You can print the matrix, export the results to a text file, and copy-and-paste data into other applications. The Results Matrix section of the help system contains more information about the SEER*Stat matrix and its features.
  • Use the Save As command on the File menu to save the matrix. Enter "Rate Exercise 5" as the filename. SEER*Stat will assign the "sim" extension to indicate that this is a "SEER*Stat Rate Matrix" file.
  • Compare your results to this SEER*Stat matrix file: Rate Exercise 5 Matrix Results.