County attribute data from the U.S. Census such as median income or educational attainment are linked to SEER incidence, U.S. mortality, and population data by state-county FIPS codes.
Create a table showing incidence rates for the SEER 21 registries by high school education quintiles. Assign quintiles of counties so that 20% of the counties are in each group. To create the quintiles use the results from the Case Listing Exercise 3. The quintiles should be based on data from all U.S. counties. Results should be based on data for female malignant cervical cancer cases for the years 2014-2018 from the SEER 21 registries.
Key Points
- This exercise illustrates the use of SEER*Stat to view incidence rates by county attribute variables.
- You must have access to the Research Plus data to complete this exercise because it requires use of a county attribute variable only available in Research Plus databases.
- Starting with the 1975-2017 SEER Data, there are two data products available: SEER Research and SEER Research Plus. The Research Plus databases provide access to additional variables and require a more rigorous authorization process. Refer to Comparison of SEER Data Products for more information.
- Using results from Case Listing Exercise 3, see how to determine the quintile cut points for counties by a county attribute. In Rate Exercise 6 you will produce U.S. mortality rates by the same quintiles. For consistency we will use the same quintile cut-points for both analyses.
- Create a user-defined variable based on a county attribute variable available in the selected incidence database.
Step 1: Create a Rate Session
- Start SEER*Stat.
- From the File menu select New > Rate Session or use the on the toolbar.
Step 2: Select a Database (Data Tab)
- It is extremely important that you select the database as the first step in order to see the correct list of variables. In this problem, we need to select a incidence database with county attribute data.
- On the Data Tab select "Incidence - SEER Research Plus Limited-Field Data, 21 Registries, Nov 2020 Sub (2000-2018)".
- Make sure the Age Variable is set to "Age recode with <1 year olds".
Step 3: Choose the Statistics to Display (Statistic Tab)
- In the Statistics box, select Rates (Age-Adjusted).
- In the Parameters box:
- Make sure that the Standard Population is set to "2000 U.S. Std Population (19 age groups - Census P25-1130)".
- Make sure the Age Variable is set to "Age recode with <1 year olds".
Step 4: Defining the Analysis Cohort (Selection Tab)
Specific click-by-click instructions for creating individual selection statements were given in previous tutorials (see Frequency Exercise 1a). Use those techniques to create your selection statement.
Make sure that the Malignant Behavior option is checked in the Select Only box. The Known Age option is always checked and disabled in rate sessions because all records must have values that are included in the U.S. Population and Standard Population data. Unknown age is not a valid value, so records with unknown ages are excluded from the analysis.
For this problem you should create selection statements based on year of diagnosis, sex, and cancer site.
Make the following selections in the "Race, Sex, Year Dx, Registry, County (Pop, Case Files)" box:
{Race, Sex, Year Dx.Year of diagnosis} = '2014','2015','2016','2017,'2018'
AND {Race, Sex, Year Dx.Sex} = ' Female'
Make the following selection in the "Other (Case Files)" box:
{Site and Morphology.Site recode ICD-O-3/WHO 2008} = ' Cervix Uteri'
Step 5: Calculate Quintiles of Counties
Use the results from Case Listing Exercise 3 as a guide to calculate quintiles based on all U.S. counties in 2018. In the exercise, we created a table showing percentages of less than a high school education by county. View the results of this SEER*Stat matrix file: key.case3.slm.
Since there are 3143 valid counties for 2018 (shown as the 3143 rows in the case listing matrix), to create 20% groupings we will assign 628 counties to two quintiles, and 629 to three (3143/5 = 628 with 3 additional county).
- The 1st quintile will include counties 1 - 628
- The 2nd quintile will include counties 629 - 1256
- The 3rd quintile will include counties 1257 - 1885 (includes extra county)
- The 4th quintile will include counties 1886 - 2514 (includes extra county)
- The 5th quintile will include counties 2515 - 3143 (includes extra county)
To determine quintile cut points for a user-defined variable based what percentage of the county population had less than a high school education, use the case listing results. You will see that this matrix is sorted by percent with less than high school education. The rows are numbered on the left side to use as a guide.
- The 1st quintile begins at (MT: Petroleum County (30069) - 00118 (1.18%)
and ends at (628) MI: Iron County (26071) - 00798 (7.98%) - The 2nd quintile begins at 00798+1=00799 (7.99%)
and ends at (1256) IN: Ripley County (18137) - 01065 (10.65%) - The 3rd quintile begins at 01065+1=01066 (10.66%)
and ends at (1885) WA: Ferry County (53019) - 01390 (13.90%) - The 4th quintile begins at 01390+1=01391 (13.91%)
and ends at (2514) LA: St. Mary Parish (22101) – 01868 (18.68%) - The 5th quintile begins at 01868+1=01869 (18.69%)
and ends at (3143) TX: Kenedy County (48261) - 06634 (66.34%)
Step 6: Create a User-Defined Variable
Now use the information from Step 5 to create the user-defined variable:
- Return to the rate session you created in step 1.
- Open the Data Dictionary by clicking on the toolbar.
- Expand the "County Attributes ACS - 2014-2018" folder and highlight % < High school education ACS 2014-2018.
- Click the Create... button.
- In the Name field, edit the variable name to read, "% < HS Educ ACS 2014-2018 (non-weighted quints)".
- Delete the existing groupings in the Groupings box on the left by selecting each grouping and clicking the Delete button.
- In the box marked Unlabeled Values, enter each quintile's grouping in the Selected textbox as follows:
- For the 1st quintile, type "00118-00798" in the textbox, and then click Add.
- The grouping you entered will be added to the groupings box. Change its name to "First Quintile (1.18%-7.98%)".
- Repeat these instructions for each Quintile with the following information:
"Second Quintile (7.99%-10.65%)": values 00799-01065
"Third Quintile: (10.66%-13.90%)": values 01066-01390
"Fourth Quintile (13.91%-18.68%)": values 01391-01868
"Fifth Quintile (18.69%-66.34%)": values 01869-06634 - Click OK to save the user-defined variable.
- Click Close to exit the data dictionary.
Step 7: Set Table Variables (Table Tab)
Use the Table Tab to choose variables to include in the output matrix. For this exercise, you want to show the incidence rate for malignant cervical cancers by county quintiles for the percentage of the population with less than a high school diploma.
- On the Table Tab, the variables are listed in categories in the Available Variables box at the bottom of the screen.
- Use the "+" to expand the "User-Defined" category.
- Select the new variable, "% < HS Educ ACS 2014-2018 (non-weighted quints)".
- Click Row on the right hand side of the screen to add this variable to the row dimension in the list of Display Variables at the top of the window.
Step 8: Specify a Title (Output Tab)
- Move to the Output Tab.
- Enter the following title:
Age-Adjusted Incidence Rates for Female Cervical Cancer, SEER 21 Registries, 2014-2018
By Non-Weighted Quintiles (Based on Total U.S. Counties) of Percentage of Population (Ages 25+) without a High School Degree or Equivalent (ACS 2014-2018)
Rate Exercise 5
Step 9: Execute SEER*Stat and Save the Matrix
- At this point, you have made all the necessary selections on the session tabs. Use the or select Execute from the Session menu to execute the session.
- A new window will be opened containing the output table or matrix. Results shown in the SEER*Stat matrix window cannot be edited. You can print the matrix, export the results to a text file, and copy-and-paste data into other applications. The Results Matrix section of the help system contains more information about the SEER*Stat matrix and its features.
- Use the Save As command on the File menu to save the matrix. Enter "Rate Exercise 5" as the filename. SEER*Stat will assign the "sim" extension to indicate that this is a "SEER*Stat Rate Matrix" file.
- Compare your results to this SEER*Stat matrix file: Rate Exercise 5 Matrix Results.