Use a frequency session to learn the basic concepts needed to work through the options in all session types.
SEER*Stat has several session types, each designed for calculating specific statistics: Frequency, Rate , Survival , Limited-Duration Prevalence , MP-SIR , and Case Listing. This tutorial explores Frequency Sessions and basic session concepts.
Exercise
Create a table showing the frequency of lung and bronchus cancer by stage at diagnosis (use the "Combined Summary Stage (2004+)" variable). Include cases diagnosed in the SEER 22 registries from 2000 through 2021 and exclude cases with unknown age. Show percent cases per stage in a column of the table.
When you finish this exercise, save the matrix as "frequency_exercise1a.sfm". The saved matrix file will be used in Frequency Exercise 1b.
Key Points
- When creating a new session, the database must be selected first. The correct database must be selected in all SEER*Stat sessions so that the correct variables are available throughout the session, including in selection statements, as trend variables, and on the Table options.
- Once you have selected the database, it is recommended that you work through the sidebar menu options in order from top to bottom (Statistic, Selection, Table, and then Output) and work from top to bottom on each option set.
- When using a variable in a table, the format labels (groupings) shown are determined by the variable's definition in the database. As you will see, the databases provided with SEER*Stat contain variables that are formatted for common use. You will often need to add or remove groupings for your particular analysis.
Instructions
Step 1: Create a New Frequency Session and Select a Database
- Start SEER*Stat.
- Select the Frequency button from the New Session menu toolbar.
- On the Select Database dialog, select "Incidence - SEER Research Limited-field Data, 22 Registries, Nov 2023 Sub (2000-2021)" and press the OK button. Copy/Paste or start typing the database name into filter at the top of the Select Database dialog to easily find and select the database.
- Press the OK button on the Linked Database Change warning, if it opens.
- The variables shown in the dictionary and throughout the session are the variables in the database selected on the Select Database dialog.
- The databases listed on the Select Database dialog are the databases that are appropriate for frequency sessions and also are available from the Data Locations defined in the Preferences of the selected Profile. There are two types of data locations, server and local:
- A server data location is the address of a dedicated computer that stores databases and can perform SEER*Stat analyses.
- A local data location is the address of a directory on your computer or local network in which SEER*Stat databases are stored.
- Suggested Citations:
- Data Citation: The citation for each database provided by SEER should include information about the data submission and release date. A suggested citation for each database can be viewed at the bottom of the Data sidebar menu options and is included in print-outs of sessions and results.
- Citation for SEER*Stat Software: Version information should be included in the citation for the SEER*Stat software. To view the software citation, select the Citations button from the Help menu.
- Check Do not show this message in the future on a warning prompt to hide a warning in future sessions. Turn all warnings back on from the Profile Preferences.
- For more information regarding the linked database change warning, select the For More Information... link on the warning prompt.
Step 2: Choose the Statistics to Display
- Select Statistic from the sidebar menu to view statistic options.
- In the Percentages box, select Column.
- It can be difficult to remember if you want to use Column Percentages or Row Percentages. Here is a tip that may help: selecting Column Percentages will generate percentages of the row variables (column percentages sum to 100 in the column, row percentages sum to 100 in the row). Later in this exercise we will be defining stage as a row variable, therefore, the stage distributions must be generated using Column Percentages. As a rule of thumb, put your percentages in one dimension of the table and the variable in the other.
- Percentages can only be calculated on variables with unique groupings, that is, the values in the groupings cannot overlap. For example, to calculate the percent of cases for males versus females, you would have to create a variable that does not include the "Males and Females" combined grouping.
Step 3: Define the Analysis Cohort Selection
When using the Selection options to define the analysis cohort it is important to note that:
- There are two basic mechanisms for making selections: adding selection statements and checking standard options. Selection statements reduce the number of records included in an analysis based on specific variables. Selection options are checkboxes that implement commonly used selection statements. If no selections are made then your analysis cohort will include every case in the database (all cases in the 22 SEER registries for 2000-2021). In this exercise, we want the frequency of lung cancer cases of known age in the research data (regardless of behavior). Therefore, we need a statement selecting lung cancer cases and to turn off the default option for behavior within the Select Only box.
- Do not add a selection statement for every variable. The exercise statement specifies that the results be shown by stage for specific years of diagnosis and registries. The number of records analyzed is not affected by any of those variables. This analysis will include records for all stages -- the table will be shown by stage based on settings you will make in the Table selections in a later step. This analysis will also include data for all years of diagnosis and all SEER registries in the selected database. That is, the years of diagnosis and SEER registries were determined by the selected database.
- Choose Selection from the sidebar menu to view selection options.
- In the Select Only box, uncheck the Malignant Behavior option and make sure the Known Age option is checked.
- Use the Edit button to open the Case Selection dialog where you create a selection statement by adding selection lines. Only one selection line is needed for this exercise.
- Select the New Line button to open the Case Selection Line dialog. Available variables are listed in categories in the Variable box on the top left of the screen.
- In the Variable box, use the "+" to expand the "Site and Morphology" category.
- Select "Site recode ICD-O-3/WHO 2008".
- Moving to the center of the window, check that "is = to" is selected as the Operator.
- Scroll through the items in the Values box until you find and select "Lung and Bronchus".
- At this time, the following should appear in the Selection Statement Line box at the bottom of the dialog:
{Site and Morphology.Site recode ICD-O-3/WHO 2008} = ' Lung and Bronchus'. - Use the OK button to close the Case Selection Line dialog. The one-line selection statement is shown on the Case Selection dialog where additional lines could be added if necessary.
- Select the OK button to close the Case Selection dialog.
- Advanced Selection features, including Select Only the First Matching Record for Each Person and Use Person Selection are described in the Frequency Selection section.
- Full Boolean logic can be used to create complex selection statements as described in the Selection Statement section.
Step 4: Set Table Variables
To successfully use SEER*Stat, you must understand the purpose of the Selection versus the Table options. These are often confused. Before you continue, please consider the following:
- The Selection options are used to reduce the number of records included in an analysis based on specific variables. In this exercise, the number of records is reduced by cancer site in order to analyze only lung cancer cases.
- The Table options are used to choose the variables to be displayed and do not affect the number of records analyzed in any way. In this exercise the frequencies are to be shown by stage. Therefore, the stage variable needs to be used as a row variable. The variables are listed in categories in the Available Variables box at the bottom of the screen.
- Select Table from the sidebar menu.
- Use the "+" to expand the "Stage - Summary/Historic" category.
- Select "Combined Summary Stage (2004+)".
- Click the Row button.
- At this time, the "Combined Summary Stage (2004+)" should be listed as a row variable in the Display Variables box at the top of the window.
Step 5: Specify an Output Title
Additional Output features will be explored in later exercises.
- Select Output from the sidebar menu.
- Enter the following title:
Lung and Bronchus Cancer
Stage Distribution SEER 22
Years 2000-2021
Frequency Exercise 1a
Step 6: Execute SEER*Stat
- Select the Execute button from the Actions menu.
- A variable warning dialog will appear indicating that caution should be exercised when using the "Combined Summary Stage (2004+)" variable. We will address this warning in Frequency Exercise 1b. For now, click the OK button to continue.
- A dialog will display the progress of the job. When the job completes a new window will open containing the output table or matrix. Results shown in the SEER*Stat matrix window cannot be edited. However, you do have the ability to print the matrix, export the results to a text file, and copy-and-paste data into other applications. The Results Matrix section of the help system contains more information about the SEER*Stat matrix and its features.
If you are running SEER*Stat in client-server mode you will also have the option to Execute Remotely. This feature is intended for jobs that take several minutes or longer to complete. SEER*Stat automatically sends an e-mail message to you as soon as the execution is complete. The e-mail message includes a link to the location of the results matrix.
Step 7: Save the Matrix
- You may want to widen the column of labels so that all text is shown. Put your cursor over the column's right border then click and drag to resize.
- Use Save As command on the File menu to save the matrix.
- In the Save As options, select the Browse button or select a Recent Folder. The Save As dialog opens.
- Select the save location and enter "Frequency_Exercise1a" as the File name and press the Save button. SEER*Stat will assign the "sfm" extension to indicate that this is a "SEER*Stat Frequency Matrix" file.
Step 8: Print the Matrix and Check the Results
- Use the Print command on the File menu to print the matrix. You will notice that the session information is printed along with the results matrix. The session information serves as documentation for the table. Notice that the name of the matrix file is printed at the top of each page. For more information see the matrix options and print options topics.
- Compare your results to this SEER*Stat matrix file: Exercise 1a Results Matrix (sfm, 20.1 KB). Download the file and use Open from the File menu to open the example matrix.
- Notice the row labeled "Blank(s)". This grouping (format) was included in the "Combined Summary Stage (2004+)" variable definition for cases that cannot not be staged for certain year ranges. The results also include a row for "N/A" that is not applicable to this cancer site. In Exercise 1b, you will learn more about the cases included in these rows.