SEER*Stat Frequency Exercise 1a: Introduction to SEER*Stat

The SEER*Stat program has several session types, each designed for specific calculations. The frequency session was designed to generate the number of records stratified by any variable in a database; the rate session was designed to calculate disease incidence and mortality rates; there are separate sessions designed specifically for survival, prevalence, and MP-SIR statistics; and the case listing session was designed to view the values of variables for individual cases (records). In all sessions, you must set options on tabs shown in the program's interface.

This exercise uses a frequency session to introduce the basic concepts needed to work through the tabs in all sessions.

Create a table showing the frequency of lung and bronchus cancer by stage at diagnosis (use the "Combined Summary Stage (2004+)" variable). Include cases diagnosed in the SEER 22 registries from 2000 through 2021 and exclude cases with unknown age. Show percent cases per stage in a column of the table.

When you finish this exercise, save the matrix as "frequency_exercise1a.sfm". The saved matrix file will be used in Frequency Exercise 1b.

Key Points

When creating a new session, the database must be selected first (Data Tab). The correct database must be selected in all SEER*Stat sessions so that the correct variables are available throughout the session, including in selection statements, as trend variables, and on the Table tab.
Once you have selected the database, it is recommended that you work through the remaining tabs in order from left to right (Statistic Tab, Selection Tab, Table Tab, Output Tab) and work from top to bottom on each tab.
When using a variable in a table, the format labels (groupings) shown are determined by the variable's definition in the database. As you will see, the databases provided with SEER*Stat contain variables that are formatted for common use. You will often need to add or remove groupings for your particular analysis.

Step 1: Create a New Frequency Session

Start SEER*Stat.
Start a new Frequency Session from the File menu or toolbar.

Learn More...

Each SEER*Stat session type is described in detail in the help system. Look for "Frequency Session", "Rate Session", "Survival Session", "Limited-Duration Prevalence Session", "MP-SIR Session", and "Case Listing Session" in the help system.

Step 2: Select a Database (Data Tab)

On the Data Tab select "Incidence - SEER Research Data, 22 Registries, Nov 2023 Sub (2000-2021)"

Learn More...

The variables shown in the dictionary and throughout the session are the variables in the database selected on the Data Tab.

The only databases listed on the Data Tab are databases appropriate for frequency sessions that available from the Data Locations defined in the Preferences of the selected Profile. There are two types of data locations, server and local:

A server data location is the address of a dedicated computer that stores databases and can perform SEER*Stat analyses.

A local data location is the address of a directory on your computer or local network in which SEER*Stat databases are stored.

Suggested Citations

Data Citation: The citation for each database provided by SEER should include information about the data submission and release date. A suggested citation for each database can be viewed at the bottom of the Data Tab and is included in print-outs of sessions and results.
Citation for SEER*Stat Software: version information should be included in the citation for the SEER*Stat software. In SEER*Stat, select Suggested Citations from the Help menu.

Step 3: Choose the Statistics to Display (Statistic Tab)

Move to the Statistic Tab.
In the Statistic box, select Frequencies.
In the Percentages box, select Column.

Learn More...

Percentages can only be selected if Frequencies are chosen.

It can be difficult to remember if you want to use Column Percentages or Row Percentages. Here is a tip that may help: selecting Column Percentages will generate percentages of the row variables (column percentages sum to 100 in the column, row percentages sum to 100 in the row). Later in this exercise we will be defining stage as a row variable, therefore, the stage distributions must be generated using Column Percentages. As a rule of thumb, put your percentages in one dimension of the table and the variable in the other.
Percentages can only be calculated on variables with unique groupings, that is, the values in the groupings cannot overlap. For example, to calculate the percent of cases for males versus females, you would have to create a variable that does not include the "Males and Females" combined grouping.
Parameters are only enabled and can be selected if Trends are chosen as the statistic.

Step 4: Defining the Analysis Cohort (Selection Tab)

Move to the Selection Tab.
Before you continue, please take time to consider the following:
1. There are two basic mechanisms for making selections: adding selection statements and checking standard options. Selection statements reduce the number of records included in an analysis based on specific variables. Selection options are checkboxes that implement commonly used selection statements. If no selections are made then your analysis cohort will include every case in the database (all cases in the 22 SEER registries for 2000-2021). In this exercise, we want the frequency of lung cancer cases of known age in the research data (regardless of behavior). Therefore, we need a statement selecting lung cancer cases and to turn off the default option for behavior within the Select Only box.
2. Do not add a selection statement for every variable. The Problem Statement specifies that the results be shown by stage for specific years of diagnosis and registries. The number of records analyzed is not affected by any of those variables. This analysis will include records for all stages -- the table will be shown by stage based on settings you will make on the Table Tab in a later step. This analysis will also include data for all years of diagnosis and all SEER registries in the selected database. That is, the years of diagnosis and SEER registries were determined by the database selected on the Data Tab.
In the Select Only box at the top of the Selection Tab, uncheck the "Malignant Behavior" option and make sure the "Known Age" option is checked.
Use the edit button to open the Case Selection window.
Using the controls at the top of the Case Selection window, you will create a selection statement. The variables are listed in categories in the Variable box on the top left of the screen.
In the Variable box, use the "+" to expand the "Site and Morphology" category.
Select "Site recode ICD-O-3/WHO 2008".
Moving to the center of the window, check to see that "is = to" is selected as the Operator.
Scroll through the items in the Values box until you find and select "Lung and Bronchus".
At this time, the following should appear in the Selection Statement box at the bottom of the window:
{Site and Morphology.Site recode ICD-O-3/WHO 2008} = ' Lung and Bronchus'.
Use the OK button to close the Case Selection window.

Learn More...

Advanced features of the Selection Tab, including "Select Only the First Matching Record for Each Person" and full "Person Selection" are described in the Frequency Selection Tab section of the SEER*Stat help system.
Full Boolean logic can be used to create complex selection statements as described in the Selection Statement Dialog section of the help system.

Step 5: Set Table Variables (Table Tab)

Move to the Table Tab.
To successfully use SEER*Stat, you must understand the purpose of the Selection Tab versus the Table Tab. These are often confused. Before you continue, please consider the following:
- The Table Tab is used to choose the variables to be displayed and does not affect the number of records analyzed in any way. In this exercise the frequencies are to be shown by stage. Therefore, the stage variable needs to be used as a row variable.
- The Selection Tab is used to reduce the number of records included in an analysis based on specific variables. In this exercise, the number of records is reduced by cancer site in order to analyze only lung cancer cases.
The variables are listed in categories in the Available Variables box at the bottom of the screen.
Use the "+" to expand the "Stage - Summary/Historic" category.
Select "Combined Summary Stage (2004+)".
Click Row on the right hand side of the screen.
At this time, the "Combined Summary Stage (2004+)" should be listed as a row variable in the Display Variables box at the top of the window.

Step 6: Specify a Title (Output Tab)

Move to the Output Tab.
Enter the following title:

Lung and Bronchus Cancer
Stage Distribution SEER 22
Years 2000-2021
Frequency Exercise 1a

Learn More...

The other features of the Output Tab will be explored in later exercises. In addition, more information is available in the Output Tab section of the SEER*Stat Help System.

Step 7: Execute SEER*Stat

Select Execute from the Session menu or toolbar to execute the session. (Execute Offline is a 3rd option that is described below in the Learn More section.)
A variable warning dialog will appear indicating that caution should be exercised when using the "Combined Summary Stage (2004+)" variable. We will address this warning in Exercise 1b. For now, click OK to continue.
A dialog will display the progress of the job. When the job completes a new window will open containing the output table or matrix. Results shown in the SEER*Stat matrix window cannot be edited. However, you do have the ability to print the matrix, export the results to a text file, and copy-and-paste data into other applications. The Results Matrix section of the help system contains more information about the SEER*Stat matrix and its features.
Notice that only the first line of the title (as entered on the Output Tab) appears on the title bar. The entire title will appear on printouts.

Learn More...

If you are running SEER*Stat in client-server mode you will also have the option to Execute Offline. This feature is intended for jobs that take several minutes or longer to complete. SEER*Stat automatically sends an e-mail message to you as soon as the execution is complete. The e-mail message includes a link to the location of the results matrix.

Step 8: Save the Matrix

You may want to widen the column of labels so that all text is shown. Put your cursor over the column's right border then click and drag to resize.
Use the Save As command on the File menu to save the matrix. Enter "Frequency_Exercise1a" as the filename. SEER*Stat will assign the "sfm" extension to indicate that this is a "SEER*Stat Frequency Matrix" file.

Step 9: Print the Matrix and Check the Results

Use the Print command on the File menu to print the matrix. You will notice that the session information is printed along with the results matrix. The session information serves as documentation for the table. Notice that the name of the matrix file is printed at the top of each page. More information regarding the matrix options and print options are included in the SEER*Stat help system.
Compare your results to this SEER*Stat matrix file: Exercise 1a Results Matrix.
Notice the row labeled "Blank(s)". This grouping (format) was included in the "Combined Summary Stage (2004+)" variable definition for cases that cannot not be staged for certain year ranges. The results also include rows for "N/A" and "Not coded- Testis" that are not applicable to this cancer site. In Exercise 1b, you will learn more about the cases included in these rows.