Many registries across the world collect only a more limited set of variables for incidence and survival analysis. When this is the case, the Database Description file for the Global format should be used instead of the one for the NAACCR format. The steps to create a SEER*Stat database are the same described in Example 1.
In this example, we will keep the same data but exclude race. This variable is not part of the default variables in the Global format as many registries do not collect it. However, you can always create a user-specified variable for race, that may or may not be linked to the population file.
Step 1: Get Detailed Descriptions of the Case and Population Input Files
- Start SEER*Prep.
- Open the Database Description (DD) file distributed with the software. Select Open from the File menu to select the file. The name of this file is "global334.d11282022.dd" or something similar (if an update is released, the date embedded in the filename will differ).
- Once you select the DD file, SEER*Prep will load information for each variable into the box on the right side of the window. Initially, the list will be sorted by the variable location in the incidence data file (note the Case Start Col column). Click the Pop Start Col column header. The list should now be sorted by the variable location in the population data file (case-only variables will have a blank entry in this column).
- Set the variables used to link the case data with the population data. Since this example assumes your data are for 19 age groups, the age variable should be "Age recode with < 1 year olds."
- Select a specific variable and open the Edit Variable window by double-clicking or using the Edit button. The Edit window contains a complete description of the variable including its valid values. Edit the "Basis of diagnosis" variable to view an example.
- Variables can be deleted or added according to what you have in your database. User-specified options can be used to add variables that are not part of the default list of variables. Length for user-specified variables range between 1- and 6-bytes length.
- Select Generate Input File Description from the File menu to create a text file containing detailed format information for the case and population files.
Step 2: Prepare your Incidence Data Files
- Using software other than SEER*Prep, create an incidence data file according to the Database Description file specified in Step 1. The name of the file, the record length, and the variable formats must adhere to the rules described in Input File Formats. Note: you may store the data in more than one file. SEER*Prep will process the data files sequentially and combine the data into one SEER*Stat database.
Step 3: Prepare your Population Data Files
- Create population data file(s) that meet the criteria documented in the report created in Step 1. The filename(s), record length, and variable formats must also adhere to the rules described in Input File Formats.
- When making this file be sure to include only the appropriate populations. For this example, the population data file should only contain records with populations for females in the state of Maryland for years 2001-2018. If male populations or populations for additional years are included, extra care will be required when using SEER*Stat with this database to prevent the generation of misleading statistics.
Step 4: Create a Database Description File for Your Database
The Database Description for the Global format is meant to be used as a template. Follow these steps to create one containing the exact specifications for your database:
- Start SEER*Prep.
- Reopen the DD file used in Step 1.
- Add your incidence file or files to the Input Case Files control.
- Add your population file or files to the Input Population Files control.
- Provide a name for the SEER*Stat database to be created. Edit the text in the Database Name control. The name entered here will be shown in the list of databases on SEER*Stat's Data Tab.
- Press the Edit button next to the label "Study Cutoff Date for Survival". Enter the month and year when the study ended. This date is used to create several variables that SEER*Stat needs to perform survival analysis. If your incidence data does not contain follow-up information or if you are not interested in survival analysis with SEER*Stat, enter December of the latest year of diagnosis in your input incidence file (for this example, December 2018).
- Use the Save As function on the File menu to save the Database Description with a new name.
Step 5: Verify your Data Files
- Using the checkmark on the toolbar or Verify Data from the Execute menu, create a Verify Report. SEER*Prep will generate a one way frequency of every variable in your incidence and population file.
- Review the Verify Report and resolve any issues identified in the report.
Step 6: Create Database
- Click the lightning bolt on the toolbar or select Create Database from the Execute menu.
- Use the default for record exclusions (see SEER*Prep help system for more information).
- Enter a name for the Create Report. SEER*Prep will now create a SEER*Stat database. This database will contain your data converted to a binary format, indices, and dictionaries (format libraries) for SEER*Stat.
- Review the Create Report, paying particular attention to any Notes/Warnings. These identify potential mismatches between your incidence and population files. For example, if your population file contained information for males, you would get a warning, since your incidence data is only for females.
Step 7: Use Your New Database in SEER*Stat
- Exit SEER*Prep and start SEER*Stat. Your new database will be available in the list of databases on the Data Tab (for the appropriate sessions).