Skip to Main Content

* indicates required field

This database is available only to U.S. based researchers affiliated with U.S. institutions.

SEER registries in Georgia (GA) and California (CA) linked all incident tumor cases diagnosed from 2013 to 2019 to genetic test results performed from 2012 to 2021. Results were provided by the three genetic testing laboratories (Ambry Genetics, Invitae/Labcorp Genetics, and Myriad Genetics) that conduct most of the genetic testing in the two states. This pilot project was conducted under Institutional Review Board (IRB) approved protocol at each registry. Of 1,584,923 cancer patients in the registry cohort, 9.1% linked to genetic test results. Approximate counts of linked tumors by primary site are provided in Tumor Counts Linked to Genetic Test Data (XLSX).

Access Requirements

  • You must already have access to the latest SEER Research Plus Data before a specialized data request can be submitted. Refer to How to Request Data Access for more information.
  • Provide purpose and analytical plan in your request.
    • Include a section on how data will be protected.
  • Each request will first be reviewed by NCI SEER staff for provisional approval.
  • Provisionally approved request is then required to be reviewed by the National Cancer Institute's central Institutional Review Board (cIRB). The data requestor will receive detailed instructions how to apply to cIRB in an email notification of provisionally approved proposal.
  • Depending on which version of the genetic data file is requested, an approval of California Cancer Registry (CCR) IRB may be required.

Database Details

There are two Genetic Testing Linkage Database versions available to request:

The first version has only year of sample accession date and report date. The release of this version requires only NCI cIRB review. The second version has month and year of sample accession and report date. This version requires both California Cancer Registry (CCR) IRB review and NCI cIRB review.

For both version of the database, the available data consists of two files. Both files contain a field genelinkID, which is a masked patient ID that can be used to link across the two files.

  1. GA-CA SEER Research Plus (2000-2021 diagnosis years)
  2. Genetic test results data file which has the following fields:
    • Gene name (approximately 100 genes)
    • Gene status (reported categorically):
      • Normal
      • Pathogenic variant
      • Variant of unknown significance
    • Accession date (month and year or only year)
    • Report date (month and year or only year)

The genes are reported if they were tested by at least two of the laboratories. Genes tested by a single laboratory were collapsed in an "other" gene category. Download the data dictionary for the genetic data [XLSX].

After final approval, the two data files along with data dictionary and instructions are provided through Secure File Transfer Protocol (SFTP).

Database Limitations

For some tumor sites the sample size may not be sufficient to support research questions. Please review the table in the link above before requesting data.