For the 1975-2018 Data (November 2020 Submission)
When non-institutional users request access to the SEER Research Plus data, they must sign a best practices assurance for securing the data. You may review the language of the agreement below (this cannot be used to request access to the data).
This document is intended to provide guidance for those requesting access to National Cancer Institute (NCI)-designated data repositories. It provides an outline of the expectations for the management and responsible conduct for the secondary use of data managed by the NCI on local storage systems or in cloud computing systems. This document is intended to ensure that NCI’s data distributed for secondary research purposes are kept secure and that only NCI approved users have access to this data.
The information contained in this document is targeted to those individuals who are:
- independent and not affiliated with an institution of higher learning, company, or industry and do not have institutional signing officials1, or
Information Related to the Use of Cloud Computing
In contrast to traditional computing on local servers and hardware, cloud computing often entails the transfer and storage of NCI registry data on systems managed by a third party. Cloud computing offers a number of advantages for authorized data requestors but also requires additional security considerations.
Typically, information security in cloud environments is the responsibility of the Requestor’s institution; the implementation of that security is shared between the institution and the cloud service provider. Thus, it is essential that institutions validate that they are partnering with a reputable cloud service provider. Since you are independent of an institution, you, as an Independent Requestor, are expected to understand the security policies and practices utilized and recommended by the cloud service provider of choice and may wish to obtain third party reviews or audits from the cloud service provider. The Independent Requestor should utilize these best practices, work with their cloud service provider to understand and implement the best practices associated with their specific environment and ensure that the cloud service provider can meet information security requirements. Because the use of cloud computing has the potential for being higher risk than using local infrastructure, the NIH strongly recommends that the Independent Requestor consult their Information Technology experts to ensure that an appropriate security plan is developed and that necessary technical, training and policy controls are in place before data is migrated to cloud environments. Remember – The Independent Requestor is accountable for ensuring the security of this data, not the cloud service provider
Local Infrastructure Guidance
General Information Security Guidelines
- Ensure data files are not exposed to the Internet, except connections that are required to download data from source repositories. Infrastructure should be behind firewalls that block access from outside your computer system. For cloud infrastructure, Independent Requestors must restrict external access to instances and storage under the Requestor’s control (see section on cloud computing for more details).
- Do not post data on servers in any fashion that will make it publicly accessible, such as a data requestor’s website. Even files on websites can be “discovered” by Internet search engines, e.g., Google, Bing. The Independent Requestor is prohibited from making the SEER data publicly accessible.
- Do not set up web or other electronic services that publicly host data, or provide access to individuals that are not listed on the Data Access Request even if those individuals have access to the same NCI registry data. Providing such access requires that an organization be an NIH Trusted Partner, with different requirements above and beyond those required for access to NCI registry data.
- Implement and use authentication technology for access control. Two factor authentication technologies (smart cards, hard or soft token, etc.) are preferred. When using single factor passwords, set policies that mandate the following requirements:
- Minimum length of 12 characters
- Does not contain usernames, real names or company names
- Does not contain a complete dictionary word
- Contains characters from each of the following groups: lowercase letters, uppercase letters, numerals, and special characters
- Passwords should expire every 120 days or at the rate required by institutional policies, whichever is more frequent
- Avoid allowing users to place NCI registry data on mobile devices (e.g. laptops, smartphones, tablets, mp3 players) or removable media such as USB thumb drives (except where such media are used as backups and follow appropriate physical security controls). If data must be placed on mobile devices, it must be encrypted. NIH recommends the use of the National Institute of Standards and Technology (NIST) validated encryption technologies.
- Keep all software patches up-to-date.
Physical Security Guidelines
- Encrypt and store data that are in hard copy or reside on portable media, e.g., on a USB stick, CD, flash drive or laptop in a locked facility with access granted to the minimum number of individuals required to efficiently carry out research.
- Restrict physical access to all servers, network hardware, storage arrays, firewalls and backup media only to those that are required for efficient operations.
- Log access to secure facilities, ideally with electronic authentication.
Controls for Servers
- Keep servers from being accessible directly from the Internet, (i.e. must be behind a firewall or not connected to a larger network) and disable unnecessary services. It is better to begin with a server image that disables all non-essential services and restore those that are needed than to start with a full-featured image and disable unnecessary services.
- Enforce principle of Least Privilege to ensure that individuals and/or processes grant only the rights and permissions necessary to perform their assigned tasks and functions, but no more.
- Secure NCI registry data on the systems from unauthorized users (restrict directory permissions to only the owner and group) and if exported via file sharing, ensure limited access to remote systems.
- Use encrypted data access (such as Secure Shell (SSH) or Virtual Private Network (VPN)) when accessing systems remotely. It is preferred to use a tool such as Remote Desktop Protocol (RDP), X-windows or Virtual Network Computing (VNC) that does not permit copying of data and provides “View only” support.
- Ensure that data access policies are retained throughout the processing of the data on all systems including when data is used on multiple systems (such as a compute cluster). If data is cached on local systems, directory protection must be kept, and data must be removed when processing is complete. Requestors must meet the spirit and intent of these protection requirements to ensure a secure environment 24 hours a day for the period of the agreement.
Source Data and Control of Copies of Data
- Approved data requestors must retain the original version of the encrypted data, track all copies or extracts and ensure that the information is not divulged to anyone except approved users. Therefore, NIH recommends ensuring careful control of all copies of data and providing appropriate logging on machines where such data is resident.
- Additional independent or collaborating investigators from other institutions must submit a separate Data Access Request and be approved to access the data. Approved requestors must restrict outbound access from devices that host NCI registry data.
Destruction of Data
Data downloaded from NIH-designated data repositories must be destroyed upon the time of project close-out or project termination.
- Delete all data for the project from storage, virtual and physical machines, databases, and random-access archives (i.e., archival technology that allows for deletion of specified records within the context of media containing multiple records).
- Retain only encrypted copies of the minimum data necessary for publication(s). Ideally, the data will exist on backup media that is not used for other projects and can therefore be destroyed or erased without impacting other users/tenants. If retaining the data on separate backup media is not possible, as will be the case with many users, the media may be retained for the standard media retention period . However, the data may not be recovered for any purpose without a new Data Access Request approved by NCI SEER program staff. Retained data should be deleted at the end of the SEER approved access period.
- Delete electronic files securely. For personal computers, the minimum would involve deleting files and emptying the recycle bin or equivalent with equivalent procedures for servers. Optimally, use a secure method that performs a delete and overwrite of the physical media that was used to store the files.
- Ensure that backups are reused (data deleted) and any archive copies are also destroyed.
- Destroy media according to NIST Guidelines for Information Media Sanitization.
Additional Guidance for Cloud Computing
Independent Requestors wishing to use cloud computing must work with their cloud service provider to devise an appropriate security plan that meets the general Information Security Best Practices outlined in this document as well as additional requirements that derive from the nature of multi-tenant clouds with default access to the internet. Please refer to the specific cloud service provider for methods, processes and procedures for working with NCI registry databases.
General Cloud Computing Guidelines
- Use end-to-end encryption for network traffic whenever possible. For example, use Hypertext Transfer Protocol (HTTPS) sessions between you and your virtual server instance. Ensure that your service uses only valid and up-to-date certificates.
- Encrypt data at rest with a user's own keys. For example, sensitive data should be encrypted into ciphertext while it is stored in a database and decrypt it to plaintext when it is accessed by an authorized user, and vice versa. The Sequence Read Archive (SRA)-toolkit includes this feature; other software providers offer tools to meet this requirement.
- Use security groups and firewalls to control inbound traffic access to your instance. Ensure that your security profile is configured to allow access only to the minimum set of ports required to provide necessary functionality for your services and limit access to specific networks or hosts. In addition, allow administrative access only to the minimum set of ports and source IP address ranges necessary.
- Be aware of the top 10 vulnerabilities for web applications and build your applications accordingly. To learn more, visit Open Web Application Security Project (OWASP) - Top 10 Web Application Security Risks. When new internet vulnerabilities are discovered, promptly update any web applications included in your Virtual Machine (VM) images. Examples of resources that include this information are Security Focus and the NIST National Vulnerability Database.
- Review the Access Control Lists (ACLs), permissions, and security perimeter to ensure consistent definition.
Audit and Accountability
- Ensure that data is accessible only to those approved for access, and control for changing that access is retained by the Requestor who submitted the request for access to NCI registry data and the appropriate IT staff. A mechanism for monitoring and notification needs to be in place to monitor changes in permission changes.
- Ensure that account access is logged along with access controls and file access. This information must be reviewed by the data requestor on a regular basis to ensure continued secure access.
Image Specific Security
- Ensure images do not contain any known vulnerabilities, malware, or viruses. A number of tools are available for scanning the software, such as Chkrootkit, rkhunter, OpenVAS and Nessus.
- Ensure that Linux-based Images lock/disable root login and allow only sudo access. Additionally, root password must not be null or blank.
- Ensure that images allow end-users with OS-level administration capabilities to allow for compliance requirements, vulnerability updates, and log file access. For Linux-based Images, this is normally through secure shell (SSH), and for Windows-based virtual machine images, this is normally through remote desktop protocol (RDP).
Best Practices for Specific Cloud Service Providers
Examples of cloud service provider best practices are provided in the links below, links to the best practices of additional cloud service providers will be periodically appended to this document when they become available. Please be aware that these are provided for convenience only, and do not imply endorsement by the NIH or the United States Government for any of these services, nor does the government guarantee that these links lead to the most current version of these best practices. NIH recommends that Requestors consult with their cloud service provider to ensure that they are using the most up to date best practice documents.
Amazon Web Services:
- Amazon Machine Images
- Amazon’s Best Practices for Elastic Cloud Compute (EC2)
- Information on Elastic Cloud Compute (EC2)
Additional Resources for Testing and Best Practices
Examples of cloud best practices from organizations that leverage the cloud are provided in the links below.
Center for Internet Security (CIS)
CIS is the only distributor of consensus best practice standards for security configuration. The benchmarks are widely accepted by U.S. government agencies for Federal Security Information Act (FISMA) compliance, and by auditors for compliance with:
- the International Organization for Standardization (ISO)
- the Gramm-Leach-Bliley (GLB) Act
- Sarbanes-Oxley (SOX) Act
- federal Health Insurance Portability and Accountability Act (HIPAA)
- Family Educational Rights and Privacy Act (FERPA)
- and other regulatory requirements for information security
End user organizations that build their configuration policies based on the consensus benchmarks cannot acquire them elsewhere.
National Institute of Standards and Technology (NIST)
NIST, an agency of the US Department of Commerce, provides information security standards and best practices for the federal government. The NIST Special Publications (SP) and Federal Information Processing Standards (FIPS) provide useful and concrete guidance to users of information technology systems (http://csrc.nist.gov/publications/).
United States Government Configuration Baseline (USGCB)
USGCB (http://usgcb.nist.gov) provides security configuration baselines for information technology products widely used across the federal government including desktop computers.