Identify Urinary Cancer Susceptibility with Germline Sequencing
Discover how germline whole-exome sequencing can help identify gene variants associated with urinary cancer susceptibility and immune escape, informing targeted treatment approaches.
Executive Brief
- The News: 810 patients underwent germline whole-exome sequencing for urinary cancer research.
- Clinical Win: FOXP3-related gene variants confer urinary cancer susceptibility.
- Target Specialty: Immunologists and oncologists treating urinary tract cancer patients.
Key Data at a Glance
Study Design: Case-control study
Sample Size (N=): 810 patients
Primary Cancer Sites: Bladder, renal pelvis, ureter, urethra
Exclusion Criterion: Kidney cancer (unless urothelial by histology)
Discovery Set Size: 354 UTC cases, 371 healthy controls
Confirmation Set Size: 456 cases, 1983 controls
Identify Urinary Cancer Susceptibility with Germline Sequencing
The STrengthening the REporting of Genetic Association Studies reporting guidelines were used for this manuscript.22
The ORIEN consortium represents a nationwide network of 19 cancer centers. The ORIEN Avatar program conducts germline WES, tumor WES, and tumor RNA-seq to research and match patients to clinical trials as previously described.23 24 Details regarding data sources for the variables used in this study are listed on the ORIEN website at https://www.oriencancer.org.25 Our study included 810 patients with primary sites of cancer in the urinary tract (bladder, renal pelvis, ureter, urethra) and excluded any kidney cancer cases unless they were urothelial by histology. As illustrated in figure 1, we stratified the 810 germline WES data into discovery and confirmation sets.
Study design. FOXP3, forkhead box p3; ICI, immune checkpoint inhibition; MSI, microsatellite instability; GSRVAT, gene set-based rare-variant association test;ORIEN, the Oncology Research Information Exchange Network; PFS, progression-free survival; QC, quality control; RNA-seq, RNA sequencing; TMB, tumor mutation burden; TME, tumor microenvironment; UTC, urinary tract cancer; WES, whole-exome sequencing.
The sequencing experiments and primary analysis of the sequencing data of the ORIEN Avatar program are described in Kohlmann et al.26
In the discovery set, 354 UTC cases were compared with 371 healthy, unrelated controls from the Centre d'Etudes du Polymorphisme Humain families and the University of Utah Heritage 1000 (H1K) Projects. The sequencing of H1K samples was described in Ostrander et al.27 In brief, DNA was extracted from blood or saliva. Genomes were prepared using TruSeq DNA PCR-free libraries (Illumina) and run on the Illumina HiSeq X Ten System at a minimum of ×60 median whole-genome coverage. The same bioinformatics pipeline was used to analyze cases and control data, starting from BAM files. Paired-end reads were aligned to the GRCh38 reference genome using Burrows-Wheeler Aligner V.0.7.10. Variants in both cases and controls were jointly called using the Sentieon software.
In the confirmation set, germline variants in 456 cases were jointly called with 1983 WES data from the 1000 Genomes Project using the Sentieon software.
Tumor exome sequencing data from the ORIEN Avatar program were processed by the TNRunner workflow of the USeq software (https://github.com/HuntsmanCancerInstitute). This workflow aligned tumor and germline DNA sequencing reads to GRCh38 and called variants by Illumina’s Strelka2 Small Variant Caller. Somatic mutations were extracted by subtracting germline variants from tumor variants. Genome-wide tumor mutation burden (TMB) was calculated from non-synonymous variants. To ensure comparability of the TMB among patients, we performed a joint variant calling by the GATK package and then excluded variants if the missing rate was >10% or the variant was outside the callable regions for 95% of the samples.
The TNRunner RnaAlignQC workflow was run on each tumor RNA-seq dataset. This workflow uses CutAdapt V.3.4 to remove adapter sequences, STAR V.2.7.9a28 to align reads to the human GRCh38 Ensembl 106 reference, and featureCounts subread 2.0.329 to assign reads to annotated genes.
In patients who had multiple tumor specimens, we selected one specimen per patient to analyze tumor WES and tumor RNA-seq (online supplemental note 1).
Quality control (QC) of data
In the discovery set (n=354), the germline analyses excluded 18 cases with a high genotype missing rate (figure 1). The remaining 336 cases were compared with 364 healthy individuals (figure 1). Cryptic relatedness, or unexpected genetic correlation between two or more cases, was inferred using the program KING (Kinship-based INference for Genome-wide association studies). In cases where multiple individuals were genetically related, only one individual, the one with the lowest genotype missing rate, was included in the analysis. In the confirmation set (n=456), one case was excluded due to cryptic relatedness, and an additional 62 cases were eliminated because of a high missing rate (figure 1). Gene-based and gene set-based rare-variant association tests (GSRVATs) on the discovery set yielded p values with the expected distribution and no evidence of type 1 error inflation, affirming the high quality of the data (online supplemental figure S1).
Among the 780 samples that underwent tumor WES, 730 (93.6%) were derived from specimens originating from primary sites, while the remainder were obtained from metastatic sites. The consistency of TMB across inferred ancestry categories was observed from a non-significant Kruskal-Wallis test (online supplemental figure S2).
The ORIEN data contained 554 tumor samples with RNA-seq data available. We excluded 52 samples due to missing clinical data and 11 duplicates. To mitigate technical variations and batch effects, we removed 43 non-primary tumor samples, 128 samples prepared by tagmentation, 1 sample exhibiting over 300 extreme count outliers, and 179 samples derived from formalin-fixed paraffin-embedded tissue specimens. This process resulted in the retention of 140 samples for subsequent analysis.
Medical Subject Headings (MeSH) terms over-represented in a gene set
To delineate the biological functions of a set of genes, we conducted a gene set over-representation analysis (GSOA) using the BioLitMine database30 to identify MeSH terms that were over-represented among these genes. BioLitMine mesh2gene database (V.2023.2) was kindly provided by Dr Claire Yanhui Hu. From BioLitMine, we constructed a gene set database linking multiple Homo sapiens genes to each MeSH term in the Phenomena and Processes category. A total of 1157 MeSH terms were considered for GSOA after excluding 61 uninformative terms, 148 with fewer than 2 genes analyzed by WES, 112 with more than 2000 genes analyzed by WES, and 11 duplicates. Subsequently, we conducted a one-sided mid-p Fisher’s exact test for each MeSH term based on a 2×2 contingency table and then corrected the p value by the Šidák procedure (Pc) to control the family-wise error rate. MeSH terms with Pc<0.05 were deemed significant.
Analysis of germline exomes
Variant annotation, filtering, and statistical analyses of WES data were conducted using the Variant Interpretation for Clinical Test or Research (VICTOR, http://BJFengLab.org/) software suite31 V.1.2 beta, built on 10 August 2023, and R V.4.3.1.
VICTOR performs a sandwich QC approach consisting of three stages: variant-wise QC, sample-wise QC, and variant-wise QC. This strategy ensures that the sample-wise QC is based on high-quality variants and that the variant-wise QC is based on high-quality samples. Variants were filtered out based on Variant Quality Score Recalibration, the FILTER column of the Variant Call Format (VCF) file, the missing rate (>10%), and an exact Hardy-Weinberg Equilibrium test within each ancestry group (p<0.000001). In both the rare-variant association test (RVAT) and the epistasis test involving the comparison of rare-variant burdens, samples with missing genotypes were excluded. This approach ensured comparability between cases and controls, effectively controlling type 1 errors.
Subsequently, VICTOR annotated and filtered variants and performed RVAT by Firth’s penalized logistic regression. To account for population stratification, the VICTOR pipeline first inferred ancestry and then conducted association analyses separately within each ancestry group (African/African American, Admixed American, East Asian, Non-Finnish European, and South Asian). Within each ancestry, population substructure was adjusted using principal components calculated from a within-ancestry principal component analysis, with the number of components determined using the elbow method. Results from different ancestry groups were combined using Stouffer’s method, with each group weighted by its effective sample size calculated as 4×S× T/(S+T), where S is the number of cases, and T is the number of controls.
Since variants were excluded if they had a low BayesDel deleteriousness score,32 a high maximum allele frequency across populations, or a ClinVar classification as Benign or Likely Benign, a protective effect conferred by the remaining variants was not expected. Therefore, RVAT was one-sided. The genome-wide significance threshold was set at p<0.05 after Šidák correction for the number of genes or gene sets in the analysis. To select gene sets for confirmation, we calculated each gene set’s false discovery rate (q value) and progressively reduced the q value threshold from 1 to 0 until fewer than ten gene sets remained. If more than 10 sets remained, only the first 10 sets were selected.
Clinical Perspective — Dr. Aarti Ghosh, Immunology
Workflow: As I assess patients with urinary tract cancer, I'm now considering the potential impact of FOXP3-related gene variants on their susceptibility to the disease. With 810 patients in the study, I'd look for any indications of immune escape mechanisms, such as those associated with immune checkpoint inhibition. The study's use of germline whole-exome sequencing (WES) and tumor WES data informs my approach to matching patients to clinical trials.
Economics: The article doesn't address cost directly, but the use of whole-exome sequencing and tumor RNA-seq suggests a significant investment in diagnostic testing. I'd consider the cost of these tests, such as the Illumina HiSeq X Ten System, when evaluating treatment options for my patients. The cost-effectiveness of these tests in identifying FOXP3-related gene variants and informing treatment decisions is an important consideration.
Patient Outcomes: The study's findings on FOXP3-related gene variants and their association with immune escape mechanisms have significant implications for patient outcomes. For example, the use of immune checkpoint inhibition may be more effective in patients with certain variants, and I'd consider this when developing treatment plans. The study's results can inform my discussions with patients about their prognosis and treatment options, such as progression-free survival (PFS) rates.
Transparency & Corrections
HCP Connect is funded by Stravent LLC and maintains editorial independence from advertisers and pharmaceutical companies. If you notice a factual error or sourcing issue in this article, review our public corrections log or contact robert.foster@straventgroup.com.