distance and Ward’s linkage. The optimal number of sample and gene
clusters were identified using the GAP statistic
[23] .Gene ontology biological processes determined biological significance
of the gene clusters. Chi-square or analysis of variance tests were used to
assess association of sample clusters with clinical data. Class labels were
assigned to samples, classifying the subgroup enriched with metastatic
tumours as the ‘‘metastatic-subgroup’’ and the subgroup enriched with
normal prostate samples as the ‘‘
[43_TD$DIFF]
non-metastatic-subgroup’’.
A signature to identify the metastatic-subgroup was developed using
partial-least-squares (PLS) regression. All model development steps (
[44_TD$DIFF]
pre-
processing, gene filtering/selection, model parameter estimation) were
nested within 10 5-fold cross validation (CV), including assessment of
signature score reproducibility in 5 separate FFPE sections and
repeatability across 20 resection samples from the secondary training
dataset with technical duplicates. In sum, area under the receiver
operating characteristic curve (AUC), C-index performance for metastatic
recurrence in the additional dataset of 75 resections, and assay stability
across replicates were used to guide the final number of transcripts
detected by the assay. Thresholds for dichotomising predictions were
selected at the point where sensitivity and specificity for detecting the
metastatic subgroup reached a joint maximum.
2.4.
Statistical assessment of metastatic assay performance
The performance of the metastatic assay regarding biochemical and
metastatic progression was assessed by sensitivity and specificity. Cox
regression was used to investigate prognostic effects of the assay with
respect to time to recurrence
[45_TD$DIFF]
endpoints. The estimated effect of the assay
was adjusted for PSA, age, and GS in a multivariable model. A second
multivariable analysis was performed to investigate the prognostic effect
of the assay when adjusting for CAPRA-S
[13] ,whilst further assessing
additional prognostic effect of a combined model generated for the assay
and CAPRA-S together. Verification of proportional hazard assumptions
was assessed using a statistical test based on the Schoenfeld residuals
[24]. Samples with unknown clinical factors were excluded. All tests of
statistical significance were two sided at 5% level of significance.
2.5.
Combined model development and application (metastatic
assay and CAPRA-S)
A combined model using metastatic assay dichotomised calls and
CAPRA-S dichotomised into low risk (CAPRA-S: 0–5) and high risk
(CAPRA-S: 6–10) was assessed in the resection validation cohort
independently against biochemical and metastatic
[45_TD$DIFF]
endpoints using
Cox regression analysis. Participants were classified as the ‘‘low risk’’
group given a combined model result of assay negative/CAPRA-S low
risk; otherwise, they were labelled as the ‘‘high risk’’ group (ie, samples
that were classified as assay negative/CAPRA-S high risk, assay positive/
CAPRA-S low risk, or assay positive/CAPRA-S high risk).
See the Supplementary material for additional experimental detail.
3.
Results
3.1.
Molecular subtyping and identification of a metastatic
subgroup in the discovery cohort
We hypothesised that amolecular subgroup of poor prognosis
primary prostate cancers would be transcriptionally similar to
metastatic disease. To identify this subgroup, we measured
gene expression in primary prostate cancers, primary prostate
cancers with known concomitant metastases, metastatic
lymph node samples, and histologically confirmed normal
prostate tissue (Supplementary Table 2).
Unsupervised hierarchical clustering identified two sam-
ple groups and two gene clusters
( Fig. 1 A). Importantly, one of
the molecular subgroups (C1) demonstrated significant
enrichment for primary cancers with known concomitant
metastatic disease
( Fig. 1A and 1B, chi-square
p
<
0.0001). In
addition, the C1 group contained all metastatic lymph node
samples and no normal prostate samples. We defined this
subgroup as the ‘‘metastatic subgroup’’ and the other (C2) as
the ‘‘
[43_TD$DIFF]
non-metastatic subgroup’’.
3.2.
Identifying metastatic-subgroup biology
A feature of the metastatic subgroup was loss of gene
expression observed in gene cluster 1 (G1)
( Fig. 1A and
Supplementary Table 8). To investigate whether loss of gene
expression was due to epigenetic silencing, we measured
DNA methylation in eight metastatic- and 14
[43_TD$DIFF]
non-
metastatic-subgroup samples (Supplementary Table 9).
[46_TD$DIFF]
Semi-supervised hierarchical clustering of the methylation
data of downregulated genes (G1) separated the samples
into two groups (Supplementary Fig. 2 and Supplementary
Table 10), with 7/8 samples (88%) from the metastatic
subgroup (M2) and 10/14 samples (71%) from the
nonmetastatic subgroup clustering together (M1) (chi-
square,
p
= 0.02). Functional analysis demonstrated that the
metastatic subgroup had higher levels of methylation in
genes that negatively regulate pathways known to be
involved in aggressive prostate cancer such as WNT and
growth signalling (Supplementary Table 11)
[25]. Together
these data suggest that epigenetic silencing is a feature of
the metastatic subgroup and may therefore be important in
metastases.
To better understand the molecular processes upregulated
in the metastatic subgroup, we performed differential gene
analysis, identifying 222
[47_TD$DIFF]
genes that were overexpressed.
Ingenuity Pathway Analysis
( www.ingenuity.com )identified
two upregulated pathways in the metastatic subgroup
[48_TD$DIFF]
(False
Discovery Rate (FDR
[49_TD$DIFF]
)
p
<
0.05). The ToppGene Suite
[26]identified 18 upregulated pathways (FDR
p
<
0.05) (Supple-
mentary Table 12). These pathways represented mitotic
progression and Forkhead Box M1 (FOXM1) pathways.
Consistently, FOXM1 was 2.80-fold overexpressed in the
metastatic subgroup.
3.3.
Development of a metastatic assay
Next, we developed an assay that could identify metastatic-
subgroup tumours (Supplementary Fig. 3). Computational
classification using PLS regression resulted in a 70-transcript
metastatic assay. In the training set, the AUC under CV for
detecting the metastatic-subgroup was 99.1 (98.5–99.8).
The standard deviation (SD) in assay scores using five
separate sections from the same tumour was 0.06,
representing 6.9% of the assay range and 100% agreement
in assay call. In a secondary training dataset of 75 primary
resections, the C-index for detecting the metastatic sub-
group was 90.4, with an SD in assay scores using 20 patient
samples with technical replicates of 0.02 representing 2.9%
of assay range (Supplementary Fig. 4).
E U R O P E A N U R O L O G Y 7 2 ( 2 0 1 7 ) 5 0 9 – 5 1 8
511




