CancerLocator: non-invasive cancer diagnosis and tissue-of-origin prediction using methylation profiles of cell-free DNA
Shuli Kang, Qingjiao Li, Quan Chen, Yonggang Zhou, Stacy Park, Gina Lee, Brandon Grimes, Kostyantyn Krysan, Min Yu, Wei Wang, Frank Alber, Fengzhu Sun, Steven M. Dubinett
, Wenyuan Li
and Xianghong Jasmine Zhou
https://doi.org/10.1186/s13059-017-1191-5© The Author(s). 2017
Received: 13 January 2017Accepted: 8 March 2017Published: 24 March 2017
We propose a probabilistic method, CancerLocator, which exploits the diagnostic potential of cell-free DNA by determining not only the presence but also the location of tumors. CancerLocator simultaneously infers the proportions and the tissue-of-origin of tumor-derived cell-free DNA in a blood sample using genome-wide DNA methylation data. CancerLocator outperforms two established multi-class classification methods on simulations and real data, even with the low proportion of tumor-derived DNA in the cell-free DNA scenarios. CancerLocator also achieves promising results on patient plasma samples with low DNA methylation sequencing coverage.
Cell-free DNALiquid biopsyDNA methylationNext-generation sequencingCancer diagnosis
Cancer cells often display aberrant DNA methylation patterns, such as hypermethylation of the promoter regions of tumor suppressor genes and pervasive hypomethylation of intergenic regions [1, 2, 3, 4, 5]. Therefore, DNA methylation is an ideal target for cancer diagnosis in clinical practice [6, 7]. Hyper/hypomethylated tumor DNA fragments can be released into the bloodstream via cell apoptosis or necrosis, where they become part of the circulating cell-free DNA (cfDNA) in plasma . The non-invasive nature of cfDNA methylation profiling makes it a promising strategy for general cancer screening. Current research on cfDNA-based, non-invasive cancer detection approaches falls into two classes: the development of biomarkers for a single specific cancer type; and the characterization of circulating tumor DNA (ctDNA) for general cancer detection, without trying to predict specific cancer types.
In recent years, several studies have reported plasma methylation biomarkers for different types of cancers [9, 10, 11, 12, 13, 14, 15]. Usually, the differentially methylated marker genes are identified by comparing methylation profile data from patients with a certain cancer type to healthy controls. However, these specific biomarkers are of limited use for general cancer screening. Ideally, as a non-invasive early screening tool, a liquid biopsy test should be able to detect many types of cancers and provide tumor location information for further specific clinical investigation.
Several approaches have recently been proposed for non-invasive universal cancer detection. These methods do not rely on detecting biomarkers specific to certain tumor types. Instead, they utilize properties of ctDNA that are common to various cancer types, such as copy number aberration (CNA) [16, 17, 18, 19], pervasive hypomethylation , and DNA integrity [16, 20]. None of these methods can predict the tissue of origin after the detection of ctDNA. The nature of the liquid biopsy introduces a new challenge, in that the cancer type can remain unknown even when there is strong signal of tumor-derived DNA fragments in the blood. Hence, a positive result from a liquid biopsy would call for comprehensive follow-up investigations using clinical, analytical, and radiological tools to identify the tumor location. Considering that non-invasive screening is usually the first step of cancer diagnosis, and could be associated with a fair ratio of false positives, such follow-up would be likely to increase the burden on the medical care system. A few recent studies have proposed using cfDNA methylation [21, 22] or nucleosome footprinting  to partially alleviate this problem. For example, Sun et al.  estimated the proportions of cfDNAs contributed by different tissues and showed that an abnormally high proportion of cfDNA from a specific tissue can indicate the possibility of a tumor in that tissue. Their approach, though promising, has not been developed into a systematic method capable of supporting clinical diagnosis applications. Lehmann-Werman et al.  tested the same rationale to diagnose pancreatic cancer, but fewer than 50% of the pancreatic cancer patients demonstrated a substantial excess of pancreas-originated cfDNA fragments compared with healthy subjects. Snyder et al.  pioneered an approach of using nucleosome footprinting to predict the tissue of origin of the cfDNA, but its power in cancer diagnosis has not been demonstrated because only five plasma samples with high ctDNA burden were selected for testing from 44 late-stage cancer patients, and less than one half had their cancer types correctly predicted.
In summary, no existing cfDNA-based method can simultaneously detect cancer and predict its tissue of origin. We are therefore proposing a novel method, CancerLocator, that simultaneously infers the proportion and tissue of origin of ctDNA in a blood sample using genome-wide DNA methylation data. As shown in Fig. 1, from the vast amount of The Cancer Genome Atlas (TCGA) DNA methylation data, we first learn the informative features of different cancer types. We then model the plasma cfDNAs in cancer patients as a mixture of normal cfDNAs and ctDNAs. Finally, given the genome-wide methylation profile derived from the cfDNA sample of an unknown patient, CancerLocator uses the informative features to estimate the fraction of ctDNAs in the plasma and the likelihood that the detected ctDNAs come from each tumor type. Based on those likelihoods, CancerLocator makes the final decision on whether the patient has tumors and, if yes, the locations of the primary tumor.