Caner-specific CTCF binding sites


About the project

We systematically analyzed 771 high-quality human CTCF ChIP-seq datasets (with at least 2000 peaks) from the public domain. These datasets cover over 200 human cell types, including normal tissues and multiple cancer types. We collectively identified 688,429 distinct CTCF binding sites by merging shared peaks called from each dataset.

We assigned an occupancy score to each CTCF site by tallying the ChIP-seq datasets exhibiting peaks within the site. We obtained 285,467 high-confidence CTCF binding sites with occupancy score ≥ 3, and we identified 22,097 constitutive CTCF binding sites, which were defined as binding present in at least 80% of all 771 datasets determined by an empirical model.

We identified cancer-specific CTCF binding patterns in six cancer types, including T-cell acute lymphoblastic leukemia (T-ALL), acute myeloid leukemia (AML), breast cancer (BRCA), colorectal cancer (CRC), lung cancer (LUAD) and prostate cancer (PRAD). We find that gain of CTCF binding in cancer associates with increased chromatin interactions and cancer-specific gene activation, while loss of CTCF binding occurred at promoters of genes present with lower expression in cancer compared to normal cells. Our results substantiate CTCF binding alteration as a functional epigenomic signature of cancer.


Citation

If you use any data from this website, please cite:

Cancer-specific CTCF binding facilitates oncogenic transcriptional dysregulation
Celestia Fang*, Zhenjia Wang*, Cuijuan Han, Stephanie L. Safgren, Kathryn A. Helmin, Emmalee R. Adelman, Valentina Serafin, Giuseppe Basso, Kyle P. Eagen, Alexandre Gaspar-Maia, Maria E. Figueroa, Benjamin D. Singer, Aakrosh Ratan, Panagiotis Ntziachristos#, Chongzhi Zang#
Genome Biology 21, 247 (2020)



Download

  • Union of all CTCF binding sites (688,429)
  • BED format with chrom, start, end, id (hg38)
    CSV format with extra annotation (CTCF sequence motif, occupancy score, genomic annotation, etc.)


  • High-confidence CTCF binding sites (285,467)
  • BED format with chrom, start, end, id (hg38)
    CSV format with extra annotation (CTCF sequence motif, occupancy score, genomic annotation, etc.)


  • Constitutive CTCF binding sites (22,097)
  • BED format with chrom, start, end, id (hg38)
    CSV format with extra annotation (CTCF sequence motif, occupancy score, genomic annotation, etc.)


  • Cancer-specific CTCF binding sites
  • T-ALL gained    T-ALL lost
    AML gained     AML lost
    BRCA gained    BRCA lost
    CRC gained      CRC lost
    LUAD gained    LUAD lost
    PRAD gained    PRAD lost


  • Other supplementary data
  • List of collected CTCF ChIP-seq datasets
    Lists of CTCF ChIP-seq datasets for identification of cancer-specific CTCF binding sites




  • Zang lab

  • Ntziachristos lab

  • Source code at GitHub


  • Last modified: December 15, 2020