The research in the lab focuses on two aspects: 1) developing quantitative models and computational methods for analyzing high-throughput data generated from emerging genomics technologies; and 2) using innovative computational and data science approaches to study epigenetics and transcriptional regulation of gene expression in mammalian cell systems and human diseases such as cancer.
How gene expression is regulated in chromatin is a fundamental question in molecular biology. The transcription program is a major determinant of cell identity; transcriptional regulation is involved in many biological processes and human diseases. Advanced genomics technologies such as high-throughput sequencing, single-cell and spatial omics assays enable us to obtain massive data measuring the dynamic patterns of numerous factors and elements in the genome that affect chromatin states and gene regulation. We leverage big data and combine computational and experimental research at the intersection of functional genomics, epigenetics, and cancer biology. Several ongoing research directions include:
1. Bioinformatics methods for emerging omics technologies
Modern development of life sciences has been accelerated by new technologies. Innovative analytics methods are essential for converting high-throughput experimental data into scientific knowledge. We are interested in developing innovative statistical models and algorithms for analyzing data from emerging genomics technologies. As a pioneer in next-generation sequencing (NGS) bioinformatics, we developed SICER (Bioinformatics 2009), one of the most widely used methods for ChIP-seq data analysis. We are currently developing new methods for unbiased analysis of data from epigenomics (ATAC-seq, CUT&RUN, CUT&Tag, etc.), single-cell multi-omics, and spatial omics techniques to study gene regulation (Nature Commun 2022).
2. Machine learning methods for regulatory factor prediction and multi-omics integration
Transcriptional regulators (TRs, including transcription factors and chromatin regulators) are key players in transcriptional regulation. Leveraging publicly available ChIP-seq data, we developed a series of machine learning-based computational methods, including MARGE (Genome Res 2016), BART (Bioinformatics 2018), BARTweb (NAR Genom Bioinform 2021), and BART3D (Bioinformatics 2021), for predicting cis-regulatory profiles and functional TRs from various input data types. Integrating public omics data with the Cancer Genome Atlas (TCGA), we curated BART Cancer (NAR Cancer 2021) for modeling transcription factor activities in TCGA cancer types. We are currently developing new methods specifically for single-cell multi-omics data, and will further develop a general framework using advanced machine learning for multi-omics integration and regulatory network prediction.
3. Data-inspired modeling for functional epigenetics and transcriptional condensation
Data-driven discovery has become a new paradigm of biological research. By modeling massive data available in the public domain, we can find new patterns and new relationships in biological entities that are usually unseen from individual datasets. Integrating thousands of public omics datasets, we recently identified a cancer-specific binding pattern of CTCF, an important DNA-binding protein, and characterized its function in facilitating oncogenic transcriptional activation (Genome Biol 2020). Inspired by emerging evidence of phase separation phenomena in gene regulation (e.g., Nature 2021), we will develop computational models to characterize phase-separated transcriptional condensation events from multi-omics data, with the ultimate goal of better understanding molecular mechanisms of transcriptional regulation.
Collaborations
Computational biology is an interdisciplinary science. It cannot thrive without close collaboration between researchers with different backgrounds and expertises. We always commit to collaborations and team science with experimental biologists, clinicians, as well as statisticians, computer scientists, mathematicians, and physicists. We study functional epigenetics and transcriptional regulation in a variety of biological systems and human diseases, including T-cell immunity, malignant peripheral nerve sheath tumor (MPNST), acute myeloid leukemia (AML), T-cell acute lymphoblastic leukemia (T-ALL), colorectal cancer, coronary artery disease, development, and environmental health, etc.
Our collaborators include:
Myles Brown, Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School
Suresh Cuddapah, Department of Environmental Medicine, New York University
Anindya Dutta, Department of Genetics, University of Alabama at Birmingham