1932

Abstract

Spatiotemporal control of gene expression during development requires orchestrated activities of numerous enhancers, which are -regulatory DNA sequences that, when bound by transcription factors, support selective activation or repression of associated genes. Proper activation of enhancers is critical during embryonic development, adult tissue homeostasis, and regeneration, and inappropriate enhancer activity is often associated with pathological conditions such as cancer. Multiple consortia [e.g., the Encyclopedia of DNA Elements (ENCODE) Consortium and National Institutes of Health Roadmap Epigenomics Mapping Consortium] and independent investigators have mapped putative regulatory regions in a large number of cell types and tissues, but the sequence determinants of cell-specific enhancers are not yet fully understood. Machine learning approaches trained on large sets of these regulatory regions can identify core transcription factor binding sites and generate quantitative predictions of enhancer activity and the impact of sequence variants on activity. Here, we review these computational methods in the context of enhancer prediction and gene regulatory network models specifying cell fate.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-genom-121719-010946
2020-08-31
2024-03-28
Loading full text...

Full text loading...

/deliver/fulltext/genom/21/1/annurev-genom-121719-010946.html?itemId=/content/journals/10.1146/annurev-genom-121719-010946&mimeType=html&fmt=ahah

Literature Cited

  1. 1. 
    Agius P, Arvey A, Chang W, Noble WS, Leslie C 2010. High resolution models of transcription factor-DNA affinities improve in vitro and in vivo binding predictions. PLOS Comput. Biol. 6:e1000916
    [Google Scholar]
  2. 2. 
    Alexander J, Stainier DYR. 1999. A molecular pathway leading to endoderm formation in zebrafish. Curr. Biol. 9:1147–57
    [Google Scholar]
  3. 3. 
    Alipanahi B, Delong A, Weirauch MT, Frey BJ 2015. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33:831–38
    [Google Scholar]
  4. 4. 
    Allis CD, Jenuwein T. 2016. The molecular hallmarks of epigenetic control. Nat. Rev. Genet. 17:487–500
    [Google Scholar]
  5. 5. 
    Arvey A, Agius P, Noble WS, Leslie C 2012. Sequence and chromatin determinants of cell-type-specific transcription factor binding. Genome Res 22:1723–34
    [Google Scholar]
  6. 6. 
    Beer MA. 2017. Predicting enhancer activity and variant impact using gkm-SVM. Hum. Mutat. 38:1251–58
    [Google Scholar]
  7. 7. 
    Beer MA, Tavazoie S. 2004. Predicting gene expression from sequence. Cell 117:185–98
    [Google Scholar]
  8. 8. 
    Bernstein BE, Meissner A, Lander ES 2007. The mammalian epigenome. Cell 128:669–81
    [Google Scholar]
  9. 9. 
    Boyle AP, Davis S, Shulha HP, Meltzer P, Margulies EH et al. 2008. High-resolution mapping and characterization of open chromatin across the genome. Cell 132:311–22
    [Google Scholar]
  10. 10. 
    Boyle EA, Li YI, Pritchard JK 2017. An expanded view of complex traits: from polygenic to omnigenic. Cell 169:1177–86
    [Google Scholar]
  11. 11. 
    Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ 2013. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10:1213–18
    [Google Scholar]
  12. 12. 
    Cao F, Fullwood MJ. 2019. Inflated performance measures in enhancer-promoter interaction-prediction methods. Nat. Genet. 51:1196–98
    [Google Scholar]
  13. 13. 
    Chakravarti A, Turner TN. 2016. Revealing rate‐limiting steps in complex disease biology: the crucial importance of studying rare, extreme‐phenotype families. BioEssays 38:578–86
    [Google Scholar]
  14. 14. 
    Conlon FL, Barth KS, Robertson EJ 1991. A novel retrovirally induced embryonic lethal mutation in the mouse: assessment of the developmental fate of embryonic stem cells homozygous for the 413.d proviral integration. Development 111:969–81
    [Google Scholar]
  15. 15. 
    Crawford GE, Holt IE, Mullikin JC, Tai D, Natl. Inst. Health Intramur. Seq. Cent., et al. 2004. Identifying gene regulatory elements by genome-wide recovery of DNase hypersensitive sites. PNAS 101:992–97
    [Google Scholar]
  16. 16. 
    Creyghton MP, Cheng AW, Welstead GG, Kooistra T, Carey BW et al. 2010. Histone H3K27ac separates active from poised enhancers and predicts developmental state. PNAS 107:21931–36
    [Google Scholar]
  17. 17. 
    D'Amour KA, Agulnick AD, Eliazer S, Kelly OG, Kroon E, Baetge EE 2005. Efficient differentiation of human embryonic stem cells to definitive endoderm. Nat. Biotechnol. 23:1534–41
    [Google Scholar]
  18. 18. 
    Davidson EH. 2010. The Regulatory Genome: Gene Regulatory Networks in Development and Evolution Burlington, MA: Academic
  19. 19. 
    Degner JF, Pai AA, Pique-Regi R, Veyrieras J-B, Gaffney DJ et al. 2012. DNaseI sensitivity QTLs are a major determinant of human expression variation. Nature 482:390–94
    [Google Scholar]
  20. 20. 
    ENCODE Consort 2012. An integrated encyclopedia of DNA elements in the human genome. Nature 489:57–74
    [Google Scholar]
  21. 21. 
    Feldman B, Gates MA, Egan ES, Dougan ST, Rennebeck G et al. 1998. Zebrafish organizer development and germ-layer formation require nodal-related signals. Nature 395:181–85
    [Google Scholar]
  22. 22. 
    Fletez-Brant C, Lee D, McCallion AS, Beer MA 2013. kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets. Nucl. Acids Res. 41:W544–56
    [Google Scholar]
  23. 23. 
    François P, Hakim V. 2004. Design of genetic networks with specified functions by evolution in silico. PNAS 101:580–85
    [Google Scholar]
  24. 24. 
    Gate RE, Cheng CS, Aiden AP, Siba A, Tabaka M et al. 2018. Genetic determinants of co-accessible chromatin regions in activated T cells across humans. Nat. Genet. 50:1140–50
    [Google Scholar]
  25. 25. 
    Ghandi M, Lee D, Mohammad-Noori M, Beer MA 2014. Enhanced regulatory sequence prediction using gapped k-mer features. PLOS Comput. Biol. 10:e1003711
    [Google Scholar]
  26. 26. 
    Ghandi M, Mohammad-Noori M, Beer MA 2014. Robust k-mer frequency estimation using gapped k-mers. J. Math. Biol. 69:469–500
    [Google Scholar]
  27. 27. 
    Ghandi M, Mohammad-Noori M, Ghareghani N, Lee D, Garraway L, Beer MA 2016. gkmSVM: an R package for gapped-kmer SVM. Bioinformatics 32:2205–7
    [Google Scholar]
  28. 28. 
    Gillespie DT. 2000. The chemical Langevin equation. J. Chem. Phys. 113:297–306
    [Google Scholar]
  29. 29. 
    Gorkin DU, Lee D, Reed X, Fletez-Brant C, Bessling SL et al. 2012. Integration of ChIP-seq and machine learning reveals enhancers and a predictive regulatory sequence vocabulary in melanocytes. Genome Res 22:2290–301
    [Google Scholar]
  30. 30. 
    Gusev A, Lee SH, Trynka G, Finucane H, Vilhjálmsson BJ et al. 2014. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 95:535–52
    [Google Scholar]
  31. 31. 
    Heintzman ND, Stuart RK, Hon G, Fu Y, Ching CW et al. 2007. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat. Genet. 39:311–18
    [Google Scholar]
  32. 32. 
    Huang Q, Whitington T, Gao P, Lindberg JF, Yang Y et al. 2014. A prostate cancer susceptibility allele at 6q22 increases RFX6 expression by modulating HOXB13 chromatin binding. Nat. Genet. 46:126–35
    [Google Scholar]
  33. 33. 
    Huang S, Guo Y-P, May G, Enver T 2007. Bifurcation dynamics in lineage-commitment in bipotent progenitor cells. Dev. Biol. 305:695–713
    [Google Scholar]
  34. 34. 
    Inoue F, Kircher M, Martin B, Cooper GM, Witten DM et al. 2017. A systematic comparison reveals substantial differences in chromosomal versus episomal encoding of enhancer activity. Genome Res 27:38–52
    [Google Scholar]
  35. 35. 
    Kanai-Azuma M, Kanai Y, Gad JM, Tajima Y, Taya C et al. 2002. Depletion of definitive gut endoderm in Sox17-null mutant mice. Development 129:2367–79
    [Google Scholar]
  36. 36. 
    Kelley DR, Snoek J, Rinn JL 2016. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res 26:990–99
    [Google Scholar]
  37. 37. 
    Kreimer A, Zeng H, Edwards MD, Guo Y, Tian K et al. 2017. Predicting gene expression in massively parallel reporter assays: a comparative study. Hum. Mutat. 38:1240–50
    [Google Scholar]
  38. 38. 
    Kubo A, Shinozaki K, Shannon JM, Kouskoff V, Kennedy M et al. 2004. Development of definitive endoderm from embryonic stem cells in culture. Development 131:1651–62
    [Google Scholar]
  39. 39. 
    Kwasnieski JC, Fiore C, Chaudhari HG, Cohen BA 2014. High-throughput functional testing of ENCODE segmentation predictions. Genome Res 24:1595–602
    [Google Scholar]
  40. 40. 
    Lee D. 2016. LS-GKM: a new gkm-SVM for large-scale datasets. Bioinformatics 32:2196–98
    [Google Scholar]
  41. 41. 
    Lee D, Beer MA. 2014. Mammalian enhancer prediction. Genome Analysis: Current Procedures and Applications MS Poptsova 101–20 Norfolk, UK: Caister Acad.
    [Google Scholar]
  42. 42. 
    Lee D, Gorkin DU, Baker M, Strober BJ, Asoni AL et al. 2015. A method to predict the impact of regulatory variants from DNA sequence. Nat. Genet. 47:955–61
    [Google Scholar]
  43. 43. 
    Lee D, Kapoor A, Safi A, Song L, Halushka MK et al. 2018. Human cardiac cis-regulatory elements, their cognate transcription factors, and regulatory DNA sequence variants. Genome Res 28:1577–88
    [Google Scholar]
  44. 44. 
    Lee D, Karchin R, Beer MA 2011. Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res 21:2167–80
    [Google Scholar]
  45. 45. 
    Lee K, Cho H, Rickert RW, Li QV, Pulecio J et al. 2019. FOXA2 is required for enhancer priming during pancreatic differentiation. Cell Rep 28:382–93.e7
    [Google Scholar]
  46. 46. 
    Li QV, Dixon G, Verma N, Rosen BP, Gordillo M et al. 2019. Genome-scale screens identify JNK-JUN signaling as a barrier for pluripotency exit and endoderm differentiation. Nat. Genet. 51:999–1010
    [Google Scholar]
  47. 47. 
    Liu X, Li YI, Pritchard JK 2019. Trans effects on gene expression can drive omnigenic inheritance. Cell 177:1022–34.e6
    [Google Scholar]
  48. 48. 
    Loh KM, Ang LT, Zhang J, Kumar V, Ang J et al. 2014. Efficient endoderm induction from human pluripotent stem cells by logically directing signals controlling lineage bifurcations. Cell Stem Cell 14:237–52
    [Google Scholar]
  49. 49. 
    Manolio TA. 2010. Genomewide association studies and assessment of the risk of disease. N. Engl. J. Med. 363:166–76
    [Google Scholar]
  50. 50. 
    Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA et al. 2009. Finding the missing heritability of complex diseases. Nature 461:747–53
    [Google Scholar]
  51. 51. 
    Maston GA, Landt SG, Snyder M, Green MR 2012. Characterization of enhancer function from genome-wide analyses. Annu. Rev. Genom. Hum. Genet. 13:29–57
    [Google Scholar]
  52. 52. 
    Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E et al. 2012. Systematic localization of common disease-associated variation in regulatory DNA. Science 337:1190–95
    [Google Scholar]
  53. 53. 
    McClymont SA, Hook PW, Soto AI, Reed X, Law WD et al. 2018. Parkinson-associated SNCA enhancer variants revealed by open chromatin in mouse dopamine neurons. Am. J. Hum. Genet. 103:874–92
    [Google Scholar]
  54. 54. 
    McQuarrie DA. 1967. Stochastic approach to chemical kinetics. J. Appl. Probab. 4:413–78
    [Google Scholar]
  55. 55. 
    Miguel-Escalada I, Bonàs-Guarch S, Cebola I, Ponsa-Cobas J, Mendieta-Esteban J et al. 2019. Human pancreatic islet three-dimensional chromatin architecture provides insights into the genetics of type 2 diabetes. Nat. Genet. 51:1137–48
    [Google Scholar]
  56. 56. 
    Mo A, Luo C, Davis FP, Mukamel EA, Henry GL et al. 2016. Epigenomic landscapes of retinal rods and cones. eLife 5:e11613
    [Google Scholar]
  57. 57. 
    Moris N, Pina C, Arias AM 2016. Transition states and cell fate decisions in epigenetic landscapes. Nat. Rev. Genet. 17:693–703
    [Google Scholar]
  58. 58. 
    Mullen AC, Orlando DA, Newman JJ, Lovén J, Kumar RM et al. 2011. Master transcription factors determine cell-type-specific responses to TGF-β signaling. Cell 147:565–76
    [Google Scholar]
  59. 59. 
    Patwardhan RP, Hiatt JB, Witten DM, Kim MJ, Smith RP et al. 2012. Massively parallel functional dissection of mammalian enhancers in vivo. Nat. Biotechnol. 30:265–70
    [Google Scholar]
  60. 60. 
    Rada-Iglesias A, Bajpai R, Swigut T, Brugmann SA, Flynn RA, Wysocka J 2011. A unique chromatin signature uncovers early developmental enhancers in humans. Nature 470:279–83
    [Google Scholar]
  61. 61. 
    Rivera CM, Ren B. 2013. Mapping human epigenomes. Cell 155:39–55
    [Google Scholar]
  62. 62. 
    Roadmap Epigenom. Consort., Kundaje A, Meuleman W, Ernst J, Bilenky M et al. 2015. Integrative analysis of 111 reference human epigenomes. Nature 518:317–30
    [Google Scholar]
  63. 63. 
    Robertson EJ. 2014. Dose-dependent Nodal/Smad signals pattern the early mouse embryo. Semin. Cell Dev. Biol. 32:73–79
    [Google Scholar]
  64. 64. 
    Sabo PJ, Hawrylycz M, Wallace JC, Humbert R, Yu M et al. 2004. Discovery of functional noncoding elements by digital analysis of chromatin structure. PNAS 101:16837–42
    [Google Scholar]
  65. 65. 
    Sabo PJ, Kuehn MS, Thurman R, Johnson BE, Johnson EM et al. 2006. Genome-scale mapping of DNase I sensitivity in vivo using tiling DNA microarrays. Nat. Methods 3:511–18
    [Google Scholar]
  66. 66. 
    Setty M, Leslie CS. 2015. SeqGL identifies context-dependent binding signals in genome-wide regulatory element maps. PLOS Comput. Biol. 11:e1004271
    [Google Scholar]
  67. 67. 
    Shigaki D, Adato O, Adhikar AN, Dong S, Hawkins-Hooker A et al. 2019. Integration of multiple epigenomic marks improves prediction of variant impact in saturation mutagenesis reporter assay. Hum. Mutat. 40:1280–91
    [Google Scholar]
  68. 68. 
    Takata R, Akamatsu S, Kubo M, Takahashi A, Hosono N et al. 2010. Genome-wide association study identifies five new susceptibility loci for prostate cancer in the Japanese population. Nat. Genet. 42:751–54
    [Google Scholar]
  69. 69. 
    Teo AKK, Arnold SJ, Trotter MWB, Brown S, Ang LT et al. 2011. Pluripotency factors regulate definitive endoderm specification through eomesodermin. Genes Dev 25:238–50
    [Google Scholar]
  70. 70. 
    Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT et al. 2012. The accessible chromatin landscape of the human genome. Nature 489:75–82
    [Google Scholar]
  71. 71. 
    Thurner M, van de Bunt M, Torres JM, Mahajan A, Nylander V et al. 2018. Integration of human pancreatic islet genomic data refines regulatory mechanisms at type 2 diabetes susceptibility loci. eLife 7:e31977
    [Google Scholar]
  72. 72. 
    Tsankov AM, Gu H, Akopian V, Ziller MJ, Donaghey J et al. 2015. Transcription factor binding dynamics during human ES cell differentiation. Nature 518:344–49
    [Google Scholar]
  73. 73. 
    Wang J, Zhuang J, Iyer S, Lin X, Whitfield TW et al. 2012. Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res 22:1798–812
    [Google Scholar]
  74. 74. 
    Welter D, MacArthur J, Morales J, Burdett T, Hall P et al. 2014. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucl. Acids Res. 42:D1001–6
    [Google Scholar]
  75. 75. 
    Whalen S, Truty RM, Pollard KS 2016. Enhancer-promoter interactions are encoded by complex genomic signatures on looping chromatin. Nat. Genet. 48:488–96
    [Google Scholar]
  76. 76. 
    Xi W, Beer MA. 2018. Local epigenomic state cannot discriminate interacting and non-interacting enhancer-promoter pairs with high accuracy. PLOS Comput. Biol. 14:e1006625
    [Google Scholar]
  77. 77. 
    Yue F, Cheng Y, Breschi A, Vierstra J, Wu W et al. 2014. A comparative encyclopedia of DNA elements in the mouse genome. Nature 515:355–64
    [Google Scholar]
  78. 78. 
    Zeng H, Edwards MD, Liu G, Gifford DK 2016. Convolutional neural network architectures for predicting DNA-protein binding. Bioinformatics 32:i121–27
    [Google Scholar]
  79. 79. 
    Zhou J, Troyanskaya OG. 2015. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12:931–34
    [Google Scholar]
  80. 80. 
    Zorn AM, Wells JM. 2009. Vertebrate endoderm development and organ formation. Annu. Rev. Cell Dev. Biol. 25:221–51
    [Google Scholar]
/content/journals/10.1146/annurev-genom-121719-010946
Loading
/content/journals/10.1146/annurev-genom-121719-010946
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error