1932

Abstract

Over the last decade, a rich variety of massively parallel assays have revolutionized our understanding of how biological sequences encode quantitative molecular phenotypes. These assays include deep mutational scanning, high-throughput SELEX, and massively parallel reporter assays. Here, we review these experimental methods and how the data they produce can be used to quantitatively model sequence–function relationships. In doing so, we touch on a diverse range of topics, including the identification of clinically relevant genomic variants, the modeling of transcription factor binding to DNA, the functional and evolutionary landscapes of proteins, and -regulatory mechanisms in both transcription and mRNA splicing. We further describe a unified conceptual framework and a core set of mathematical modeling strategies that studies in these diverse areas can make use of. Finally, we highlight key aspects of experimental design and mathematical modeling that are important for the results of such studies to be interpretable and reproducible.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-genom-083118-014845
2019-08-31
2024-05-27
Loading full text...

Full text loading...

/deliver/fulltext/genom/20/1/annurev-genom-083118-014845.html?itemId=/content/journals/10.1146/annurev-genom-083118-014845&mimeType=html&fmt=ahah

Literature Cited

  1. 1.
    Aakre CD, Herrou J, Phung TN, Perchuk BS, Crosson S, Laub MT 2015. Evolving new protein-protein interaction specificity through promiscuous intermediates. Cell 163:594–606
    [Google Scholar]
  2. 2.
    Adams RM, Mora T, Walczak AM, Kinney JB 2016. Measuring the sequence-affinity landscape of antibodies with massively parallel titration curves. eLife 5:e23156
    [Google Scholar]
  3. 3.
    Alipanahi B, Delong A, Weirauch MT, Frey BJ 2015. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33:831–38
    [Google Scholar]
  4. 4.
    Araya CL, Fowler DM, Chen W, Muniez I, Kelly JW, Fields S 2012. A fundamental protein property, thermodynamic stability, revealed solely from large-scale measurements of protein function. PNAS 109:16858–63
    [Google Scholar]
  5. 5.
    Arnold CD, Gerlach D, Stelzer C, Boryn LM, Rath M, Stark A 2013. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science 339:1074–77
    [Google Scholar]
  6. 6.
    Arnosti D, Kulkarni M 2005. Transcriptional enhancers: intelligent enhanceosomes or flexible billboards. J. Cell. Biochem. 94:890–98
    [Google Scholar]
  7. 7.
    Ashenberg O, Gong LI, Bloom JD 2013. Mutational effects on stability are largely conserved during protein evolution. PNAS 110:21071–76
    [Google Scholar]
  8. 8.
    Atwal GS, Kinney JB 2016. Learning quantitative sequence–function relationships from massively parallel experiments. J. Stat. Phys. 162:1203–43
    [Google Scholar]
  9. 9.
    Baeza-Centurion P, Miñana B, Schmiedel JM, Valcárcel J, Lehner B 2019. Combinatorial genetics reveals a scaling law for the effects of mutations on splicing. Cell 17654963.e23
    [Google Scholar]
  10. 10.
    Bailey T, Elkan C 1995. Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Mach. Learn. 21:51–80
    [Google Scholar]
  11. 11.
    Bank C, Matuszewski S, Hietpas RT, Jensen JD 2016. On the (un)predictability of a large intragenic fitness landscape. PNAS 113:14085–90
    [Google Scholar]
  12. 12.
    Belliveau NM, Barnes SL, Ireland WT, Jones DL, Sweredoski MJ et al. 2018. Systematic approach for dissecting the molecular mechanisms of transcriptional regulation in bacteria. PNAS 115:E4796–805
    [Google Scholar]
  13. 13.
    Berg O, von Hippel P 1987. Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. J. Mol. Biol. 193:723–50
    [Google Scholar]
  14. 14.
    Berg O, von Hippel P 1988. Selection of DNA binding sites by regulatory proteins. II. The binding specificity of cyclic AMP receptor protein to recognition sites. J. Mol. Biol. 200:709–23
    [Google Scholar]
  15. 15.
    Berger M, Philippakis A, Qureshi A, He F, Estep P, Bulyk M 2006. Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat. Biotechnol. 24:1429–35
    [Google Scholar]
  16. 16.
    Bintu L, Buchler NE, Garcia HG, Gerland U, Hwa T et al. 2005. Transcriptional regulation by the numbers: models. Curr. Opin. Genet. Dev. 15:116–24
    [Google Scholar]
  17. 17.
    Bloom JD 2014. An experimentally determined evolutionary model dramatically improves phylogenetic fit. Mol. Biol. Evol. 31:1956–78
    [Google Scholar]
  18. 18.
    Bloom JD, Raval A, Wilke CO 2007. Thermodynamics of neutral protein evolution. Genetics 175:255–66
    [Google Scholar]
  19. 19.
    Bloom JD, Silberg JJ, Wilke CO, Drummond DA, Adami C, Arnold FH 2005. Thermodynamic prediction of protein neutrality. PNAS 102:606–11
    [Google Scholar]
  20. 20.
    Browning DF, Busby SJW 2016. Local and global regulation of transcription initiation in bacteria. Nat. Rev. Microbiol. 14:638–50
    [Google Scholar]
  21. 21.
    Castel SE, Cervera A, Mohammadi P, Aguet F, Reverter F et al. 2018. Modified penetrance of coding variants by cis-regulatory variation contributes to disease risk. Nat. Genet. 50:1327–34
    [Google Scholar]
  22. 22.
    Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A et al. 2016. A survey of best practices for RNA-seq data analysis. Genome Biol. 17:13
    [Google Scholar]
  23. 23.
    Cooper DN, Krawczak M, Polychronakos C, Tyler-Smith C, Kehrer-Sawatzki H 2013. Where genotype is not predictive of phenotype: towards an understanding of the molecular basis of reduced penetrance in human inherited disease. Hum. Genet. 132:1077–130
    [Google Scholar]
  24. 24.
    Cox RS, Surette MG, Elowitz MB 2007. Programming gene expression with combinatorial promoters. Mol. Syst. Biol. 3:145
    [Google Scholar]
  25. 25.
    Crooks GE, Hon G, Chandonia JM, Brenner SE 2004. WebLogo: a sequence logo generator. Genome Res. 14:1188–90
    [Google Scholar]
  26. 26.
    Cuperus JT, Groves B, Kuchina A, Rosenberg AB, Jojic N et al. 2017. Deep learning of the regulatory grammar of yeast 5′ untranslated regions from 500,000 random sequences. Genome Res. 27:2015–24
    [Google Scholar]
  27. 27.
    Dantas Machado AC, Zhou T, Rao S, Goel P, Rastogi C et al. 2015. Evolving insights on how cytosine methylation affects protein-DNA binding. Brief. Funct. Genom. 14:61–73
    [Google Scholar]
  28. 28.
    de Boer C, Sadeh R, Friedman N, Regev A 2018.Deciphering eukaryotic cis-regulatory logic with 100 million random promoters. bioRxiv 224907. https://doi.org/10.1101/224907
    [Crossref]
  29. 29.
    Domingo J, Diss G, Lehner B 2018. Pairwise and higher-order genetic interactions during the evolution of a tRNA. Nature 558:117–21
    [Google Scholar]
  30. 30.
    Doniger S, Fay J 2007. Frequent gain and loss of functional transcription factor binding sites. PLOS Comput. Biol. 3:e99
    [Google Scholar]
  31. 31.
    Dvir S, Velten L, Sharon E, Zeevi D, Carey LB et al. 2013. Deciphering the rules by which 5′-UTR sequences affect protein expression in yeast. PNAS 110:E2792–801
    [Google Scholar]
  32. 32.
    Elemento O, Slonim N, Tavazoie S 2007. A universal framework for regulatory element discovery across all genomes and data types. Mol. Cell 28:337–50
    [Google Scholar]
  33. 32a.
    Esposito D, Weile J, Shendure J, Starita LM, Papenfuss AT et al. 2019. An open-source platform to distribute and interpret data from multiplexed assays of variant effect. bioRxiv 555797. https://doi.org/10.1101/555797
    [Crossref] [Google Scholar]
  34. 33.
    Feng Y, Zhang Y, Ebright RH 2016. Structural basis of transcription activation. Science 352:1330–33
    [Google Scholar]
  35. 34.
    Findlay GM, Boyle EA, Hause RJ, Klein JC, Shendure J 2014. Saturation editing of genomic regions by multiplex homology-directed repair. Nature 513:120–23
    [Google Scholar]
  36. 35.
    Findlay GM, Daza RM, Martin B, Zhang MD, Leith AP et al. 2018. Accurate classification of BRCA1 variants with saturation genome editing. Nature 562:217–22
    [Google Scholar]
  37. 36.
    Firnberg E, Labonte JW, Gray JJ, Ostermeier M 2014. A comprehensive, high-resolution map of a gene's fitness landscape. Mol. Biol. Evol. 31:1581–92
    [Google Scholar]
  38. 37.
    Firnberg E, Ostermeier M 2012. PFunkel: efficient, expansive, user-defined mutagenesis. PLOS ONE 7:e52031
    [Google Scholar]
  39. 38.
    Foat B, Morozov A, Bussemaker H 2006. Statistical mechanical modeling of genome-wide transcription factor occupancy data by MatrixREDUCE. Bioinformatics 22:e141–49
    [Google Scholar]
  40. 39.
    Forcier TL, Ayaz A, Gill MS, Jones D, Phillips R, Kinney JB 2018. Measuring cis-regulatory energetics in living cells using allelic manifolds. eLife 7:e40618
    [Google Scholar]
  41. 40.
    Forsyth CM, Juan V, Akamatsu Y, DuBridge RB, Doan M et al. 2013. Deep mutational scanning of an antibody against epidermal growth factor receptor using mammalian cell display and massively parallel pyrosequencing. mAbs 5:523–32
    [Google Scholar]
  42. 41.
    Fowler DM, Araya CL, Fleishman SJ, Kellogg EH, Stephany JJ et al. 2010. High-resolution mapping of protein sequence-function relationships. Nat. Methods 7:741–46
    [Google Scholar]
  43. 42.
    Fowler DM, Fields S 2014. Deep mutational scanning: a new style of protein science. Nat. Methods 11:801–7
    [Google Scholar]
  44. 43.
    Gallet R, Cooper TF, Elena SF, Lenormand T 2012. Measuring selection coefficients below : method, questions, and prospects. Genetics 190:175–86
    [Google Scholar]
  45. 44.
    Gertz J, Siggia ED, Cohen BA 2009. Analysis of combinatorial cis-regulation in synthetic and genomic promoters. Nature 457:215–18
    [Google Scholar]
  46. 45.
    Ghandi M, Lee D, Mohammad-Noori M, Beer MA 2014. Enhanced regulatory sequence prediction using gapped k-mer features. PLOS Comput. Biol. 10:e1003711
    [Google Scholar]
  47. 46.
    Goldstein RA 2011. The evolution and evolutionary consequences of marginal thermostability in proteins. Proteins 79:1396–407
    [Google Scholar]
  48. 47.
    Gong LI, Suchard MA, Bloom JD 2013. Stability-mediated epistasis constrains the evolution of an influenza protein. eLife 2:e00631
    [Google Scholar]
  49. 48.
    Goodfellow I, Bengio Y, Courville A 2016. Deep Learning Cambridge, MA: MIT Press
  50. 49.
    Grimm DG, Azencott CA, Aicheler F, Gieraths U, MacArthur DG et al. 2015. The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity. Hum. Mutat. 36:513–23
    [Google Scholar]
  51. 50.
    Guenther UP, Yandek LE, Niland CN, Campbell FE, Anderson D et al. 2013. Hidden specificity in an apparently nonspecific RNA-binding protein. Nature 502:385–88
    [Google Scholar]
  52. 51.
    Halpern A, Bruno W 1998. Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies. Mol. Biol. Evol. 15:910–17
    [Google Scholar]
  53. 52.
    Heumann J, Lapedes A, Stormo G 1994. Neural networks for determining protein specificity and multiple alignment of binding sites. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2:188–94
    [Google Scholar]
  54. 53.
    Hietpas RT, Jensen JD, Bolon DNA 2011. Experimental illumination of a fitness landscape. PNAS 108:7896–901
    [Google Scholar]
  55. 54.
    Hopf TA, Ingraham JB, Poelwijk FJ, Schärfe CPI, Springer M et al. 2017. Mutation effects predicted from sequence co-variation. Nat. Biotechnol. 35:128–35
    [Google Scholar]
  56. 55.
    Inukai S, Kock KH, Bulyk ML 2017. Transcription factor-DNA binding: beyond binding site motifs. Curr. Opin. Genet. Dev. 43:110–19
    [Google Scholar]
  57. 56.
    Ipe J, Swart M, Burgess KS, Skaar TC 2017. High-throughput assays to assess the functional impact of genetic variants: a road towards genomic-driven medicine. Clin. Transl. Sci. 10:67–77
    [Google Scholar]
  58. 57.
    Isakova A, Groux R, Imbeault M, Rainer P, Alpern D et al. 2017. SMiLE-seq identifies binding motifs of single and dimeric transcription factors. Nat. Methods 14:316–22
    [Google Scholar]
  59. 58.
    Jacquier H, Birgy A, Le Nagard H, Mechulam Y, Schmitt E et al. 2013. Capturing the mutational landscape of the beta-lactamase TEM-1. PNAS 110:13067–72
    [Google Scholar]
  60. 59.
    Jiang L, Schlesinger F, Davis CA, Zhang Y, Li R et al. 2011. Synthetic spike-in standards for RNA-seq experiments. Genome Res. 21:1543–51
    [Google Scholar]
  61. 60.
    Johns NI, Gomes ALC, Yim SS, Yang A, Blazejewski T et al. 2018. Metagenomic mining of regulatory elements enables programmable species-selective gene expression. Nat. Methods 15:323–29
    [Google Scholar]
  62. 61.
    Johnson A, Meyer BJ, Ptashne M 1978. Mechanism of action of the cro protein of bacteriophage λ. PNAS 75:1783–87
    [Google Scholar]
  63. 62.
    Jolma A, Yan J, Whitington T, Toivonen J, Nitta KR et al. 2013. DNA-binding specificities of human transcription factors. Cell 152:327–39
    [Google Scholar]
  64. 63.
    Joyce AP, Zhang C, Bradley P, Havranek JJ 2015. Structure-based modeling of protein: DNA specificity. Brief. Funct. Genom. 14:39–49
    [Google Scholar]
  65. 64.
    Judson HF 1996. The Eighth Day of Creation: The Makers of the Revolution in Biology Cold Spring Harbor, NY: Cold Spring Harb. Lab. Press
  66. 65.
    Katsanis N 2016. The continuum of causality in human genetic disorders. Genome Biol. 17:233
    [Google Scholar]
  67. 66.
    Kazan H, Ray D, Chan ET, Hughes TR, Morris Q 2010. RNAcontext: a new method for learning the sequence and structure binding preferences of RNA-binding proteins. PLOS Comput. Biol. 6:e1000832
    [Google Scholar]
  68. 66a.
    Ke S, Shang S, Kalachikov SM, Morozova I, Yu L et al. 2011. Quantitative evaluation of all hexamers as exonic splicing elements. Genome Res. 21:1360–74
    [Google Scholar]
  69. 67.
    Keefe AD, Szostak JW 2001. Functional proteins from a random-sequence library. Nature 410:715–18
    [Google Scholar]
  70. 68.
    Kelley DR, Snoek J, Rinn JL 2016. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26:990–99
    [Google Scholar]
  71. 69.
    Kinney JB, Murugan A, Callan CG, Cox EC 2010. Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence. PNAS 107:9158–63
    [Google Scholar]
  72. 70.
    Kinney JB, Tkacik G, Callan CG 2007. Precise physical models of protein-DNA interaction from high-throughput data. PNAS 104:501–6
    [Google Scholar]
  73. 71.
    Kivioja T, Vähärautio A, Karlsson K, Bonke M, Enge M et al. 2012. Counting absolute numbers of molecules using unique molecular identifiers. Nat. Methods 9:72–74
    [Google Scholar]
  74. 72.
    Kondrashov FA, Kondrashov AS 2001. Multidimensional epistasis and the disadvantage of sex. PNAS 98:12089–92
    [Google Scholar]
  75. 73.
    Koo PK, Anand P, Paul SB, Eddy SR 2018.Inferring sequence-structure preferences of RNA-binding proteins with convolutional residual networks. bioRxiv 418459. https://doi.org/10.1101/418459
    [Crossref]
  76. 74.
    Koo PK, Eddy SR 2018.Representation learning of genomic sequence motifs with convolutional neural networks. bioRxiv 362756. https://doi.org/10.1101/362756
    [Crossref]
  77. 75.
    Kowalsky CA, Faber MS, Nath A, Dann HE, Kelly VW et al. 2015. Rapid fine conformational epitope mapping using comprehensive mutagenesis and deep sequencing. J. Biol. Chem. 290:26457–70
    [Google Scholar]
  78. 76.
    Kwasnieski JC, Mogno I, Myers CA, Corbo JC, Cohen BA 2012. Complex effects of nucleotide variants in a mammalian cis-regulatory element. PNAS 109:19498–503
    [Google Scholar]
  79. 77.
    Lai X, Stigliani A, Vachon G, Carles C, Smaczniak C et al. 2019. Building transcription factor binding site models to understand gene regulation in plants. Mol. Plant12:P743–63
    [Google Scholar]
  80. 78.
    Le DD, Shimko TC, Aditham AK, Keys AM, Longwell SA et al. 2018. Comprehensive, high-resolution binding energy landscapes reveal context dependencies of transcription factor binding. PNAS 115:E3702–11
    [Google Scholar]
  81. 79.
    Levo M, Segal E 2014. In pursuit of design principles of regulatory sequences. Nat. Rev. Genet. 15:453–68
    [Google Scholar]
  82. 80.
    Levy RM, Haldane A, Flynn WF 2017. Potts Hamiltonian models of protein co-variation, free energy landscapes, and evolutionary fitness. Curr. Opin. Struct. Biol. 43:55–62
    [Google Scholar]
  83. 81.
    Li C, Qian W, Maclean CJ, Zhang J 2016. The fitness landscape of a tRNA gene. Science 352:837–40
    [Google Scholar]
  84. 82.
    Li F, Salit ML, Levy SF 2018.Unbiased fitness estimation of pooled barcode or amplicon sequencing studies. Cell Syst. 7:521–25.e4
  85. 83.
    Liachko I, Youngblood RA, Keich U, Dunham MJ 2013. High-resolution mapping, characterization, and optimization of autonomously replicating sequences in yeast. Genome Res. 23:698–704
    [Google Scholar]
  86. 84.
    Ligr M, Siddharthan R, Cross F, Siggia ED 2006. Gene expression from random libraries of yeast promoters. Genetics 172:2113–22
    [Google Scholar]
  87. 85.
    Long HK, Prescott SL, Wysocka J 2016. Ever-changing landscapes: transcriptional enhancers in development and evolution. Cell 167:1170–87
    [Google Scholar]
  88. 86.
    Love MI, Huber W, Anders S 2014. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15:550
    [Google Scholar]
  89. 87.
    Lubliner S, Regev I, Lotan-Pompan M, Edelheit S, Weinberger A, Segal E 2015. Core promoter sequence in yeast is a major determinant of expression level. Genome Res. 25:1008–17
    [Google Scholar]
  90. 88.
    Majithia AR, Tsuda B, Agostini M, Gnanapradeepan K, Rice R et al. 2016. Prospective functional classification of all possible missense variants in PPARG. Nat. Genet. 48:1570–75
    [Google Scholar]
  91. 89.
    Manhart M, Morozov AV 2015. Protein folding and binding can emerge as evolutionary spandrels through structural coupling. PNAS 112:1797–802
    [Google Scholar]
  92. 90.
    Maricque BB, Chaudhari HG, Cohen BA 2018. A massively parallel reporter assay dissects the influence of chromatin structure on cis-regulatory activity. Nat. Biotechnol. 37:90–95
    [Google Scholar]
  93. 91.
    Maticzka D, Lange SJ, Costa F, Backofen R 2014. GraphProt: modeling binding preferences of RNA-binding proteins. Genome Biol. 15:R17
    [Google Scholar]
  94. 92.
    Matreyek KA, Starita LM, Stephany JJ, Martin B, Chiasson MA et al. 2018. Multiplex assessment of protein variant abundance by massively parallel sequencing. Nat. Genet. 50:874–82
    [Google Scholar]
  95. 93.
    Matuszewski S, Hildebrandt ME, Ghenu AH, Jensen JD, Bank C 2016. A statistical guide to the design of deep mutational scanning experiments. Genetics 204:77–87
    [Google Scholar]
  96. 94.
    McLaughlin RN, Poelwijk FJ, Raman A, Gosal WS, Ranganathan R 2012. The spatial architecture of protein function and adaptation. Nature 491:138–42
    [Google Scholar]
  97. 95.
    Melnikov A, Murugan A, Zhang X, Tesileanu T, Wang L et al. 2012. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat. Biotechnol. 30:271–77
    [Google Scholar]
  98. 96.
    Miosge LA, Field MA, Sontani Y, Cho V, Johnson S et al. 2015. Comparison of predicted and actual consequences of missense mutations. PNAS 112:E5189–98
    [Google Scholar]
  99. 97.
    Miyazawa S, Jernigan R 1985. Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation. J. Am. Chem. Soc. 18:534–52
    [Google Scholar]
  100. 98.
    Mogno I, Kwasnieski JC, Cohen BA 2013. Massively parallel synthetic promoter assays reveal the in vivo effects of binding site variants. Genome Res. 23:1908–15
    [Google Scholar]
  101. 99.
    Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS et al. 2011. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. PNAS 108:E1293–301
    [Google Scholar]
  102. 100.
    Mustonen V, Kinney JB, Callan CG, Lässig M 2008. Energy-dependent fitness: a quantitative model for the evolution of yeast transcription factor binding sites. PNAS 105:12376–81
    [Google Scholar]
  103. 101.
    Nirenberg M, Leder P, Bernfield M, Brimacombe R, Trupin J et al. 1965. RNA codewords and protein synthesis, VII. On the general nature of the RNA code. PNAS 53:1161–68
    [Google Scholar]
  104. 102.
    Nisthal A, Wang CY, Ary ML, Mayo SL 2018. Protein stability engineering insights revealed by domain-wide comprehensive mutagenesis. bioRxiv 484949 https://doi.org/10.1101/484949
    [Crossref] [Google Scholar]
  105. 103.
    Oikonomou P, Goodarzi H, Tavazoie S 2014. Systematic identification of regulatory elements in conserved 3′ UTRs of human transcripts. Cell Rep. 7:281–92
    [Google Scholar]
  106. 104.
    Olson CA, Wu NC, Sun R 2014. A comprehensive biophysical description of pairwise epistasis throughout an entire protein domain. Curr. Biol. 24:2643–51
    [Google Scholar]
  107. 105.
    Orenstein Y, Wang Y, Berger B 2016. RCK: accurate and efficient inference of sequence- and structure-based protein-RNA binding models from RNAcompete data. Bioinformatics 32:i351–59
    [Google Scholar]
  108. 106.
    Otwinowski J 2018. Biophysical inference of epistasis and the effects of mutations on protein stability and function. Mol. Biol. Evol. 35:2345–54
    [Google Scholar]
  109. 107.
    Otwinowski J, McCandlish DM, Plotkin JB 2018. Inferring the shape of global epistasis. PNAS 115:E7550–58
    [Google Scholar]
  110. 108.
    Panne D 2008. The enhanceosome. Curr. Opin. Struct. Biol. 18:236–42
    [Google Scholar]
  111. 109.
    Panne D, Maniatis T, Harrison SC 2007. An atomic model of the interferon-beta enhanceosome. Cell 129:1111–23
    [Google Scholar]
  112. 110.
    Parkinson G, Wilson C, Gunasekera A, Ebright YW, Ebright RH et al. 1996. Structure of the CAP-DNA complex at 2.5 Å resolution: a complete picture of the protein-DNA interface. J. Mol. Biol. 260:395–408
    [Google Scholar]
  113. 111.
    Patwardhan RP, Hiatt JB, Witten DM, Kim MJ, Smith RP et al. 2012. Massively parallel functional dissection of mammalian enhancers in vivo. Nat. Biotechnol. 30:265–70
    [Google Scholar]
  114. 112.
    Patwardhan RP, Lee C, Litvin O, Young DL, Pe'er D, Shendure J 2009. High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis. Nat. Biotechnol. 27:1173–75
    [Google Scholar]
  115. 113.
    Peterman N, Levine E 2016. Sort-seq under the hood: implications of design choices on large-scale characterization of sequence-function relations. BMC Genom. 17:206
    [Google Scholar]
  116. 114.
    Phillips R, Kondev J, Theriot J, Garcia HG 2012. Physical Biology of the Cell New York: Garland Sci. 2nd ed.
  117. 115.
    Plesa C, Sidore AM, Lubock NB, Zhang D, Kosuri S 2018. Multiplexed gene synthesis in emulsions for exploring protein functional landscapes. Science 359:343–47
    [Google Scholar]
  118. 116.
    Podgornaia AI, Laub MT 2015. Pervasive degeneracy and epistasis in a protein-protein interface. Science 347:673–77
    [Google Scholar]
  119. 117.
    Pribnow D 1975. Nucleotide sequence of an RNA polymerase binding site at an early T7 promoter. PNAS 72:784–88
    [Google Scholar]
  120. 118.
    Puchta O, Cseke B, Czaja H, Tollervey D, Sanguinetti G, Kudla G 2016. Network of epistatic interactions within a yeast snoRNA. Science 352:840–44
    [Google Scholar]
  121. 119.
    Raraigh KS, Han ST, Davis E, Evans TA, Pellicore MJ et al. 2018. Functional assays are essential for interpretation of missense variants associated with variable expressivity. Am. J. Hum. Genet. 102:1062–77
    [Google Scholar]
  122. 120.
    Ray D, Kazan H, Chan ET, Pena-Castillo L, Chaudhry S et al. 2009. Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins. Nat. Biotechnol. 27:667–70
    [Google Scholar]
  123. 121.
    Reich LL, Dutta S, Keating AE 2015. SORTCERY—a high-throughput method to affinity rank peptide ligands. J. Mol. Biol. 427:2135–50
    [Google Scholar]
  124. 122.
    Richards S, Aziz N, Bale S, Bick D, Das S et al. 2015. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17:405–14
    [Google Scholar]
  125. 123.
    Riesselman AJ, Ingraham JB, Marks DS 2018. Deep generative models of genetic variation capture the effects of mutations. Nat. Methods 15:816–22
    [Google Scholar]
  126. 124.
    Riley TR, Lazarovici A, Mann RS, Bussemaker HJ 2015. Building accurate sequence-to-affinity models from high-throughput in vitro protein-DNA binding data using FeatureREDUCE. eLife 4:e06397
    [Google Scholar]
  127. 125.
    Rodenburg RJ 2018. The functional genomics laboratory: functional validation of genetic variants. J. Inherit. Metab. Dis. 41:297–307
    [Google Scholar]
  128. 126.
    Rodrigue N, Philippe H, Lartillot N 2010. Mutation-selection models of coding sequence evolution with site-heterogeneous amino acid fitness profiles. PNAS 107:4629–34
    [Google Scholar]
  129. 127.
    Rohs R, Jin X, West SM, Joshi R, Honig B, Mann RS 2010. Origins of specificity in protein-DNA recognition. Annu. Rev. Biochem. 79:233–69
    [Google Scholar]
  130. 128.
    Rollins NJ, Brock KP, Poelwijk FJ, Stiffler MA et al. 2018.3D protein structure from genetic epistasis experiments. bioRxiv 320721. https://doi.org/10.1101/320721
    [Crossref]
  131. 129.
    Rosenberg AB, Patwardhan RP, Shendure J, Seelig G 2015. Learning the sequence determinants of alternative splicing from millions of random sequences. Cell 163:698–711
    [Google Scholar]
  132. 130.
    Ruan S, Stormo GD 2017. Inherent limitations of probabilistic models for protein-DNA binding specificity. PLOS Comput. Biol. 13:e1005638
    [Google Scholar]
  133. 131.
    Ruan S, Swamidass SJ, Stormo GD 2017. BEESEM: estimation of binding energy models using HT-SELEX data. Bioinformatics 33:2288–95
    [Google Scholar]
  134. 132.
    Rubin AF, Gelman H, Lucas N, Bajjalieh SM, Papenfuss AT et al. 2017. A statistical framework for analyzing deep mutational scanning data. Genome Biol. 18:150
    [Google Scholar]
  135. 133.
    Sailer ZR, Harms MJ 2017. Detecting high-order epistasis in nonlinear genotype-phenotype maps. Genetics 205:1079–88
    [Google Scholar]
  136. 134.
    Salinas VH, Ranganathan R 2018. Coevolution-based inference of amino acid interactions underlying protein function. eLife 7:e34300
    [Google Scholar]
  137. 135.
    Santos-Zavaleta A, Sánchez-Pérez M, Salgado H, Velázquez-Ramírez DA, Gama-Castro S et al. 2018. A unified resource for transcriptional regulation in Escherichia coli K-12 incorporating high-throughput-generated binding data into RegulonDB version 10.0. BMC Biol. 16:91
    [Google Scholar]
  138. 136.
    Sarkisyan KS, Bolotin DA, Meer MV, Usmanova DR, Mishin AS et al. 2016. Local fitness landscape of the green fluorescent protein. Nature 533:397–401
    [Google Scholar]
  139. 137.
    Schmiedel J, Lehner B 2018.Determining protein structures using genetics. bioRxiv 303875. https://doi.org/10.1101/303875
    [Crossref]
  140. 138.
    Schneider TD, Stephens RM 1990. Sequence logos: a new way to display consensus sequences. Nucl. Acids Res. 18:6097–100
    [Google Scholar]
  141. 139.
    Segal E, Fondufe-Mittendorf Y, Chen L, Thastrom A, Field Y et al. 2006. A genomic code for nucleosome positioning. Nature 442:772–78
    [Google Scholar]
  142. 140.
    Shalem O, Sharon E, Lubliner S, Regev I, Lotan-Pompan M et al. 2015. Systematic dissection of the sequence determinants of gene 3′ end mediated expression control. PLOS Genet. 11:e1005147
    [Google Scholar]
  143. 141.
    Sharon E, Chen SAA, Khosla NM, Smith JD, Pritchard JK, Fraser HB 2018.Functional genetic variants revealed by massively parallel precise genome editing. Cell 175:544–57.e16
  144. 142.
    Sharon E, Kalma Y, Sharp A, Raveh-Sadka T, Levo M et al. 2012. Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters. Nat. Biotechnol. 30:521–30
    [Google Scholar]
  145. 143.
    Sharon E, Lubliner S, Segal E 2008. A feature-based approach to modeling protein-DNA interactions. PLOS Comput. Biol. 4:e1000154
    [Google Scholar]
  146. 144.
    Sherman MS, Cohen BA 2012. Thermodynamic state ensemble models of cis-regulation. PLOS Comput. Biol. 8:e1002407
    [Google Scholar]
  147. 145.
    Shine J, Dalgarno L 1975. Determinant of cistron specificity in bacterial ribosomes. Nature 254:34–38
    [Google Scholar]
  148. 146.
    Shrikumar A, Greenside P, Kundaje A 2017.Learning important features through propagating activation differences. In Proceedings of the 34th International Conference on Machine Learning, pp. 3145–53. Proc. Mach. Learn. Res. Vol. 70. N.p.: PMLR
  149. 147.
    Siggers T, Duyzend MH, Reddy J, Khan S, Bulyk ML 2011. Non-DNA-binding cofactors enhance DNA-binding specificity of a transcriptional regulatory complex. Mol. Syst. Biol. 7:555
    [Google Scholar]
  150. 148.
    Slattery M, Riley T, Liu P, Abe N, Gomez-Alcala P et al. 2011. Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins. Cell 147:1270–82
    [Google Scholar]
  151. 149.
    Smith RP, Taher L, Patwardhan RP, Kim MJ, Inoue F et al. 2013. Massively parallel decoding of mammalian regulatory sequences supports a flexible organizational model. Nat. Genet. 45:1021–28
    [Google Scholar]
  152. 150.
    Spitz F, Furlong EEM 2012. Transcription factors: from enhancer binding to developmental control. Nat. Rev. Genet. 13:613–26
    [Google Scholar]
  153. 151.
    Staller MV, Holehouse AS, Swain-Lenz D, Das RK, Pappu RV, Cohen BA 2018.A high-throughput mutational scan of an intrinsically disordered acidic transcriptional activation domain. Cell Syst. 6:444–55.e6
  154. 152.
    Starita LM, Ahituv N, Dunham MJ, Kitzman JO, Roth FP et al. 2017. Variant interpretation: functional assays to the rescue. Am. J. Hum. Genet. 101:315–25
    [Google Scholar]
  155. 153.
    Starita LM, Islam MM, Banerjee T, Adamovich AI, Gullingsrud J et al. 2018. A multiplex homology-directed DNA repair assay reveals the impact of more than 1,000 BRCA1 missense substitution variants on protein function. Am. J. Hum. Genet. 103:498–508
    [Google Scholar]
  156. 154.
    Starita LM, Young DL, Islam M, Kitzman JO, Gullingsrud J et al. 2015. Massively parallel functional analysis of BRCA1 RING domain variants. Genetics 200:413–22
    [Google Scholar]
  157. 155.
    Starr TN, Flynn JM, Mishra P, Bolon DNA, Thornton JW 2018. Pervasive contingency and entrenchment in a billion years of Hsp90 evolution. PNAS 115:4453–58
    [Google Scholar]
  158. 156.
    Starr TN, Picton LK, Thornton JW 2017. Alternative evolutionary histories in the sequence space of an ancient protein. Nature 549:409–13
    [Google Scholar]
  159. 157.
    Starr TN, Thornton JW 2016. Epistasis in protein evolution. Protein Sci. 25:1204–18
    [Google Scholar]
  160. 158.
    Stormo GD 2013. Modeling the specificity of protein-DNA interactions. Quant. Biol. 1:115–30
    [Google Scholar]
  161. 159.
    Stormo GD, Fields D 1998. Specificity, free energy and information content in protein-DNA interactions. Trends Biochem. Sci. 23:109–13
    [Google Scholar]
  162. 160.
    Stormo GD, Zhao Y 2010. Determining the specificity of protein-DNA interactions. Nat. Rev. Genet. 11:751–60
    [Google Scholar]
  163. 161.
    Struhl K, Segal E 2013. Determinants of nucleosome positioning. Nat. Rev. Microbiol. 20:267–73
    [Google Scholar]
  164. 162.
    Sun S, Yang F, Tan G, Costanzo M, Oughtred R et al. 2016. An extended set of yeast-based functional assays accurately identifies human disease mutations. Genome Res. 26:670–80
    [Google Scholar]
  165. 163.
    Tamuri AU, Goldman N, dos Reis M 2014. A penalized-likelihood method to estimate the distribution of selection coefficients from phylogenetic data. Genetics 197:257–71
    [Google Scholar]
  166. 164.
    Tang H, Thomas PD 2016. Tools for predicting the functional impact of nonsynonymous genetic variation. Genetics 203:635–47
    [Google Scholar]
  167. 164a.
    Tareen A, Kinney JB 2019. Logomaker: beautiful sequence logos in Python. bioRxiv 635029. https://doi.org/10.1101/635029
    [Crossref] [Google Scholar]
  168. 165.
    Thyagarajan B, Bloom JD 2014. The inherent mutational tolerance and antigenic evolvability of influenza hemagglutinin. eLife 3:e03300
    [Google Scholar]
  169. 166.
    Tuerk C, Gold L 1990. Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science 249:505–10
    [Google Scholar]
  170. 167.
    Ulirsch JC, Nandakumar SK, Wang L, Giani FC, Zhang X et al. 2016. Systematic functional dissection of common genetic variation affecting red blood cell traits. Cell 165:1530–45
    [Google Scholar]
  171. 168.
    Vvedenskaya IO, Zhang Y, Goldman SR, Valenti A, Visone V et al. 2015. Massively systematic transcript end readout, “MASTER”: transcription start site selection, transcriptional slippage, and transcript yields. Mol. Cell 60:953–65
    [Google Scholar]
  172. 169.
    Weile J, Roth FP 2018. Multiplexed assays of variant effects contribute to a growing genotype-phenotype atlas. Hum. Genet. 17:241–14
    [Google Scholar]
  173. 170.
    Weile J, Sun S, Cote AG, Knapp J, Verby M et al. 2017. A framework for exhaustively mapping functional missense variants. Mol. Syst. Biol. 13:957
    [Google Scholar]
  174. 171.
    Weingarten-Gabbay S, Nir R, Lubliner S, Sharon E, Kalma Y et al. 2019. Systematic interrogation of human promoters. Genome Res. 29:171–83
    [Google Scholar]
  175. 172.
    Weirauch MT, Cote A, Norel R, Annala M, Zhao Y et al. 2013. Evaluation of methods for modeling transcription factor sequence specificity. Nat. Biotechnol. 31:126–34
    [Google Scholar]
  176. 173.
    Wong MS, Kinney JB, Krainer AR 2018.Quantitative activity profile and context dependence of all human 5′ splice sites. Mol. Cell 71:1012–26.e3
  177. 174.
    Wrenbeck EE, Klesmith JR, Stapleton JA, Adeniran A, Tyo KEJ, Whitehead TA 2016. Plasmid-based one-pot saturation mutagenesis. Nat. Methods 13:928–30
    [Google Scholar]
  178. 175.
    Wu NC, Dai L, Olson CA, Lloyd-Smith JO, Sun R 2016. Adaptation in protein fitness landscapes is facilitated by indirect paths. eLife 5:e16965
    [Google Scholar]
  179. 176.
    Wu NC, Olson CA, Sun R 2016. High-throughput identification of protein mutant stability computed from a double mutant fitness landscape. Protein Sci. 25:530–39
    [Google Scholar]
  180. 177.
    Wu NC, Young AP, Al-Mawsawi LQ, Olson CA, Feng J et al. 2014. High-throughput profiling of influenza A virus hemagglutinin gene at single-nucleotide resolution. Sci. Rep. 4:4942
    [Google Scholar]
  181. 178.
    Wylie CS, Shakhnovich EI 2011. A biophysical protein folding model accounts for most mutational fitness effects in viruses. PNAS 108:9916–21
    [Google Scholar]
  182. 179.
    Xu DJ, Noyes MB 2015. Understanding DNA-binding specificity by bacteria hybrid selection. Brief. Funct. Genom. 14:3–16
    [Google Scholar]
  183. 180.
    Yeo G, Burge CB 2004. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J. Comput. Biol. 11:377–94
    [Google Scholar]
  184. 181.
    Zeldovich KB, Chen P, Shakhnovich EI 2007. Protein stability imposes limits on organism complexity and speed of molecular evolution. PNAS 104:16152–57
    [Google Scholar]
  185. 182.
    Zhang J, Kinch LN, Cong Q, Weile J, Sun S et al. 2017. Assessing predictions of fitness effects of missense mutations in SUMO-conjugating enzyme UBE2I. Hum. Mutat. 38:1051–63
    [Google Scholar]
  186. 183.
    Zhao Y, Granas D, Stormo GD 2009. Inferring binding energies from selected binding sites. PLOS Comput. Biol. 5:e1000590
    [Google Scholar]
  187. 184.
    Zhou J, Troyanskaya OG 2015. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12:931–34
    [Google Scholar]
  188. 185.
    Zhou T, Shen N, Yang L, Abe N, Horton J et al. 2015. Quantitative modeling of transcription factor binding specificities using DNA shape. PNAS 112:4654–59
    [Google Scholar]
  189. 186.
    Zhou T, Yang L, Lu Y, Dror I, Dantas Machado AC et al. 2013. DNAshape: a method for the high-throughput prediction of DNA structural features on a genomic scale. Nucl. Acids Res. 41:W56–62
    [Google Scholar]
  190. 187.
    Zuo Z, Stormo GD 2014. High-resolution specificity from DNA sequencing highlights alternative modes of lac repressor binding. Genetics 198:1329–43
    [Google Scholar]
/content/journals/10.1146/annurev-genom-083118-014845
Loading
/content/journals/10.1146/annurev-genom-083118-014845
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error