1932

Abstract

The experimental and computational techniques for capturing information about protein structures and genetic variation within the human genome have advanced dramatically in the past 20 years, generating extensive new data resources. In this review, we discuss these advances, along with new approaches for determining the impact a genetic variant has on protein function. We focus on the potential of new methods that integrate human genetic variation into protein structures to discover relationships to disease, including the discovery of mutational hotspots in cancer-related proteins, the localization of protein-altering variants within protein regions for common complex diseases, and the assessment of variants of unknown significance for Mendelian traits. We expect that approaches that integratethese data sources will play increasingly important roles in disease gene discovery and variant interpretation.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-biodatasci-122220-112147
2022-08-10
2024-06-21
Loading full text...

Full text loading...

/deliver/fulltext/biodatasci/5/1/annurev-biodatasci-122220-112147.html?itemId=/content/journals/10.1146/annurev-biodatasci-122220-112147&mimeType=html&fmt=ahah

Literature Cited

  1. 1.
    Pauling L, Itano HA, Singer SJ, Wells IC. 1949. Sickle cell anemia, a molecular disease. Science 110:2865543–48
    [Google Scholar]
  2. 2.
    Ingram VM. 1957. Gene mutations in human haemoglobin: the chemical difference between normal and sickle cell haemoglobin. Nature 180:4581326–28
    [Google Scholar]
  3. 3.
    Strasser BJ. 1999.. “ Sickle cell anemia, a molecular disease. .” Science 286:54441488–90
    [Google Scholar]
  4. 4.
    Perutz MF, Rossmann MG, Cullis AF, Muirhead H, Will G, North ACT 1960. Structure of haemoglobin: a three-dimensional Fourier synthesis at 5.5-Å. resolution, obtained by X-ray analysis. Nature 185:4711416–22
    [Google Scholar]
  5. 5.
    Vekilov PG. 2007. Sickle-cell haemoglobin polymerization: Is it the primary pathogenic event of sickle-cell anaemia?. Br. J. Haematol. 139:2173–84
    [Google Scholar]
  6. 6.
    Gusella JF, Wexler NS, Conneally PM, Naylor SL, Anderson MA et al. 1983. A polymorphic DNA marker genetically linked to Huntington's disease. Nature 306:5940234–38
    [Google Scholar]
  7. 7.
    MacDonald ME, Ambrose CM, Duyao MP, Myers RH, Lin C et al. 1993. A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington's disease chromosomes. Cell 72:6971–83
    [Google Scholar]
  8. 8.
    Zuccato C, Ciammola A, Rigamonti D, Leavitt BR, Goffredo D et al. 2001. Loss of huntingtin-mediated BDNF gene transcription in Huntington's disease. Science 293:5529493–98
    [Google Scholar]
  9. 9.
    Tsui LC, Dorfman R. 2013. The cystic fibrosis gene: a molecular genetic perspective. Cold Spring Harb. . Perspect. Med. 3:2a009472
    [Google Scholar]
  10. 10.
    Zhang Z, Liu F, Chen J 2018. Molecular structure of the ATP-bound, phosphorylated human CFTR. PNAS 115:5012757–62
    [Google Scholar]
  11. 11.
    Liu F, Zhang Z, Csanády L, Gadsby DC, Chen J. 2017. Molecular structure of the human CFTR ion channel. Cell 169:185–95.e8
    [Google Scholar]
  12. 12.
    wwPDB (Worldw. Protein Data Bank) Found 2022. Deposition statistics Web Resour. wwPDB Found. Piscataway, NJ: accessed Jan. 1. https://www.wwpdb.org/stats/deposition
    [Google Scholar]
  13. 13.
    Burley SK, Berman HM, Bhikadiya C, Bi C, Chen L et al. 2019. Protein Data Bank: the single global archive for 3D macromolecular structure data. Nucleic Acids Res 47:D1D520–28
    [Google Scholar]
  14. 14.
    Nogales E. 2016. The development of cryo-EM into a mainstream structural biology technique. Nat. Methods 13:24–27
    [Google Scholar]
  15. 15.
    Bonomi M, Vendruscolo M. 2019. Determination of protein structural ensembles using cryo-electron microscopy. Curr. Opin. Struct. Biol. 56:37–45
    [Google Scholar]
  16. 16.
    Thonghin N, Kargas V, Clews J, Ford RC. 2018. Cryo-electron microscopy of membrane proteins. Methods 147:176–86
    [Google Scholar]
  17. 17.
    Zhang Z, Liu F, Chen J 2018. Molecular structure of the ATP-bound, phosphorylated human CFTR. PNAS 115:5012757–62
    [Google Scholar]
  18. 18.
    Guo Q, Huang B, Cheng J, Seefelder M, Engler T et al. 2018. The cryo-electron microscopy structure of huntingtin. Nature 555:7694117–20
    [Google Scholar]
  19. 19.
    Muhammed MT, Aki-Yalcin E. 2019. Homology modeling in drug discovery: overview, current applications, and future perspectives. Chem. Biol. Drug Des. 93:112–20
    [Google Scholar]
  20. 20.
    Haddad Y, Adam V, Heger Z 2020. Ten quick tips for homology modeling of high-resolution protein 3D structures. PLOS Comput. Biol. 16:4e1007449
    [Google Scholar]
  21. 21.
    Waterhouse A, Bertoni M, Bienert S, Studer G, Tauriello G et al. 2018. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res 46:W1W296–303
    [Google Scholar]
  22. 22.
    Pieper U, Webb BM, Dong GQ, Schneidman-Duhovny D, Fan H et al. 2014. ModBase, a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res 42:D336–46
    [Google Scholar]
  23. 23.
    Dill KA, Ozkan SB, Shell MS, Weikl TR. 2008. The protein folding problem. Annu. Rev. Biophys. 37:289–316
    [Google Scholar]
  24. 24.
    Li B, Fooksa M, Heinze S, Meiler J. 2018. Finding the needle in the haystack: towards solving the protein-folding problem computationally. Crit. Rev. Biochem. Mol. Biol. 53:11–28
    [Google Scholar]
  25. 25.
    Moult J, Pedersen JT, Judson R, Fidelis K 1995. A large-scale experiment to assess protein structure prediction methods. Proteins 23:3ii–v
    [Google Scholar]
  26. 26.
    Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L et al. 2019. Protein structure prediction using multiple deep neural networks in the 13th Critical Assessment of Protein Structure Prediction (CASP13). Proteins Struct. Funct. Bioinforma. 87:121141–48
    [Google Scholar]
  27. 27.
    Xu J, Wang S 2019. Analysis of distance-based protein structure prediction by deep learning in CASP13. Proteins Struct. Funct. Bioinforma. 87:121069–81
    [Google Scholar]
  28. 28.
    Jumper J, Evans R, Pritzel A, Green T, Figurnov M et al. 2021. Highly accurate protein structure prediction with AlphaFold. Nature 596:7873583–89
    [Google Scholar]
  29. 29.
    Lupas AN, Pereira J, Alva V, Merino F, Coles M, Hartmann MD 2021. The breakthrough in protein structure prediction. Biochem. J. 478:101885–90
    [Google Scholar]
  30. 30.
    Baek M, DiMaio F, Anishchenko I, Dauparas J, Ovchinnikov S et al. 2021. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373:6557871–76
    [Google Scholar]
  31. 31.
    Humphreys I, Pei J, Baek M, Krishnakumar A, Anishchenko I et al. 2021. Computed structures of core eukaryotic protein complexes. Science 374:6573eabm4805
    [Google Scholar]
  32. 32.
    Anishchenko I, Pellock SJ, Chidyausiku TM, Ramelot TA, Ovchinnikov S et al. 2021. De novo protein design by deep network hallucination. Nature 600:7889547–52
    [Google Scholar]
  33. 33.
    Abdellah Z, Ahmadi A, Ahmed S, Aimable M, Ainscough R et al. 2004. Finishing the euchromatic sequence of the human genome. Nature 431:7011931–45
    [Google Scholar]
  34. 34.
    Shendure J, Balasubramanian S, Church GM, Gilbert W, Rogers J et al. 2017. DNA sequencing at 40: past, present and future. Nature 550:7676345–53
    [Google Scholar]
  35. 35.
    Bush WS, Moore JH. 2012. Chapter 11: genome-wide association studies. PLOS Comput. Biol. 8:12e1002822
    [Google Scholar]
  36. 36.
    Manolio TA, Brooks LD, Collins FS. 2008. A HapMap harvest of insights into the genetics of common disease. J. Clin. Investig. 118:51590–605
    [Google Scholar]
  37. 37.
    Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI et al. 2017. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101:15–22
    [Google Scholar]
  38. 38.
    Butkiewicz M, Blue EE, Leung YY, Jian X, Marcora E et al. 2018. Functional annotation of genomic variants in studies of late-onset Alzheimer's disease. Bioinformatics 34:162724–31
    [Google Scholar]
  39. 39.
    Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E et al. 2016. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536:7616285–91
    [Google Scholar]
  40. 40.
    Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J et al. 2020. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581:7809434–43
    [Google Scholar]
  41. 41.
    Sherry ST, Ward MH, Kholodov M, Baker J, Phan L et al. 2001. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29:1308–11
    [Google Scholar]
  42. 42.
    Auton A, Abecasis GR, Altshuler DM, Durbin RM, Bentley DR et al. 2015. A global reference for human genetic variation. Nature 526:757168–74
    [Google Scholar]
  43. 43.
    Visscher PM, Yengo L, Cox NJ, Wray NR. 2021. Discovery and implications of polygenicity of common diseases. Science 373:65621468–73
    [Google Scholar]
  44. 44.
    Wrighton K. 2021. Filling in the gaps telomere to telomere. Nature Milestones: Genomic SequencingS21 London: Springer Nature https://www.nature.com/articles/d42859-020-00117-1
    [Google Scholar]
  45. 45.
    Miga KH, Koren S, Rhie A, Vollger MR, Gershman A et al. 2020. Telomere-to-telomere assembly of a complete human X chromosome. Nature 585:782379–84
    [Google Scholar]
  46. 46.
    Logsdon GA, Vollger MR, Hsieh PH, Mao Y, Liskovykh MA et al. 2021. The structure, function and evolution of a complete human chromosome 8. Nature 593:7857101–7
    [Google Scholar]
  47. 47.
    Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV et al. 2021. The complete sequence of a human genome. bioRxiv 2021.05.26.445798. https://doi.org/10.1101/2021.05.26.445798
    [Crossref]
  48. 48.
    Amberger JS, Bocchini CA, Scott AF, Hamosh A. 2019. OMIM.org: leveraging knowledge across phenotype-gene relationships. Nucleic Acids Res 47:D1D1038–43
    [Google Scholar]
  49. 49.
    Wooster R, Neuhausen SL, Mangion J, Quirk Y, Ford D et al. 1994. Localization of a breast cancer susceptibility gene, BRCA2, to chromosome 13q12-13. Science 265:51812088–90
    [Google Scholar]
  50. 50.
    Hall JM, Lee MK, Newman B, Morrow JE, Anderson LA et al. 1990. Linkage of early-onset familial breast cancer to chromosome 17q21. Science 250:49881684–89
    [Google Scholar]
  51. 51.
    Corder EH, Saunders AM, Strittmatter WJ, Schmechel DE, Gaskell PC et al. 1993. Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer's disease in late onset families. Science 261:5123921–23
    [Google Scholar]
  52. 52.
    Buniello A, Macarthur JAL, Cerezo M, Harris LW, Hayhurst J et al. 2019. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res 47:D1D1005–12
    [Google Scholar]
  53. 53.
    Sirugo G, Williams SM, Tishkoff SA. 2019. The missing diversity in human genetic studies. Cell 177:426–31
    [Google Scholar]
  54. 54.
    Hindorff LA, Bonham VL, Brody LC, Ginoza MEC, Hutter CM et al. 2018. Prioritizing diversity in human genomics research. Nat. Rev. Genet. 19:3175–85
    [Google Scholar]
  55. 55.
    Povysil G, Petrovski S, Hostyk J, Aggarwal V, Allen AS, Goldstein DB. 2019. Rare-variant collapsing analyses for complex traits: guidelines and applications. Nat. Rev. Genet. 20:12747–59
    [Google Scholar]
  56. 56.
    Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X. 2011. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89:182–93
    [Google Scholar]
  57. 57.
    Lee S, Emond MJ, Bamshad MJ, Barnes KC, Rieder MJ et al. 2012. Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am. J. Hum. Genet. 91:2224–37
    [Google Scholar]
  58. 58.
    Wang K. 2016. Boosting the power of the sequence kernel association test by properly estimating its null distribution. Am. J. Hum. Genet. 99:1104–14
    [Google Scholar]
  59. 59.
    Schweiger R, Weissbrod O, Rahmani E, Müller-Nurasyid M, Kunze S et al. 2017. RL-SKAT: an exact and efficient score test for heritability and set tests. Genetics 207:41275–83
    [Google Scholar]
  60. 60.
    Lee S, Teslovich TM, Boehnke M, Lin X. 2013. General framework for meta-analysis of rare variants in sequencing association studies. Am. J. Hum. Genet. 93:142–53
    [Google Scholar]
  61. 61.
    Sun J, Zheng Y, Hsu L. 2013. A unified mixed-effects model for rare-variant association in sequencing studies. Genet. Epidemiol. 37:4334–44
    [Google Scholar]
  62. 62.
    Lin DY, Tang ZZ. 2011. A general framework for detecting disease associations with rare variants in sequencing studies. Am. J. Hum. Genet. 89:3354–67
    [Google Scholar]
  63. 63.
    Chen H, Meigs JB, Dupuis J. 2013. Sequence kernel association test for quantitative traits in family samples. Genet. Epidemiol. 37:2196–204
    [Google Scholar]
  64. 64.
    Wang Q, Dhindsa RS, Carss K, Harper AR, Nag A et al. 2021. Rare variant contribution to human disease in 281,104 UK Biobank exomes. Nature 597:7877527–32
    [Google Scholar]
  65. 65.
    Bis JC, Jian X, Kunkle BW, Chen Y, Hamilton-Nelson KL et al. 2018. Whole exome sequencing study identifies novel rare and common Alzheimer's-associated variants involved in immune response and transcriptional regulation. Mol. Psychiatry 25:1859–75
    [Google Scholar]
  66. 66.
    Kennedy B, Kronenberg Z, Hu H, Moore B, Flygare S et al. 2014. Using VAAST to identify disease-associated variants in next-generation sequencing data. Curr. Protoc. Hum. Genet. 81:6.14.1–6.14.25
    [Google Scholar]
  67. 67.
    Li X, Li Z, Zhou H, Gaynor SM, Liu Y et al. 2020. Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nat. Genet. 52:9969–83
    [Google Scholar]
  68. 68.
    Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A et al. 2010. A method and server for predicting damaging missense mutations. Nat. Methods 7:4248–49
    [Google Scholar]
  69. 69.
    Richards S, Aziz N, Bale S, Bick D, Das S et al. 2015. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17:5405–24
    [Google Scholar]
  70. 70.
    Stein A, Fowler DM, Hartmann-Petersen R, Lindorff-Larsen K. 2019. Biophysical and mechanistic models for disease-causing protein variants. Trends Biochem. Sci. 44:7575–88
    [Google Scholar]
  71. 71.
    Starita LM, Ahituv N, Dunham MJ, Kitzman JO, Roth FP et al. 2017. Variant interpretation: functional assays to the rescue. Am. J. Hum. Genet. 101:3315–25
    [Google Scholar]
  72. 72.
    Kinney JB, McCandlish DM. 2019. Massively parallel assays and quantitative sequence-function relationships. Annu. Rev. Genom. Hum. Genet. 20:99–127
    [Google Scholar]
  73. 73.
    Fowler DM, Fields S. 2014. Deep mutational scanning: a new style of protein science. Nat. Methods. 11:8801–7
    [Google Scholar]
  74. 74.
    Fowler DM, Stephany JJ, Fields S. 2014. Measuring the activity of protein variants on a large scale using deep mutational scanning. Nat. Protoc. 9:92267–84
    [Google Scholar]
  75. 75.
    Glazer AM, Wada Y, Li B, Muhammad A, Kalash OR et al. 2020. High-throughput reclassification of SCN5A variants. Am. J. Hum. Genet. 107:1111–23
    [Google Scholar]
  76. 76.
    Esposito D, Weile J, Shendure J, Starita LM, Papenfuss AT et al. 2019. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol 20:223
    [Google Scholar]
  77. 77.
    Deng Z, Huang W, Bakkalbasi E, Brown NG, Adamski CJ et al. 2012. Deep sequencing of systematic combinatorial libraries reveals β-lactamase sequence constraints at high resolution. J. Mol. Biol. 424:3–4150–67
    [Google Scholar]
  78. 78.
    Stiffler MA, Hekstra DR, Ranganathan R. 2015. Evolvability as a function of purifying selection in TEM-1 β-lactamase. Cell 160:5882–92
    [Google Scholar]
  79. 79.
    Starita LM, Young DL, Islam M, Kitzman JO, Gullingsrud J et al. 2015. Massively parallel functional analysis of BRCA1 RING domain variants. Genetics 200:2413–22
    [Google Scholar]
  80. 80.
    Gasperini M, Starita L, Shendure J. 2016. The power of multiplexed functional analysis of genetic variants. Nat. Protoc. 11:101782–87
    [Google Scholar]
  81. 81.
    Matreyek KA, Starita LM, Stephany JJ, Martin B, Chiasson MA et al. 2018. Multiplex assessment of protein variant abundance by massively parallel sequencing. Nat. Genet. 50:6874–82
    [Google Scholar]
  82. 82.
    Nussinov R, Tsai CJ, Jang H. 2019. Protein ensembles link genotype to phenotype. PLOS Comput. Biol. 15:6e1006648
    [Google Scholar]
  83. 83.
    Beltrao P, Albanèse V, Kenner LR, Swaney DL, Burlingame A et al. 2012. Systematic functional prioritization of protein posttranslational modifications. Cell 150:2413–25
    [Google Scholar]
  84. 84.
    Lee TI, Young RA. 2013. Transcriptional regulation and its misregulation in disease. Cell 152:61237–51
    [Google Scholar]
  85. 85.
    Yourshaw M, Taylor SP, Rao AR, Martín MG, Nelson SF. 2015. Rich annotation of DNA sequencing variants by leveraging the Ensembl Variant Effect Predictor with plugins. Brief. Bioinform. 16:2255–64
    [Google Scholar]
  86. 86.
    Cooper GM, Shendure J. 2011. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat. Rev. Genet. 12:9628–40
    [Google Scholar]
  87. 87.
    Thusberg J, Vihinen M. 2009. Pathogenic or not? And if so, then how? Studying the effects of missense mutations using bioinformatics methods. Hum. Mutat. 30:5703–14
    [Google Scholar]
  88. 88.
    Ng PC, Henikoff S. 2006. Predicting the effects of amino acid substitutions on protein function. Annu. Rev. Genom. Hum. Genet. 7:61–80
    [Google Scholar]
  89. 89.
    Niroula A, Vihinen M. 2016. Variation interpretation predictors: principles, types, performance, and choice. Hum. Mutat. 37:6579–97
    [Google Scholar]
  90. 90.
    Zhu C, Miller M, Zeng Z, Wang Y, Mahlich Y et al. 2020. Computational approaches for unraveling the effects of variation in the human genome and microbiome. Annu. Rev. Biomed. Data Sci. 3:411–32
    [Google Scholar]
  91. 91.
    Rost B, Radivojac P, Bromberg Y. 2016. Protein function in precision medicine: deep understanding with machine learning. FEBS Lett 590:152327–41
    [Google Scholar]
  92. 92.
    Heyne HO, Baez-Nieto D, Iqbal S, Palmer DS, Brunklaus A et al. 2020. Predicting functional effects of missense variants in voltage-gated sodium and calcium channels. Sci. Transl. Med. 12:556aav2848
    [Google Scholar]
  93. 93.
    Pejaver V, Urresti J, Lugo-Martinez J, Pagel KA, Lin GN et al. 2020. Inferring the molecular and phenotypic impact of amino acid variants with MutPred2. Nat. Commun. 11:5918
    [Google Scholar]
  94. 94.
    Wang C, Balch WE. 2018. Bridging genomics to phenomics at atomic resolution through variation spatial profiling. Cell Rep 24:82013–28.e6
    [Google Scholar]
  95. 95.
    Tsang M, Cheng D, Liu Y. 2018. Detecting statistical interactions from neural network weights. Proceedings of the 6th International Conference on Learning Representations (ICLR 2018) https://openreview.net/forum?id=ByOfBggRZ
    [Google Scholar]
  96. 96.
    Miosge LA, Field MA, Sontani Y, Cho V, Johnson S et al. 2015. Comparison of predicted and actual consequences of missense mutations. PNAS 112:37E5189–98
    [Google Scholar]
  97. 97.
    Itan Y, Casanova JL. 2015. Can the impact of human genetic variations be predicted?. PNAS 112:3711426–27
    [Google Scholar]
  98. 98.
    Kircher M, Witten DM, Jain P, O'Roak BJ, Cooper GM, Shendure J 2014. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46:3310–15
    [Google Scholar]
  99. 99.
    Ioannidis NM, Rothstein JH, Pejaver V, Middha S, McDonnell SK et al. 2016. REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am. J. Hum. Genet. 99:4877–85
    [Google Scholar]
  100. 100.
    Care MA, Needham CJ, Bulpitt AJ, Westhead DR. 2007. Deleterious SNP prediction: Be mindful of your training data. ! Bioinformatics 23:6664–72
    [Google Scholar]
  101. 101.
    Cooper DN, Krawczak M, Polychronakos C, Tyler-Smith C, Kehrer-Sawatzki H. 2013. Where genotype is not predictive of phenotype: towards an understanding of the molecular basis of reduced penetrance in human inherited disease. Hum. Genet. 132:101077–130
    [Google Scholar]
  102. 102.
    Li B, Mendenhall JL, Kroncke BM, Taylor KC, Huang H et al. 2017. Predicting the functional impact of KCNQ1 variants of unknown significance. Circ. Cardiovasc. Genet. 10:5e001754
    [Google Scholar]
  103. 103.
    Traynelis J, Silk M, Wang Q, Berkovic SF, Liu L et al. 2017. Optimizing genomic medicine in epilepsy through a gene-customized approach to missense variant interpretation. Genome Res 27:101715–29
    [Google Scholar]
  104. 104.
    Evans P, Wu C, Lindy A, McKnight DA, Lebo M et al. 2019. Genetic variant pathogenicity prediction trained using disease-specific clinical sequencing data sets. Genome Res 29:71144–51
    [Google Scholar]
  105. 105.
    Manolio TA, Rowley R, Williams MS, Roden D, Ginsburg GS et al. 2019. Opportunities, resources, and techniques for implementing genomics in clinical care. Lancet 394:10197511–20
    [Google Scholar]
  106. 106.
    Wise AL, Manolio TA, Mensah GA, Peterson JF, Roden DM et al. 2019. Genomic medicine for undiagnosed diseases. Lancet 394:10197533–40
    [Google Scholar]
  107. 107.
    Roden DM, McLeod HL, Relling MV, Williams MS, Mensah GA et al. 2019. Pharmacogenomics. Lancet 394:10197521–32
    [Google Scholar]
  108. 108.
    Landrum MJ, Lee JM, Benson M, Brown GR, Chao C et al. 2018. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res 46:D1D1062–67
    [Google Scholar]
  109. 109.
    Andreoletti G, Pal LR, Moult J, Brenner SE. 2019. Reports from the fifth edition of CAGI: the critical assessment of genome interpretation. Hum. Mutat. 40:91197–201
    [Google Scholar]
  110. 110.
    MacArthur DG, Manolio TA, Dimmock DP, Rehm HL, Shendure J et al. 2014. Guidelines for investigating causality of sequence variants in human disease. Nature 508:7497469–76
    [Google Scholar]
  111. 111.
    Eilbeck K, Quinlan A, Yandell M. 2017. Settling the score: variant prioritization and Mendelian disease. Nat. Rev. Genet. 18:10599–612
    [Google Scholar]
  112. 112.
    Stehr H, Jang S-HJ, Duarte JM, Wierling C, Lehrach H et al. 2011. The structural impact of cancer-associated missense mutations in oncogenes and tumor suppressors. Mol. Cancer 10:54
    [Google Scholar]
  113. 113.
    Kamburov A, Lawrence MS, Polak P, Leshchiner I, Lage K et al. 2015. Comprehensive assessment of cancer missense mutation clustering in protein structures. PNAS 112:40E5486–95
    [Google Scholar]
  114. 114.
    Meyer MJ, Lapcevic R, Romero AE, Yoon M, Das J et al. 2016. mutation3D: cancer gene prediction through atomic clustering of coding variants in the structural proteome. Hum. Mutat. 37:5447–56
    [Google Scholar]
  115. 115.
    Tokheim C, Bhattacharya R, Niknafs N, Gygax DM, Kim R et al. 2016. Exome-scale discovery of hotspot mutation regions in human cancer using 3D protein structure. Cancer Res 76:133719–31
    [Google Scholar]
  116. 116.
    Niu B, Scott AD, Sengupta S, Bailey MH, Batra P et al. 2016. Protein-structure-guided discovery of functional mutations across 19 cancer types. Nat. Genet. 48:8827–37
    [Google Scholar]
  117. 117.
    Gao J, Chang MT, Johnsen HC, Gao SP, Sylvester BE et al. 2017. 3D clusters of somatic mutations in cancer reveal numerous rare mutations as functional targets. Genome Med 9:4
    [Google Scholar]
  118. 118.
    Kumar S, Clarke D, Gerstein MB 2019. Leveraging protein dynamics to identify cancer mutational hotspots using 3D structures. PNAS 116:3818962–70
    [Google Scholar]
  119. 119.
    Sivley RM, Dou X, Meiler J, Bush WS, Capra JA. 2018. Comprehensive analysis of constraint on the spatial distribution of missense variants in human protein structures. Am. J. Hum. Genet. 102:3415–26
    [Google Scholar]
  120. 120.
    Martinez-Ledesma E, Flores D, Trevino V. 2020. Computational methods for detecting cancer hotspots. Comput. Struct. Biotechnol. J. 18:3567–76
    [Google Scholar]
  121. 121.
    West RM, Lu W, Rotroff DM, Kuenemann MA, Chang SM et al. 2019. Identifying individual risk rare variants using protein structure guided local tests (POINT). PLOS Comput. Biol. 15:2e1006722
    [Google Scholar]
  122. 122.
    Tang ZZ, Sliwoski GR, Chen G, Jin B, Bush WS et al. 2020. PSCAN: Spatial scan tests guided by protein structures improve complex disease gene discovery and signal variant detection. Genome Biol 21:217
    [Google Scholar]
  123. 123.
    Jin B, Capra JA, Benchek P, Wheeler N, Naj AC et al. 2022. An association test of the spatial distribution of rare missense variants within protein structures identifies Alzheimer's disease-related patterns. Genome Res 32:477890
    [Google Scholar]
  124. 124.
    Sivley RM, Sheehan JH, Kropski JA, Cogan J, Blackwell TS et al. 2018. Three-dimensional spatial analysis of missense variants in RTEL1 identifies pathogenic variants in patients with Familial Interstitial Pneumonia. BMC Bioinform. 19:18
    [Google Scholar]
  125. 125.
    Li B, Roden DM, Capra JA. 2021. The 3D spatial constraint on 6.1 million amino acid sites in the human proteome. bioRxiv 2021.09.15.460390. https://doi.org/10.1101/2021.09.15.460390
    [Crossref]
  126. 126.
    Varadi M, Anyango S, Deshpande M, Nair S, Natassia C et al. 2022. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 50:D1D439–44
    [Google Scholar]
  127. 127.
    Wang Z, Moult J. 2001. SNPs, protein structure, and disease. Hum. Mutat. 17:4263–70
    [Google Scholar]
  128. 128.
    Yue P, Li Z, Moult J. 2005. Loss of protein structure stability as a major causative factor in monogenic disease. J. Mol. Biol. 353:2459–73
    [Google Scholar]
  129. 129.
    Worth CL, Gong S, Blundell TL. 2009. Structural and functional constraints in the evolution of protein families. Nat. Rev. Mol. Cell Biol. 10:10709–20
    [Google Scholar]
  130. 130.
    Gao M, Zhou H, Skolnick J. 2015. Insights into disease-associated mutations in the human proteome through protein structural analysis. Structure 23:71362–69
    [Google Scholar]
  131. 131.
    Li B, Yang YT, Capra JA, Gerstein MB. 2020. Predicting changes in protein thermodynamic stability upon point mutation with deep 3D convolutional neural networks. PLOS Comput. Biol. 16:11e1008291
    [Google Scholar]
  132. 132.
    Perszyk RE, Kristensen AS, Lyuboslavsky P, Traynelis SF. 2021. Three-dimensional missense tolerance ratio analysis. Genome Res 31:81447–61
    [Google Scholar]
  133. 133.
    Silk M, Pires DEV, Rodrigues CHM, D'Souza EN, Olshansky M et al. 2021. MTR3D: identifying regions within protein tertiary structures under purifying selection. Nucleic Acids Res 49:W1W438–45
    [Google Scholar]
  134. 134.
    Hicks M, Bartha I, di Iulio J, Venter JC, Telenti A 2019. Functional characterization of 3D protein structures informed by human genetic diversity. PNAS 116:188960–65
    [Google Scholar]
  135. 135.
    Evans R, O'Neill M, Pritzel A, Antropova N, Senior A et al. 2021. Protein complex prediction with AlphaFold-Multimer. bioRxiv 2021.10.04.463034. https://doi.org/10.1101/2021.10.04.463034
    [Crossref]
  136. 136.
    Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. 2022. ColabFold—making protein folding accessible to all. bioRxiv 2021.08.15.456425. https://doi.org/10.1101/2021.08.15.456425
    [Crossref]
  137. 137.
    Callaway E. 2021. DeepMind's AI predicts structures for a vast trove of proteins. Nature 595:7869635
    [Google Scholar]
  138. 138.
    Castel SE, Aguet F, Mohammadi P, Aguet F, Anand S et al. 2020. A vast resource of allelic expression data spanning human tissues. Genome Biol 21:234
    [Google Scholar]
  139. 139.
    Cummings BB, Karczewski KJ, Kosmicki JA, Seaby EG, Watts NA et al. 2020. Transcript expression-aware annotation improves rare variant interpretation. Nature 581:7809452–58
    [Google Scholar]
  140. 140.
    Battle A, Khan Z, Wang SH, Mitrano A, Ford MJ et al. 2015. Impact of regulatory variation from RNA to protein. Science 347:6222664–67
    [Google Scholar]
/content/journals/10.1146/annurev-biodatasci-122220-112147
Loading
/content/journals/10.1146/annurev-biodatasci-122220-112147
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error