1932

Abstract

Our understanding of the human genome has continuously expanded since its draft publication in 2001. Over the years, novel assays have allowed us to progressively overlay layers of knowledge above the raw sequence of A's, T's, G's, and C's. The reference human genome sequence is now a complex knowledge base maintained under the shared stewardship of multiple specialist communities. Its complexity stems from the fact that it is simultaneously a template for transcription, a record of evolution, a vehicle for genetics, and a functional molecule. In short, the human genome serves as a frame of reference at the intersection of a diversity of scientific fields. In recent years, the progressive fall in sequencing costs has given increasing importance to the quality of the human reference genome, as hundreds of thousands of individuals are being sequenced yearly, often for clinical applications. Also, novel sequencing-based assays shed light on novel functions of the genome, especially with respect to gene expression regulation. Keeping the human genome annotation up to date and accurate is therefore an ongoing partnership between reference annotation projects and the greater community worldwide.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-genom-121119-083418
2020-08-31
2024-04-18
Loading full text...

Full text loading...

/deliver/fulltext/genom/21/1/annurev-genom-121119-083418.html?itemId=/content/journals/10.1146/annurev-genom-121119-083418&mimeType=html&fmt=ahah

Literature Cited

  1. 1. 
    1000 Genomes Proj. Consort 2015. A global reference for human genetic variation. Nature 526:68–74
    [Google Scholar]
  2. 2. 
    Adams D, Altucci L, Antonarakis SE, Ballesteros J, Beck S et al. 2012. BLUEPRINT to decode the epigenetic signature written in blood. Nature 30:224–26
    [Google Scholar]
  3. 3. 
    Alexandersson M, Cawley S, Pachter L 2003. SLAM: cross-species gene finding and alignment with a generalized pair hidden Markov model. Genome Res 13:496–502
    [Google Scholar]
  4. 4. 
    Allen NE, Sudlow C, Peakman T, Collins R 2014. UK Biobank data: come and get it. Sci. Transl. Med. 6:224ed4
    [Google Scholar]
  5. 5. 
    Amberger JS, Bocchini CA, Scott AF, Hamosh A 2019. OMIM.org: leveraging knowledge across phenotype-gene relationships. Nucleic Acids Res 47:D1038–43
    [Google Scholar]
  6. 6. 
    Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J et al. 2014. An atlas of active enhancers across human cell types and tissues. Nature 507:455–61
    [Google Scholar]
  7. 7. 
    Annas GJ. 2003. HIPAA regulations—a new era of medical-record privacy. N. Engl. J. Med. 348:1486–90
    [Google Scholar]
  8. 8. 
    Avery OT, MacLeod CM, McCarty M 1944. Studies on the chemical nature of the substance inducing transformation of pneumococcal types: induction of transformation by a desoxyribonucleic acid fraction isolated from pneumococcus type III. J. Exp. Med. 79:137–58
    [Google Scholar]
  9. 9. 
    Banerji J, Olson L, Schaffner W 1983. A lymphocyte-specific cellular enhancer is located downstream of the joining region in immunoglobulin heavy chain genes. Cell 33:729–40
    [Google Scholar]
  10. 10. 
    Barra V, Fachinetti D. 2018. The dark side of centromeres: types, causes and consequences of structural abnormalities implicating centromeric DNA. Nat. Commun. 9:4340
    [Google Scholar]
  11. 11. 
    Beaudet AL, Tsui L-C. 1993. A suggested nomenclature for designating mutations. Hum. Mut. 2:245–48
    [Google Scholar]
  12. 12. 
    Bennett S. 2004. Solexa Ltd. Pharmacogenomics 5:433–38
    [Google Scholar]
  13. 13. 
    Benoist C, Chambon P. 1981. In vivo sequence requirements of the SV40 early promoter region. Nature 290:304–10
    [Google Scholar]
  14. 14. 
    Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic A et al. 2010. The NIH Roadmap Epigenomics Mapping Consortium. Nat. Biotechnol. 28:1045–48
    [Google Scholar]
  15. 15. 
    Bernstein FC, Koetzle TF, Williams GJB, Meyer EF Jr, Brice MD et al. 1977. The Protein Data Bank: a computer‐based archival file for macromolecular structures. Eur. J. Biochem. 80:319–24
    [Google Scholar]
  16. 16. 
    Birney E, Vamathevan J, Goodhand P 2017. Genomics in healthcare: GA4GH looks to 2022. bioRxiv 203554. https://doi.org/10.1101/203554
    [Crossref]
  17. 17. 
    Blencowe BJ. 2017. The relationship between alternative splicing and proteomic complexity. Trends Biochem. Sci. 42:407–8
    [Google Scholar]
  18. 18. 
    Braschi B, Denny P, Gray K, Jones T, Seal R et al. 2019. Genenames.org: the HGNC and VGNC resources in 2019. Nucleic Acids Res 47:D786–92
    [Google Scholar]
  19. 19. 
    Buniello A, MacArthur JAL, Cerezo M, Harris LW, Hayhurst J et al. 2019. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res 47:D1005–1012
    [Google Scholar]
  20. 20. 
    Burge C, Karlin S. 1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268:78–94
    [Google Scholar]
  21. 21. 
    Campbell MS, Holt C, Moore B, Yandell M 2014. Genome annotation and curation using MAKER and MAKER-P. Curr. Protoc. Bioinform. 48:4 11 1–39
    [Google Scholar]
  22. 22. 
    Cancer Genome Atlas Res. Netw 2013. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45:1113–20
    [Google Scholar]
  23. 23. 
    Carvill GL, Engel KL, Ramamurthy A, Cochran JN, Roovers J et al. 2018. Aberrant inclusion of a poison exon causes Dravet syndrome and related SCN1A-associated genetic epilepsies. Am. J. Hum. Genet. 103:1022–29
    [Google Scholar]
  24. 24. 
    Chen Y, Cunningham F, Rios D, McLaren WM, Smith J et al. 2010. Ensembl variation resources. BMC Genom 11:293
    [Google Scholar]
  25. 25. 
    Church DM, Schneider VA, Graves T, Auger K, Cunningham F et al. 2011. Modernizing reference genome assemblies. PLOS Biol 9:e1001091
    [Google Scholar]
  26. 26. 
    Clamp M, Fry B, Kamal M, Xie X, Cuff J et al. 2007. Distinguishing protein-coding and noncoding genes in the human genome. PNAS 104:19428–33
    [Google Scholar]
  27. 27. 
    Clark BF. 2006. The crystal structure of tRNA. J. Biosci. 31:453–57
    [Google Scholar]
  28. 28. 
    Collins FS, Morgan M, Patrinos A 2003. The Human Genome Project: lessons from large-scale biology. Science 300:286–90
    [Google Scholar]
  29. 29. 
    Cordaux R, Batzer MA. 2009. The impact of retrotransposons on human genome evolution. Nat. Rev. Genet. 10:691–703
    [Google Scholar]
  30. 30. 
    Crick FH. 1958. On protein synthesis. Symp. Soc. Exp. Biol. 12:138–63
    [Google Scholar]
  31. 31. 
    Dahm R. 2005. Friedrich Miescher and the discovery of DNA. Dev. Biol. 278:274–88
    [Google Scholar]
  32. 32. 
    Dana JM, Gutmanas A, Tyagi N, Qi G, O'Donovan C et al. 2019. SIFTS: updated Structure Integration with Function, Taxonomy and Sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins. Nucleic Acids Res 47:D482–489
    [Google Scholar]
  33. 33. 
    Deciphering Dev. Disord. Study 2017. Prevalence and architecture of de novo mutations in developmental disorders. Nature 542:433–38
    [Google Scholar]
  34. 34. 
    Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T et al. 2012. Landscape of transcription in human cells. Nature 489:101–8
    [Google Scholar]
  35. 35. 
    Dolman L, Page A, Babb L, Freimuth RR, Arachchi H et al. 2018. ClinGen advancing genomic data-sharing standards as a GA4GH driver project. Hum. Mut. 39:1686–89
    [Google Scholar]
  36. 36. 
    Doucet AJ, Droc G, Siol O, Audoux J, Gilbert N 2015. U6 snRNA pseudogenes: markers of retrotransposition dynamics in mammals. Mol. Biol. Evol. 32:1815–32
    [Google Scholar]
  37. 37. 
    Duret L, Chureau C, Samain S, Weissenbach J, Avner P 2006. The Xist RNA gene evolved in eutherians by pseudogenization of a protein-coding gene. Science 312:1653–55
    [Google Scholar]
  38. 38. 
    Eddy SR. 2001. Non-coding RNA genes and the modern RNA world. Nat. Rev. Genet. 2:919–29
    [Google Scholar]
  39. 39. 
    Eid J, Fehr A, Gray J, Luong K, Lyle J et al. 2009. Real-time DNA sequencing from single polymerase molecules. Science 23:133–38
    [Google Scholar]
  40. 40. 
    ENCODE Proj. Consort 2007. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447:799–816
    [Google Scholar]
  41. 41. 
    ENCODE Proj. Consort. 2012. An integrated encyclopedia of DNA elements in the human genome. Nature 489:57–74
    [Google Scholar]
  42. 42. 
    Eur. Bioinform. Inst 2020. GIFTs. European Bioinformatics Institute https://www.ebi.ac.uk/gifts
    [Google Scholar]
  43. 43. 
    Eur. Bioinform. Inst 2020. MANE (Matched Annotation between NCBI and EBI). European Bioinformatics Institute https://www.ensembl.org/info/genome/genebuild/mane.html
    [Google Scholar]
  44. 44. 
    Ezkurdia I, Juan D, Rodriguez JM, Frankish A, Diekhans M et al. 2014. Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes. Hum. Mol. Genet. 23:5866–78
    [Google Scholar]
  45. 45. 
    Felsenfeld G. 2014. A brief history of epigenetics. Cold Spring Harb. Perspect. Biol. 6:a018200
    [Google Scholar]
  46. 46. 
    Fields C, Adams MD, White O, Venter JC 1994. How many genes in the human genome. Nat. Genet. 7:345–46
    [Google Scholar]
  47. 47. 
    Fiers W, Contreras R, Duerinck F, Haegeman G, Iserentant D et al. 1976. Complete nucleotide sequence of bacteriophage MS2 RNA: primary and secondary structure of the replicase gene. Nature 260:500–7
    [Google Scholar]
  48. 48. 
    Flicek P, Keibler E, Hu P, Korf I, Brent MR 2003. Leveraging the mouse genome for gene prediction in human: from whole-genome shotgun reads to a global synteny map. Genome Res 13:46–54
    [Google Scholar]
  49. 49. 
    Frankish A, Diekhans M, Ferreira A-M, Johnson R, Jungreis I et al. 2019. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res 47:D766–73
    [Google Scholar]
  50. 50. 
    Furey TS. 2012. ChIP-seq and beyond: new and improved methodologies to detect and characterize protein-DNA interactions. Nat. Rev. Genet. 13:840–52
    [Google Scholar]
  51. 51. 
    Gasperini M, Andrew J, Hill AJ, McFaline-Figueroa JL, Martin B et al. 2018. crisprQTL mapping as a genome-wide association framework for cellular genetic screens. bioRxiv 314344. https://doi.org/10.1101/314344
    [Crossref]
  52. 52. 
    Gasperini M, Tome JM, Shendure J 2020. Towards a comprehensive catalogue of validated and target-linked human enhancers. Nat. Rev. Genet. 21:292–310
    [Google Scholar]
  53. 53. 
    Gaudet P, Michel PA, Zahn-Zabal M, Britan A, Cusin I et al. 2017. The neXtProt knowledgebase on human proteins: 2017 update. Nucleic Acids Res 45:D177–82
    [Google Scholar]
  54. 54. 
    Gerstein MB, Bruce C, Rozowsky JS, Zheng D, Du J et al. 2007. What is a gene, post-ENCODE? History and updated definition. Genome Res 17:669–81
    [Google Scholar]
  55. 55. 
    Gillies SD, Morrison SL, Oi VT, Tonegawa S 1983. A tissue-specific transcription enhancer element is located in the major intron of a rearranged immunoglobulin heavy chain gene. Cell 33:717–28
    [Google Scholar]
  56. 56. 
    Gingeras TR. 2009. Implications of chimaeric non-co-linear transcripts. Nature 461:206–11
    [Google Scholar]
  57. 57. 
    Gjerstorff MF, Andersen MH, Ditzel HJ 2015. Oncogenic cancer/testis antigens: prime candidates for immunotherapy. Oncotarget 6:15772–87
    [Google Scholar]
  58. 58. 
    Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B et al. 1996. Life with 6000 genes. Science 274:546–67
    [Google Scholar]
  59. 59. 
    Gruss P, Dhar R, Khoury G 1981. Simian virus 40 tandem repeated sequences as an element of the early promoter. PNAS 78:943–47
    [Google Scholar]
  60. 60. 
    Guigó R, Flicek P, Abril JF, Reymond A, Lagarde J et al. 2006. EGASP: the human ENCODE Genome Annotation Assessment Project. Genome Biol 7:Suppl. 1S21–31
    [Google Scholar]
  61. 61. 
    Gurdon JB. 1970. Nuclear transplantation and the control of gene activity in animal development. Proc. R. Soc. Lond. B 176:303–14
    [Google Scholar]
  62. 62. 
    Guttman M, Donaghey J, Carey BW, Garber M, Grenier JK et al. 2011. lincRNAs act in the circuitry controlling pluripotency and differentiation. Nature 477:295–300
    [Google Scholar]
  63. 63. 
    Haeussler M, Zweig AS, Tyner C, Speir ML, Rosenbloom KR et al. 2019. The UCSC Genome Browser database: 2019 update. Nucleic Acids Res 47:D853–58
    [Google Scholar]
  64. 64. 
    Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M et al. 2012. GENCODE: the reference human genome annotation for the ENCODE Project. Genome Res 22:1760–74
    [Google Scholar]
  65. 65. 
    Hershey AD, Chase M. 1952. Independent functions of viral protein and nucleic acid in growth of bacteriophage. J. Gen. Physiol. 36:39–56
    [Google Scholar]
  66. 66. 
    Hoff KJ, Stanke M. 2019. Predicting genes in single genomes with AUGUSTUS. Curr. Protoc. Bioinform. 65:e57
    [Google Scholar]
  67. 67. 
    Holliday R. 2006. Epigenetics: a historical overview. Epigenetics 1:76–80
    [Google Scholar]
  68. 68. 
    Holmes JB, Moyer E, Phan L, Maglott D, Kattman BL 2019. SPDI: data model for variants and applications at NCBI. bioRxiv 537449. https://doi.org/10.1101/537449
    [Crossref]
  69. 69. 
    Hon CC, Ramilowski JA, Harshbarger J, Bertin N, Rackham OJ et al. 2017. An atlas of human long non-coding RNAs with accurate 5′ ends. Nature 543:199–204
    [Google Scholar]
  70. 70. 
    Horsthemke B. 2018. A critical view on transgenerational epigenetic inheritance in humans. Nat. Commun. 9:2973
    [Google Scholar]
  71. 71. 
    Ingolia NT, Brar GA, Stern-Ginossar N, Harris MS, Talhouarne GJ et al. 2014. Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes. Cell Rep 8:1365–79
    [Google Scholar]
  72. 72. 
    Ingolia NT, Lareau LF, Weissman JS 2011. Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147:789–802
    [Google Scholar]
  73. 73. 
    Inoue F, Ahituv N. 2015. Decoding enhancers using massively parallel reporter assays. Genomics 106:159–64
    [Google Scholar]
  74. 74. 
    Int. Cancer Genome Consort 2010. International network of cancer genome projects. Nature 464:993–98
    [Google Scholar]
  75. 75. 
    Int. HapMap 3 Consort 2010. Integrating common and rare genetic variation in diverse human populations. Nature 467:52–58
    [Google Scholar]
  76. 76. 
    Int. Hum. Genome Seq. Consort 2001. Initial sequencing and analysis of the human genome. Nature 409:860–921
    [Google Scholar]
  77. 77. 
    Int. Hum. Genome Seq. Consort 2004. Finishing the euchromatic sequence of the human genome. Nature 431:931–45
    [Google Scholar]
  78. 78. 
    Jacob F, Monod J. 1961. Genetic regulatory mechanisms in the synthesis of proteins. J. Mol. Biol. 3:318–56
    [Google Scholar]
  79. 79. 
    Johannsen W. 1909. Elemente der exakten Erblichkeitslehre Jena, Ger: Fischer
  80. 80. 
    Kaessmann H. 2010. Origins, evolution, and phenotypic impact of new genes. Genome Res 20:1313–26
    [Google Scholar]
  81. 81. 
    Karsch-Mizrachi I, Nakamura Y 2012. The International Nucleotide Sequence Database Collaboration. Nucleic Acids Res 40:D33–37
    [Google Scholar]
  82. 82. 
    Katsila T, Potamias G, Patrinos GP, Swertz MA 2018. A review of tools to automatically infer chromosomal positions from dbSNP and HGVS genetic variants. Human Genome Informatics: Translating Genes into Health CG Lambert, DJ Baker, GP Patrinos 133–56 Cambridge, MA: Academic
    [Google Scholar]
  83. 83. 
    Kellis M, Wold B, Snyder MP, Bernstein BE, Kundaje A et al. 2014. Defining functional DNA elements in the human genome. PNAS 111:6131–38
    [Google Scholar]
  84. 84. 
    Kim J, Piao HL, Kim BJ, Yao F, Han Z et al. 2018. Long noncoding RNA MALAT1 suppresses breast cancer metastasis. Nat. Genet. 50:1705–15
    [Google Scholar]
  85. 85. 
    Kim MS, Pinto SM, Getnet D, Nirujogi RS, Manda SS et al. 2014. A draft map of the human proteome. Nature 509:575–81
    [Google Scholar]
  86. 86. 
    Kozomara A, Birgaoanu M, Griffiths-Jones S 2019. miRBase: from microRNA sequences to function. Nucleic Acids Res 47:D155–62
    [Google Scholar]
  87. 87. 
    Kurland CG. 1960. Molecular characterization of ribonucleic acid from Escherichia coli ribosomes: I. Isolation and molecular weights. J. Mol. Biol. 2:83–91
    [Google Scholar]
  88. 88. 
    Lagarde J, Uszczynska-Ratajczak B, Carbonell S, Pérez-Lluch S, Abad A et al. 2017. High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing. Nat. Genet. 49:1731–40
    [Google Scholar]
  89. 89. 
    Landrum MJ, Kattman BL. 2018. ClinVar at five years: delivering on the promise. Hum. Mut. 39:1623–30
    [Google Scholar]
  90. 90. 
    Lareau LF, Inada M, Green RE, Wengrod JC, Brenner SE 2007. Unproductive splicing of SR genes associated with highly conserved and ultraconserved DNA elements. Nature 446:926–29
    [Google Scholar]
  91. 91. 
    Lefranc MP, Giudicelli V, Duroux P, Jabado-Michaloud J, Folch G et al. 2015. IMGT®, the international ImMunoGeneTics information system® 25 years on. Nucleic Acids Res 43:D413–22
    [Google Scholar]
  92. 92. 
    Lin MF, Jungreis I, Kellis M 2011. PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions. Bioinformatics 27:i275–82
    [Google Scholar]
  93. 93. 
    Lynch DC, Revil T, Schwartzentruber J, Bhoj EJ, Innes AM et al. 2014. Disrupted auto-regulation of the spliceosomal gene SNRPB causes cerebro-costo-mandibular syndrome. Nat. Commun. 5:4483
    [Google Scholar]
  94. 94. 
    Margulies M, Egholm M, Altman WE, Attiya S, Bader JS et al. 2005. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437:376–80
    [Google Scholar]
  95. 95. 
    McClintock B. 1953. Induction of instability at selected loci in maize. Genetics 38:579–99
    [Google Scholar]
  96. 96. 
    Melamud E, Moult J. 2009. Stochastic noise in splicing machinery. Nucleic Acids Res 37:4873–86
    [Google Scholar]
  97. 97. 
    Mendel JG. 1866. Versuche über Pflanzenhybriden. Verh. Naturforsch. Ver. Brünn 4:3–47
    [Google Scholar]
  98. 98. 
    Mercola M, Wang XF, Olsen J, Calame K 1983. Transcriptional enhancer elements in the mouse immunoglobulin heavy chain locus. Science 221:663–65
    [Google Scholar]
  99. 99. 
    Mikheyev AS, Tin MM. 2014. A first look at the Oxford Nanopore MinION sequencer. Mol. Ecol. Resour. 14:1097–102
    [Google Scholar]
  100. 100. 
    Morris DR, Geballe AP. 2000. Upstream open reading frames as regulators of mRNA translation. Mol. Cell Biol. 20:8635–42
    [Google Scholar]
  101. 101. 
    Murray JC, Buetow KH, Weber JL, Ludwigsen S, Scherpbier-Heddema T et al. 1994. A comprehensive human linkage map with centimorgan density. Cooperative Human Linkage Center (CHLC). Science 265:2049–54
    [Google Scholar]
  102. 102. 
    Nacu S, Yuan W, Kan Z, Bhatt D, Rivers CS et al. 2011. Deep RNA sequencing analysis of readthrough gene fusions in human prostate adenocarcinoma and reference samples. BMC Med. Genom. 4:11
    [Google Scholar]
  103. 103. 
    Nagy E, Maquat LE. 1998. A rule for termination-codon position within intron-containing genes: when nonsense affects RNA abundance. Trends Biochem. Sci. 23:198–99
    [Google Scholar]
  104. 104. 
    Natl. Cancer Inst 2020. The Cancer Genome Atlas Project. National Cancer Institute https://www.cancer.gov/tcga
    [Google Scholar]
  105. 105. 
    Nawrocki EP, Eddy SR. 2013. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29:2933–35
    [Google Scholar]
  106. 106. 
    Nellore A, Jaffe AE, Fortin JP, Alquicira-Hernández J, Collado-Torres L et al. 2016. Human splicing diversity and the extent of unannotated splice junctions across human RNA-seq samples on the Sequence Read Archive. Genome Biol 17:266
    [Google Scholar]
  107. 107. 
    Nelson BR, Makarewich CA, Anderson DM, Winders BR, Troupes CD 2016. A peptide encoded by a transcript annotated as long noncoding RNA enhances SERCA activity in muscle. Science 351:271–75
    [Google Scholar]
  108. 108. 
    Nica AC, Dermitzakis ET. 2013. Expression quantitative trait loci: present and future. Philos. Trans. R. Soc. Lond. 368:20120362
    [Google Scholar]
  109. 109. 
    Nirenberg M, Leder P, Bernfield M, Brimacombe R, Trupin J et al. 1965. RNA codewords and protein synthesis, VII. On the general nature of the RNA code. PNAS 53:1161–68
    [Google Scholar]
  110. 110. 
    O'Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D et al. 2016. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 44:D733–45
    [Google Scholar]
  111. 111. 
    Osoegawa K, Mammoser AG, Wu C, Frengen E, Zeng C et al. 2001. A bacterial artificial chromosome library for sequencing the complete human genome. Genome Res 11:483–96
    [Google Scholar]
  112. 112. 
    Pace JK, Feschotte C. 2007. The evolutionary history of human DNA transposons: evidence for intense activity in the primate lineage. Genome Res 17:422–32
    [Google Scholar]
  113. 113. 
    Parra G, Agarwal P, Abril JF, Wiehe T, Fickett JW, Guigó R 2003. Comparative gene prediction in human and mouse. Genome Res 13:108–17
    [Google Scholar]
  114. 114. 
    Paten B, Novak AM, Eizenga JM, Garrison E 2017. Genome graphs and the evolution of genome inference. Genome Res 27:665–76
    [Google Scholar]
  115. 115. 
    Pavan S, Rommel K, Mateo Marquina ME, Höhn S, Lanneau V et al. 2017. Clinical practice guidelines for rare diseases: the Orphanet database. PLOS ONE 12:e0170365
    [Google Scholar]
  116. 116. 
    Pei B1, Sisu C, Frankish A, Howald C, Habegger L et al. 2012. The GENCODE pseudogene resource. Genome Biol 13:R51
    [Google Scholar]
  117. 117. 
    Pennacchio LA, Ahituv N, Moses AM, Prabhakar S, Nobrega MA et al. 2006. In vivo enhancer analysis of human conserved non-coding sequences. Nature 444:499–502
    [Google Scholar]
  118. 118. 
    Pennacchio LA, Bickmore W, Dean A, Nobrega MA, Bejerano G 2015. Enhancers: five essential questions. Nat. Rev. Genet. 14:288–95
    [Google Scholar]
  119. 119. 
    Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, Salzberg SL 2015. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33:290–95
    [Google Scholar]
  120. 120. 
    Pertea M, Shumate A, Pertea G, Varabyou A, Brietwieser FP et al. 2018. CHESS: a new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise. Genome Biol 19:208
    [Google Scholar]
  121. 121. 
    Phillips M. 2018. International data-sharing norms: from the OECD to the General Data Protection Regulation (GDPR). Hum. Genet. 137:575–82
    [Google Scholar]
  122. 122. 
    Plath K, Mlynarczyk-Evans S, Nusinow DA, Panning B 2002. Xist RNA and the mechanism of X chromosome inactivation. Annu. Rev. Genet. 36:233–78
    [Google Scholar]
  123. 123. 
    Poliseno L, Salmena L, Zhang J, Carver B, Haveman WJ, Pandolfi PP 2010. A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature 465:1033–38
    [Google Scholar]
  124. 124. 
    Prakash T, Sharma VK, Adati N, Ozawa R, Kumar N et al. 2010. Expression of conjoined genes: another mechanism for gene regulation in eukaryotes. PLOS ONE 5:e13284
    [Google Scholar]
  125. 125. 
    Pruitt KD, Harrow J, Harte RA, Wallin C, Diekhans M et al. 2009. The Consensus Coding Sequence (CCDS) project: identifying a common protein-coding gene set for the human and mouse genomes. Genome Res 19:1316–23
    [Google Scholar]
  126. 126. 
    Pujar S, O'Leary NA, Farrell CM, Loveland JE, Mudge JM et al. 2018. Consensus Coding Sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation. Nucleic Acids Res 46:D221–28
    [Google Scholar]
  127. 127. 
    Reis EM, Nakaya HI, Louro R, Canavez FC, Flatschart AV et al. 2004. Antisense intronic non-coding RNA levels correlate to the degree of tumor differentiation in prostate cancer. Oncogene 23:6684–92
    [Google Scholar]
  128. 128. 
    Rhoads A, Au KF. 2015. PacBio sequencing and its applications. Genom. Proteom. Bioinform. 13:278–89
    [Google Scholar]
  129. 129. 
    Rinn JL, Kertesz M, Wang JK, Squazzo SL, Xu X et al. 2007. Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 129:1311–23
    [Google Scholar]
  130. 130. 
    Roll-Hansen N. 2014. The holist tradition in twentieth century genetics. Wilhelm Johannsen's genotype concept. J. Physiol. 592:2431–38
    [Google Scholar]
  131. 131. 
    Sanger F, Nicklen S, Coulson AR 1977. DNA sequencing with chain-terminating inhibitors. PNAS 74:5463–67
    [Google Scholar]
  132. 132. 
    Schneider VA, Graves-Lindsay T, Howe K, Bouk N, Chen HC et al. 2017. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res 27:849–64
    [Google Scholar]
  133. 133. 
    Shay JW, Wright WE. 2019. Telomeres and telomerase: three decades of progress. Nat. Rev. Genet. 20:299–309
    [Google Scholar]
  134. 134. 
    Sherry ST, Ward M, Sirotkin K 1999. dbSNP—database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome Res 9:677–79
    [Google Scholar]
  135. 135. 
    Söll D, Ohtsuka E, Jones DS, Lohrmann R, Hayatsu H et al. 1965. Studies on polynucleotides, XLIX. Stimulation of the binding of aminoacyl-sRNA's to ribosomes by ribotrinucleotides and a survey of codon assignments for 20 amino acids. PNAS 54:1378–85
    [Google Scholar]
  136. 136. 
    Spivakov M. 2014. Spurious transcription factor binding: non-functional or genetically redundant. BioEssays 36:798–806
    [Google Scholar]
  137. 137. 
    Starck SR, Tsai JC, Chen K, Shodiya M, Wang L et al. 2016. Translation from the 5′ untranslated region shapes the integrated stress response. Science 351:aad3867
    [Google Scholar]
  138. 138. 
    Stark Z, Dolman L, Manolio TA, Ozenberger B, Hill SL et al. 2019. Integrating genomics into healthcare: a global responsibility. Am. J. Hum. Genet. 104:13–20
    [Google Scholar]
  139. 139. 
    Steijger T, Abril JF, Engström PG, Kokocinski F, RGASP Consort. et al. 2013. Assessment of transcript reconstruction methods for RNA-seq. Nat. Methods. 10:1177–84
    [Google Scholar]
  140. 140. 
    Stevens H. 2013. Life Out of Sequence: A Data-Driven History of Bioinformatics Chicago: Univ. Chicago Press
  141. 141. 
    Storz G. 2002. An expanding universe of noncoding RNAs. Science 296:1260–63
    [Google Scholar]
  142. 142. 
    Stunnenberg HG 2016. The International Human Epigenome Consortium: a blueprint for scientific collaboration and discovery. Cell 167:1145–49
    [Google Scholar]
  143. 143. 
    Sturtevant H. 1913. The linear arrangement of six sex-linked factors in Drosophila as shown by their mode of association. J. Exp. Zool. 14:43–59
    [Google Scholar]
  144. 144. 
    Sudmant PH, Kitzman JO, Antonacci F, Alkan C, Malig M et al. 2010. Diversity of human copy number variation and multicopy genes. Science 330:641–46
    [Google Scholar]
  145. 145. 
    Tanaka TU, Clayton L, Natsume T 2013. Three wise centromere functions: see no error, hear no break, speak no delay. EMBO Rep 14:1073–83
    [Google Scholar]
  146. 146. 
    Tang AD, Soulette CM, van Baren MJ, Hart K, Hrabeta-Robinson E et al. 2018. Full-length transcript characterization of SF3B1 mutation in chronic lymphocytic leukemia reveals downregulation of retained introns. bioRxiv 410183. https://doi.org/10.1101/410183
    [Crossref]
  147. 147. 
    Tani H, Torimura M, Akimitsu N 2013. The RNA degradation pathway regulates the function of GAS5 a non-coding RNA in mammalian cells. PLOS ONE 8:e55684
    [Google Scholar]
  148. 148. 
    Tarailo-Graovac M, Chen N. 2009. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinform. 25:4 10 1–14
    [Google Scholar]
  149. 149. 
    Tardaguila M, de la Fuente L, Marti C, Pereira C, Pardo-Palacios FJ et al. 2018. SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification. Genome Res 28:396–411
    [Google Scholar]
  150. 150. 
    Tilgner H, Jahanbani F, Blauwkamp T, Moshrefi A, Jaeger E et al. 2015. Comprehensive transcriptome analysis using synthetic long-read sequencing reveals molecular co-association of distant splicing events. Nat. Biotechnol. 33:736–42
    [Google Scholar]
  151. 151. 
    Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G et al. 2010. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28:511–15
    [Google Scholar]
  152. 152. 
    Tress ML, Abascal F, Valencia A 2017. Alternative splicing may not be the key to proteome complexity. Trends Biochem. Sci. 42:98–110
    [Google Scholar]
  153. 153. 
    Uhlén M, Fagerberg L, Hallström BM, Lindskog C, Oksvold P et al. 2015. Tissue-based map of the human proteome. Science 347:1260419
    [Google Scholar]
  154. 154. 
    UniProt Consort 2019. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res 47:D506–15
    [Google Scholar]
  155. 155. 
    US Dep. Energy 2019. History of the Human Genome Project. Human Genome Project Information Archive: 1990–2003 https://web.ornl.gov/sci/techresources/Human_Genome/project/hgp.shtml
    [Google Scholar]
  156. 156. 
    van der Wijst MGP, Brugge H, de Vries DH, Deelen P, Swertz MA et al. 2018. Single-cell RNA sequencing identifies cell type-specific cis-eQTLs and co-expression QTLs. Nat. Genet. 50:493–97
    [Google Scholar]
  157. 157. 
    Venter JC, Adams MD, Myers EW, Li PW, Mural RJ et al. 2001. The sequence of the human genome. Science 291:1304–51
    [Google Scholar]
  158. 158. 
    Visel A, Minovitsky S, Dubchak I, Pennacchio LA 2007. VISTA Enhancer Browser—a database of tissue-specific human enhancers. Nucleic Acids Res 35:D88–92
    [Google Scholar]
  159. 159. 
    Visel A, Rubin EM, Pennacchio LA 2009. Genomic views of distant-acting enhancers. Nature 461:199–205
    [Google Scholar]
  160. 160. 
    Volders PJ, Anckaert J, Verheggen K, Nuytens J, Martens L et al. 2019. LNCipedia 5: towards a reference set of human long non-coding RNAs. Nucleic Acids Res 47:D135–39
    [Google Scholar]
  161. 161. 
    Waddington CH. 1939. Introduction to Modern Genetics London: Allen & Unwin
  162. 162. 
    Wang L, Park HJ, Dasari S, Wang S, Kocher JP, Li W 2013. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model. Nucleic Acids Res 41:e74
    [Google Scholar]
  163. 163. 
    Wang Z, Gerstein M, Snyder M 2009. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10:57–63
    [Google Scholar]
  164. 164. 
    Weirather JL, de Cesare M, Wang Y, Piazza P, Sebastiano V et al. 2017. Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis. F1000Res 6:100
    [Google Scholar]
  165. 165. 
    Wellcome Trust 2003. Sharing data from large-scale biological research projects: a system of tripartite responsibility Rep., Wellcome Trust London: https://www.sanger.ac.uk/legal/assets/fortlauderdalereport.pdf
  166. 166. 
    Wilhelm M, Schlegl J, Hahne H, Gholami AM, Lieberenz M et al. 2014. Mass-spectrometry-based draft of the human proteome. Nature 509:582–87
    [Google Scholar]
  167. 167. 
    Wolfsberg TG. 2011. Using the NCBI Map Viewer to browse genomic sequence data. Curr. Protoc. Hum. Genet. 69:18 5 1–25
    [Google Scholar]
  168. 168. 
    Wright JC, Mudge J, Weisser H, Barzine MP, Gonzalez JM et al. 2016. Improving GENCODE reference gene annotation using a high-stringency proteogenomics workflow. Nat. Commun. 7:11778
    [Google Scholar]
  169. 169. 
    Wyman D, Balderrama-Gutierrez G, Reese F, Jiang S, Rahmanian S et al. 2019. A technology-agnostic long-read analysis pipeline for transcriptome discovery and quantification. bioRxiv 672931. https://doi.org/10.1101/672931
    [Crossref]
  170. 170. 
    Wyman D, Mortazavi A. 2019. TranscriptClean: variant-aware correction of indels, mismatches and splice junctions in long-read transcripts. Bioinformatics 35:340–42
    [Google Scholar]
  171. 171. 
    Xue X, Yang YA, Zhang A, Fong KW, Kim J et al. 2016. LncRNA HOTAIR enhances ER signaling and confers tamoxifen resistance in breast cancer. Oncogene 35:2746–55
    [Google Scholar]
  172. 172. 
    Yates AD, Achuthan P, Akanni W, Allen J, Allen J et al. 2020. Ensembl 2020. Nucleic Acids Res 48:D682–88
    [Google Scholar]
  173. 173. 
    Zerbino DR, Wilder SP, Johnson N, Juettemann T, Flicek PR 2015. The Ensembl regulatory build. Genome Biol 16:56
    [Google Scholar]
  174. 174. 
    Zhao S, Fung-Leung WP, Bittner A, Ngo K, Liu X 2014. Comparison of RNA-Seq and microarray in transcriptome profiling of activated T cells. PLOS ONE 9:e78644
    [Google Scholar]
/content/journals/10.1146/annurev-genom-121119-083418
Loading
/content/journals/10.1146/annurev-genom-121119-083418
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error