Current Advances in Neural Networks

Víctor Gallego; David Ríos Insua

doi:10.1146/annurev-statistics-040220-112019

Annual Review of Statistics and Its Application

Volume 9, 2022

Review Article

Free

Current Advances in Neural Networks

Víctor Gallego¹, and David Ríos Insua^1,2
View Affiliations Hide Affiliations

Affiliations: ¹Institute of Mathematical Sciences (ICMAT-CSIC), 28049 Madrid, Spain; email: [email protected] ²Department of Statistics and Operations Research, Universidad Complutense de Madrid, 28003 Madrid, Spain
Vol. 9:197-222 (Volume publication date March 2022) https://doi.org/10.1146/annurev-statistics-040220-112019
First published as a Review in Advance on November 02, 2021
Copyright © 2022 by Annual Reviews. All rights reserved

Abstract

This article reviews current advances and developments in neural networks. This requires recalling some of the earlier work in the field. We emphasize Bayesian approaches and their benefits compared to more standard maximum likelihood treatments. Several representative experiments using varied modern neural architectures are presented.

Keyword(s): Bayesian methods, deep learning, neural networks

Article metrics loading...

/content/journals/10.1146/annurev-statistics-040220-112019

2022-03-07

2024-04-19

Full text loading...

/deliver/fulltext/statistics/9/1/annurev-statistics-040220-112019.html?itemId=/content/journals/10.1146/annurev-statistics-040220-112019&mimeType=html&fmt=ahah

Literature Cited

Arjovsky M, Chintala S, Bottou L. 2017. Wasserstein generative adversarial networks. Proc. Mach. Learn. Res. 70:214–23
[Google Scholar]
Babu J, Banks D, Cho H, Han D, Sang H, Wang S. 2021. A statistician teaches deep learning. arXiv:2102.01194 [stat.ML]
Bahdanau D, Cho K, Bengio Y. 2014. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473 [cs.CL]
Baldi P. 2012. Autoencoders, unsupervised learning, and deep architectures. Proc. Mach. Learn. Res. 27:37–49
[Google Scholar]
Banks DL, Aliaga JMR, Insua DR. 2015. Adversarial Risk Analysis Boca Raton, FL: Chapman and Hall/CRC
Bender EM, Gebru T, McMillan-Major A, Shmitchell S. 2021. On the dangers of stochastic parrots: Can language models be too big?. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency610–23 New York: ACM
Bengio Y. 2008. Neural net language models. Scholarpedia 3:13881
[Google Scholar]
Bishop C. 2006. Pattern Recognition and Machine Learning New York: Springer
Blei DM, Kucukelbir A, McAuliffe JD. 2017. Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112:518859–77
[Google Scholar]
Bottou L 2010. Large-scale machine learning with stochastic gradient descent. Proceedings of COMPSTAT'2010 Y Lechevallier, G Saporta 177–86 New York: Springer
Brock A, Donahue J, Simonyan K. 2019. Large scale GAN training for high fidelity natural image synthesis. 7th International Conference on Learning Representations, ICLR 2019 N.p.: OpenReview.net
Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J et al. 2020. Language models are few-shot learners. arXiv:2005.14165 [cs.CL]
Buntine D, Weigend A. 1991. Bayesian back-propagation. Complex Syst. 5:603–43
[Google Scholar]
Burkart N, Huber M. 2021. A survey on the explainability of supervised machine learning. J. Artif. Intell. Res. 70:245–317
[Google Scholar]
Cao Y, Geddes TA, Yang JYH, Yang P 2020. Ensemble deep learning in bioinformatics. Nat. Mach. Intell. 2:9500–8
[Google Scholar]
Carleo G, Cirac I, Cranmer K, Daudet L, Schuld M et al. 2019. Machine learning and the physical sciences. Rev. Mod. Phys. 91:4045002
[Google Scholar]
Carlini N, Athalye A, Papernot N, Brendel W, Rauber J et al. 2019. On evaluating adversarial robustness. arXiv:1902.06705 [cs.LG]
Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B et al. 2017. Stan: a probabilistic programming language. J. Stat. Softw. 76:1
[Google Scholar]
Chen T, Fox E, Guestrin C 2014. Stochastic gradient Hamiltonian Monte Carlo. Proc. Mach. Learn. Res. 32:21683–91
[Google Scholar]
Cho K, van Merrienboer B, Gülçehre Ç, Bahdanau D, Bougares F et al. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) A Moschitti, B Pang W Daelemans1724–34 Stroudsburg, PA: Assoc. Comput. Linguist.
Chollet F. 2018. Deep Learning with Python Shelter Island NY: Manning
Chung J, Gulcehre C, Cho K, Bengio Y. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555 [cs.NE]
Cruse H. 2006. Neural Networks as Cybernetic Systems Bielefeld, Ger.: Brains Minds Media. , 2nd ed..
Csiszár I, Shields P. 2004. Information theory and statistics: a tutorial. Found. Trends Commun. Inform. Theory 1:4417–528
[Google Scholar]
Cybenko G. 1989. Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2:4303–14
[Google Scholar]
Dalvi N, Domingos P, Mausam Sanghai S, Verma D. 2004. Adversarial classification. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining99–108 New York: ACM
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L 2009. ImageNet: a large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition248–55 Washington, DC: IEEE
Deng L, Platt JC 2014. Ensemble deep learning for speech recognition. 15th Annual Conference of the International Speech Communication Association H Li, P Ching 1915–19 Red Hook, NY: Curran
Devlin J, Chang MW, Lee K, Toutanova K 2018. BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 [cs.CL]
Dieng AB, Ruiz FJ, Blei DM, Titsias MK. 2019. Prescribed generative adversarial networks. arXiv:1910.04302 [stat.ML]
Fan J, Ma C, Zhong Y 2019. A selective overview of deep learning. arXiv:1904.05526 [stat.ML]
Finn C, Abbeel P, Levine S 2017. Model-agnostic meta-learning for fast adaptation of deep networks. Proc. Mach. Learn. Res. 70:1126–35
[Google Scholar]
French S, Rios Insua D. 2000. Statistical Decision Theory New York: Wiley
Frey BJ, Hinton GE, Dayan P 1995. Does the wake-sleep algorithm produce good density estimators?. Proceedings of the 8th International Conference on Neural Information Processing Systems DS Touretzky, MC Mozer, ME Hasselmo 661–67 Cambridge, MA: MIT Press
Gal Y, Ghahramani Z. 2016. Dropout as a Bayesian approximation: representing model uncertainty in deep learning. Proc. Mach. Learn. Res. 48:1050–59
[Google Scholar]
Gallego V, Insua DR. 2018. Stochastic gradient MCMC with repulsive forces. arXiv:1812.00071 [stat.ML]
Gallego V, Insua DR. 2021. Variationally inferred sampling through a refined bound. Entropy 23:1123
[Google Scholar]
Gamerman D, Lopes H. 2006. Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference Boca Raton, FL: Chapman and Hall/CRC
Ghosh S, Yao J, Doshi-Velez F. 2018. Structured variational learning of Bayesian neural networks with horseshoe priors. Proc. Mach. Learn. Res. 80:1744–53
[Google Scholar]
Girshick R, Donahue J, Darrell T, Malik J 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition580–87 Washington, DC: IEEE
Glorot X, Bengio Y. 2010. Understanding the difficulty of training deep feedforward neural networks. Proc. Mach. Learn. Res. 9:249–56
[Google Scholar]
Goodfellow I, Bengio Y, Courville A. 2016. Deep Learning Cambridge, MA: MIT Press
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D et al. 2014a. Generative adversarial nets. Advances in Neural Information Processing Systems 27 Z Ghahramani, M Welling, C Cortes, N Lawrence, KQ Weinberger 2672–80 N.p.: NeurIPS
[Google Scholar]
Goodfellow IJ, Shlens J, Szegedy C. 2014b. Explaining and harnessing adversarial examples. arXiv:1412.6572 [stat.ML]
Gordon A, Henzinger TA, Nori A, Rajamani S 2014. Probabilistic programming. FOSE 2014: Future of Software Engineering Proceedings167–81 New York: ACM
Graves A, Wayne G, Danihelka I. 2014. Neural Turing machines. arXiv:1410.5401 [cs.NE]
Graves A, Wayne G, Reynolds M, Harley T, Danihelka I et al. 2016. Hybrid computing using a neural network with dynamic external memory. Nature 538:7626471–76
[Google Scholar]
Green P. 1995. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82:4711–32
[Google Scholar]
Grigoriescu S, Tranea B, Cocias T, Macesan G 2020. A survey of deep learning techniques for autonomous driving. J. Field Robot. 37:362–86
[Google Scholar]
Hargreaves-Heap S, Varoufakis Y. 2004. Game Theory: A Critical Introduction. London: Routledge
He K, Zhang X, Ren S, Sun J 2016. Deep residual learning for image recognition. Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition770–78 Washington, DC: IEEE
Hessler G, Boringhaus KH. 2018. Artificial intelligence in drug design. Molecules 23:2520
[Google Scholar]
Hochreiter S, Bengio Y, Frasconi P, Schmidhuber J et al. 2001. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. A Field Guide to Dynamical Recurrent Networks JF Kolen, SC Kremer 237–43 Piscataway, NJ: Wiley-IEEE Press
[Google Scholar]
Hochreiter S, Schmidhuber J. 1997. Long short-term memory. Neural Comput. 9:81735–80
[Google Scholar]
Houlsby N, Huszár F, Ghahramani Z, Lengyel M 2011. Bayesian active learning for classification and preference learning. arXiv:1112.5745 [stat.ML]
Huszár F. 2017. Variational inference using implicit distributions. arXiv:1702.08235 [stat.ML]
Insua DR, Müller P 1998. Feedforward neural networks for nonparametric regression. Practical Nonparametric and Semiparametric Bayesian Statistics D Dey, P Müller, D Sinha 181–93 New York: Springer
[Google Scholar]
Insua DR, Naveiro R, Gallego V, Poulos J. 2020. Adversarial machine learning: Perspectives from adversarial risk analysis. arXiv:2003.03546 [cs.AI]
Ioffe S, Szegedy C. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167 [cs.LG]
Izmailov P, Podoprikhin D, Garipov T, Vetrov D, Wilson AG. 2018. Averaging weights leads to wider optima and better generalization. arXiv:1803.05407 [cs.LG]
Karras T, Laine S, Aittala M, Hellsten J, Lehtinen J, Aila T. 2020. Analyzing and improving the image quality of StyleGAN. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)pp. 8107–16 Washington, DC: IEEE https://doi.org/10.1109/CVPR42600.2020.00813
[Crossref]
Kingma DP, Ba J. 2014. Adam: a method for stochastic optimization. arXiv:1412.6980 [cs.LG]
Kingma DP, Welling M. 2013. Auto-encoding variational Bayes. arXiv:1312.6114 [stat.ML]
Kramer MA. 1991. Nonlinear principal component analysis using autoassociative neural networks. AIChE J. 37:2233–43
[Google Scholar]
Krishnan RG, Shalit U, Sontag D. 2015. Deep Kalman filters. arXiv:1511.05121 [stat.ML]
Krishnan RG, Shalit U, Sontag D. 2017. Structured inference networks for nonlinear state space models. AAAI'17: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence2101–9 Cambridge, MA: MIT Press
Krizhevsky A. 2009. Learning multiple layers of features from tiny images. Tech. Rep. Univ. Toronto Toronto, ON, Can:.
[Google Scholar]
Krizhevsky A, Sutskever I, Hinton GE 2012. ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25 F Pereira, CJC Burges, L Bottou, KQ Weinberger Red Hook, NY: Curran
[Google Scholar]
Krizhevsky A, Sutskever I, Hinton GE. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60:684–90
[Google Scholar]
Krogh A, Hertz JA. 1991. A simple weight decay can improve generalization. Proceedings of the 4th International Conference on Neural Information Processing Systems950–57 N.p.: NeurIPS
Lakshminarayanan B, Pritzel A, Blundell C 2017. Simple and scalable predictive uncertainty estimation using deep ensembles. Proceedings of the 31st International Conference on Neural Information Processing Systems U von Luxburg, I Guyon, S Bengio, H Wallach, R Fergus 6405–16 Red Hook, NY: Curran
Lavine M, West M. 1992. A Bayesian method for classification and discrimination. Can. J. Stat. 20:451–61
[Google Scholar]
LeCun Y, Boser J, Denker D, Henderson R, Howard R et al. 1989. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1:541–51
[Google Scholar]
LeCun Y, Bottou L, Bengio Y, Haffner P. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86:112278–324
[Google Scholar]
Liu Q, Wang D 2016. Stein variational gradient descent: a general purpose Bayesian inference algorithm. Proceedings of the 30th International Conference on Neural Information Processing Systems DD Lee, U von Luxburg, R Garnett, M Sugiyama, I Guyon 2378–86 Red Hook, NY: Curran
Liu Y, Ott M, Goyal N, Du J, Joshi M et al. 2019. RoBERTa: a robustly optimized BERT pretraining approach. arXiv:1907.11692 [cs.CL]
Loshchilov I, Hutter F. 2016. SGDR: Stochastic gradient descent with warm restarts. arXiv:1608.03983 [cs.LG]
Lundberg SM, Lee SI 2017. A unified approach to interpreting model predictions. Proceedings of the 31st International Conference on Neural Information Processing Systems U von Luxburg, I Guyon, S Bengio, H Wallach, R Fergus 4765–74 Red Hook, NY: Curran
Ma YA, Chen T, Fox E 2015. A complete recipe for stochastic gradient MCMC. Proceedings of the 28th International Conference on Neural Information Processing Systems C Cortes, DD Lee, M Sugiyama, R Garnett 2917–25 Red Hook, NY: Curran
Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C. 2011. Learning word vectors for sentiment analysis. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies142–50 Portland, OR: Assoc. Comput. Linguist.
Maddox WJ, Izmailov P, Garipov T, Vetrov DP, Wilson AG. 2019. A simple baseline for Bayesian uncertainty in deep learning. Advances in Neural Information Processing Systems 32 H Wallach, H Larochelle, A Beygelzimer, F d'Alché-Buc, E Fox, R Garnett Red Hook, NY: Curran https://proceedings.neurips.cc/paper/2019/file/118921efba23fc329e6560b27861f0c2-Paper.pdf
[Google Scholar]
Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A 2018. Towards deep learning models resistant to adversarial attacks. 6th International Conference on Learning Representations N.p.: OpenReview.net
Maroñas J, Paredes R, Ramos D. 2020. Calibration of deep probabilistic models with decoupled Bayesian neural networks. Neurocomputing 407:194–205
[Google Scholar]
McKay D. 1992. A practical Bayesian framework for backprop networks. Neural Comput. 4:448–72
[Google Scholar]
Menchero A, Diez RM, Insua DR, Müller P 2005. Bayesian analysis of nonlinear autoregression models based on neural networks. Neural Comput. 17:2453–85
[Google Scholar]
Meza J. 2010. Steepest descent. WIRE Comput. Stat. 2:63719–22
[Google Scholar]
Minsky M, Papert S. 1969. Perceptrons Cambridge, MA: MIT Press
Movellan JR, Gabbur P. 2020. Probabilistic transformers. arXiv:2010.15583 [cs.LG]
Müller P, Insua DR. 1998. Issues in Bayesian analysis of neural network models. Neural Comput. 10:3749–70
[Google Scholar]
Neal RM. 2012. Bayesian Learning for Neural Networks New York: Springer
Nowozin S, Cseke B, Tomioka R 2016. f-GAN: training generative neural samplers using variational divergence minimization. Proceedings of the 30th International Conference on Neural Information Processing Systems DD Lee, U von Luxburg, R Garnett, M Sugiyama, I Guyon 271–79 Red Hook, NY: Curran
Ovadia Y, Fertig E, Ren J, Nado Z, Sculley D et al. 2019. Can you trust your model's uncertainty? Evaluating predictive uncertainty under dataset shift. Advances in Neural Information Processing Systems 32 H Wallach, H Larochelle, A Beygelzimer, F d'Alché-Buc, et al. Red Hook, NY: Curran https://proceedings.neurips.cc/paper/2019/file/8558cb408c1d76621371888657d2eb1d-Paper.pdf
[Google Scholar]
Pan SJ, Yang Q. 2009. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22:101345–59
[Google Scholar]
Paszke A, Gross S, Massa F, Lerer A, Bradbury J et al. 2019. PyTorch: an imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems 32 H Wallach, H Larochelle, A Beygelzimer, F d'Alché-Buc, et al. Red Hook, NY: Curran https://papers.nips.cc/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf
[Google Scholar]
Qiu X, Zhang L, Ren Y, Suganthan PN, Amaratunga G. 2014. Ensemble deep learning for regression and time series forecasting. 2014 IEEE Symposium on Computational Intelligence in Ensemble Learning (CIEL) Washington, DC: IEEE
Qummar S, Khan FG, Shah S, Khan A, Shamshirband S et al. 2019. A deep learning ensemble approach for diabetic retinopathy detection. IEEE Access 7:150530–39
[Google Scholar]
Radford A, Kim JW, Hallacy C, Ramesh A, Goh G et al. 2021. Learning transferable visual models from natural language supervision. arXiv:2103.00020 [cs.CV]
Radford A, Narasimhan K, Salimans T, Sutskever I. 2018. Improving language understanding by generative pre-training. OpenAI June 11. https://openai.com/blog/language-unsupervised/
[Google Scholar]
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. 2019. Language models are unsupervised multitask learners. Tech. Rep. OpenAI, San Francisco, CA:
[Google Scholar]
Raftery A, Madigan D, Volinsky C 1996. Accounting for model uncertainty in survival analysis improves predictive performance. Bayesian Statistics: Proceedings of the Fifth Valencia International Meeting JM Bernardo, JO Berger, AP Dawid, AFM Smith 323–349 Oxford, UK: Oxford Univ. Press
Rezende D, Mohamed S 2015. Variational inference with normalizing flows. Proc. Mach. Learn. Res. 37:1530–38
[Google Scholar]
Ribeiro MT, Singh S, Guestrin C. 2016. Model-agnostic interpretability of machine learning. arXiv:1606.05386 [stat.ML]
Rios Insua D, Naveiro R, Gallego V. 2020. Perspectives on adversarial classification. Mathematics 8:111957
[Google Scholar]
Rios Insua D, Rios J, Banks D. 2009. Adversarial risk analysis. J. Am. Stat. Assoc. 104:841–54
[Google Scholar]
Robbins H, Monro S. 1951. A stochastic approximation method. Ann. Math. Stat. 22:3400–407
[Google Scholar]
Rosenblatt F. 1958. The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65:6386
[Google Scholar]
Rudin C. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1:5206–15
[Google Scholar]
Rumelhart DE, Hinton GE, Williams RJ. 1986. Learning representations by back-propagating errors. Nature 323:6088533–36
[Google Scholar]
Salakhutdinov R. 2015. Learning deep generative models. Annu. Rev. Stat. Appl. 2:361–85
[Google Scholar]
Samek W, Wiegand T, Müller KR. 2017. Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models. arXiv:1708.08296 [cs.AI]
Sharif Razavian A, Azizpour H, Sullivan J, Carlsson S 2014. CNN features off-the-shelf: an astounding baseline for recognition. 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops806–13 Washington, DC: IEEE
Simonyan K, Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 [cs.CV]
Smith LN. 2017. Cyclical learning rates for training neural networks. 2017 IEEE Winter Conference on Applications of Computer Vision (WACV)464–72 Washington, DC: IEEE
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. 2014. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15:561929–58
[Google Scholar]
Sutskever I, Vinyals O, Le QV 2014. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems 27 Z Ghahramani, M Welling, C Cortes, N Lawrence, KQ Weinberger 3104–12 Red Hook, NY: Curran
[Google Scholar]
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S et al. 2015. Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)pp. 1–9 Washington, DC: IEEE https://doi.org/10.1109/CVPR.2015.7298594
[Crossref]
Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D et al. 2013. Intriguing properties of neural networks. arXiv:1312.6199 [cs.CV]
Tan C, Sun F, Kong T, Zhang W, Yang C, Liu C 2018. A survey on deep transfer learning. Artificial Neural Networks and Machine Learning—ICANN 2018 V Kurková, Y Manolopoulos, B Hammer, L Iliadis, I Maglogiannis 270–79 New York: Springer
[Google Scholar]
Tan M, Le Q 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. Proc. Mach. Learn. Res. 97:6105–14
[Google Scholar]
Tay Y, Dehghani M, Abnar S, Shen Y, Bahri D et al. 2020. Long range arena: A benchmark for efficient transformers. arXiv:2011.04006 [cs.LG]
Ulyanov D, Vedaldi A, Lempitsky V. 2020. Deep image prior. Int. J. Comput. Vis. 128:71867–88
[Google Scholar]
van den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O et al. 2016a. WaveNet: a generative model for raw audio. arXiv:1609.03499 [cs.SD]
van den Oord A, Kalchbrenner N, Kavukcuoglu K. 2016b. Pixel recurrent neural networks. Proc. Mach. Learn. Res. 48:1747–56
[Google Scholar]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L et al. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 I Guyon, UV Luxburg, S Bengio, H Wallach, R Fergus, et al. Red Hook, NY: Curran https://papers.nips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
[Google Scholar]
Vidal R, Bruna J, Giryes R, Soatto S 2017. Mathematics of deep learning. arXiv:1712.04741 [cs.LG]
Wang H, Li G, Wang G, Peng J, Jiang H, Liu Y. 2017. Deep learning based ensemble approach for probabilistic wind power forecasting. Appl. Energy 188:56–70
[Google Scholar]
Wang Y, Yao Q, Kwok JT, Ni LM. 2020. Generalizing from a few examples: A survey on few-shot learning. ACM Comput. Surv. 53:363
[Google Scholar]
Welling M, Teh YW 2011. Bayesian learning via stochastic gradient Langevin dynamics. ICML'11: Proceedings of the 28th International Conference on International Conference on Machine Learning L Getoor, T Scheffer 681–88 Madison, WI: Omnipress
Wenpeng Yin JH, Roth D. 2019. Benchmarking zero-shot text classification: datasets, evaluation and entailment approach. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing3914–23 Hong Kong: Assoc. Comput. Linguist.
Werbos PJ. 1990. Backpropagation through time: what it does and how to do it. Proc. IEEE 78:101550–60
[Google Scholar]
Wilson AG, Izmailov P 2020. Bayesian deep learning and a probabilistic perspective of generalization. Advances in Neural Information Processing Systems 33 H Larochelle, M Ranzato, R Hadsell, M Balcan, H Lin N.p.: NeurIPS
[Google Scholar]
Wood F, Meent JW, Mansinghka V. 2014. A new approach to probabilistic programming inference. Proc. Mach. Learn. Res. 33:1024–32
[Google Scholar]
Wu Y, Wayne G, Graves A, Lillicrap T. 2018. The Kanerva machine: A generative distributed memory. arXiv:1804.01756 [stat.ML]
Xiao Y, Wu J, Lin Z, Zhao X 2018. A deep learning-based multi-model ensemble method for cancer prediction. Comput. Methods Programs Biomed. 153:1–9
[Google Scholar]
Yoon J, Kim T, Dia O, Kim S, Bengio Y, Ahn S 2018. Bayesian model-agnostic meta-learning. Advances in Neural Information Processing Systems 31 S Bengio, H Wallach, H Larochelle, K Grauman, N Cesa-Bianchi, R Garnett Red Hook, NY: Curran https://papers.nips.cc/paper/2018/file/e1021d43911ca2c1845910d84f40aeae-Paper.pdf
[Google Scholar]
Yosinski J, Clune J, Bengio Y, Lipson H. 2014. How transferable are features in deep neural networks?. Advances in Neural Information Processing Systems 27 Z Ghahramani, M Welling, C Cortes, N Lawrence, KQ Weinberger Red Hook, NY: Curran https://proceedings.neurips.cc/paper/2014/file/375c71349b295fbe2dcdca9206f20a06-Paper.pdf
[Google Scholar]
Zeager MF, Sridhar A, Fogal N, Adams S, Brown DE, Beling PA 2017. Adversarial learning in credit card fraud detection. 2017 Systems and Information Engineering Design Symposium (SIEDS)112–16 Washington, DC: IEEE

/content/journals/10.1146/annurev-statistics-040220-112019

Current Advances in Neural Networks

Annual Review of Statistics and Its Application 9, 197 (2022); https://doi.org/10.1146/annurev-statistics-040220-112019

/content/journals/10.1146/annurev-statistics-040220-112019

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Probabilistic Forecasting
  
  Tilmann Gneiting, and Matthias Katzfuss
  
  Vol. 1 (2014), pp. 125–151
- Functional Data Analysis
  
  Jane-Ling Wang, Jeng-Min Chiou, and Hans-Georg Müller
  
  Vol. 3 (2016), pp. 257–295
- Bayesian Computing with INLA: A Review
  
  Håvard Rue, Andrea Riebler, Sigrunn H. Sørbye, Janine B. Illian, Daniel P. Simpson, and Finn K. Lindgren
  
  Vol. 4 (2017), pp. 395–421
- Functional Regression
  
  Jeffrey S. Morris
  
  Vol. 2 (2015), pp. 321–359
- Topological Data Analysis
  
  Larry Wasserman
  
  Vol. 5 (2018), pp. 501–532
- Algorithmic Fairness: Choices, Assumptions, and Definitions
  
  Shira Mitchell, Eric Potash, Solon Barocas, Alexander D'Amour, and Kristian Lum
  
  Vol. 8 (2021), pp. 141–163
- Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis
  
  Hongzhe Li
  
  Vol. 2 (2015), pp. 73–94
- Learning Deep Generative Models
  
  Ruslan Salakhutdinov
  
  Vol. 2 (2015), pp. 361–385
- On p-Values and Bayes Factors
  
  Leonhard Held, and Manuela Ott
  
  Vol. 5 (2018), pp. 393–419
- High-Dimensional Statistics with a View Toward Applications in Biology
  
  Peter Bühlmann, Markus Kalisch, and Lukas Meier
  
  Vol. 1 (2014), pp. 255–278
More Less

Annual Review of Statistics and Its Application

Volume 9, 2022

Review Article

Free

Current Advances in Neural Networks

Abstract

Most Read This Month

Most Cited Most Cited RSS feed