Dive into Deep Learning

Testing the potential of deep learning presents unique challenges because any single application brings together various disciplines. Applying deep learning requires simultaneously understanding (i) the motivations for casting a problem in a particular way; (ii) the mathematical form of a given model; (iii) the optimization algorithms for fitting the models to data; (iv) the statistical principles that tell us when we should expect our models to generalize to unseen data and practical methods for certifying that they have, in fact, generalized; and (v) the engineering techniques required to train models efficiently, navigating the pitfalls of numerical computing and getting the most out of available hardware. Teaching the critical thinking skills required to formulate problems, the mathematics to solve them, and the software tools to implement those solutions all in one place presents formidable challenges. Our goal in this book is to present a unified resource to bring would-be practitioners up to speed.

When we started this book project, there were no resources that simultaneously (i) remained up to date; (ii) covered the breadth of modern machine learning practices with sufficient technical depth; and (iii) interleaved exposition of the quality one expects of a textbook with the clean runnable code that one expects of a hands-on tutorial. We found plenty of code examples illustrating how to use a given deep learning framework (e.g., how to do basic numerical computing with matrices in TensorFlow) or for implementing particular techniques (e.g., code snippets for LeNet, AlexNet, ResNet, etc.) scattered across various blog posts and GitHub repositories. However, these examples typically focused on how to implement a given approach, but left out the discussion of why certain algorithmic decisions are made. While some interactive resources have popped up sporadically to address a particular topic, e.g., the engaging blog posts published on the website Distill, or personal blogs, they only covered selected topics in deep learning, and often lacked associated code. On the other hand, while several deep learning textbooks have emerged—e.g., Goodfellow et al. (2016), which offers a comprehensive survey on the basics of deep learning—these resources do not marry the descriptions to realizations of the concepts in code, sometimes leaving readers clueless as to how to implement them. Moreover, too many resources are hidden behind the paywalls of commercial course providers.

Introduction
Preliminaries
Linear Neural Networks for Regression
Linear Neural Networks for Classification
Multilayer Perceptrons
Builders’ Guide
Convolutional Neural Networks
Modern Convolutional Neural Networks
Recurrent Neural Networks
Modern Recurrent Neural Networks
Attention Mechanisms and Transformers
Optimization Algorithms
Computational Performance
Computer Vision
Natural Language Processing: Pretraining
Natural Language Processing: Applications
Reinforcement Learning
Gaussian Processes
Hyperparameter Optimization
Generative Adversarial Networks
Recommender Systems
Appendix: Mathematics for Deep Learning
Appendix: Tools for Deep Learning

Aston Zhang

Zack C. Lipton

Mu Li

Alex J. Smola