We introduce to the JSM community the mathematical and conceptual basics of recent deep learning methods, emphasizing the predictive power of multi-layer neural net models that learn distributed representations. Part 1 introduces neural nets, their supervised training by stochastic gradient descent (SGD) and back-propagation, and their big successes. Part 2 focuses on (non-convex) optimization, regularization, debugging and interpretation. Part 3 examines unsupervised learning, covering Boltzmann machines, variational autoencoders (VAEs), and particularly generative adversarial networks (GANs). Part 4 looks at applications to computer vision and human language, including object recognition, information retrieval, question answering, and machine translation, introducing convolutional nets, recurrent sequence models (LSTMs), and attention. Throughout, we try to make connections: Which neural net models can be cast as statistical models? Should statisticians now be using deep learning? Is this black-box really so easy to use, and, moreover, can it be opened? Where is there room for statisticians to contribute to the understanding of and further development of deep learning models?