Summer Paper Reading List 2024
On this page
This is my summer reading list for 2024.
The list covers the foundational concepts and cutting-edge research in machine learning, neural networks, transformers, and distributed systems.
It is based on my own research interests and recommendations from others.
Pick a few that grab you and really dig in.
Implement the key ideas if you can.
It’s amazing how much clearer things become when you actually build them.
Any suggestions? Send me a DM on Twitter @chrisbbh (opens in a new tab).
If I find or get recommended a paper that’s relevant and insightful, I’ll add it to the list.
I’ll update this post with my progress as I work through the papers. Each paper will be marked with a colored dot indicating its status: unread (🔵), read (🟢), or abandoned (🔴).
Foundational Computer Science Papers
- Go To Statement Considered Harmful by Edsger W. Dijkstra
- Communicating Sequential Processes by C. A. R. Hoare
- Computing Machinery and Intelligence by Alan Turing
Fundamental Concepts and Techniques in Machine Learning
- Deep Learning Using Rectified Linear Units (ReLU) by Abien Fred M. Agarap
- Layer Normalization by Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton
- Adam: A Method for Stochastic Optimization by Diederik P. Kingma and Jimmy Lei Ba
- Dropout: A Simple Way to Prevent Neural Networks from Overfitting by Nitish Srivastava et al.
- Gaussian Error Linear Units (GELUs) by Dan Hendrycks and Kevin Gimpel
- Learning Internal Representations by Error Propagation by David Rumelhart, Geoffrey Hinton, and Ronald Williams
- Greedy Function Approximation: A Gradient Boosting Machine by Jerome H. Friedman
- A Few Useful Things to Know About Machine Learning by Pedro Domingos
- The Unreasonable Effectiveness of Data by Halevy et al.
- XGBoost: A Scalable Tree Boosting System by Tianqi Chen and Carlos Guestrin
- Distributed Representations of Words and Phrases and their Compositionality by Mikolov et al.
- A Cookbook of Self-Supervised Learning by Randall Balestriero et al.
Advanced Neural Network Architectures
- Long Short-Term Memory by Sepp Hochreiter and Jürgen Schmidhuber
- On the Difficulty of Training Recurrent Neural Networks by Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio
- xLSTM: Extended Long Short-Term Memory by Maximilian Beck et al.
- Deep Residual Learning for Image Recognition by Kaiming He et al.
- Understanding LLMs: A Comprehensive Overview from Training to Inference by Yiheng Liu et al.
- Show, Attend and Tell: Neural Image Caption Generation with Visual Attention by Kelvin Xu et al.
- ImageNet Classification with Deep Convolutional Neural Networks by Krizhevsky et al.
- Generative Adversarial Nets by Goodfellow et al.
Transformer Models and Innovations
- Attention Is All You Need by Vaswani et al.
- BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin et al.
- RoBERTa: A Robustly Optimized BERT Pretraining Approach by Yinhan Liu et al.
- Language Models Are Few-Shot Learners by Tom B. Brown et al.
- Memorizing Transformers by Yuhuai Wu et al.
- Transformers Can Do Arithmetic with the Right Embeddings by Sean McLeish et al.
- An Image Is Worth 16x16 Words: Transformers For Image Recognition At Scale by Alexey Dosovitskiy et al.
Retrieval-Augmented Learning (RAG)
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks by Patrick Lewis et al.
- RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing by Yucheng Hu and Yuxing Lu
- A Comprehensive Survey on Vector Database: Storage and Retrieval Technique, Challenge by Han, Yikun and Liu, Chunjiang and Wang, Pengfei
Distributed Systems and Large-Scale Machine Learning
- The Google File System by Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung
- Spanner: Google's Globally-Distributed Database by James C. Corbett et al.
- Pathways: Asynchronous Distributed Dataflow for ML by Paul Barham et al.
- Kafka: A Distributed Messaging System for Log Processing by Jay Kreps et al.
- TAO: Facebook's Distributed Data Store for the Social Graph by Nathan Bronson et al.
- Dynamo: Amazon's Highly Available Key-Value Store by Giuseppe DeCandia et al.
- Federated Learning: Strategies for Improving Communication Efficiency by Konečný et al.
- Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web by David Karger et al.
- Scaling Memcache at Facebook by Rajesh Nishtala et al.
- The Chubby Lock Service for Loosely-Coupled Distributed Systems by Mike Burrows
- MapReduce: Simplified Data Processing on Large Clusters by Jeffrey Dean and Sanjay Ghemawat
- Spark: Cluster Computing with Working Sets by Matei Zaharia et al.
Recent Innovations and Applications
- Highly Accurate Protein Structure Prediction with AlphaFold by John Jumper et al.
- Emergent Autonomous Scientific Research Capabilities of Large Language Models by Daniil A. Boiko et al.
- Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking by Eric Zelikman et al.
- Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models by Hyungjoo Chae et al.
- Generative Agents: Interactive Simulacra of Human Behavior by Joon Sung Park et al.
- Student of Games: A unified learning algorithm for both perfect and imperfect information games by Martin Schmid et al.
- KAN: Kolmogorov-Arnold Networks by Ziming Liu et al.
Other interesting papers
- Medallion Fund: The Ultimate Counterexample? by Gregory Zuckerman
- Bitcoin: A Peer-to-Peer Electronic Cash System by Satoshi Nakamoto
Ilya 30u30 papers
This list is inspired by a story involving Ilya Sutskever (opens in a new tab) and John Carmack (opens in a new tab). As Carmack recounts:
So I asked Ilya Sutskever, OpenAI’s chief scientist, for a reading list. He gave me a list of like 40 research papers and said, ‘If you really learn all of these, you’ll know 90% of what matters today.’ And I did. I plowed through all those things and it all started sorting out in my head.
It’s important to note that the papers listed below are not confirmed to be the exact ones Sutskever recommended. This story comes from an interview with Carmack (opens in a new tab), but the specific papers remain unverified.
It’s an interesting collection, if true.
The kind of thing that makes you wonder what you’d put on your own list if you had to distill an entire field down to its essence.
What would be your “90% of what matters” in your area of expertise?
Of course, even if this is the real list, it’s a snapshot in time.
In a field moving as rapidly as AI, today’s 90% might be tomorrow’s 50%.
But there’s value in understanding the foundations, the papers that shaped the current landscape.
Source of list: Ilya 30u30 (opens in a new tab).
- The Annotated Transformer by Alexander Rush, Austin Huang, Suraj Subramanian, Jonathan Sum, Khalid Almubarak, Stella Biderman
- The First Law of Complexodynamics by Scott Aaronson
- The Unreasonable Effectiveness of RNNs by Andrej Karpathy
- Understanding LSTM Networks by Christopher Olah
- Recurrent Neural Network Regularization by Wojciech Zaremba et al.
- Keeping Neural Networks Simple by Minimizing the Description Length of the Weights by Geoffrey E. Hinton, Drew van Camp
- Pointer Networks by Oriol Vinyals et al.
- ImageNet Classification with Deep Convolutional Neural Networks by Alex Krizhevsky et al.
- Order Matters: Sequence to sequence for sets by Oriol Vinyals et al.
- GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism by Yanping Huang et al.
- Deep Residual Learning for Image Recognition by Kaiming He et al.
- Multi-Scale Context Aggregation by Dilated Convolutions by Fisher Yu, Vladlen Koltun
- Neural Quantum Chemistry by Justin Gilmer et al.
- Attention Is All You Need by Ashish Vaswani et al.
- Neural Machine Translation by Jointly Learning to Align and Translate by Dzmitry Bahdanau et al.
- Identity Mappings in Deep Residual Networks by Kaiming He et al.
- A Simple Neural Network Module for Relational Reasoning by Adam Santoro et al.
- Variational Lossy Autoencoder by Xi Chen et al.
- Relational Recurrent Neural Networks by Adam Santoro et al.
- Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton by Scott Aaronson, Sean M. Carroll, Lauren Ouellette
- Neural Turing Machines by Alex Graves et al.
- Deep Speech 2: End-to-End Speech Recognition in English and Mandarin by Dario Amodei et al.
- Scaling Laws for Neural Language Models by Jared Kaplan et al.
- A Tutorial Introduction to the Minimum Description Length Principle by Peter D. Grünwald
- Machine Super Intelligence by Shane Legg
- Kolmogorov Complexity (from page 434) by A.Shen, V. A. Uspensky, N. Vereshchagin
- CS231n Convolutional Neural Networks for Visual Recognition by Stanford University