Summer Paper Reading List 2024
| 10 min read
This is my summer reading list for 2024.
The list covers the foundational concepts and cutting-edge research in machine learning, neural networks, transformers, and distributed systems.
It is based on my own research interests and recommendations from others.
Pick a few that grab you and really dig in.
Implement the key ideas if you can.
It’s amazing how much clearer things become when you actually build them.
Any suggestions? Send me a DM on Twitter @chrisbbh.
If I find or get recommended a paper that’s relevant and insightful, I’ll add it to the list.
I’ll update this post with my progress as I work through the papers. Each paper will be marked with a colored dot indicating its status: unread (🔵), read (🟢), or abandoned (🔴).
Foundational Computer Science Papers
- Go To Statement Considered Harmfulby Edsger W. Dijkstra
- Communicating Sequential Processesby C. A. R. Hoare
- Computing Machinery and Intelligenceby Alan Turing
Fundamental Concepts and Techniques in Machine Learning
- Deep Learning Using Rectified Linear Units (ReLU)by Abien Fred M. Agarap
- Layer Normalizationby Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton
- Adam: A Method for Stochastic Optimizationby Diederik P. Kingma and Jimmy Lei Ba
- Dropout: A Simple Way to Prevent Neural Networks from Overfittingby Nitish Srivastava et al.
- Gaussian Error Linear Units (GELUs)by Dan Hendrycks and Kevin Gimpel
- Learning Internal Representations by Error Propagationby David Rumelhart, Geoffrey Hinton, and Ronald Williams
- Greedy Function Approximation: A Gradient Boosting Machineby Jerome H. Friedman
- A Few Useful Things to Know About Machine Learningby Pedro Domingos
- The Unreasonable Effectiveness of Databy Halevy et al.
- XGBoost: A Scalable Tree Boosting Systemby Tianqi Chen and Carlos Guestrin
- Distributed Representations of Words and Phrases and their Compositionalityby Mikolov et al.
- A Cookbook of Self-Supervised Learningby Randall Balestriero et al.
Advanced Neural Network Architectures
- Long Short-Term Memoryby Sepp Hochreiter and Jürgen Schmidhuber
- On the Difficulty of Training Recurrent Neural Networksby Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio
- xLSTM: Extended Long Short-Term Memoryby Maximilian Beck et al.
- Deep Residual Learning for Image Recognitionby Kaiming He et al.
- Understanding LLMs: A Comprehensive Overview from Training to Inferenceby Yiheng Liu et al.
- Show, Attend and Tell: Neural Image Caption Generation with Visual Attentionby Kelvin Xu et al.
- ImageNet Classification with Deep Convolutional Neural Networksby Krizhevsky et al.
- Generative Adversarial Netsby Goodfellow et al.
Transformer Models and Innovations
- Attention Is All You Needby Vaswani et al.
- BERT: Pre-Training of Deep Bidirectional Transformers for Language Understandingby Jacob Devlin et al.
- RoBERTa: A Robustly Optimized BERT Pretraining Approachby Yinhan Liu et al.
- Language Models Are Few-Shot Learnersby Tom B. Brown et al.
- Memorizing Transformersby Yuhuai Wu et al.
- Transformers Can Do Arithmetic with the Right Embeddingsby Sean McLeish et al.
- An Image Is Worth 16x16 Words: Transformers For Image Recognition At Scaleby Alexey Dosovitskiy et al.
Retrieval-Augmented Learning (RAG)
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasksby Patrick Lewis et al.
- RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processingby Yucheng Hu and Yuxing Lu
- A Comprehensive Survey on Vector Database: Storage and Retrieval Technique, Challengeby Han, Yikun and Liu, Chunjiang and Wang, Pengfei
Distributed Systems and Large-Scale Machine Learning
- The Google File Systemby Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung
- Spanner: Google's Globally-Distributed Databaseby James C. Corbett et al.
- Pathways: Asynchronous Distributed Dataflow for MLby Paul Barham et al.
- Kafka: A Distributed Messaging System for Log Processingby Jay Kreps et al.
- TAO: Facebook's Distributed Data Store for the Social Graphby Nathan Bronson et al.
- Dynamo: Amazon's Highly Available Key-Value Storeby Giuseppe DeCandia et al.
- Federated Learning: Strategies for Improving Communication Efficiencyby Konečný et al.
- Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Webby David Karger et al.
- Scaling Memcache at Facebookby Rajesh Nishtala et al.
- The Chubby Lock Service for Loosely-Coupled Distributed Systemsby Mike Burrows
- MapReduce: Simplified Data Processing on Large Clustersby Jeffrey Dean and Sanjay Ghemawat
- Spark: Cluster Computing with Working Setsby Matei Zaharia et al.
Recent Innovations and Applications
- Highly Accurate Protein Structure Prediction with AlphaFoldby John Jumper et al.
- Emergent Autonomous Scientific Research Capabilities of Large Language Modelsby Daniil A. Boiko et al.
- Quiet-STaR: Language Models Can Teach Themselves to Think Before Speakingby Eric Zelikman et al.
- Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Modelsby Hyungjoo Chae et al.
- Generative Agents: Interactive Simulacra of Human Behaviorby Joon Sung Park et al.
- Student of Games: A unified learning algorithm for both perfect and imperfect information gamesby Martin Schmid et al.
- KAN: Kolmogorov-Arnold Networksby Ziming Liu et al.
Other interesting papers
- Medallion Fund: The Ultimate Counterexample?by Gregory Zuckerman
- Bitcoin: A Peer-to-Peer Electronic Cash Systemby Satoshi Nakamoto
Ilya 30u30 papers
This list is inspired by a story involving Ilya Sutskever and John Carmack. As Carmack recounts:
So I asked Ilya Sutskever, OpenAI’s chief scientist, for a reading list. He gave me a list of like 40 research papers and said, ‘If you really learn all of these, you’ll know 90% of what matters today.’ And I did. I plowed through all those things and it all started sorting out in my head.
It’s important to note that the papers listed below are not confirmed to be the exact ones Sutskever recommended. This story comes from an interview with Carmack, but the specific papers remain unverified.
It’s an interesting collection, if true.
The kind of thing that makes you wonder what you’d put on your own list if you had to distill an entire field down to its essence.
What would be your “90% of what matters” in your area of expertise?
Of course, even if this is the real list, it’s a snapshot in time.
In a field moving as rapidly as AI, today’s 90% might be tomorrow’s 50%.
But there’s value in understanding the foundations, the papers that shaped the current landscape.
Source of list: Ilya 30u30.
- The Annotated Transformerby Alexander Rush, Austin Huang, Suraj Subramanian, Jonathan Sum, Khalid Almubarak, Stella Biderman
- The First Law of Complexodynamicsby Scott Aaronson
- The Unreasonable Effectiveness of RNNsby Andrej Karpathy
- Understanding LSTM Networksby Christopher Olah
- Recurrent Neural Network Regularizationby Wojciech Zaremba et al.
- Keeping Neural Networks Simple by Minimizing the Description Length of the Weightsby Geoffrey E. Hinton, Drew van Camp
- Pointer Networksby Oriol Vinyals et al.
- ImageNet Classification with Deep Convolutional Neural Networksby Alex Krizhevsky et al.
- Order Matters: Sequence to sequence for setsby Oriol Vinyals et al.
- GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelismby Yanping Huang et al.
- Deep Residual Learning for Image Recognitionby Kaiming He et al.
- Multi-Scale Context Aggregation by Dilated Convolutionsby Fisher Yu, Vladlen Koltun
- Neural Quantum Chemistryby Justin Gilmer et al.
- Attention Is All You Needby Ashish Vaswani et al.
- Neural Machine Translation by Jointly Learning to Align and Translateby Dzmitry Bahdanau et al.
- Identity Mappings in Deep Residual Networksby Kaiming He et al.
- A Simple Neural Network Module for Relational Reasoningby Adam Santoro et al.
- Variational Lossy Autoencoderby Xi Chen et al.
- Relational Recurrent Neural Networksby Adam Santoro et al.
- Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automatonby Scott Aaronson, Sean M. Carroll, Lauren Ouellette
- Neural Turing Machinesby Alex Graves et al.
- Deep Speech 2: End-to-End Speech Recognition in English and Mandarinby Dario Amodei et al.
- Scaling Laws for Neural Language Modelsby Jared Kaplan et al.
- A Tutorial Introduction to the Minimum Description Length Principleby Peter D. Grünwald
- Machine Super Intelligenceby Shane Legg
- Kolmogorov Complexity (from page 434)by A.Shen, V. A. Uspensky, N. Vereshchagin
- CS231n Convolutional Neural Networks for Visual Recognitionby Stanford University
Liked this post? Join the newsletter.
Get notified whenever I post something new.