Grokking (machine learning)

In machine learning, grokking, or delayed generalization, is a transition to generalization that occurs many training iterations after the interpolation threshold, after many iterations of seemingly little progress, as opposed to the usual process where generalization occurs slowly and progressively once the interpolation threshold has been reached.^[1]^[2]^[3]

The term derives from the word grok coined by Robert Heinlein in his novel Stranger in a Strange Land.

Grokking can be understood as a phase transition during the training process.^[4] While grokking has been thought of as largely a phenomenon of relatively shallow models, grokking has been observed in deep neural networks and non-neural models and is the subject of active research.^[5]^[6]^[7]^[8]

^ Pearce, Adam; Ghandeharioun, Asma; Hussein, Nada; Thain, Nithum; Wattenberg, Martin; Dixon, Lucas. "Do Machine Learning Models Memorize or Generalize?". pair.withgoogle.com. Retrieved 2024-06-04.
^ Power, Alethea; Burda, Yuri; Edwards, Harri; Babuschkin, Igor; Misra, Vedant (2022-01-06). "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets". arXiv:2201.02177 [cs.LG].
^ Minegishi, Gouki; Iwasawa, Yusuke; Matsuo, Yutaka (2024-05-09). "Bridging Lottery ticket and Grokking: Is Weight Norm Sufficient to Explain Delayed Generalization?". arXiv:2310.19470 [cs.LG].
^ Liu, Ziming; Kitouni, Ouail; Nolte, Niklas; Michaud, Eric J.; Tegmark, Max; Williams, Mike (2022). "Towards Understanding Grokking: An Effective Theory of Representation Learning". In Koyejo, Sanmi; Mohamed, S.; Agarwal, A.; Belgrave, Danielle; Cho, K.; Oh, A. (eds.). Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 – December 9, 2022. arXiv:2205.10343.
^ Fan, Simin; Pascanu, Razvan; Jaggi, Martin (2024-05-29). "Deep Grokking: Would Deep Neural Networks Generalize Better?". arXiv:2405.19454 [cs.LG].
^ Miller, Jack; O'Neill, Charles; Bui, Thang (2024-03-31). "Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity". arXiv:2310.17247 [cs.LG].
^ Liu, Ziming; Michaud, Eric J.; Tegmark, Max (2023). "Omnigrok: Grokking Beyond Algorithmic Data". The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1–5, 2023. OpenReview.net. arXiv:2210.01117.
^ Samothrakis, Spyridon; Matran-Fernandez, Ana; Abdullahi, Umar I.; Fairbank, Michael; Fasli, Maria (2022). "Grokking-like effects in counterfactual inference". International Joint Conference on Neural Networks, IJCNN 2022, Padua, Italy, July 18-23, 2022. IEEE. pp. 1–8. doi:10.1109/IJCNN55064.2022.9891910. ISBN 978-1-7281-8671-9.

[1] Pearce, Adam; Ghandeharioun, Asma; Hussein, Nada; Thain, Nithum; Wattenberg, Martin; Dixon, Lucas. "Do Machine Learning Models Memorize or Generalize?". pair.withgoogle.com. Retrieved 2024-06-04.

[2] Power, Alethea; Burda, Yuri; Edwards, Harri; Babuschkin, Igor; Misra, Vedant (2022-01-06). "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets". arXiv:2201.02177 [cs.LG].

[3] Minegishi, Gouki; Iwasawa, Yusuke; Matsuo, Yutaka (2024-05-09). "Bridging Lottery ticket and Grokking: Is Weight Norm Sufficient to Explain Delayed Generalization?". arXiv:2310.19470 [cs.LG].

[4] Liu, Ziming; Kitouni, Ouail; Nolte, Niklas; Michaud, Eric J.; Tegmark, Max; Williams, Mike (2022). "Towards Understanding Grokking: An Effective Theory of Representation Learning". In Koyejo, Sanmi; Mohamed, S.; Agarwal, A.; Belgrave, Danielle; Cho, K.; Oh, A. (eds.). Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 – December 9, 2022. arXiv:2205.10343.

[5] Fan, Simin; Pascanu, Razvan; Jaggi, Martin (2024-05-29). "Deep Grokking: Would Deep Neural Networks Generalize Better?". arXiv:2405.19454 [cs.LG].

[6] Miller, Jack; O'Neill, Charles; Bui, Thang (2024-03-31). "Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity". arXiv:2310.17247 [cs.LG].

[7] Liu, Ziming; Michaud, Eric J.; Tegmark, Max (2023). "Omnigrok: Grokking Beyond Algorithmic Data". The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1–5, 2023. OpenReview.net. arXiv:2210.01117.

[8] Samothrakis, Spyridon; Matran-Fernandez, Ana; Abdullahi, Umar I.; Fairbank, Michael; Fasli, Maria (2022). "Grokking-like effects in counterfactual inference". International Joint Conference on Neural Networks, IJCNN 2022, Padua, Italy, July 18-23, 2022. IEEE. pp. 1–8. doi:10.1109/IJCNN55064.2022.9891910. ISBN 978-1-7281-8671-9.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]