Stephen Wolfram's "What Is ChatGPT Doing … and Why Does It Work?" gives an example where certain small neural nets cannot learn a function.
We will understand how larger trained networks approximate the function. Then by handcrafting weights ourselves, we'll discover that the function can be approximated arbitrarily well using fewer neurons than expected.
This will be part of my in-progress book "LLM Foundations" (working title). A YouTube video will also be produced, but likely after the jam's conclusion.