In a post from last year – On the Surprising Parameter Efficiency of Vision Models, I discussed a question which had been puzzling me at the time – that image models appear to reach or exceed human parity with significantly fewer parameters than are seemingly used by the brain. This also applies to audio models and many other modalities where humans compete with AI systems. This was constrasted with large language models with their hundreds of billions of parameters needed to get close to seeming parity, and I was wondering why this might be. I think almost a year later, the field has advanced tremendously and we are closer to a definitive answer to this question – there is no seeming paradox, at the time we were just exceptionally bad at training small language models, and now we are much improved.

Now (only one year later!), we have 7B and smaller language models which appear able to significantly outperform the giants from 2022 and early 2023 such as GPT3. While not completely closing the gap with image and audio models, we have shrunk LLMs down by a factor of 1-2 OOMs in essentially just one year. This is an incredible pace of progress and demonstrates how inefficient previous pretraining methods were. The primary reason for this seems to be that we realized, thanks to Chinchilla, that the existing massive models were wildly undertrained, and that in fact not only should we match Chinchilla in token count but far exceed it to obtain very potent small models which can be efficiently inferenced. There have also likely been significant improvements in dataset curation, pretraining methods, and curriculum learning among leading labs which have enabled much better models to be trained at a given parameter count than were possible one or two years ago, although they are clearly unwilling to share the specifics.

There is still a small gap between the parameter counts of human level vision and language models by perhaps 1 (instead of 2 or 3) orders of magnitude. For vision and audio, human level seems possible between the 1-10B order of magnitude. For LLMs, 70B SoTA models of today seem to subsume the majority of the human distribution, and they will only ever improve. To me, what this suggests is that perhaps the brain is actually deeply inefficient in its parameter count compared to strong existing neural network architectures, either because of insufficient and poorly curated data (a huge factor), asymptotically worse architectures due to biological constraints, or worse learning algorithms, it is clear that the human brain generally seems to get worse bang for its buck than existing ML systems, at least for the sensory systems. Humans still dominate in long term reasoning and planning tasks as well as maintaining coherent agency and continual online learning due to episodic memory and a prefrontal cortex while existing LLMs and other systems largely lack specialized systems for these capabilities – and this is now the primary bottleneck to AGI. Some of this inefficiency may be specific to humanity – i.e. other much smaller species achieve highly efficient visual and audio capabilities with a fraction of the neurons and parameters that humans do – however the reasons why this would be are unclear to me.

Ultimately, this points to a world where AI systems become significantly more efficient than biological human brains and thus have the potential to quickly obselesce the majority or all of the human population economically. However, alignment and control of these AI systems is currently looking to be relatively straightforward due to the level of control we can straightforwardly exert via backprop over their internal workings as well as the natural linearity of their representations. Such a world will not lead to the immediate extinction of humanity, however, it is vital to ensure both that the gains and power arising from vast AI automation of society are shared at least reasonably equitably such that the resulting equilibrium is one of many agents with slack to pursue their own desires rather than a collapse into a small oligarchic elite.