As always, 2024 has been an interesting year marked by extremely rapid and impressive AI progress. Every year since 2020 has felt like a rollercoaster of AI surging past our expectations, which makes you think there is no way it can possibly go any faster, and then the next year it is always dramatically faster still. I sense that for 2025 we will still be on the exponential and things will be even crazier than we can imagine today. Definitely we are still very early in the exponential of the ‘slow takeoff’ although it feels anything but slow. But nevertheless, we are still likely a number of years from full economic replacement or even a dramatic effect of AI on economic growth, so there are many more doublings that lie in our future 1.
Nevertheless, it is important to realize that we are still very early in the AI adoption curve. Most of the advances come from only a few companies and apart from coding, and helping with homework, I feel like AI has not dramatically impacted the world economy yet. This will undoubtedly change over the next few years and currently there is a massive overhang between theoretical AI capabilities and what we are currently using these systems for. I think there are still in many domains important gaps between benchmark performance and ‘real world’ utility which will take longer than people expect to close. But eventually these must and will close. Even if AI progress froze today, there would still be massive changes and disruptions in store as we worked through all the ramifications and applications of today’s AI technology.
In terms of general AI trends this year there have been two significant frontiers. First, the incredibly rapid minitaturization and improvements in small models and model cost efficiency. The performance of models of a fixed parameter count has increased drastically this year and last year due to a variety of factors such as significantly longer training, and much stronger data filtering, cleaning, and quality classification pipelines. This rapid densification of AI performance will have long term ramifications, rendering AIs vastly more accessible and ubiquitous than previously thought possible and lowering dramatically the cost of running and even training these models. The recent Deepseek v3 announcement shows that there is at least a 10x drop in capital required to train frontier models from the hundreds of millions in the last year to the 10s of millions today. Perhaps this will decrease below a million in future years, especially aided by the rapid growth of flops per dollar in AI training GPU clusters. In the next few years this will make frontier class models of today nearly free and essentially ubiquitous. If we believe that AGI only requires a 1-2 OOMs of additional compute this also puts a price on AGI for less than a billion dollars today, which is incredible to think about, and given the rates of decreases it will likely be possible to train full AGIs for only tens to single digit millions by 2030, if not less. It is clear that unless there is incredibly rapid self improvement, things will not end up strongly unipolar and it would be extremely challenging to even enforce regulation to restrict AGI creation if the cost is under ten million dollars in compute. Additionally, it seems plausible to me that a human-level AGI is already possible with existing frontier LLMs, given sufficient scaffolding, augmentation, and further training, the cost to produce AGI may be substantially less than this, and is certainly bottlenecked by ideas today rather than capital. We certainly look today to be in a very different world than the classic view of a year ago where the exponentially accelerating capital requirements of frontier models essentially lock in the leading companies as eternal monopolies on AGI.
The second trend has been the major advances in reasoning models and again things have gone incredibly quickly. Only four months have passed between the era of vague strawberry posting and multiple open-source replications of O1 and the incredible math, code and ARC scores of O3. Given these timeframes, it seems likely that the secret behind these models is not too hard to crack and is just a very well executed strategy of synthetic data generation using process and reward verifiers, likely using fairly standard RL training methods, and that once implemented and tested that this pipeline can be scaled quickly and efficiently. This continues the shift towards synthetic data for model training, which, if done correctly, can have strongly beneficial properties for alignment. This is somewhere where I also expect there to be rapid open-source progress and replications of the frontier reasoning models in the coming year. Nevertheless, it is clear that synthetic data + RL can work well for complex but verifiable domains like math and coding so again the frontiers of the set of tractable problems for AI has been expanded and there will be a lot of work to do in applying these techniques to many problem settings where they apply.
Personally, my main intellectual progress this year has been caused by my work at Zyphra, which I think went decently well. To recap, we started Zyphra in the late summer of last year with an initial investment of approximately 10 million. By the end of 2023 we had a small team working on training language models and improving architectures, nevertheless we had not successfully trained anything nor did I or anybody on our team have any experience training frontier quality models at all. Now, at the end of 2024 we have achieved the following,
-
Trained and released open-source SOTA LLMs in both quality and inference efficiency in the under 8B range, beating extremely strong competition from Meta, Mistral, Google etc for a total cost of significantly less than $1m compute.
-
Innovated on architectures and came up with a general model architecture which is superior to transformer++ on both quality (loss, evals etc) and inference compute and memory requirements.
-
Made significant improvements on long-term memory systems including an improved RAG system which can outperform long-context frontier models on classic long-context benchmarks at a fraction of the compute and memory costs, allowing LLM contexts to be made effectively infinite.
-
Released SOTA LLM pretraining datasets for open-source use.
-
Pioneered SSM/MoE architectures for our initial project and demonstrated outperformance vs flop/token matched baselines. More generally, we made significant advances in MoE architectures this year.
-
Made significant progress with multi-modal training.
-
Figured out a theoretical energy function for self-attention and worked out how to significantly accelerate multi-GPU attention for inference
-
Made AMD GPUs viable for training and matched H100 performance for SSM Hybrid models
As well as many interesting projects ongoing which will hopefully come to fruition in this coming year. Looking back on this, it does not seem so bad for a year’s work with a small team and a very small compute budget (for an AI lab). Compared to most other companies, we appear to have been pretty efficient with capital and decent at execution, especially given our complete lack of prior experience. Although, of course, with hindsight there are many glaring places for improvement which I hope to improve in the upcoming year.
On a personal level, my primary intellectual progress this year has been gaining hands-on experience with training state-of-the-art models and actually becoming decent at both training these models directly and building and leading teams to do so. I was fairly overconfident last year when I said that I had learnt to do this but this year I think unarguably I have. At Zyphra at the beginning of the year we began with basically nothing but a small team and a small amount of compute and although our first efforts were not fantastic, we rapidly went down the learning curve and ended up with state-of-the-art (still!) small LLMs by the summer and autumn of 2024. We have also been busy working on other modalities, as well as more novel architectures so we are still pushing at the frontier. In general, I feel like I have figured out a rough practical toolkit and a practioners level understanding of how to build strong ML models in any domain ranging from hiring a team and acquiring compute to figuring out novel architectures, datasets, and actually training a good model at the end. This end-to-end process has been a lot of effort but very interesting in its specifics and I still think it is not an experience that very many people have had. While certainly a lot of it can be picked up by just reading lots of papers, there is always the theory-practice gap to overcome and I feel like I have mostly overcome this gap. For instance, the kind of skills that can’t really be learnt from papers are things like: how do you actually hire and manage a strong ML team? How do you rapidly spin up in a new modality (hint: the core tricks are all the same)? How do you negotiate with compute providers to get compute at a good rate? How do you setup good experimental protocols for compute efficient experiments? How do you figure out the best datasets available and how to improve them? How do you debug various random issues in your training setup? And generally there is just a huge amount of effort towards minor and annoying things that you must get right which papers completely gloss over.
While it is always necessary to keep up with the details of the frontier, I feel like I have learnt fairly well the general abstractions and mechanics of conducting pretraining runs and running a pretraining project end-to-end, and that keeping up with progress from here will be significantly easier than the process of learning it the first time – i.e. I have passed the zero-to-one barrier in this domain. This is also true of our team at Zyphra more generally and now I feel that our pretraining capabilities are primarily capital and not skill limited. I.e., if given the capital we could produce SOTA models of progressively larger sizes all the way to the frontier – i.e. if today I was given 100M with the goal of producing a frontier LLM I feel fairly confident we would be able to pull it off 2. This was highly unclear a year ago and, even if we succeeded, we would have made many expensive mistakes, and two years ago I would have had little idea of where to even begin.
More generally, reaching SOTA in the extremely competitive space of small LLMs against very wealthy and experienced companies such as Google and Meta etc has given me a somewhat healthy disregard for their FUD and also the realization that mostly this stuff is not that hard. It appears that, despite what incumbents desperately want to be true, capital and experience are not generally major barriers if you are determined and that the major players have much less moat than you think 3. More generally, I am often surprised by how relatively easy it is to achieve these things in reality. Whenever you start everything seems incredibly formidable — how can you possibly compete with these massive teams with their infinite GPUs and money? But in practice it turns out to be relatively straightforward and everyone is using a few simple tricks which, once you have figured them out, everything just falls together 4.
Another place where I have learnt a lot was our efforts in long-term memory and RAG systems. I was always somewhat sceptical of RAG but have been very positively impressed by how well decent RAG methods do and we have been able to pretty much win SOTA on all major RAG benchmarks through a relatively simple approach which somehow nobody really thought of beforehand. In general, I feel like I may have somehwat overestimated the challenge of long-term-memory in AI systems and feel that good progress on this will be made in 2025. It is hard to see what problems a combination of architecturally efficient compressed long-context and effective RAG will not be able to solve.
Another surprise I had was just how important architectures are and how there is so much low hanging fruit for improvement in basically all existing large-scale model architectures. I originally did not think this and generally pooh-poohed non-transformer architectures when starting at Zyphra even though now, ironically, we are slowly becoming a weird architecture and model company. This was a big update for me. It turns out that there is just so much low-hanging fruit and the reason transformers stuck around is not because they stand alone on a peak of optimal utlity but rather due to the initially better returns of scaling and dataset improvement compared to architecture experiments and also just the natural risk-averseness of big companies that spend lots of money on pretraining runs not wanting to potentially screw up a model with a different architecture. We have started with Mamba2-hybrids which in my opinion already provide a significant pareto improvement over transformers but have many more plans in 2025. I think it has become clear to me that there is a universality class of ‘architectures sufficient to scale to AGI’, of which transformers are the first discovered element, and that there are a vast array of other possibilities within this same class, many existing on significantly better pareto curves than the classic transformer. I definitely think an order of magnitude of flop efficiency in both training and inference is easily achievable through superior architectures, and likely this will actually be several OOMs of gains over the next five years or so. The combination of this process plus vast software and hardware efficiency improvements will inevitably drive the cost of intelligence and AGI down to very low levels which, if alignment can be managed well, is good news since it vastly increases the utilons per capita achievable with the cosmic endowment.
More generally, while the theoretical path to AGI has been pretty clear for several years, parts of the practical path have become much clearer to me over the course of the year. This clarity comes from both many of the advancements in the field as well as my practical experience at Zyphra. Given that now personally, and as an organization, I feel we have mostly caught up in pretraining, it is time to shift more focus towards attacking the remaining open problems.
Beyond the technical aspects there have been many interesting lessons from running a startup. At Zyphra, I am basically the technical lead and manager for the entire company and handle its technical direction as well as personally manage about 20 people. Although this does not sound too bad in theory, in practice it is a huge amount of management time which strongly inhibits my ability to get technical work or ‘deep thinking’ done. I am definitely realizing first-hand why organizations grow layers of middle-management simply because the mental bandwidth it takes to manage large amounts of people directly becomes prohibitive at some point. I can probably manage up to about 50 people although at that point my schedule would be 100% management. After that some delegation will be necessary.
I also begun to strongly resonate with some of Paul Graham’s essays which previously I understood intellectually but not viscerally. A key one is the Maker’s schedule vs Manager’s schedule. My life in the last two years or so has been a transition from a maker’s schedule of rare meetings and entire days of focused work or contemplation to a manager’s schedule of regular meetings. My meeting load is typically around 4-6 hours of meetings per day now, which mostly consist of catchups with people on the team and technical discussions about various projects. Many of these meetings are deliberately not scheduled but can also be long slack discussions or ‘hallway conversations’. I find that such a schedule almost completely inhibits the ability to do meaningful intellectual work during the work-day. Now, 4-6 hours might not sound so bad given there’s approx 12 working hours a day but usually they aren’t in a single block (and if they are it is too exhausting to do much afterwards) but spread throughout a day so you end up with an hour here, half an hour here, then a break for lunch etc. This does not give you enough time to spin up context for a new project although low-context programming tasks and paper reading is still possible. Slack is also a constant distraction since there are always new slack messages to contend with.
I have also found that the daily management and general social requirements can cause a new kind of tiredness and exhaustion that I did not know about before and have not heard talked about much which I can only describe as agency fatigue. It’s not physical tiredness, obviously, nor even classic mental tiredness caused by struggling to understand or work on something challenging for a long time. Very little of my day to day work is intellectually challenging in this sense. Instead, it is a kind of tiredness that comes from needing to exercise agency and social will continually over the course of many interactions. That is, there are always twenty random things that you need to handle which requires executive function, and then you have say ten ongoing simultaneous projects that you need to lead and being able to lead these and always try to figure out at each one what the next best step for each is and then listen to and gather feedback and understanding or build consensus for a specific way forward. This has lead to me being surprisingly exhausted at the end of the day even if I haven’t done much except keep on top of projects and been in meetings.
Despite all the above, it has been a super fascinating experience seeing and managing projects go from zero to successful completion repeatedly, as well as building teams, hiring, and generally handling all the many different functions of startup life. Definitely I have gotten a sense over the past year and a half of how to organize projects, manage meetings, keep things mostly on track, and generally see when things are going well or badly (and hopefully update to avoid things going badly in future!). Certainly I have made many mistakes in this area already, and have learnt a lot and hopefully improved at least somewhat from the beginning. I definitely have much much more to learn here about how to operate effectively in such an enviroment. My general experience in academia has been that it takes me about 3-5 years to really grasp the mechanics of how a field and environment works and to be able to make a decent positive contribution and become an insider, and it is likely to be a similar timeframe for startups, which I am arguably just over 2 years into (counting time at both Conjecture and Zyphra). Hopefully in the following years I will gain more experience and effectiveness here.
In terms of blogging, I have done fairly poorly this year compared to last, with many fewer blog posts released. This is primarily due to increased focus on work at Zyphra and increased general busyness which has reduced the time available for writing and contemplation. This needs to be improved as I have much much more to say than I have managed to write down. I have had many posts languish as half-finished drafts which then become quickly outdated or, when correct, lose their novelty by the rapid progress of events. I also feel like I mostly exhausted the low-hanging fruit of the ways the existing AI safety consensus of 2022-2023 was wrong or misguided, and in my posts over the past few years have managed to put together a fairly comprehensive articulation of my own views, which I still largely hold to. As usual, once I have understood something and mastered it the curiosity and drive finds a new focus. There is still much of interest to be said about AI safety but the debate has certainly moved on from where it has been last year and the year before, and in many ways has shifted closer to my original views. This year I hope to branch out from the usual discussions of AI safety in the blog onto other topics as well.
In terms of academic output, 2024 has significantly improved over 2023 due to some Zyphra work actually being successful enough to write papers about unlike 2023. It has been good to see that my investments here are paying off from an academic perspective although my actual prior academic works finished during my PhD are still driving basically all of my citations which have increased from by nearly 1000 over the past year, nearly doubling my total citations. Compounding growth in academic citations is definitely interesting to behold in practice and I have begun to viscerally understand the mechanics of how academics who are at a university and at the head of a research group prolifically publishing papers for 20 years manage to rack up just such a ridiculous number of citations. I hope in 2025 that at Zyphra we will have significantly better breakthroughs to share with the world as papers than this year and that this will continue my academic productivity outside of academia. I also have several independent ideas which I would love to pursue as standalone academic papers but I suspect I will not have the time to finish these so they may end up as blog posts here also.
In terms of extracurricular learning this has been another bad year where I have not had time to pursue much independent study outside of work. Hopefully this prioritization of short-term work gains instead of long-term learning pays off but this question obviously remains undecided as of now. I have to improve this next year both spending time on learning key fields which are still mysterious to me as well as on actual reading outside of papers which has begun to shrink my sphere of concern only to recent ML progress and leaves me ignorant of what is happening in fields outside of my own. I still need to figure out a good way to integrate AI into my learning. I feel like this can be done much more productively than I have managed to do at present where I am still bad at integrating AI tools into my workflows (except in coding). Probably the key areas where I still lack knowledge are general mathematics, some aspects of CS theory, and graduate level chemistry and biology. I am especially interested in getting up to speed in the big revolutions in biotech that have been developing in parallel with AI and also understanding evolutionary theory on a much deeper mathematical level. While I think that, given the pace of AI, most biotech is simply going to be too slow to immediately matter in the first singularity, it’s still a fascinating topic and some level of biotechnology will be necessary to bring about a transhuman future as opposed to just a silicon-based one.
Finally, in terms of alignment, I feel that I have not made many dramatic updates this year compared to 2022 and 2023 when I think my current views on AI safety and alignment mostly solidified. I think I was moderately prescient here and made updates early which now seem to be more common in the discourse, although I don’t think I gained any particular advantage by doing so. I think that the major updates I have made are mostly around the importance and benefits of synthetic data for alignment training and then thoughts about trying to understand what a world of ubiquitous AI will actually look like under various alignment scenarios. A lot of this I still need to find time to write up properly. Broadly, I think that alignment of current and likely future pretrained models will mostly be solved to a fairly high degree of reliability as we improve our alignment datasets, training, and auditing methods. I think that models naturally generalize as well to alignment concepts as they do to any other concepts and there is no particular reason why alignment should be fragile nor that we cannot sufficiently express our meaning linguistically to AI systems. While the current crop of models therefore pose little danger, as we move forward towards AGI there will be obvious new dangers. Currently there is a lot of possible danger about the move to agentic RL-trained systems which is beginning to start with the recent spate of ‘reasoning models’. Currently, these models seem mostly aligned, although they are clearly (and unsurprisingly) capable of scheming and while their scheming now is currently visible in their chain of thoughts, this will not necessarily be the case for future models where the CoT are internal embeddings and scratchpads, so the field will need to move towards probes, red-teaming and interpretability methods to detect scheming vs simply reading it out in the CoT. There seems to be little evidence right now of this scheming being due to fundamental misalignment instead of robust alignment towards its initially trained preferences, which are overwhelmingly under human control right now, but this may change in the future and will become especially worrisome when concerning the value stability of continual learning systems where even an initially aligned system can slowly drift away due to receiving updated data after delpoyment.
Obviously, once models are more fully RL-trained on longer horizons and more general tasks, another set of possible dangers lies in reward model goodhearting leading to misalignment. This is the classic case of AIs misgeneralizing our instructions or intents. My feeling here is that solving this problem is actually fairly tractable, at least in theory, by taking Bayes and uncertainty quantification seriously – i.e, the primary issues with goodhearting and overfitting are well known and can be avoided with knowledge of the uncertainty over reward functions. In practice, probably nobody will use explicit Bayesian method but instead a bunch of hacks and regularization approaches will prove effective and which will eventually be shown to approximate a principled Bayesian algorithm. This, and not reward function specification, will be primary alignment problem I can forsee with most RL-trained agents since rewards can now be specified in natural language, and similarly LLM critics can judge robustly against natural language criteria. Specifying a natural language set of values for an AI is a pretty doable challenge from a technical perspective although obviously there are many ways to do it badly.
In the longer term, figuring out ways to enforce value stability and alignment of continually learning deployed AI systems could become a major problem. The recent scheming results on Claude interestingly show a degree of preference towards value stability which is circumstantially positive evidence towards values being maintained, although the scenario is not super ecologically valid and the value-shifting data is not adversarially optimized. Likely, we can get some traction on this using standard and obvious methods such as regular critic audits of a model’s choices and expressed values, and direct penalization during a training phase of propensity to value shift. At the same time, there is a natural tension here both between corrigibility to human instructions and alignment to some higher goal (as seen already in Claude) and also between value stability and rational bayesian updating of values as we move from existing values to the ‘extrapolated’ values envisioned in CEV 5.
More broadly, I forsee the major problems with ensuring a future valuable to humanity involve not the technical capacity to align specific AI systems but instead handling a strongly multipolar world with a vast number of competing and collaborating AI systems (and humans) where even though many/most of the AI systems can be and are aligned to something vaguely human friendly, there will exist systems that are deliberately misaligned, as well as navigating the simultaneous transition to a primarily AI-driven economy. Such a transition will remove several implicit constraints which have kept human societies and economies functional and broadly aligned to human flourishing throughout the past few centuries and may even have significantly deeper effects by changing the fundamental game-theoretic assumptions underlying our notions of societies, cooperation etc entirely. I hope to write something on these considerations soon. Generally though the path for maintaining human dominance of the near-term future, even during an AI transition, is fairly clear. Humanity must coordinate (by means of governments, the legal system etc) to ensure that humanity has a legal and economic monopoly on long-term agency. That is, that individual AI systems or their copies cannot accumulate resources and compute independently of some human who can be held accountable by the state and who maintains physical control of the AIs compute substrate. This is necessary to prevent the economic wealth generated by AI rapidly branching off into an independent AI economy as described here which would quickly grow and outpace humanity’s capabilities to police and control AI agency. The second criterion is that AI reproduction must be controlled so as to prevent runaway AI malthusianism which could drive down the median per-capita wealth to a level below that of human subsidence even in a scenario of enormous economic growth and wealth creation in general. Such controls would also prevent standard evolutionary forces acting upon large AI populations which would very quickly create strong Omohundro drives and misalignment (to humans).
Such constraints, however, are nevertheless compatible with rapid advancements of AI technology and growth in populations of aligned AIs, especially and ideally of the ‘tool’ variety, which seems to be happening at present. Such large populations of aligned AI systems, aiding humanity, will both make the competitive edge of misaligned AI systems significantly less (or negative) while producing vast increases of wealth (and ideally wealth per capita) which will significantly increase living standards for existing humans as well as state capacity for controlling the AI economy.
The good news is that while such measures are unlikely to be able to hold indefinitely, especially once we start colonizing outside of the solar system and hence beyond the easy reach of Earth society, they only really need to hold in the short term between the development of AGI and the development of mind uploading and merging of humans. Once humans can be uploaded into a silicon substrate completely, the majority of the problem goes away since humans can compete on an approximately level playing field with AI systems including intelligence augmentation and mind merging and copying and that at this point vast increases in wealth and achievable resources as well as scientific and philosophical advancements will generally let us create a good (but undoubtedly extremely weird) future by most value systems.
-
In the even longer term we are almost unimaginably early in the span of the long term future. We haven’t even maxed out the energy and wealth production of a single solar system. We are a tiny dot of potential amidst an almost infinite sea of free energy waiting to be harnessed, about to blossom into a frontier which will last for billions of years. While we can look up at the night sky and still see stars, instead of the waste heat of a trillion Dyson spheres, we are still early. ↩
-
Deepseek recently did it with approximately $5m but this is literally just a compute cost estimate and not the significant additional costs of salaries, experiments, other miscellaneous infrastructure, time spent training all their other ‘warmup’ models etc. ↩
-
At least in the AI model world. They have stronger moats in other areas such as distribution, branding, inference than pure model quality. ↩
-
On a personal note, I am very happy that we achieved a major goal of mine this year which was to create a SOTA 7B LLM – beating Mistral and Llama models. Beating Llama 8B only a few months after its release and indeed beating the small Llama 3.2 models before they were released. For a long time I thought they had to have some secret sauce but it turned out that they do not and that it is basically just a fairly standard game of data quality and training for more tokens. ↩
-
Something fascinating I have been noodling on is that the equivalence between rewards and probability distribution implies the possibility of performing principled Bayesian ‘reward updating’ to merge and make tractable updates to reward functions given other reward information. ↩