In my original Grokking Grokking post, I argued that Grokking could be caused simply by diffusive dynamics on the optimal manifold. I.e. the idea being that during the pretraining phase to zero loss in an overparametrized network, the weight dynamics minimize loss until they hit an optimal manifold of solutions....
[Read More]
Strong infohazard norms lead to predictable failure modes
Obligatory disclaimer: This post is meant to argue against overuse of infohazard norms in the AI safety community and demonstrate failure modes that I have personally observed. It is not an argument for never using infohazards anywhere or that true infohazards do not exist. None of this is meant to...
[Read More]
Preference Aggregation as Bayesian Inference
A fundamental problem in AI alignment, as well as in many social sciences is the problem of preference aggregation. Given a number of different actors who have specific preferences, what is a consistent way of making decisions that ensures that the outcome is fair and ideally that all of the...
[Read More]
Thoughts on loss landscapes and why deep learning works
Epistemic status: Pretty uncertain. I don’t have an expert level understanding of current views in the science of deep learning about why optimization works but just read papers as an amateur. Some of the arguments I present here might be already either known or disproven. If so please let me...
[Read More]
My path to prosaic alignment and open questions
One of the big updates I have made in the past six months is strongly towards the belief that solving alignment for current LLM-like agents is not only possible, but is actually fairly straightforward and has a good chance of being solved by standard research progress over the next ten...
[Read More]