Beren's Blog

Intellectual progress in 2021

Posted on September 24, 2022

Overall 2021 was much less of a productive and growth year than was 2020. The main reason for this is that, in retrospect, 2020 was exceptional in that for the first time in Sussex in Chris Buckley’s lab I had proper research mentorship and direction and that I also lucked... [Read More]

Scaling laws vs individual differences

Posted on September 18, 2022

This is a quick post on something I have been confused about for a while. If an answer to this is known, please reach out and let me know! [Read More]

Towards concrete threat models for AGI

Posted on August 27, 2022

There are many facets to the alignment problem but one is as a computer security problem. We want to design a secure system to test our AGIs in to ensure they are aligned, which they cannot ‘break out of’. Having such a secure AGI box is necessary to have any... [Read More]

Probabilities multiply in our favour for AGI containment

Posted on August 27, 2022

This is a short post for a short point. One thing I just realized, which should have been obvious, is that for prosaic AGI containment mechanisms like various boxing variants,simulation, airgapping, adding regularizers like low impact, automatic interpretability checking for safe vs unsafe thoughts, constraining the training data, automatic booby-traps... [Read More]

Alignment needs empirical evidence

Posted on August 27, 2022

There has recently been a lot of discussion on Lesswrong about whether alignment is a uniquely hard problem because of the intrinsic lack of empirical evidence. Once we have an AGI, it seems unlikely we could safely experiment on it for a long time (potentially decades) until we crack alignment.... [Read More]