Beren's Blog

The Paperclip King, by GPT4

Posted on March 28, 2023

I got access to the GPT4 API yesterday and was playing around. GPT4 managed to zero-shot this entire greentext with nothing other than the prompt: “Please write a 4chan greentext about a self replicating probe that converts the universe into paperclips”. [Read More]

Orthogonality is expensive

Posted on March 19, 2023

A common assumption about AGI is the orthogonality thesis, which argues that goals/utility functions and the core intelligence of an AGI system are orthogonal or can be cleanly factored apart. More concretely, this perfect factoring occurs in model-based planning algorithms where it is assumed that we have a world model,... [Read More]

LLMs confabulate not hallucinate

Posted on March 19, 2023

Minor terminological nitpick. [Read More]

Against ubiquitous alignment taxes

Posted on March 5, 2023

It is often argued that any alignment technique that works primarily by constraining the capabilities of an AI system to be within some bounds cannot work because it imposes too high an ‘alignment tax’ on the ML system. The argument is that people will either refuse to apply any method... [Read More]

Fingerprinting LLMs with their unconditioned distribution

Posted on February 26, 2023

When playing around with the OpenAI playground models, I noticed something very interesting occurs if we study the unconditioned distribution of the models. LLMs are generative models that try to learn the full joint distribution of tokens across text data on their internet and are trained with an autoregressive objective... [Read More]