It is often argued that any alignment technique that works primarily by constraining the capabilities of an AI system to be within some bounds cannot work because it imposes too high an ‘alignment tax’ on the ML system. The argument is that people will either refuse to apply any method that has an alignment tax, or else they will be outcompeted by those who do. I think that this argument is applied too liberally and often without consideration for several key points:
1.) ‘Capabilities’ is not always a dial with two settings ‘more’ and ‘less’. Capabilities are highly multifaceted and certain aspects of capabilities can be taxed or constrained without affecting others. Often, it is precisely these constraints that make the AI system economically valuable in the first place. We have seen this story play out very recently with language models where techniques that strongly constrain capabilities such as instruct finetuning and RLHF are, in fact, what create the economic value. Base LLMs are pretty much useless in practice for most economic tasks, and RLHFd and finetuned LLMs are much more useful even though the universe of text that they can generate has been massively constrained. It just so happens that the constrained universe has a mnuch greater proportion of useful text than the unconstrained universe of the base LLM. People are often, rationally, very willing to trade off capability and generalizability for reliability in practice.
2.) ‘Capabilities’ are not always good from our perspective economically. Many AGI doom scenarios require behaviour and planning that would be extremely far from what there would be essentially any economic value to any current actors for doing. As an extreme case, the classic paperclipper scenario typically arises because the model calculates that if it kills all humans it gets to tile the universe with paperclips in billions of years. Effectively, it Pascal’s mugs itself over the dream of universal paperclips. Having an AGI that can plan billions of years in the future is valuable to nobody today compared to one with a much, much, shorter planning horizon. Constraining this ‘capability’ has an essentially negligible alignment tax.
3.) Small alignment taxes being intolerable is an efficient market argument and the near-term AGI market is likely to be extremely inefficient. Specifically, it appears likely to be dominated by a few relatively conservative tech behemoths. The current brewing arms race between Google and Microsoft/OpenAI is bad for this but notably this is the transition from there being literally no competition to any competition at all. Economic history also shows us that the typical results of setups like this is that the arms race will quickly defuse into a cosy and slow oligopoly. Even now there is still apparently huge slack. OpenAI have almost certaintly been sitting on GPT4 for many months before partially releasing it as Bing. Google have many many unreleased large language models including almost certainly SOTA ones.
4.) Alignment taxes can (and should) be mandated by governments. Having regulations slow development and force safety protocols to be implemented is not a radical proposal and is in fact the case in many other industries where it can completely throttle progress (i.e. nuclear with much less reason for concern). This should clearly be a focus for policy efforts.