Here I want to bring attention to what I think is an extremely impresive case of evolution’s ability to ‘align’ humans in the wild: the development of human sexuality.
Reasons why this is an interesting thing to study from the lens of alignment, and why it is a highly non-trivial accomplishment:
1.) Evolution has been very successful here: almost all humans end up wanting to have sex and typically with opposite-gender partners in a way that would result in children in the evolutionary environment.
2.) Sexuality, unlike many other drives, is not something built into the brain from the beginning. Instead there is a sudden ‘on switch’ around puberty. What happens in the brain during this time? How does evolution exert such fine-grained control of brain development so long (decades) after birth?
3.) It is mostly independent of initial training data before puberty – i.e. largely evolution can ignore a decade of data input and representation learning, which it cannot control, during a time period when the brain is undergoing extremely large changes, and still reliably finds a way to instill a new drive highly reliably.
4.) It seems to occur mostly without RL. People start wanting to have sex before they have had sex at all. If sexuality developed by some RL mechanism, it would look like you go around doing your normal things, then at some point you have sex, and realize it is highly rewarding, and you slightly update your behaviours and/or values to get more sex or to want more sex. This is not what happens in humans. Instead, humans often want to start having sex before they have had it, and even before they really know what sex is.
5.) Evolution has solved some variant of the pointers problem get humans assigning high value to both a previously unknown and mostly non-represented state, as well as also translating this desire to specific other agents in the world – i.e. crushes etc. This is done, presumably, in an entirely genetically mediated way without requiring specific experience.
6.) Sexuality is, usually, a very strong drive which has large influences over behaviour and long term goals. If we could create an ‘alignment’ drive as strong in our AGI we would be in a good position.
Some other aspects of the phenomenon that may be interesting to alignment:
1.) Clearly, the ‘alignment’ in this case is not perfect. Assuming that what evolution ‘wants’ child-bearing heterosexual sex, then human sexuality has a large number of deviations from this in practice including homosexuality, asexuality, and various paraphilias 1.
2.) As the worldwide demographic shift evidences, the link between sex and children has largely been broken off-distribution, but our desire to have sex has largely stayed aligned even significantly off distribution. This may change in the future with fully-realistic simulated sex, but has been remarkably resilient to ubiquitous pornography.
3.) Further evidence against the RL experience is that people often still desire sex even if their initial experiences are negative. However, severe abuse etc can often have significant and long lasting effects (but not always) which shows that the intrinsic drives can perhaps be modulated by RL-ish effects.
4.) Specific sexual behaviours can be significantly influenced by ‘culture’ and hence by environmental ‘training data’. This means that strong intrinsic drives can still be strongly modulated.
5.) While evolution gives us a strong aligned ‘desire’ to have sex, this is clearly not coupled with a strong ability to do so from scratch where we must learn the required skills with a standard RL-ish approach. This, to me, implies that the information content of the drive is relatively low (much lower than all relevant skills) so that it can be genetically encoded so well. This implies that such a drive must be relatively ‘simple’.
The fact that evolution has managed to figure out a way to give humans such a reliable sex drive under these circumstances is rather remarkable and a reasonable test-case of alignment. Understanding how this mechanism works, as well as where it goes wrong (from an evolutionary perspective) seems like it could provide one potential mechanistic route for aligning our own systems. We also have a good deal more control over the system, both during design and training and especially after deployment than evolution does, so there are also reasons to be hopeful in this regard. Moreover, it gives an existence proof that developing such a relatively aligned and robust drive is possible even in relatively black-box RL systems like the brain.
-
It is unclear to me to what extent these actual affect IGF in the ancestral environment. Modern conceptions of homosexuality are extremely recent by evolutionary standards (really just < 5 generations) and other historical forms of homosexuality, such as in the millitary/during wartime and in the ancient world (the Greeks etc), seem to have relatively little affect on ability/desire to also have children. ↩