A common assumption about AGI is the orthogonality thesis, which argues that goals/utility functions and the core intelligence of an AGI system are orthogonal or can be cleanly factored apart. More concretely, this perfect factoring occurs in model-based planning algorithms where it is assumed that we have a world model, a planner, and a reward function each as orthogonal standalone components. The planner utilizes the world model to predict consequences of actions, and the reward model to rank these consequences. Then the planner figures out the plan with the best predicted consequences. This is a fully factored model of intelligence – the world model, planner, and reward function can be swapped with others without issue. This was also the assumption of how intelligence would work in pre-DL thinking on AGI.
However, it has recently become obvious that current ML RL systems are not always so well factored. For instance, model-free RL typically computes value functions or amortized policies where the weights are directly learnt to predict values or actions directly without going through a planner (or exhaustively computing values via world model simulation). In these cases, the cognitive architecture is not orthogonal or factored. Core components of the policy-selector (planner) depend on details of the reward function – the policy you learned for reward function A may be really bad if you instead switch to reward function B. These agents are much less flexible than their full model-based planning equivalents
Why, then, do we build and use such non-factored agents: because they are much more efficient. Full model-based planning at every step is extremely computationally prohibitive. Instead we tend to amortize the cost of planning into policies or value functions which comes at an inevitable cost of flexibility. However, if we only want to use the agent for one task or a small range of similar tasks, then this does not matter and is a good trade-off.
We can go even further and remove the orthogonality and factoredness of the world model. This is implicitly what we do when we only use end-to-end reward-trained policies. Here the ‘world model’ learns only information relevant to optimizing the specific reward function it was trained on. Irrelevant information for this reward function (but which may be relevant for others or even other action-selection mechanisms) is ignored and cannot be recovered. This further specializes the agent towards optimizing for one goal over any others.
In general, full orthogonality and the resulting full flexibility is expensive. It requires you to keep around and learn information (at maximum all information) that is not relevant for the current goal but could be relevant for some possible goal where there is an extremely wide space of all possible goals. It requires you to not take advantages of structure in the problem space nor specialize your algorithms to exploit this structure. It requires you not to amortize specific reoccuring patterns for one task at the expense of preserving generality across tasks.
This is a special case of the tradeoff between specificity and generality and a consequence of the no-free-lunch theorem. Specialization to do really well at one or a few things can be done relatively cheaply. Full generality over everything is prohibitively expensive. The question is the shape of the pareto frontier at a specific capabilities region, which depends on the natural shape of solution space as well as the most active constraints.
Because of this it does not really make sense to think of full orthogonality as the default case we should expect, nor the ideal case to strive for. Instead, full factoring is at one end of a pareto tradeoff curve and different architectures, depending on their constraints, will be at different places along it. It also makes sense that both humans and powerful DL systems do not exhibit full orthogonality but instead differing degrees of modularity between these components and resulting behavioural flexibility.
The important question for determining the shape of the future is what the slope of the pareto frontier looks like over the ranges of general capabilities that an AGI might have. This will determine whether we end up withfully general AGI singletons, multiple general systems, or else a very large number of much smaller hyper-specialized systems. The likely outcome then depends on the shape of the shape of the pareto frontier as well as which constraints are most active in this regime.