This is broadly the thesis I have been working on for my dissertation. Now it’s time to start seriously writing up, I thought I’d try writing it down in an informal blog setting first, before translating it into science-ese.
Okay. A “Predictive Processing Model of Autism”. There are two components of this sentence. Predictive processing, and autism. Nobody on the internet should need any introduction to autism, so I will focus on predictive processing.
Predictive processing is all the rage now in certain circles, and at heart it is a pretty simple theory. Basically, it’s a theory of how the brain deals with and processes perceptions and stuff. The “classical” view is that the brain simply passively receives sensory input and then implicitly does a bunch of statistics on it to extract regularities and then by some mostly unknown process these regularities are extracted into features and higher levels of representation through some hierarchical system until we end up with abstract representations of objects and concepts and so forth.
Predictive processing however asserts that rather than passively extracting statistics, the brain is instead a prediction machine. It makes predictions of its inputs and tries to constantly adjust its own predictions to be more like what the actual inputs are. To do this it needs to make “multilayer generative models” which are the hierarchical layers of features and abstractions until we get objects and concepts as mentioned previously. The process used to build these “hierarchical generative models” is basically just as mysterious as the equivalent is in the classical version. But perhaps a little less mysterious, as we have a kind of a mechanism. Basically the brain slowly trains itself by minimising prediction error - i.e. the difference between what we predicted and what actually happened. This seems like a fairly reasonable way to go about things.It is also the same way things are done in AI with supervised learning. The brain then builds/trains these “hierarchical generative models” by starting off with some random system and then slowly tweaking it until it can predict its future input from its current input pretty well. Each layer in the brain is thought to be one such model, and they exist at increasing levels of abstraction. Why we need multiple levels at all is interesting, and probably has to do with combinatorial explosions and the ability of abstraction to make them tractable, but really we don’t know. The brain does. As do ML systems. Empirically, it works.,
So we minimise prediction error. That’s basically it. Friston dresses this up in a bunch of complicated and mostly impenetrable maths which even people who’ve written books on predictive processing often don’t fully understand, but basically boils down to the fact that the brain minimises the “free energy functional”. This sounds esoteric but is basically just the KL divergence between the predicted and observed distributions. So prediction error.
Interestingly, this scheme is probably bayes-optimal in some sense, thus incorporating research done on “the bayesian brain”. Basically each layer in the brain’s hierarchy, or “generative model” receives two sets of inputs. It receives bottom up inputs coming from the layer below, and it receives top down inputs coming from the layer above. These top down inputs are effectively the predictions outputted by the layer above which tries to predict the inputs coming in from the layer below. The layer does some fancy calculations including minimising the predictoin error between these two differing inputs and then outputs its own set of predictions, to be propagated down the hierarchy. Thus, at every level there is thecmobination of bottom up observations and top down predictions. If you squint at it this is isomorphic to the bayesian procedure of combining a prior (the top down prediction) and the likelihood (the bottom up observations) to obtain a posterior (the final activation of the layer).
This combination of bottom up observations and top down predictions modulating brain state can explain a bunch of cool things, like visual illusions such as the Kanisza triangle, where the top down priors often interfere with the bottom up observations so we see things not as how they are but as some weird mash up of how they are and how we expect them to be. So, where does autism come into this?
Lawson et al (2014) argue that autism is caused by an aberrant precision rating on sensory stimuli. Basically this means that the brain tends to weight incoming evidence more heavily than the top down priors. This unsuprisingly mean that autistic people tend to pay close attention to sensory stimuli, they might be worse at generalisatoin among them, and are more sensitive to small differences between them (which would normally be smoothed out by the priors). They are also less sensitive to visual illusions (which is empirically verified and pretty cool!). This work expanded upon Van der Cruys et al who basically theorised that autism could be caused by attenuated priors, which would cause the same effect, as if the prior is weaker, the sensory evidence must be stronger.
Hold onto that thought.
Of course long before predictive processing came onto the scene there have been other theories of autism. One venerable and important theory is that of weak central coherence, postulated by Frith et al. Basically this theory tries to explain why autistic people are often better than normal at some sensory discrimination tasks while being worse at a large range of things, especially social cognition. Effectively the theory states that people with autism have very finely tuned local processing, especally sensory local processing, but what they struggle to do is integrate all this information together in higher brain regions to use it effectively. This is why for instance, they often struggle with social interaction, which needs a huge range of different pools of information integrated together - sensory informatoin about the persons behaviour/gaze/speech, languge centres on waht they say, theory of mind models about them, retrieving past behaviour characteristics from memory, adn so on, all of which must be integrated in real time.
Another theory, which is related but at a lower level are theories of abnormal connectivity - especially underconnectivity in autism. These arise from evidence from various fMRI studies and Diffusion Tract Imaging, as well as neurological experiments which find that generally functional connectivity between and within regions in autism is different from those of neurotypical controls. Although the details are very complex and very from region to region, the general theme is that autistic people have relatively weak long range connectivity between regions while possessing large amounts of internal local connectivity within regions. If this is true then the weak central coherence theory makes a lot of sense. Weak connectivity means bandwidth around the brain for communicating between regions is low and insufficient, meaning that integration of information is hard. As for why autistic people’s brains develop in this way it’s hard to say. One interesting hypothesis though is that it is due to overaggressive synaptic pruning in the first few years of life. Infants are born with a very densely connected brain which over the first year expands rapidly, after about 3 however, synaptic pruning starts which dramtically reduces the amount of connections (white matter) in the brain. This likely has some kind of useful effect but it’s still unclear exactly how this is beneficial in neural circuits. It could simply be reducing the metabolic cost of maintaining all these synapses, a lot of which are probably redundant.
Anyhow, in autism ths process seems to be more extreme. Autistic children are statistically more likely to be born with bigger heads and hence bigger brains. This pattern continues where their early neural efflorescence is greater than neurotypical children, on average. The synaptic pruning that follows also may be more extreme, and many more connections are pruned than normal. This may be why autistic development plateaus at this stage, or sometimes goes backwards in the pretty awful childhood disintegrative disorder.
Anyway, this pruning seems to disproportionately affect long range connections in the brain. Local connections seem much less affected. This then explains why we see the pattern of connectivity in autism that we do - weak long range, plentiful short range connections. The long range ones are pruned away, but the short range connections much less so, so a shadow of the early efflorescence remains.
Okay, that’s great. But what about predictive processing? It’s time to integrate the two theories.
Consider this: what would be the effect under a predictive processing model of the connectivity pattern seen in autism?
Well, the generative models each comprise a layer in the brain. Poor long range connectivity would impair connections and hence the transfer of information between them. This would mean that the priors that the higher levels convey to the lower ones would be sparser or more interrupted and noisy, so naturally the lower levels would weight them less, thus providing a simple mechanism for attenutated priors in autism.
At the same time, the sensory information conveyed up to the higher levels will also be of worse quality. This means that the generative models up there will be probably worse, so their predictions propagated down will also be worse. This means that people wth autism will experience considerably more predictoin error than otherwise as they are dealing with sparser, more underdeveloped generative models.This can explain why they often find complex situations so confusing and uncomfortable is the huge amont of predictoin errors being generated by the poorer high level generative models which cannot deal with the situation, and why they may prefer stereotyped and repetitive movements and routines which are simple enough that their poorer high level generative models can cope, thus providing them some relief from the predictoin error.
We can also tell an interesting story in terms of overfitting and regularisation. Introducing priors to a system is a means of regularisation. In a very simple case from machine learning, the L2 or Lassoo regularisation technique is mathematically identical to positing a prior of random gausian noise around the data. If priors are attenuated then the lower levels will have less regularisation. They also have richer local connectivity. As anybody who has trained a neural net knows this is a recipe for overfitting. And many symptoms of autism can be considered similar to those of overfitting - a lack and difficulty generalising to new stimuli; excessive attention and sensitivity to minute differences in known datapoints. Indeed, one of the earliest neural network models of autism (Cohen et al 1994), modelled it as overfitting in an associator network. This explains why overfitting occurs here.
Of course overfitting only occurs in the lower sensory regions with direct access to the data. The higher regions, because of the poor long range connectivity and hence communiaction, do not receive enough data. Thus they underfit the signal and create impoverished generative models, as descirbed above.
We argue that this theory can explain the many seemingly unrelated facets of the autistic condition and integrates nicely with both current neurophysiological theories of autism and with Friston et al’s theory in the predictive processing paradigm by explaining exactly why attenuated priors/greater sensory precision takes place. Our theory thus straddles the algorithmic-implementational boundary in Marr’s levels of description and thus serves as a useful complement to higher level predictive processing theories which primarily reside in the computational stratum. It also provides new insights into the interactions of the physical and computational systems and abnormalities that ultimately produces the constellation of related effects that we call autism.