As we interact with the world around us, we process loads of sensory information across different modalities. But this isn’t done passively – the overwhelming consensus is that we proactively interact with the environment, using priors or predictions about the world and about sensory consequences to optimise and streamline perception and action. For instance, when something appears in the environment so many times, we don’t need to take in every detail of it anymore. It becomes a regularity. We store a prediction of its occurrence, and it effectively just requires a yes/no signal to process, until a significant change occurs and we can devote more resources to it. In this Nature Perspective, Cristoph Teufel and Paul Fletcher ask some really interesting questions about predictive processing, and put forward a case for the existence of a more embedded prior that remains constant before regular exposure to a stimulus. They argue predictive information can be divided into two broad forms – constraints and expectations, aiming to:
“…inspire models that retain the computational benefits of predictive processing but are mechanistically more precise and more powerful in their capacity to elucidate neural mechanisms.”
These are the key take away messages:
- priors can be separated into two forms, embedded constraints and high level expectations, which are mechanistically distinct
- predictive coding theory, coming from computational neuroscience and using a Bayesian decision framework, only considers top-down predictions and ignores mechanisms, but priors can act in a bottom-up manner too
- borrowing a broader approach from cybernetics, mechanisms underpinning certain visual illusions and somewhat paradoxical disease symptoms can be better explained
And these are my critiques:
- despite good evidence for low level constraints, no evidence for actual bottom-up predictive processing
- important questions raised about how low level and high level predictions interact
- only examples of visual perception are considered, which means potentially interesting dynamics such as self-generated vs externally generated stimuli are not considered
This division between two forms of prediction is the heart of the article. Inspiration from cybernetics was used in devising this categorisation, which as a field is concerned with an agent and its interaction with the environment. The agent aims to maintain internal stability in the face of environmental changes, which means some knowledge or model of that environment must be retained (W.R. Ashby’s good regulator theorem), and the agent should have a repertoire of states it can adopt (Ashby’s law of requisite variety).
Take the steersman analogy the authors describe. A boat’s material and shape form a constant structure. The boat also has sails, which can be flexibly changed depending on the context (wind, tide, current). Both affect the boat’s interaction with the environment, and are considered priors, but both work in distinct ways. They thus propose the two prediction types:
- Constraint – constant, global (spatially and temporally), non-hierarchical and supposedly not top-down
- Expectation – contextual (triggered by environment), local (spatially and temporally), hierarchical and top-down
The Bayesian decision framework is concerned with overall objective of the prediction, rather than how it is implemented – the route rather than the mode of transport, as the authors put it. It helps explain how a prior (prediction) and a current (environmental stimulus) are used, basing their relative weighting on the precision of each. Computational neuroscience describes there being an internal model of the world and this is used to predict incoming information. Here I will explore three examples they chose to pose a challenge to the computational approach to predictive processing, and I’ll finish with a consideration of mechanisms that underpin disease symptoms.
Drosophila
Prepared learning argues that some associations are learned more readily than others based on ancestral experiences. The classic evolutionary example is taste being a more reliable predictor of food quality than for example light, and so food quality and taste associations are made more readily than food quality and light.
One study paired presentation of an aversive chemical with either an odour or a colour using a population of drosophila flies. Forty generations later, there was a slightly altered probability in the new population to learn either the colour or odour pairing, based on what was learnt in the original population.
This is quite compelling evidence that there are more embedded, more biologically ingrained, internal models or constraints. The drosophila example demonstrates there are some forms of memory or constraint that need not be learned in the short term in a specific context, but are instead more expansive or constant. It led me to wonder whether the force overcompensation demonstrated in the force-matching task (see Shergill et al 2003) used to examine sensory attenuation would be present from birth in humans. It is no surprise to me though that there are either low level constraints or generalisable (cross-context) predictions. The force-matching task is likely a novel task for participants, but for the force overcompensation to occur in the self-generated condition (sensory attenuation), it is surely due to the presence of a more general (and also low level) internal model of sensory consequences of movement and accurate body state estimation.
Cornsweet illusion
When viewing the image above on the left, we tend to perceive the bottom grey of the central object as lighter than the top grey. This misjudgment supposedly occurs because the visual information has been informed by a prior. We have a certain model of how reflectiveness and shading works with 3D objects and a light source, and so the white luminance of the top edge of the bottom section leads us to misjudge the grey colour, until it’s removed as in the right image.
The authors then argue that there have been theoretical links between the retinal ganglion cells and the priors involved in this illusion, and evidence of the involvement of structures such as the lateral geniculate nucleus which is subcortical, with a mention for lateral inhibition, not top-down. Ultimately the point is that priors need not be high level, conscious predictions, but can be low down the CNS or even in its periphery in some form if the retinal ganglion cells have any sort of predictive power. It’s important to understand that this doesn’t mean cells in the retina are able to somehow guess upcoming sensory information, only that they have embedded in their structure that helps actively engage with the environment and not just passively.
Light from above
Another visual phenomena. In the absence of explicit information about the direction of a light source, humans assume the light is coming from above and judge object shape accordingly. This prediction can be modified through experience, but is exclusive to lab context and doesn’t generalise to other environments. This suggests it’s a constraint that is unchanged by short-term experience, and maybe a new high level prediction has been made rather than modification of an existing constraint.
Chickens from birth, apparently, have a constraint predicting light from above and are unable to acquire any new prediction about this, even just for a specific context. This suggests predictions are not just based on an individual’s world experience. This is fairly compelling, but I still see no evidence of pure bottom-up processing. Pure bottom-up processing involves going from the input in the environment, travelling up the hierarchy of the CNS. There is no evidence that this prediction travels upstream – it still acts on incoming sensory information.
They also argue here that a new high level prior cannot overwrite an embedded constraint, which is alluded to be the case in traditional predictive coding theory. Evidence from this light-from-above example does suggest so, and breaking away from the idea that a high level prediction aligns all priors hierarchically beneath it has interesting implications for the proposed mechanisms of disease symptoms, as we’ll see now.
Disease
Predictive coding theory has been used in an attempt to explain many disease symptoms. The authors argue that recognition of only one form of prediction has limited its ability to account for a diversity of symptoms. A mechanistic understanding is also required by clinicians to make medical decisions.
There are different types of psychosis: antibody-mediated and others. Some people with psychosis have also been shown to be resistant to visual illusions (typically a result of predictive processing). Clinically and even as part of a psychedelic drug experience, psychosis is believed to be caused by an under reliance on high level predictions. However, hallucinations are hypothesised to happen because of an over reliance on high level predictions. The two can co-occur, which suggests there is a greater intricacy of predictive coding.
When different forms of prediction are recognised it is possible to conceive how both can co-exist. It’s proposed that there is a weakening of embedded constraints, leading to psychotic experiences (and supposedly a resistance to visual illusions!?), and then high level predictions have increased influence to compensate which then increases proneness to hallucinations.
This section was confusing – resistance to illusions was thrown in at one point, whilst also citing a reference which finds there is no change in illusion experience in schizophrenia (see Sterzer et al. for a more cohesive explanation of the challenges posed to traditional predictive coding theory). The process of thinking is interesting though, and recognising different forms of prediction appears to go a long way.
The authors still insist on arguing that a low vs high level distinction doesn’t properly account for the inconsistency, and in two consecutive paragraphs basically repeat the same explanation but substitute “top-down low level” for “embedded constraint” and “top-down high level” for “top-down influences”. I realise high level and low level do not specify where exactly in the nervous system the prior exists, but neither do the others offer a cut off point for where an embedded constraint becomes an expectation – even subcortical structures are talked about as being involved in providing constraints rather than expectations. Where’s the cut off? And how does the new terminology aid with identifying underlying structures? I do like the terminology, but low vs high is still the fundamental distinction, even if it’s a spectral rather than categorical divide.
Concluding remarks
You may have noticed that I have used a high vs low level distinction throughout, which is what I think is the accurate way of describing the so called top-down vs bottom-up differentiation the authors have made. A bottom-up prediction defies the very definition of a prior within an agent affecting its interaction with the outside world. Whilst lateral inhibition is mentioned, and there are surely two-way interactions within the brain of cortical and subcortical regions involved in predictive control, there is no evidence that a prior works in the opposite direction to top-down predictions. A prediction is used to engage with environmental stimuli that travel from the peripheral nervous system toward the brain, so even if that prior is implemented really low down in the nervous system, even if it’s implemented peripherally, it is still acting on incoming information. The evidence discusses in the article does not pose a challenge to the canonical predictive coding account in terms of the directionality of predictive processing, only in the intricacy it misses and the potentially interdependent relationship of multiple priors influencing behaviour, rather than a single consciously-determined internal model enforcing authoritative rule on all structures hierarchically beneath it.
High level and low level, I believe, are better descriptors of the distinction they have made; very low level priors don’t fit especially well into the traditional computational framework in terms of the updating of internal models, but they do still fit. Arguably it might still be possible to update low level in the same way internal models are updated, just not in the same time scale. What would happen, for example, if you took those flies and attempted to reverse the effect of the pairing exercise across another 40 generations? The fact the changes occurred and held for 40 generations in the first place is reason enough to believe there is some plasticity of low level priors, they are just more constant and more global (so much so, they transcend generations!), but they are still able to be updated.
Where I think this article does excel is the potential of distinct priors at different levels of the nervous system to act independently. And when they do, the change in their relative influence can be used to explain a multitude of seemingly paradoxical symptoms. This increases the power of a predictive framework and while the separation of parts of the internal model isn’t new, a strong case is made for constraints vs expectations to be the key division. Furthermore, some really interesting questions are raised on how constraints and expectations interact. It is suggested they could compensate for each other, which is a neat idea that offers a new intricacy in its problem-solving ability with only slightly increased complexity in its framework. Whilst not much consideration is made for learning processes, nor other sensory domains, there is the interesting concept connected to their discussion that constraints and expectations are updated at different rates.