Do LLMs have agency? What is agency, and where does it come from? When we get down to it, most questions of AI risk and benefit seem to be questions of agency. What can an AI actually do in the world? This is the bit that matters.
In Pond brains and GPT-4, we explored whether LLMs are intelligent, using a pragmatic definition of intelligence offered to us by cybernetics:
To some, the critical test of whether a machine is or is not a ‘brain’ would be whether it can or cannot ‘think.’ But to the biologist the brain is not a thinking machine, it is an acting machine; it gets information and then it does something about it.
(W. Ross Ashby)
Ashby’s delightful definition grounds intelligence in actual behavior, because—from an evolutionary perspective—it is the actual behavior that matters. When we look through this lens, we see intelligence can emerge in many systems, even systems that are not conscious, like anthills, or ponds, or thermostats.
What about agency? Here too, cybernetics offers a useful understanding of agency grounded in actual behavior.
So what is agency? Cybernetics gives us an unexpected answer: agency is a loop. The bar for agency is not human-level intelligence, or desire, or will, or consciousness, or a soul. Feedback is all you need.
Feedback means that each step becomes the integration of all previous steps. A memory, or model, of the system’s interaction with its environment accumulates, allowing future actions to be influenced by past experiences. What emerges from this model is goal-seeking behavior.
We can see this more clearly when we look at a simple feedback system, like a thermostat. The thermostat senses the ambient temperature. Too low? Turn on the heater. Too high? Turn it off. Check again. Too low? On. Too high? Off. Repeat. The thermostat dynamically adapts its behavior at each step, generating an oscillating wave that seeks toward the desired temperature, even as the environment around it changes.
If it’s on, it’s off; if it’s off, it’s on. If yes, then no; if no, then yes. Yes-no-yes-no-yes-no. You see, the cybernetic equivalent of logic is oscillation.
(Gregory Bateson in Uncommon Wisdom 1989)
This is all rather surprising. It seems that goal-seeking does not require a conscious goal-seeker! Even so, a thermostat has very limited agency. It seeks toward a goal, but it can’t change its goal. It needs us to do that.
Well, but one way we could see this is that the thermostat is part of a larger loop, a loop which includes us. This thermostat-system has agency through us. This reframing might seem like a cheap trick, but it’s the same cheap trick viruses use to survive.
Getting outside systems to do your bidding this way is a well-established phenomenon in biology. One way it is often formulated is by extending the concept of a biological “self” from just your physical body to all those aspects of the surrounding environment that you can influence or control. This perspective on what the “self” is in biology, tracing back to Richard Dawkins, is known as the extended phenotype…
If you happen to have a biological system around, it’s often far easier to get it to perform computation and expend energy on your behalf rather than do those yourself…
In particular, a virus can be viewed this way… Any single such package of molecules hijacks human cells to make more such packages of molecules. By doing that, the virus gets the human cell to expend a huge amount of energy (involved in duplicating the virus many times over) and to perform an elaborate computation (namely, the biochemical computation involved in that manifold duplication). Both of those expenditures are far beyond what the virus could do itself.
(Wolpert 2020, “The concept of the extended phenotype provides a way to circumvent Landauer’s bound”)
See, there are no hard boundaries between systems in real life. If you can get something into your loop, it may as well be a part of you.
So but that’s not all. Goals can also emerge from the bottom-up, without any conscious goal-setters in the loop. Check out this Neutrophil chasing down bacteria:
Agency! But where is it coming from? Nobody told the Neutrophil to do this. Nobody set its thermostat. It is not conscious. It has no brain or nervous system! Yet it chases prey, navigates obstacles, adapts, changes strategy. It does all of this without conscious will, desire, or intention. How?
After seeing the thermostat, we can make a good guess. Some feedback loops are dynamically adapting toward the goal, while others are dynamically adapting the goals themselves. This goal-seeking and goal-setting both emerge bottom up, through a complex network of feedback loops that are ultimately surfing a chemical gradient. Feedback-on-feedback.
A thermostat has just one loop to work with, and limited agency. A Neutrophil has more. The rich feedback networks of a forest embody millions of years of experience within a changing environment. The richer your network of feedback loops, the wider your range of agency. When feedback loops entwine, you get complex systems which can adapt, change strategy, evolve.
So, I have agency, you have agency, forests and ponds have agency. Even very simple systems can have agency. A slime mold has agency, a thermostat has agency.
Feedback is all you need for agency to emerge. Cybernetics discovered this in the 1940s. Then cybernetics invented AI, with McCulloch and Pitts laying the mathematical groundwork, and Rosenblatt making it real with the first artificial neural net, The Perceptron.
Now here we are, with GPT-4.
As an AI language model, I do not have personal motivations or desires, and I cannot predict the future with certainty. However, I can provide you with information and insights based on the data and patterns available to me at this time.
(ChatGPT, Mar 23 Version)
Do LLMs have agency? Well, agency is feedback, so where do we see feedback?
I’m not an expert in LLMs or transformer architecture, but let’s give this a shot. After going through Wolfram’s walkthrough, speaking with a few domain experts, and reading some papers, here’s the high-level picture I’ve come away with…
An LLM model is trained on a large amount of text, producing an algorithm that models statistical regularities in that text.
Usually this base model is further tuned through Reinforcement Learning from Human Feedback (RLHF) to make it “nice”.
When we use ChatGPT, we’re using a snapshot of this LLM+RLHF model to guess the next token in a string. Each guess executes a fixed number of computational steps.
Guess-next-token is recursively repeated until the model generates a completed response or reaches a predefined maximum length.
If the above picture is right, the objective functions are all defined at training time. Note there is lots of feedback during this training process. However, at runtime, we’re interacting with a snapshot. No feedback. Guess-next-token is a recursive feedback loop, but that loop is halted when you converge on a solution, or else hit a predefined maximum length.
So, do LLMs have agency? My best guess is no. It seems that LLMs are intelligent, but do not have agency. Or, at least, like a thermostat, an LLM’s agency is attenuated by the humans in the loop who drive retraining events. Humans are a rate-limiter on the speed of LLM evolution.
Will LLMs gain agency? Yes, in about 5 minutes. All you need to do is hook them up to a feedback loop.
What will this look like?
An LLM could get hooked into a continual retraining loop. This might look like continually retraining on user behavior, or putting an LLM into a self-improvement loop, or both.
An LLM could use the internet as a scratchpad. LLMs are trained on internet data, and more and more internet data will be generated by LLMs—the vast majority, probably. This will allow loops to emerge across training runs, giving LLMs an evolving memory. Even if the feedback cycle time is very slow, the potential bandwidth for memory is huge (no upper limit).
An LLM could get more humans into its loop. We're part of the feedback loop producing GPT-N+1, so we might say it has agency through us. We're part of its extended phenotype. Getting more humans into that loop creates richer networks of feedback. Picture what happens if/when Twitter or Facebook hook LLMs into social algorithms. This could be direct (integrating an LLM into the algorithm), or indirect (LLM-generated content outperforms poasters in driving viral engagement). You might say “I would simply not become part of the loop”. That’s what I tried to tell COVID too.
An LLM could get other systems into its loop. Whoops, we already did this with the OpenAI Plugin Store. ChatGPT’s Zapier plugin allows LLMs to plug into just about any other loop, allowing for an open-ended number of feedback loops to form.
Ready or not, we are sharing our world with LLM agents.