“Wenn ein [Automat] sprechen könnte, wir könnten ihn nicht verstehen.”
— Wittgenstein, if he were still alive, maybe
In 1739, on a tour of Brittany to exhibit his ingenious automata, Jacques de Vaucanson unveiled to the French public a mechanical duck which behaved, in many ways that fascinated observers, like a real duck. Its movements and actions were reportedly extremely life-like. Among the amazing things it could do was consuming food offered to it — and then defecating. From a purely I/O perspective, one could say — and Vaucanson did indeed say, labeling the automaton Le Canard Digérateur, The Digesting Duck — that it digests. But it does not in fact digest. For even if Vaucanson’s description of the mechanism implemented by the duck were real, that the food was subjected to chemical treatment in the belly of the automaton (it was not, as a later analysis found), digestion is not simply changing food from one form to another. The transformation is incidental. What digestion really is, is a part of an organismic process which implicates the status of actually digesting systems as living things. This process involves, inter alia, the extraction of energy from the food and its vital use by the living system. None of this is going on in Vaucanson’s duck and rightly did a later examiner of the automaton describe it as essentially a “conjuring trick”. We have (supposedly) moved well past the crude exhibitionism of Vaucanson’s kind of automata, and yet in principle the kind of automata we have today might be no less clever conjuring tricks. Might today’s LMs, like Vaucanson’s duck, only look like they are speaking, but like the duck only take in words from the hands of their operator and pour out words, but fail to do any digestion in the middle? Are LMs to human speakers as Vaucanson’s duck was to real ducks?
What is the analogue of digestion in the domain of language use? What sort of vital (in the sense of ‘regarding living systems’) role does language play in the organismic economy of language users? The standard answers given in cognitive science all have something to do with “information processing”. It is no surprise then that it seems so easy for LMs to be regarded seriously as language users, simply on the basis of benchmark comparisons since LMs certainly are doing information processing and if that is all there is to it, then why not? But the fact is that such an answer is no more illuminating than Vaucanson’s supposition that to digest is simply to transform food from one form to another, for in fact that is what information processing comes to: transforming inputs to outputs. Unless one wants to say further that the information processing is for adaptively mediating between the organism and the informationally complex environment in which it finds itself, in this case the dialogical informational landscape. Now we would be getting somewhere because this talk of “adaptive mediation” sounds very much like it spells out that vital (again in the sense of ‘regarding living systems’) role which actual digestion and not Vaucanson digestion plays in the organismic economy of language users. But now we might ask that “adaptive mediation” be defined more extensively (since anytime an adjective and a noun are used in such a fancy sounding combination there is always a significant possibility that we have understood something other than what the person who used it intended). And now we find, when our interlocutor starts defining this term that comes off the tongue so nicely, that maybe LMs are not doing digestion at all; because whereas it appears that for human speakers adapting to a discursive situation is a really vital process, one that (to bastardize a saying of Unamuno) involves responding “with all the body and all the soul, with the blood, with the marrow of the bones, with the heart, with the lungs, with the belly, with the life”, with LLMs in contrast, to quote Haugeland (speaking of AI in general), the “computers just don’t give a d—n.” If an animal does not eat it dies, if a human being loses the ability to enter into any form of discursive exchange at all (“discursive exchange” lato sensu) they cease to be able to “construe their experience” (M.A.K. Halliday) in any human way and slips out of the human norm of mindedness. Whether an LLM is talking only to itself (as it essentially does when being pre-trained) or to an interlocutor makes no vital difference to it at all. It’s harder to articulate what this means beyond such allusive language, but I am sure you get the drift of it.
But are there answers other than “information processing” to the question that got us on this track (i.e. “What is the analogue of digestion in the domain of language use?”)? Yes. One such alternative is “understanding”. The problem is that the standard cognitivist perspective also takes understanding to be some variety of information processing, so this alternative answer usually leads to quarrels between proponents of the “information processing” camp and those who propose “understanding” as something sui generis and only incidentally (if at all) involving information processing. In order to avoid these quarrels upfront, some philosophers have resorted to using such terms as “Verstehen” and other ponderous German words usually taken from Heidegger. Because these words usually have a mystical, almost incantatory quality about them which often alienates a lot of tough-minded cognitivists (try saying “In-der-welt-sein” or “Gelassenheit” without a mystical feeling), they don’t allow for the underlying phenomenon which they are meant to name to enjoy the attention that they actually deserve. And such ideas really do deserve attention so I sometimes simply say, to put a name on the phenomenon the last paragraph gestured vaguely about, “Let’s call it “discursivity”” — choosing a robust English name (as is well known, English has always been a bad language for mysticism) which is nevertheless not “understanding”. I sometimes define discursivity as follows: an agent possesses “discursivity” just in case they can bring the facts they possess (or know, or believe, or encode, depending on what psychological predicates are permissible for the agent) to bear on their actions. “Bring…to bear” means they can employ the facts as such to control their actions. “As such” means it is by virtue of the contents of the facts, not any of their other properties, for instance the representational vehicles in which they are embodied or their distributional properties, that they are brought to bear. Let me give you an example. Here are two failures of discursivity in LLMs (taken from some old examples involving ChatGPT — invariably, things may have changed since):
User: Write a villanelle on any subject.
ChatGPT: Sure, here’s a villanelle on the beauty of nature:
In fields of green, where wildflowers grow,/A symphony of color is performed./Their petals dance in the gentle breeze, a show.//The sun shines down, a warm golden glow,/And in the sky, a rainbow is formed./In fields of green, where wildflowers grow.//The clouds roll by, as the winds blow,/Nature’s beauty can never be stormed./Their petals dance in the gentle breeze, a show.//So let us take a moment to slow,/And bask in the beauty that’s adorned./In fields of green, where wildflowers grow.//For in this place, our worries will go,/And peace and joy will always be swarmed./Their petals dance in the gentle breeze, a show.//In fields of green, where wildflowers grow,/Their petals dance in the gentle breeze, a show.
Impressive, but is it a villanelle?
User: What is a villanelle?
ChatGPT: A villanelle is a fixed form poem consisting of 19 lines, typically written in iambic pentameter. It has a rhyme scheme of A1bA2 abA1 abA2 abA1 abA2 abA1A2, where the first and third lines repeat alternately throughout the poem.
This is right. That’s just what a villanelle is, so ChatGPT encodes (if you are more charitable, you might say “knows”) the correct information of the structure of a villanelle, but not only fails to write one (how many of us can?) but produces an output that is easily checkable and discovered to not be a villanelle. Consider a similar case, the following example pair (run by Zhang et al. (2023) on GPT-4):
User: Is 9677 a prime number?
GPT-4: No, 9677 is not a prime number. It can be factored into 13 and 745, as 9677 = 13 ✕ 745.
In a distinct session, the authors query GPT-4:
User: Is 9677 divisible by 13?
GPT-4: No
The authors term this phenomenon snowballed hallucinations. Both examples show that LLMs have information that they fail to bring to bear on their discursive actions. So by my definition above of discursivity, this is a failure of discursivity. But caveat! The failure alone does not demonstrate a failure of discursivity since humans sometimes make similar errors — but in the case of humans there are certain limitations on memory and memory organization that do not exist in LLMs which explain the failures. The failures in LLMs is a failure of discursivity because such extenuating circumstances don’t explain why, as it were, it can’t make the connection between its knowledge of what a villanelle is and the act of constructing one, and of the divisibility of 9677 by 13 and the former’s primality.
Discursivity is not simply some variety of information processing just as digestion is not simply simply changing food from one form to another; but digestion does change food from one form to another and discursivity does result in information being processed. But in both cases the change is not the point. It is what the Scholastics would have called an “accident” of the actual phenomenon.
What would it take to convince someone in the information processing camp that something like discursivity exists and that it is the proper analogue of digestion in the domain of language use? The answer I have arrived at after so much head-scratching is the dispiriting: nothing. The reason is that “discursivity”, like the “information processing” of the cognitive scientist is an axiomatic posit of a paradigm for thinking about language use (and cognition in general). You cannot convince someone directly into believing it — you can only convert them into the paradigm in which it is a posit. And to convert someone out of one paradigm into another, what you do is not so much direct argumentation as advertisement. You need to show that the alternative paradigm is beautiful, useful, productive, expansive, parsimonious, or whatever other non-epistemic values it is that moves people in that realm below the propositional which drives our theoretical commitments. And that, at least for a single humble blog post, is a tall order.