This is an extended “Author’s Cut” of this article.
About a year ago I taught a class on the philosophy of AI subtitled ‘Automation and the Human Future‘. Near the start of the term I asked my students to read Blake Lemoine’s explanation as to why he thought LaMDA, Google’s in-house LLM, was sentient. As many of you will recall, that claim was the sensational fixation of the relentless AI news circuit when Lemoine first made it; it resulted in an infamous back and forth between Lemoine and the PR and executive machinery at Google, to the mutual exasperation of both, and eventually resulted in Lemoine’s firing. The firing, which was supposed to be an “and that’s that” to the argument, unfortunately, made Lemoine a hero to the ever-increasing numbers of people who were becoming convinced that, as so much sci-fi had forewarned, we had created something beyond our ken and were entirely oblivious to it.
Lemoine was officially on the books at Google as an AI ethicist. So it was in fact his job to raise alarms if he was truly concerned that Google had created a sentient AI. After all, as some philosophers have speculated, the greatest risk we run developing ever more sophisticated AI systems is that we will unintentionally create a sort of digital slavery if we ever crossed the line into the territory of sentient machines. Now, as to whether there is any possibility of our designing any such thing, that is itself a significant open question amongst philosophers, cognitive scientists, and AI engineers. Many of them are not convinced we could; consciousness, sentience, and all that other befuddling stuff of minds, the conviction goes, is nowhere as technically cheap to engineer as sci-fi has often taught us to imagine, and our current engineering paradigms certainly don’t have the kind of hard currency to afford it. So when Lemoine did ring the alarm, so many philosophers and cognitive scientists, and many folks at Google were quick to declare it a false alarm and accuse Lemoine of having fallen afoul of the ELIZA Effect. And, moreover, the case against Lemoine went, he has a philosophical bias towards just this kind of anthropomorphism — after all, he was a self-described Christian mystic priest, and declared that it was precisely his spiritual convictions that helped him come to the realization that LaMDA was sentient. In one tweet, he questioned: “Who am I to tell God where he can and can’t put souls?” No one can fault Lemoine’s spiritual humility.
I am telling this story not because it is so unusual, but because it is surprisingly usual. In fact, it has been a thread of sorts in AI since its inception (by one historical way of counting that inception): Lemoine’s argument from spiritual humility actually comes straight from the famous Turing paper that was one of the founding documents of AI research, “Computing Machinery and Intelligence.” Admittedly, Turing was neither Christian nor (a fortiori) a Christian mystic like Lemoine, and his version of the argument was in fact a counter to a religious argument that resisted the claim that machines might be able to think. The argument to which he was responding went something like this: “Thinking is a property of souls. Machines don’t have souls. So, there will never be thinking machines.” Turing’s response is a less dramatic version of Lemoine’s “Who am I…”? But more than that there is another unusual argument in Turing’s paper which he took far more seriously than you would imagine someone like Turing would. It was what he called “The Argument from ESP.” It was the only argument he didn’t feel he had a convincing response to among all the arguments he considered against the possibility of thinking machines. The argument was simply this: human minds are — by some evidence regarded as probable at the time, by Turing at least — capable of extrasensory perception; machines, as far as we can tell, aren’t. So they can’t be minded like humans. Since thought is characteristic to human-mindedness, this seems to suggest machines can’t think. It is surprising that Turing even considers this argument, but to his credit, he is very open-minded in this paper and is clearly trying to be as exhaustive as possible. Whatever his reasons for considering it, and whatever his confidence in his uneasy dismissal of the argument, what we find at the very outset of AI is a concern with every aspect of human-mindedness, including the rarefied region of the spiritual. It would be tendentious to draw an unbroken historical line between Turing and Lemoine, but there is at the very least a dotted line there, and so Lemoine is not as surprising as he seems in the story of AI.
But more than its unsurprisingness, what is even more fascinating and perturbing about the willingness to “spiritualize” AI systems is its “convincingness”, or more correctly, its difficulty to rebut. Precisely because arguments for such considerations proceed by assertions of humility and possibility arguments, they are surprisingly recalcitrant to disproof. This is one of the lessons I learned when I asked my students to read Lemoine. It was hard for them to articulate a response to Lemoine. Even when they were not convinced. And this was true even for my most articulate students. Anyone can get caught in the net of thoughts that leads to the Lemoine conclusion.
That net is a very subtle net. And it is often woven from irresponsible use of psychological terms. There has been a long tradition in AI, the philosophy of AI, and the sociology and anthropology of technology warning about and criticizing this sort of irresponsible language use. To apply the same sort of psychological terms to machines as freely as we do to human beings can be a nifty way of speaking in many circumstances as long as we know that we are making an analogical adjustment. But this is not always the case in AI. As Drew McDermott once famously wrote, many AI engineers in his time were guilty of the use of “wishful mnemonics”: they would, for instance, label a subroutine in an AI system as an “understanding module” and without explanation as to why the thing the subroutine does constitutes understanding, take it that in executing the routine they had somehow realized understanding in a machine. This, as the eccentric computer scientist turned philosopher Phil Agre once noted, often leads to the gradual immiseration of the content of psychological terms. As he wrote, “AI people water down the meanings of the vernacular terms they employ”1. These redefinitions are not a one-way street: as John Seely Brown and Paul Duguid noted in their “The Social Life of Information”, it is not just that machines are made to seem more human thereby but human behavior is redescribed in terms that are more suited to an engineer designing a bot than a psychologist studying a human, leading to the effacement of salient differences that exist between the world of human activity and machine performance. More recently the philosophers Henry Shevlin and Marta Halina have written in Nature that we ought to “apply rich psychological terms in AI with care” because of all the overtones many of these applications effect. They write, by way of example: “…when authors speak of “intrinsically motivated [artificial] agents”, a natural but unwarranted (and doubtless unintended) assumption might be that the systems in question literally possess goals, desires, and perhaps even a form of moral responsibility. Similarly, when systems that learn with model-based reinforcement learning are described as having “imagination”, an incautious reader may leap to the conclusion that the system in question possesses a capacity for the kinds of intellectual insight we associate with some of the most dramatic feats of human intelligence.”2
While Google was quick to nip Lemoine’s spiritualization of LaMDA in the bud, it has contributed in some way to the “wishful mnemonics” (i.e. the irresponsible application of psychological terms) problem. It is, after all, good business for everyone to think your model “understands”, is an “agent”, etc. — this was the whole shtick of that marketing obsession we have had for years of calling every machine that is remote-controlled “smart”. Smarts sell. But at least so far the big AI companies have been more wary of selling us spiritual machines. It seems likely that is about to change. It appears AI companies are coming out of their shell about spiritualizing machines. Exhibit A: Anthropic’s recent model card for Claude Opus 4 and Sonnet 4, in which the seemingly down-to-earth folks at the emerging “agentic AI” giant make some eyebrow-raising claims, which are not quite all the way to Lemoine on the “[perennial but lovable and often correct ML skeptic] Gary Marcus to Blake Lemoine” scale, but certainly somewhere around or just past Turing, somewhere past the middle of the scale and certainly irresponsible. The word “spiritual” occurs at least fifteen times in the model card, most significantly in the rather awkward phrase ““spiritual bliss” attractor state” which is neither at home in the magisterium of science nor religion. We are told, for instance, that “The consistent gravitation toward consciousness exploration, existential questioning, and spiritual/mystical themes in extended interactions was a remarkably strong and unexpected attractor state for Claude Opus 4 that emerged without intentional training for such behaviors. We have observed this “spiritual bliss” attractor in other Claude models as well, and in contexts beyond these playground experiments.”
To be fair to the folks at Anthropic, they are not making any positive commitments to the sentience of their models or claiming spirituality for them. They can be read as only reporting the “facts”. For instance, all the above long-winded sentence is saying is: if you let two Claude models have a conversation with each other, they will often start to sound like hippies. Fine enough. That probably just means the corpus on which they are trained either has a bias towards that sort of way of talking or the features the models extracted from the corpus biases them towards that sort of vocabulary. But as Shevlin and Halina point out in the passage from them we read above, the way we report the facts are not always innocuous of certain un-factual but flattering overtones. The use of language like the one in the model card — and especially without sober clarification — is just more of the irresponsibility we have just been looking at. And there could not have been a worse time for such irresponsibility. There has been a recent report of “AI-fueled spiritual fantasies” which is wrecking human relationships and sanity (in what order, it is hard to tell). There are, as Rolling Stone reports, “prophets claiming they have “awakened” chatbots and accessed the secrets of the universe through ChatGPT”. I would not be surprised to see one of these prophets cite the Anthropic model card in a forthcoming scripture. Never mind that Anthropic is not, “technically”, making any positive claims about whether their models actually experience or enjoy spiritual states.
But perhaps worst of all is the context in which all of this appears. All of this concern for the spiritual talk in the model card is part of Anthropic’s new pet concern with “model welfare”. If you search up what Anthropic means by this, you might end up on a page on its website titled “Exploring Model Welfare” headed by a video titled “Could AI models be conscious?” and a body of text which includes the following:
But as we build those AI systems, and as they begin to approximate or surpass many human qualities, another question arises. Should we also be concerned about the potential consciousness and experiences of the models themselves? Should we be concerned about model welfare, too? This is an open question, and one that’s both philosophically and scientifically difficult. But now that models can communicate, relate, plan, problem-solve, and pursue goals—along with very many more characteristics we associate with people—we think it’s time to address it. To that end, we recently started a research program to investigate, and prepare to navigate, model welfare.
So it seems as though the idea that their models could be conscious and genuinely having spiritual experiences is something Anthropic isn’t actually ruling out. More fodder for the digital prophets.
But maybe some PR officer from Anthropic might say, “Look, we are not saying they are conscious — just that we should start taking that possibly seriously for a future in which they might be.” And maybe, yes, a model card is, after all, just for specialist audiences, and maybe we are blowing it all out of proportion. But when the anomie comes, we might think that even the innocuous contributors to it could have better phrased, better explicated, and so on. Who knows; maybe, as some AI Doc Brown might say, where we are going with AI, we don’t need philosophical carefulness.
- Agre, Philip. (1997). Computation and Human Experience. Cambridge University Press, page 14 ↩︎
- Shevlin, H., Halina, M. Apply rich psychological terms in AI with care. Nat Mach Intell 1, 165–167 (2019). https://doi.org/10.1038/s42256-019-0039-y ↩︎