THR Web Features / December 14, 2021

Where Turing Tests Go Wrong

Intelligence, Credulity, and Charity in the Age of AI

Alan Jacobs is a distinguished professor of humanities in the honors program at Baylor University and a senior fellow of the Institute for Advanced Studies in Culture. A prolific essayist, reviewer, and blogger, he is the author of Breaking Bread with the Dead: A Reader's Guide to a More Tranquil Mind, The Year of Our Lord 1943: Christian Humanism in an Age of Crisis, “The Book of Common Prayer”: A Biography, and The Pleasures of Reading in an Age of Distraction, among others.

Related Topics

William Hasselberger, writing in The New Atlantis, offers a thoughtful assessment of computer scientist and tech entrepeneur Erik J. Larson’s recent book The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do. Hasselberger’s reflection is more than a review; it is a useful contribution to the debate over whether artificial general intelligence is likely to be achieved. And it raises vital questions.

Hasselberger, philosopher and politics professor at Catholic University of Portugal, admires Larson’s book but insists that “while the critique of AI hype points us in the right direction, it is not radical enough. For Larson is fixated on intelligence’s logical aspects”—but “in defending the human in this way he misses the broader picture.” Hasselberger approaches that “broader picture” by reflecting on what it means for human beings to converse. Extending Larson’s argument, he points out that the definition of intelligence generally operative in the world of AI research “ignores the reflective aspect of human intelligence—how we discover, imagine, question, and commit to our objectives in the first place, the judgments we make about which objectives really matter in life, and which are trivialities, distractions, irrational cravings. The constricted definition of intelligence also ignores activities with no objective, forms of human mental life that we do for their own sake, like free-ranging conversation.”

Hasselberger’s further meditation on this point needs to be quoted at length:

Why do humans converse? What’s the objective? We converse to share information, to share a laugh, to be polite and make someone feel welcome, to be rude and put others in their place, to bribe, to threaten, to seduce, to spy, to gossip, to philosophize, to plan, to celebrate, to change someone’s mind about religion or morality, to sell someone drugs or a new refrigerator, and many other things besides. Many objectives might be involved in any given conversation, including objectives that arise while talking and that could not be spelled out at the outset. You and I start a conversation, at first just to pass the time in the elevator, and then we discover a mutual interest and, walking down the hall, our conversation opens up into fresh and unanticipated territory.

The open-ended and potentially transformative nature of human conversation has made it a serious challenge for AI and, as Larson rightly points out, systems tend to only do well on Turing-style tests by fooling human participants with trickery and evasion, for example by repeating the content of the person’s statements in the form of a question, changing the subject, being evasive instead of flexibly responsive, or otherwise ensuring that humans cannot go “off script.”

I wholly endorse Hasselberger’s argument here, but I want to try to take one step beyond it and ask: Why do such tricks work? Why can computers sometimes pass a Turing Test? Erik Larson, in his book, points out that in one test a few years ago people were told that the computer was human but not a native English speaker—which didn’t fool everyone who interacted with it but fooled enough people to make some of us worried. Why were the deceived deceived? I suggest that there are two likely answers, neither of which excludes the other.

The first was offered some years ago by big tech critic Jaron Lanier in his book You Are Not a Gadget. Lanier writes that the Turing Test doesn’t just test machines—it also tests us. It “cuts both ways. You can’t tell if a machine has gotten smarter or if you’ve just lowered your own standards of intelligence to such a degree that the machine seems smart. If you can have a conversation with a simulated person presented by an AI program, can you tell how far you’ve let your sense of personhood degrade in order to make the illusion work for you?” That is, many of us have interacted with apparently thoughtful machines often enough—for instance, when on the telephone and trying, often fruitlessly, to get to a customer service representative, that we have gradually lowered our standards for intelligence. And surely this erosion of standards is furthered by situations in which, even when by some miracle we do get to speak to another human being, we find that they merely read from a script in a way not demonstrably different from the behavior of a bot. Lanier says flatly that “the exercise of treating machine intelligence as real requires people to reduce their mooring to reality.”

The second potential explanation for our credulity towards machines is perhaps less distressing than Lanier’s: It involves what some philosophers, most notably Donald Davidson, have called the “principle of charity.” “Principle” is perhaps not the best word here, and maybe not “charity” either: Davidson et al. simply mean that we habitually, or rather inevitably, participate in conversation with the assumption that our interlocutor makes sense—that whatever he or she says is construable by us as meaningful discourse. Indeed, we only with great reluctance abandon that assumption, as anyone knows who has ever tried to converse with a genuinely delusional person. We look for sense, we expect sense, and sometimes we manage to find sense even when it’s not actually present.

Hasselberger in his review of Larson does not directly invoke the principle of charity, but I think that principle or habit undergirds much of what he says about conversation: “When we interpret the world around us, we do so with the help of an expansive range of concepts, rich in emotions and values: love, trust, betrayal, longing, hope, grief, remorse, shame, passion, abandonment, commitment, deception, guilt, generosity, brutality, humor, bravery, selfishness, wisdom, and countless others.” All of those emotions and values are closely related to our need to converse with others and the assumption of sense-making that follows from that need. Turing Tests at their worst are a cheap exploitation of some of the habits most deeply characteristic of our humanity.

But they don’t really tell us anything about intelligence—and not just because of attempts at deception. As Hasselberger notes, conversation among humans is not always directly motivated, does not typically even have an immediate cause. It flows from us instinctively, and in that sense is—well, it is something like prayer.

Many years ago, C. S. Lewis wrote an essay called “The Efficacy of Prayer” in which he responded to the suggestion that scientists should set up tests in which some sick people are prayed for and others are not, in order to discover whether those prayed for had better rates of recovery. But the essential problem with such experiments—and some such have indeed been conducted—is clearly seen by Lewis:

I do not see how any real prayer could go on under such conditions. “Words without thoughts never to heaven go,” says the King in Hamlet. Simply to say prayers is not to pray; otherwise a team of properly trained parrots would serve as well as men for our experiment. You cannot pray for the recovery of the sick unless the end you have in view is their recovery. But you can have no motive for desiring the recovery of all the patients in one hospital and none of those in another. You are not doing it in order that suffering should be relieved; you are doing it to find out what happens. The real purpose and the nominal purpose of your prayers are at variance. In other words, whatever your tongue and teeth and knees may do, you are not praying.

Just as there is an essential difference between “praying” and “saying prayers,” so, I suspect, there is an essential difference between “conversing” and “programming a plausible imitation of conversation.” It may well be that conversation—because of what conversation is—simply cannot be set up and evaluated in experimental form; and if that is the case, then a useful Turing Test is an impossibility.

Again, I suspect that that conclusion is correct; but whether it is or not, Turing Tests will continue to be used, and used propagandistically by AI evangelists. As we listen to them, we should heed Jaron Lanier’s warning against the degradation of our sense of personhood; and we should simultaneously be grateful for our employment of the “principle of charity” and strive always to use that gift wisely.

THR Web Features / December 14, 2021

Where Turing Tests Go Wrong

Intelligence, Credulity, and Charity in the Age of AI

Alan Jacobs

Sign up for our newsletter