Language Machinery

Generative artificial intelligence is a headspace and a technology—as much an event playing out in our minds as it is a material reality emerging at our fingertips. Fast and fluent, AI writing and image-making machines inspire in us visions of doomsday or a radiant posthuman future. They raise existential questions about themselves and ourselves. And, not least, they should lead us to reconsider certain neglected thinkers of recent intellectual history.

Consider a few of the bolder claims made by experts. Two years ago, Blaise Agüera y Arcas, vice president of Google Research, had already declared the end of the animal kingdom’s monopoly on language on the strength of Google’s experiments with large language models. LLMs, he argued, “illustrate for the first time the way that language understanding and intelligence can be dissociated from all the embodied and emotional characteristics we share with each other and with many other animals.”¹1xBlaise Agüera y Arcas, “Do Large Language Models Understand Us?” Medium, December 16, 2021, https://medium.com/@blaisea/do-large-language-models-understand-us-6f881d6d8e75. In a similar vein, the Stanford University computer scientist Christopher Manning has argued that if “meaning” constitutes “understanding of the network of connections between linguistic form and other things,” be they “objects in the world or other linguistic forms,” then “there can be no doubt” that LLMs can “learn meanings.”²2xChristopher Manning, “Human Language Understanding and Reasoning,” Daedalus 151, no. 2 (Spring 2022), 134. Again, the point is that humans have company. The philosopher Tobias Rees (among many others) has gone further, arguing that LLMs constitute a “far-reaching, epoch-making philosophical event” on par with the shift from the premodern conception of language as a divine gift to the modern notion of language as a distinctly human trait, even our defining one. On Rees’s telling, engineers at OpenAI, Google, and Facebook have become the new Descartes and Locke, “[rendering] untenable the idea that only humans have language” and thereby undermining the modern paradigm those philosophers inaugurated. LLMs, for Rees at least, signal modernity’s end.³3xTobias Rees, “Non-Human Words: On GPT-3 as a Philosophical Laboratory,” Daedalus 151, no. 2 (Spring 2022), 169.

Rees calls the AI developers “philosophical laboratories” because “they disrupt the old concepts/ontologies we live by.”⁴4xIbid., 168. That characterization is somewhat misleading. Those disruptive engineers do not constitute a philosophical school in a traditional sense, since they aren’t advancing a positive philosophical program (such as explicit new theories of language or consciousness). And by their own admission, they lack important answers about how and why LLMs work. Yet unquestionably, the technology is blazing some kind of trail—whither, no one knows for sure—leaving us to philosophize in its wake, just as Manning, Agüera y Arcas, and Rees have done.

In this respect, current debates about writing machines are not as fresh as they seem. As is quietly acknowledged in the footnotes of scientific papers, much of the intellectual infrastructure of today’s advances was laid decades ago. In the 1940s, the mathematician Claude Shannon demonstrated that language use could be both described by statistics and imitated with statistics, whether those statistics were in human heads or a machine’s memory. Shannon, in other words, was the first statistical language modeler, which makes ChatGPT and its ilk his distant brainchildren. Shannon never tried to build such a machine, but some astute early readers of his work recognized that computers were primed to translate his paper-and-ink experiments into a powerful new medium. In writings now discussed largely in niche scholarly and computing circles, these readers imagined—and even made preliminary sketches of—machines that would translate Shannon’s proposals into reality. These readers likewise raised questions about the meaning of such machines’ outputs and wondered what the machines revealed about our capacity to write.

The current barrage of commentary has largely neglected this backstory, and our discussions suffer for forgetting that issues that appear novel to us belong to the mid-twentieth century. Shannon and his first readers were the original residents of the headspace in which so many of us now find ourselves. Their ambitions and insights have left traces on our discourse, just as their silences and uncertainties haunt our exchanges. If writing machines constitute a “philosophical event” or a “prompt for philosophizing,” then I submit that we are already living in the event’s aftermath, which is to say, in Shannon’s aftermath. Amid the rampant speculation about a future dominated by writing machines, I propose that we turn in the other direction to listen to field reports from some of the first people to consider what it meant to read and write in Shannon’s world.

Prediction Game

It is a great historical irony that the forefather of chatty machines would be a man so sparing of speech. Shannon was a reluctant spokesman for his work on language, and he was disinclined to write a textbook that would explain his thinking on the subject and its potential ramifications for the dawning digital age. (That task would fall to others, as I discuss below.) But there is another way to get at his ideas, and that is to meet the Shannons, Claude and his second wife, Mary, at home. We need only watch them play a game.

The game begins when Claude pulls a book down from the shelf, concealing the title in the process. After selecting a passage at random, he challenges Mary to guess its contents letter by letter. Since the text consists of modern printed English, the space between words will count as a twenty-seventh symbol in the set. If Mary fails to guess a letter correctly, Claude promises to supply the right one so that the game can continue. Her first guess, “T,” is spot-on, and she translates it into the full word “The” followed by a space. She misses the next two letters (“ro”), however, before filling in the ensuing eight slots (“oom_was_”). That rhythm of stumbles and runs will persist throughout the game. In some cases, a corrected mistake allows her to fill in the remainder of the word; elsewhere a few letters unlock a phrase. All in all, she guesses 89 of 129 possible letters correctly—69 percent accuracy.

In his 1951 paper “Prediction and Entropy of Printed English,” Claude Shannon reported the results as follows, listing the target passage—clipped from Raymond Chandler’s 1936 detective story “Pickup on Noon Street”—above his wife’s guesses, indicating a correct guess with a bespoke system of dashes, underlining, and ellipses (which I’ve simplified here):

(1) THE ROOM WAS NOT VERY LIGHT A SMALL OBLONG
(2) - - - - ROO - - - - - - NOT-V- - - - -I- - - - - - SM- - - - OBL- - - -
(1) READING LAMP ON THE DESK SHED GLOW ON
(2) REA- - - - - - - - - - -O- - - - - - D- - - -SHED-GLO - -O - -
(1) POLISHED WOOD BUT LESS ON THE SHABBY RED CARPET
(2) P-L-S- - - - - - O - - - BU- -L-S- -O- - - - - - SH- - - - - RE- -C- - - - - -⁵5xClaude Shannon, “Prediction and Entropy of Printed English,” Bell Systems Technical Journal 30, no. 1 (January 1951), 54.

What does this prove? The game may seem a perverse exercise in misreading (or even nonreading), but Shannon argued that the exercise was in fact not so outlandish. It illustrated, in the first place, that a proficient speaker of a language possesses an “enormous” but implicit knowledge of the statistics of that language. Shannon would have us see that we make similar calculations regularly in everyday life—such as when we “fill in missing or incorrect letters in proof-reading” or “complete an unfinished phrase in conversation.”⁶6xIbid. As we speak, read, and write, we are regularly engaged in prediction games.

But the game works, Shannon further observed, only because English itself is predictable—and so amenable to statistical modeling. Some letters and words show up more frequently than others, of course, and Mary’s knowledge of those frequencies clearly aided her guesswork. In the Chandler passage, there are (as we would expect) several e’s, a’s, and o’s but only one v (and no z’s). The words “the” and “on” appear three times each but “oblong” just once. Mary’s performance didn’t just reflect general patterns in the language, though. She also benefited from the fact that the mental labor of this exercise lightens as letters accumulate and familiar words, phrases, and grammatical sequences coalesce. Mary’s correct guess of “t” for the first letter made “h” an obvious next choice, and then “e” even more so. Once Claude supplied the “d,” the rest of “desk” slid easily into place, since that’s where reading lamps belong in small oblong rooms (not on a divan, diving board, or dinosaur egg).

Shannon’s next point was the most striking of all. Counterintuitively, he argued that Chandler’s complete text (line 1) and the “reduced text” consisting of letters and dashes (line 2) “actually…contain the same information” under certain conditions.⁷7xIbid., 55. How can this be? (Surely, our eyes tell us, the first line contains more information!) The answer depends on the peculiar notion about information that Shannon had hatched in his 1948 paper “A Mathematical Theory of Communication” (hereafter “MTC”), the founding charter of information theory.⁸8xClaude Shannon, “A Mathematical Theory of Communication,” Bell Systems Technical Journal 27, no. 3 (July 1948): 379–423, and 27, no. 4 (October 1948): 623–656. In that article, Shannon asserted that an engineer ought to be agnostic about a message’s “meaning” (or “semantic aspects”). The message could be nonsense, and the engineer’s problem—to transfer its components faithfully—would be the same. The technician should worry only about the material, particularly about whether the message’s innards are haphazard or patterned. On Shannon’s reckoning, a highly predictable message (“Twinkle, twinkle…”) contains less information than an unconventional one (“villapleach, vollapluck”). Thus, to return to the guessing game, more information was at stake in the uncertain situations Mary encountered—in which she was faced with multiple viable options.

This thinking lies behind Shannon’s equation of Chandler’s text and the results of Mary’s guessing. Dashes in line 2 indicate the most predictable letters, and, in turn, the ones increasingly redundant for the purpose of decryption. Shannon then proposes an illuminating thought experiment: Imagine that Mary has a truly identical twin (call her “Martha”). If we supply Martha with the “reduced text,” she should be able to recreate the entirety of Chandler’s passage, since she possesses the same statistical knowledge of English as Mary. Martha would make Mary’s guesses in reverse. Of course, Shannon admitted, there are no “mathematically identical twins” to be found, “but we do have mathematically identical computing machines.”⁹9xClaude Shannon, “Prediction and Entropy of Printed English,” 55. Those machines could be given a model for making informed predictions about letters, words, maybe larger phrases and messages. In one fell swoop, Shannon had demonstrated that language use has a statistical side, that languages are, in turn, predictable, and that computers too can play the prediction game.

The immediate applications of information theory were in telecommunications. Shannon’s package of ideas was a boon to anyone encoding and compressing signs in order to send them speedily over long, noisy distances—such as AT&T, the parent company of the R&D outfit where Shannon worked, Bell Labs. But you could also do something else with this theory, at least in principle: You could build a system that exploited the statistical properties of a language—particularly in its written form—in order to generate plausible text. As the experiment with Mary demonstrated, the inputs could be astonishingly minimal, since so much of what “comes next” in a language is baked in by grammatical rules and cultural conventions. An independent writing machine—a natural language generator—was no longer the stuff of science fiction.

Suitably enough, science-fiction fans were among the first to be put on alert.

Semantic Selector

Readers leafing through the October 1949 issue of Astounding Science Fiction encountered the expected fare of stories—including a novella by L. Ron Hubbard, soon to found Scientology—before coming on a nonfictional article titled “Chance Remarks” by occasional contributor J.J. Coupling.¹⁰10xJ.J. Coupling, “Chance Remarks,” Astounding Science Fiction 44, no. 2 (October 1949): 104–111. Breathlessly, the editors boosted the piece as a nonfictional improvement on its own science fiction: “Remember ‘Fifty Million Monkeys’? There was a basic idea in that, and this pure fact article recounts an actual analysis of the problem of a ‘semantic selector’—and how it might work!”

The Astounding Science Fiction editors were referring to a story that had run in the magazine six years earlier, in which a scientist employs the “infinite monkey theorem”—yes, he actually sits monkeys at typewriters—in hopes that their output will accidentally suggest a strategy to prevent the universe’s imminent collapse. But then the scientist faces the problem of locating the proverbial needle in the haystack of “meaningless garble.” His answer is a machine, the “semantic selector,” designed to detect the guidance he seeks amid all that random simian pecking.¹¹11xIbid., 104.

How can Coupling top that? He begins by suggesting that the semantic selector was not so farfetched; in fact, “something very like that has appeared in the most respectable sort of print”—Shannon’s “MTC.” After touting Shannon’s credentials and his theory’s importance, Coupling quickly zeroes in on an early section of the paper that “deals with something which seems right up the science fiction alley”: “the statistical structure of written English.” Following Shannon, Coupling sets the stage by observing that we regularly deploy “our unconscious knowledge of such statistical rules” in our own reading and writing.¹²12xIbid., 106. Those statistics, Coupling then explains, are a blessing to the engineer zipping countless messages over a wire because they allow for the reduction of redundancies. (You can cut the easy guesses, as discussed above.) But there is also a truth-is-stranger-than-fiction implication to that same fact, and this is the real occasion for Coupling’s dispatch: Shannon has demonstrated that a nonhuman process can produce humanlike language without direction. You can forego the monkeys: Statistics will suffice.

To understand how this is possible, one must look at language as Shannon did, and here Coupling quotes Shannon directly: “To a mathematician, language is a ‘stochastic—i.e., statistical—process which generates a discrete sequence of symbols from a finite set.’”¹³13xShannon, “A Mathematical Theory,” quoted in Coupling, 107. “Stochastic process,” from the ancient Greek stokhos. meaning “aim” or “guess,” was a term gaining momentum in the sciences in the thirties and forties to describe phenomena neither wholly arbitrary nor fully determined (“the meeting ground between ‘chance’ and ‘law’,” as a mathematician wrote in this period).¹⁴14xAnatol Rapoport, “Stochastic, Mechanical, and Teleological Views of Homeostasis,” Homeostatic Mechanisms (Upton, NY: Brookhaven National Laboratory, 1958), 247.

Shannon was dealing with a special class of these phenomena called Markov chains, in which the current state of a process influences where it goes next. This subspecies is perfectly suited to written language, which indeed had been part of its origin story. To repudiate a rival Russian mathematician’s claim that math could prove the existence of free will, Andrey Markov had analyzed 20,000 letters in Alexander Pushkin’s classic poem Eugene Onegin and shown that if any given letter was a vowel, the next letter was more likely be a consonant than a vowel, and vice versa. Markov’s deeper point was that what seems utterly random at the local level may in fact operate according to statistical patterns discernible over time.

Shannon turned Markov’s project around: Rather than analyze texts to reveal underlying frequencies, he illustrated the stochastic character of language by assembling sequences of his own. Shannon became a text generator, first using a book of random numbers and a table of letter frequencies appended to a history of cryptography and then approximating a stochastic process with a procedure that was equal parts page flipping and word search. Coupling eagerly quotes outputs that Shannon included in “MTC” (originally generated for its1945 precursor, A Mathematical Theory of Cryptography, once classified by the War Department but now available online). Shannon observed that “the resemblance to ordinary English text increases quite noticeably” as one increases the length of the Markov chain. For Coupling, meanwhile, Shannon was demonstrating how one can produce language through stochastic processes alone.

To set the tone, Coupling offers the gibberish Shannon amassed when picking letters entirely at random: “XFOML RXKHRJFFJUJ…” Things barely improved when letters were selected according to general frequencies in printed English: “OCRO HLI RGWR…” Yet when the simplest connection between letters was applied—a “digram” in which the probability of the next letter is determined by the preceding one—actual English words and word-like strings began to materialize: “ON IE ANTSOUTINYS ARE T INCTORE ST BE S DEAMY…” That trend continues into Shannon’s example of what happens when the previous two letters affect the next choice (a “trigram”): “IN NO IST WHEY CRATICT FROURE BIRS GROCID PONDEROME OF DEMONSTURES OF THE REPTAGIN IS…”¹⁵15xShannon, “A Mathematical Theory,” quoted in Coupling, 108.

Shannon then switched to whole words because letter tetragrams and pentagrams were simply too difficult to construct by hand. And, again, choosing elements merely on the basis of their individual probabilities—as in, unconnected to their neighbors—yielded a word salad: “REPRESENTING AND SPEEDILY IS AN GOOD APT OR COME CAN…” The more intriguing possibility, obviously, would be to see what happened when there was a statistical connection between adjacent words—a word-level digram. No tables were sufficiently comprehensive, however, so Shannon fabricated a “laborious process” that simulated those probabilities on his own. Skimming through a novel in search of some word, noting its immediate successor, then heading off to find the next occurrence of that new word, he compiled this sequence:

THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHARACTER OF THIS POINT IS THEREFORE ANOTHER METHOD FOR THE LETTERS THAT THE TIME OF WHO EVER TOLD THE PROBLEM FOR AN UNEXPECTED.¹⁶16xShannon, “A Mathematical Theory,” 385.

This assemblage was nowhere close to being a respectable English sentence, but clusters of intelligible language had nonetheless begun to form without Shannon’s conscious direction. Statistics could (almost) make sense on its own. Shannon was especially pleased with “the particular sequence of ten words ‘attack on an English writer that the character of this.’”¹⁷17xIbid. Coupling wryly notes that we may experience a “strange feeling that we have seen something like this before,” observing that passages in James’s Joyce’s Ulysses and Finnegans Wake are “scarcely more intelligible.”¹⁸18xCoupling, “Chance Remarks,” 109.

Here Shannon called his text generator to a halt. He had no need to continue—he had made his point about written language and stochastic processes—and the labor involved in assembling a trigram appeared overwhelming. Yet he adds in his paper that “it would be interesting if further approximations could be constructed.”¹⁹19xShannon, “A Mathematical Theory of Communication,” 386. Coupling seems to have taken those words as a challenge. “Can we,” he asks, “by the use of a more elaborate statistical choice of words, rule out all word combinations that don’t make sense?”²⁰20xCoupling, “Chance Remarks,” 109. How far could probability take you?

At this point, I should drop the façade and acknowledge what you’ve been suspecting all along: that this guy Coupling was no hack scratching out a living as a sci-fi writer. The writer’s insider knowledge of Shannon’s work betrays that “J.J. Coupling”—a term borrowed from atomic physics—was one of Shannon’s colleagues at Bell Labs, John R. Pierce in fact, a pioneer in space-based telecommunications. Pierce all but admits the truth toward the close of the piece when he points out that his own text generations were compiled over “a couple of hours…in a conference room with two mathematicians and two engineers.”²¹21xIbid., 110. (There is reason to suspect that Shannon was in that room. He was thus testing his wife’s language statistics at home and having his own tested by Pierce at work.)

Pierce sought to assemble a more coherent sequence by enlarging the number of words available to the subject before selecting the next one. His experiments took the form of a writing and drawing game popular in the 1920s, exquisite corpse: One by one, subjects were shown the most recent three words of the text in development, then asked (1) to imagine a sentence in which those three words made sense and (2) to write down the next word in that sentence before moving the passage along to the next participant. The process was far from scientific, and one can see why Pierce initially published his results in Astounding Science Fiction rather than a technical journal. But his scheme did yield largely grammatical, if odd, constructions—for example, “When cooked asparagus has a delicious flavor suggesting apples” and “It happened one frosty look of trees waving gracefully against the wall.” To improve coherence, he provided guessers with an overarching theme. “Salaries” begins “Money isn’t everything. However, we need considerably more incentive to produce efficiently.”²²22xIbid., 111.

These experiments were enough to convince Pierce that something like a “semantic selector” was conceivable in real life. Indeed, he seems to have come to understand the generative aspect of Shannon’s work as a core element of information theory, including all of the stochastic texts I’ve quoted here, as well as several others, in Symbols, Signals and Noise (1961), a popular introduction to information theory that Pierce wrote because Shannon was disinclined to write it himself.²³23xJohn R. Pierce, Symbols, Signals and Noise: The Nature and Process of Communication (New York, NY: Harper and Brothers, 1961). Pierce’s book was written for use by nonspecialists. More technical introductions had already been published by Solomon Kullback (Information Theory and Statistics, 1959) and David Bell (Information Theory and its Engineering Applications, 1953). Later editions of Pierce’s book bore the title An Introduction to Information Theory: Symbols, Signals and Noise. That book also offered multiple sketches of how a “sentence-producing” machine might function, such as by proceeding from word to word according to transition probabilities or, drawing inspiration from Noam Chomsky’s work, by beginning with a “kernel sentence” and then assembling its output part by part (noun phrase, verb phrase, etc.).

Although Pierce was not sure which method would prove successful (he could not foresee that his contemporaries Warren McCulloch and Walter Pitts’s work on neural nets would be decisive), he recognized writing machines as a distant but inevitable feature of the digital age. He pointed to the highly conventional genre of detective fiction as a likely candidate for mechanization—such a machine could even be “equipped with settings for hard-boiled, puzzle, character, suspense, and so on.” The open question for Pierce was quality—whether a computer could “produce text of any literary merit by means of grammatical rules and a sequence of random numbers” (italics added).²⁴24xIbid., 264.

Shannon, once again, was officially agnostic about “meaning.” Pierce was among the first to argue that questions of meaning and value could not be so easily set aside. In “Chance Remarks” and other writings on computer-generated art over the ensuing decades, he testified to the psychological difficulty of confronting machine-generated texts and images whose structures were based not on thought or artistic vision but statistics. He confessed in 1949 that “it is a little disturbing to think that an elaborate machine, taking longer-range statistics into account, would have done still better” than the human subjects he enlisted. He felt, though, that a reader might derive “meaning” from and experience “aesthetic appreciation” for a machine’s textual output even if it meant nothing to the machine. He wondered if the machine’s unexpected and even oddball outputs would be creatively useful. (For example, he liked “deamy” in Shannon’s letter digram sequence.²⁵25xCoupling, “Chance Remarks,” 108, 111.)

In so many ways, Pierce anticipated the mental condition in which we now find ourselves. How does language work? What does “meaning” mean? What counts as art? What is the nature of creativity? The electrical engineer was asking all of these questions decades ago. He knew that he didn’t have persuasive answers, but he believed, as he would argue in Symbols, Signals and Noise, that he could at least bring a “scientifically informed ignorance” to bear on them. That seemed to him an improvement on the “confusions and puzzlements” thrown up by philosophers from Plato to Locke. He saw clearly only that technology had outpaced received ideas about language. Sound familiar?

In 1949, perhaps the most difficult question of all pertained not to machines but to humans. We presume, Pierce writes, that writing is “coherent over long stretches…because of some overriding purpose in the writer’s mind.” Yet Shannon’s work troubled this presumption. If we possess a vast but tacit knowledge of the statistics of our language, could it be that our writing is coherent simply because we are “unconsciously” obeying those statistics? Do we possess the statistics or do the statistics possess us? Pierce—as J.J. Coupling—asks, “How many times does a person let his pen or tongue, started by some initial impetus, merely run through a sequence of probable words?”²⁶26xCoupling, “Chance Remarks,” 110. Are we also writing machines?

Semantic Receiver

“Chance Remarks” was, in fact, the second popular report on “MTC,” the first having appeared in the pages of the venerable Scientific American a few months earlier.²⁷27xWarren Weaver, “The Mathematics of Communication,” Scientific American 181, no. 1 (July 1949): 11–15. This piece was no less speculative, though the presentation was of course more scientific. The author in this case was the mathematician Warren Weaver (no need for a pseudonym this time), then the Rockefeller Foundation’s director for the natural sciences.

From that station, Weaver was running a philanthropic booster club for postwar science, and the future of computing was one of his interests. He was particularly intrigued by the role computers had played in decrypting messages during the recent war, and he seems to have known of—if not read—Shannon’s classified work on cryptography. In particular, he wondered if the wartime advances in decipherng messages by statistically analyzing their makeup could be applied to the field of translation—a true “machine translation,” a phrase first used at this time.

Weaver was thus already thinking about language as a statistical problem before “MTC” landed on his desk, and his article for Scientific American immediately signals that Weaver has a bigger picture in view than Shannon’s direct concerns in his paper. Weaver begins not with Shannon’s narrow understanding of “information” but with a “very broad” approach to communication that encompasses “all of the procedures by which one mind can affect another.” From that vantage point, Weaver proposes a tripartite understanding of “communication” adapted from the work of the semiotician Charles Morris: “1) technical, 2) semantic, and 3) influential.” (Morris called these levels “syntactic,” “semantic,” and “pragmatic.”) The first, base level, Weaver explains, addresses the “accuracy of the transference of information from sender to receiver,” the second the relationship between the sender’s intended meaning and the receiver’s interpretation of that meaning, and the third, top level whether the “meaning conveyed to the receiver leads to the desired conduct on his part.”²⁸28xIbid., 11.

The technical level would seem to belong to the engineer and the other two to the philosopher, Weaver grants, but he attempts to show that “MTC” “reveals facts about the statistical structure of the English language…which must seem significant to students of every phrase of language and communication.” This claim should surprise us, since Shannon had quite resolutely planted his flag in level one and declared that he had no business with level two (much less level three). After explaining “MTC” in layman’s terms, Weaver acknowledges as much toward the end of the piece, admitting that Shannon’s approach to information “at first seems disappointing and bizarre—disappointing because it has nothing to do with meaning, and bizarre because it deals not with a single message but rather with the statistical character of a whole ensemble of messages.”²⁹29xIbid., 14.

Weaver insists, however, that Shannon has undersold his achievement. To put it another way, Weaver did not share Shannon’s semantic agnosticism: He was an eager believer! In Shannon’s work, Weaver perceived a new paradigm for communication at level two (semantics), in particular: Shannon’s “analysis has so penetratingly cleared the air that one is now perhaps for the first time ready for a real theory of meaning.”³⁰30xIbid.

This proposal has prompted much headshaking and eye-rolling since the publication of Weaver’s article in Scientific American and then as the first section of The Mathematical Theory of Communication (notice the shift from “a” to “the”), a book-length work he and Shannon published in late 1949.³¹31xClaude Shannon and Warren Weaver, The Mathematical Theory of Communication (Urbana, IL: University of Illinois Press, 1949). Poor old Weaver, commentators lament, didn’t fully grasp Shannon’s theory, was willfully misreading “MTC,” was passing out of his depth in two fields—information theory and linguistics—at the same time. There is some truth to all these claims, but Weaver’s critics also frequently miss the point that Weaver was trying to make about “meaning” because they neglect the fact that Weaver was coming at Shannon’s work from the particular angle of machine translation.

In that endeavor, Weaver had run into some conceptual roadblocks, and he appears to have found encouragement for his thinking while reading Shannon. We can surmise this because in the same month that his article in Scientific American ran, Weaver distributed a memorandum (titled simply “Translation”) on the feasibility of machine translation that would set the tone for the first decade (at least) of work in the emerging field. There Weaver acknowledges that any machine translation scheme must solve the “problem of multiple meaning,” or what would subsequently be called “word sense disambiguation.”³²32xWarren Weaver, “Translation,” in Machine Translation of Languages: Fourteen Essays, ed. William Locke and Donald Booth (New York, NY: John Wiley, 1955): 15–23. Weaver cites the example of “fast.” It functions in diametrically opposed ways: it can mean “rapid”—as in, you are driving too fast—or “motionless”—stand fast! Humans resolve this difficulty by looking at a word’s surroundings. As elementary school teachers say, you look for “contextual clues.”

Perhaps a computer, Weaver suggests, could imitate this strategy. Rather than decode a text word by word—which he likens to advancing through a book while wearing “an opaque mask with a hole in it one word wide”—you could widen the slit so that the computer could peek at a word’s neighborhood. “If one lengthens the slit in the opaque mask, until one can see not only the central word in question but also say N words on either side,” he writes, “then, if N is large enough one can unambiguously decide the meaning of the central word.”³³33xIbid., 21. When Weaver speaks of the application of probability theory to semantics in his Scientific American piece, he has situations like this in mind. Sense A of some polysemous word could be differentiated from sense B by determining the words likeliest to appear in the vicinity of each usage. A computer could, in turn, decide on the “meaning” of any given instance of a word by noting adjacent items and making a probability-driven guess. The problem of meaning becomes yet another problem of prediction.

Weaver’s longed-for “real theory of meaning,” then, would set aside the traditional concerns of semantics—truth and reference—in favor of patterns of usage that could be translated into statistics. In suggesting this way forward, Weaver anticipates Christopher Manning’s argument, discussed in the opening pages of the present article, that meaning consists of grasping the network of connections between linguistic forms and other linguistic forms. And that resemblance shouldn’t surprise us because Manning’s definition of meaning has its origins in the “distributional semantics” of the linguist Zellig Harris, a recipient of Weaver’s memorandum.

But Weaver also seems to have perceived the flipside of the disambiguation problem: If you could differentiate words according to patterns of usage, you could conceivably also use patterns to detect words that mean roughly the same thing—statistics as synonym detector. Harris would make just this point in his important 1954 paper “Distributional Structure.” Using the example of “oculist” and “eye-doctor,” he argued that words occurring “in almost the same environments” can be taken mean the same thing.³⁴34xZellig Harris, “Distributional Structure,” Word 10, nos. 2–3 (1954): 146–162. (Harris’s contemporary J.R. Firth gave this point a pithy ring in a widely quoted aphorism: “You shall know a word by the company it keeps.”) This conjecture led Weaver to propose two additions to Shannon’s famous diagram (see below) of a “general communication system.”³⁵35xShannon, “A Mathematical Theory of Communication,” 381.

Claude Shannon’s diagram of a general communication system.

Weaver called his first add-on the “Semantic Receiver” and the second “Semantic Noise.” The first was a device to be “interposed” between the receiver—which converted the encoded signal into a message—and the destination or audience. It was to perform two functions. First, it would “[subject] the message to a second decoding,” “[matching] the statistical semantic characteristics of the message to the statistical semantic capacities of the totality of receivers, or that subset of receivers which constitutes the audience one wishes to affect.” In other words, it would rephrase the message in terms better suited to one’s specific audience. Second, the Semantic Receiver was to deal with Semantic Noise, which Weaver defined as the inevitable lapses on the part of a speaker or writer that introduce unintentional and potentially hazardous “distortions of meaning” into a message.³⁶36xShannon and Weaver, The Mathematical Theory of Communication, 14.

These ideas surely seemed like unwarranted leaps to many observers at the time (including, one suspects, to Shannon), but the Semantic Receiver makes sense within Weaver’s larger ambitions. Here he was expanding the horizons of machine translation: Rather than move between languages, the Semantic Receiver would translate between discourse communities—whose habits and jargon would themselves be evaluated statistically. At the same time, Weaver realized that errors were simply another disambiguation problem: The machine could be called upon to determine what we meant to say—or what should have been there, statistically speaking. The Semantic Receiver was the original autocorrect. Transmission was blending with translation, machine reading with machine writing; the mathematician was imagining a system in which messages could be not only faithfully preserved by statistical means (Shannon’s concern) but improved.

The Literature Machine’s Attendants

I have pitched this backward glance as a chance to reassess the present moment. To watch the Shannons play a guessing game, to inspect the “exquisite corpses” generated by Pierce and his colleagues, to witness Weaver assemble the Semantic Receiver out of Shannon’s ideas, is to be reminded that writing machines passed from science fiction into real life a generation ago. These people, and many more in the ensuing decades, occupied the generative AI headspace long before millions of us wandered in. That is not an insignificant historical fact, given, as I noted at the outset, the current urge to chisel LLM into a new philosophical landmark. But beyond questions of historical precision and revision, there is a deeper benefit to returning to the early dispatches from Shannon World: the chance to consider possible futures.

That is what Pierce and Weaver were doing in 1949. Starting from the same point, they were moving in different directions. Pierce—the engineer moonlighting as sci-fi writer—was pondering the future of creativity and what Shannon had revealed about language’s hold over us; writing machines appear a more ambivalent prospect in his account. Pierce was eager to figure out how to build one, but he was surprisingly honest in his admissions that he did not know what writing machines would mean culturally speaking. He had questions but no sure answers. Weaver, meanwhile, was happily inviting computers into our conversations because he viewed them as means to clear up human communication on multiple levels. Above all, Weaver hoped the machines would enhance our collective precision. (Notably, Weaver revised the core question of the second level of communication as follows: “How precisely do the transmitted symbols convey the desired meaning?”³⁷37xIbid.) Weaver dreamed that statistically powered semantic machines would help speakers and writers say exactly what they meant, free of errors, and in terms the intended audience could easily grasp.

That is a laudable ambition, without question. Yet a threat lurks in Weaver’s program, one important to acknowledge as generative AI creeps into ordinary word processing software: that the statistics governing the machine could become prescriptive, telling us what we should or even must say. Or even worse, modifying our language without consulting us at all. I may seem to be overreacting here, but we must bear in mind that Weaver was writing at a time when many philosophers and scientists were calling for language reform on the grounds that language was not just inexact but permeated with the misleading cant and mysticisms of past ages. The house of language was seen as needing a thorough cleaning. Weaver knew that discourse, and its spirit is detectable in his Scientific American article.

As that piece is winding down, Weaver makes his most overt gesture in this direction, calling on “language”—curiously disembodied—to become more statistically minded: “Language must be designed, or developed, with a view to the totality of things that man may wish to say; but not being able to accomplish everything, it should do as well as possible as often as possible. That is to say, it too should deal with its task statistically.”³⁸38xIbid., 15. A writing machine would do just that; it would be a model communicator. But what about the humans hooked up to the machine? De mortuis nil nisi bonum dicendum est—“Of the dead, nothing but good is to be said.” Clearly, Weaver wished for the Semantic Receiver and its descendants to be our assistants. Yet you do not need to strain your eyes to see the potential for such a device to become our overseer.

Pierce and Weaver were not the only ones wondering about the future. As word got out about information theory, other engineers and artists tinkered with Shannon’s ideas, mulled the future, and built the first (stuttering) generation of natural language generators, some integrating chance and statistics into their design.

Coming from a more imaginative direction was a writer who allowed himself to entertain the thought that a machine could rival his own considerable powers. In his 1967 lecture “Cybernetics and Ghosts,” Italo Calvino reviews much of the terrain passed over in this article. He recognizes the digital age as an existential condition as well as a technological one, in which the ideas of Claude Shannon, John von Neumann, and Alan Turing were changing the way humans perceived themselves and their machines. “The world in its various aspects is increasingly looked upon as discrete rather than continuous,” Calvino says. “I am using the term ‘discrete’ in the sense it bears in mathematics, a discrete quantity being one made up of separate parts.” As a result, thought (“which until the other day appeared to us as something fluid”) was being reconceived as “as a series of discontinuous states, of combinations of impulses acting on a finite (though enormous) number of sensory and motor organs.” Under the terms of this Discrete Age, the new “electronic brains” (i.e., computers) appeared to be “a convincing theoretical model for the most complex processes of our memory, our mental associations, our imagination, our conscience.”³⁹39xItalo Calvino, “Cybernetics and Ghosts,” in The Uses of Literature, trans. Patrick Creagh (New York, NY: Harcourt Brace, 1986), 3–27.

Fields of human endeavor previously closed off from mechanization now seemed to be in play, including “the most complex and unpredictable of all [humanity’s] machines: language.” Calvino bears witness to the advances in natural language processing already occurring in the sixties. Computers were “dismantling and reassembling” language—undertaking translations, performing linguistic analysis, summarizing passages, engaging in their own modes of reading. He could not help but wonder if computers would soon arrive at his own door: “Will we also have machines capable of conceiving and composing poems and novels?”⁴⁰40xIbid., 10.

Remarkably, Calvino quickly concedes that a “literature machine” (his phrase) could be a successful writer. While granting that this might seem like a strange stance, he explains that he has always been skeptical of the vision of writers as the “Voice of the Times” or the chosen vessels of the “Spirit of the World.” Literature, he argues, is not an inspired activity but a “combinatorial art”: “a constant series of attempts to make one word stay put after another by following certain definite rules; or more often, rules that were neither definite nor definable, but that might be extracted from a series of examples.”⁴¹41xIbid., 15. And if writing is a practice of piecing things together, then, he reasons, a computer could do it, and likely do it well. He was, moreover, untroubled by the thought that he might be a “writing machine”—indeed, he thought that that is exactly what happens to a writer “when things are going well.”⁴²42xIbid., 14, 15.

But while Calvino yields composition to the machines without a struggle, he clings tightly to reading. In fact, he argues that the rise of writing machines may even benefit reading, since now “the decisive moment of literary life will be that of reading.” Why? Calvino’s immediate answer is that reading is the site of “unexpected meanings,” which he explains on loose psychoanalytic terms as the moment when “a meaning that is not patent on the linguistic plane on which we were working” “[slips] in” from another, unconscious level.⁴³43xIbid., 22. In layman’s terms, Calvino is granting the reader the “aha moment,” the flash of insight, the shock of disclosure, the instant of recollection (of something perhaps known to an earlier self or to past generations). Traditionally, this meaning has been the writer’s charge; but Calvino argues that the electricity of literary discovery belongs equally to the reader. The writing machine’s entrance clarifies the priority of human reading.

This claim of reading’s priority would be enriched again and again in Calvino’s subsequent writings—prompted in part, I suspect, by his recognition of the increasing power of the machines. The crescendo comes in If on a Winter’s Night a Traveler (1979), a metafictional novel that records the diverse desires and postures we experience in reading. In the eleventh chapter, the second-person main character (you) encounters seven representative readers. One reader can take in only a few words before getting lost in the thoughts they inspire: “having seized on a thought that the text suggests to it, or a feeling, or a question, or an image, [my mind] goes off on a tangent and springs from thought to thought, from image to image, in an itinerary of reasonings and fantasies.” Another likewise “isolates some minimal segments” but then stays with them, mining their “extremely concentrated density of meaning.” The third is captivated by re-reading, wondering aloud whether we change between readings or if reading itself is an activity that “cannot be repeated twice according to the same pattern.” For the fourth, every book becomes part of an “overall and unitary book,” while the fifth reads in every book echoes of an urtext “that comes before all other stories.” The sixth loves the moment of infinite potential when facing the title page, and the seventh “the spaces that extend beyond the words ‘the end.’”⁴⁴44xItalo Calvino, If on a Winter’s Night a Traveler, trans. William Weaver (San Diego, CA: Harcourt Brace, 1979).

Machines may, Calvino argued, fit together “all the permutations possible on a given material,” but they cannot replicate the myriad and often unpredictable operations that occur within our reading. We can be any and all of these readers, and our manner of practicing these operations will be distinct, as will be their results. Calvino grasped that we are as much misreaders as readers, and that our most profound flashes of insight often happen when we stray from the strict protocols of reading—when the words we encounter are mediated by the experiences of our senses and by our personal and collective memories. We incorporate what we read into ourselves, transforming it and ourselves in the process, in a manner that mind-and-body-less machines, no matter how wide and finely woven their neural nets, cannot.

For Calvino, the “literature machine” was no more than a conjecture; he doubted that it would be “worth the trouble” to build one. But now that such a machine exists, Calvino’s meditations are crucial, because he calls attention to a dimension largely ignored in the current commotion. Generative AI dazzles us with its ability to write in an array of genres and, having absorbed Wikipedia and Google Books, to do so with surprising acumen on countless topics. And the machines, we are told, will only get better. But Calvino’s writing prods us to remember that the ultimate semantic receivers, selectors, and transmitters are still us. In the global network of communicants, machine and human, we enjoy—and will continue to enjoy—a privileged position as nodes mediating word and world, past and present, matter and spirit. Amid the staggering investment now taking place in the next generation of loquacious artificial intelligences, should we not be equally concerned about training the next generation of humans—the ones who will solicit, evaluate, edit, and circulate the machines’ outputs? You may build a better writing machine, but it will be worthless unless you build better readers. Calvino anticipated the urgent question of our time: Who will attend to the machines’ writing?

Reprinted from The Hedgehog Review 25.3 (Fall 2023). This essay may not be resold, reprinted, or redistributed for compensation of any kind without prior written permission. Please contact The Hedgehog Review for further details.

Markets and the Good / Fall 2023 / Essays

Language Machinery

Who will attend to the machines’ writing?

Richard Hughes Gibson

Prediction Game

Semantic Selector

Semantic Receiver

The Literature Machine’s Attendants

Markets and the Good / Fall 2023 / Essays

Language Machinery

Who will attend to the machines’ writing?

Richard Hughes Gibson

Prediction Game

Semantic Selector

Semantic Receiver

The Literature Machine’s Attendants

Sign up for our newsletter