What is Metadata, and Why Should We Care?

In June 2013, when Edward Snowden began to unveil the nature and extent of the National Security Agency’s activities, the NSA’s telephone metadata program was the first big revelation. Metadata, no doubt, was an unfamiliar notion to many at the time. Indeed, it typically appeared in early news reports on the NSA program in scare quotes, as “metadata,” suggesting both its unfamiliarity and its technical application. Since those first Snowden revelations we have learned that the NSA’s “metadata” activities have not been restricted to phone calls, but have extended to the farthest reaches of the internet, from Angry Bird games to webcams. All this metadata intelligence might lead us to ask, what is metadata?


The most recent entry on “metadata” in Wikipedia puts the matter about as clearly as it can be stated: “Metadata is ‘data about data.’ The term is ambiguous.” Indeed. “Metadata” is a word bestowed to us by the information sciences and their close collaborators, the computer sciences. It is a key word of the information age.

The most common form of metadata for most of us is found in library catalogs. When I search “Adventures of Huckleberry Finn” in the Library of Congress online catalog, I eventually will get to a page that tells me not only where the book is located in the library, but such information as its author, its publisher, its length, ISBN, and so on. Such information is metadata: “data” (author, publisher, etc.) about “data” (the published book called "Adventures of Huckleberry Finn"). But “metadata” is found in other contexts, too. If you were to look at the source code behind this blog page, for example, you’d find metadata, in this case data that reveals what publishing platform is being used, what kind of devices the blog page is designed to work with, and the title of the blog. Here again we have “data” about “data.” In these instances, “metadata” is a means of organizing bits of what can be thought of as “primary” information—a book or a blog post—by means of “secondary” bits of information, categories, descriptions, and so on. “Meta” in metadata lays out this primary-secondary structure, referring to the fact that the metadata is “about” other data.

In the case of the NSA activities, however, “metadata” means something a bit more slippery, and is arguably a misnomer. Whereas in the card catalog and the blog post metadata is secondary data about primary data, in the NSA’s program metadata is, in fact, for all intents and purposes, primary data. In the case of the phone metadata program, the NSA has engaged in the dragnet collection and storage of a wide range of data about particular phone calls—what number was called, when, for how long, the nearest cell tower, and more technical information like the International Mobile Subscriber Identity. While technically such information is justifiably called “metadata,” as it is is secondary data “about” the primary data, the voice content of a particular phone conversation, practically speaking such metadata functions as primary data. Even if the NSA was given sanction by the Foreign Intelligence Surveillance Court to record the voice conversations of phone calls, we could not conclude that such data would be any more “primary” than the sort of metadata they are already collecting. As David Cole has recently argued in The New York Review of Books, the sort of “metadata” that can be gathered about our various communications, whether on the phone or through the Internet, can tell intelligence analysts as much about us as eavesdropping in on our phone calls. The metadata may be of a different order—more about daily patterns, social networks, and financial transactions than about intimate words—but as anthropologists and sociologists have known for decades, with regard to learning about a life, such information is just as valuable as listening to the intimations of a person’s voice.

“Metadata,” however, matters greatly to the NSA as a term because it is the means by which it has legally and politically justified its activities, both in secret before the Foreign Intelligence Surveillance Court and publicly through proxies in Congress, the Department of Defense, and indeed the White House. We are not listening in on people’s conversations, the NSA has said, but merely gathering broad swaths of data about data, “metadata.”

But we should stop and ask whether it is correct to call all this data “metadata” at all? If for the most valuable and revealing information about its customers is not the specific words a person reads in the books she buys, or the particular images she views in the television shows she watches through Amazon Prime, but rather her more general patterns of reading and viewing, then “metadata” has the status and significance of “primary” data. That is, metadata is the moral equivalent of personal information, personal information that, historically speaking, privacy laws have sought to protect.

There has been ample talk about revising privacy laws to correct the apparent overreach of the NSA and other intelligence agencies. But perhaps we need first to consider whether the vocabulary of the information age is adequate to the texture of our social and political lives within it.

Ned O'Gorman is an Associate Professor in the Department of Communication at the University of Illinois, Urbana-Champaign, working at the intersections of the history of rhetoric, rhetorical theory, and political thought, with special interest in the crises and tensions of modernity, especially in the Cold War and in early-modernity. He is the author of Spirits of the Cold War: Contesting Worldviews in the Classical Age of American Security Strategy (2011, Michigan State University Press).