Now that everything can be faked, how will we know what’s real?
In 2011, Hany Farid, a photo-foren-sics expert, received an e-mail from a bereaved father. Three years earlier, the man’s son had found himself on the side of the road with a car that wouldn’t start. When some strangers offered him a lift, he accepted. A few minutes later, for unknown reasons, they shot him. A surveillance camera had captured him as he walked toward their car, but the video was of such low quality that key details, such as faces, were impossible to make out. The other car’s license plate was visible only as an indecipherable jumble of pixels. The father could see the evidence that pointed to his son’s killers—just not clearly enough.
Farid had pioneered the forensic analysis of digital photographs in the late nineteen-nineties, and gained a reputation as a miracle worker. As an expert witness in countless civil and criminal trials, he explained why a disputed digital image or video had to be real or fake. Now, in his lab at Dartmouth, where he was a professor of computer science, he played the father’s video over and over, wondering if there was anything he could do. On television, detectives often “enhance” photographs, sharpening the pixelated face of a suspect into a detailed portrait. In real life, this is impossible. As the video had flowed through the surveillance camera’s “imaging pipe-line”—the lens, the sensor, the compression algorithms—its data had been “downsampled,” and, in the end, very little information remained. Farid told the father that the degradation of the image couldn’t be reversed, and the case languished, unsolved.
A few months later, though, Farid had a thought. What if he could use the same surveillance camera to photograph many, many license plates? In that case, patterns might emerge—correspondences between the jumbled pixels and the plates from which they derived. The correspondences would be incredibly subtle: the particular blur of any degraded image would depend not just on the plate numbers but also on the light conditions, the design of the plate, and many other variables. Still, if he had access to enough images—hundreds of thousands, perhaps millions— patterns might emerge.
Such an undertaking seemed impractical, and for a while it was. But a new field, “image synthesis,” was coming into focus, in which computer graphics and A.I. were combined. Progress was accelerating. Researchers were discovering new ways to use neural networks—software systems based, loosely, on the architecture of the brain—to analyze and create images and videos. In the emerging world of “synthetic media,” the work of digital-image creation—once the domain of highly skilled programmers and Hollywood special-effects artists—could be automated by expert systems capable of producing realism on a vast scale.
In a media environment saturated with fake news, such technology has disturbing implications. Last fall, an anonymous Redditor with the username Deepfakes released a software tool kit that allows anyone to make synthetic videos in which a neural network substitutes one person’s face for another’s, while keeping their expressions consistent. Along with the kit, the user posted pornographic videos, now known as “deepfakes,” that appear to feature various Hollywood actresses. (The software is complex but comprehensible: “Let’s say for example we’re perving on some innocent girl named Jessica,” one tutorial reads. “The folders you create would be: ‘jessica; jessica_faces; porn; porn_faces; model; output.’”) Around the same time, “Synthesizing Obama,” a paper published by a research group at the University of Washington, showed that a neural network could create believable videos in which the former President appeared to be saying words that were really spoken by someone else. In a video voiced by Jordan Peele, Obama seems to say that “President Trump is a total and complete dipshit,” and warns that “how we move forward in the age of information” will determine “whether we become some kind of fucked-up dystopia.”
Not all synthetic media is dystopian. Recent top-grossing movies (“Black Panther,” “Jurassic World”) are saturated with synthesized images that, not long ago, would have been dramatically harder to produce; audiences were delighted by “Star Wars: The Last Jedi” and “Blade Runner 2049,” which featured synthetic versions of Carrie Fisher and Sean Young, respectively. Today’s smartphones digitally manipulate even ordinary snapshots, often using neural networks: the iPhone’s “portrait mode” simulates what a photograph would have looked like if it been taken by a more expensive camera. Meanwhile, for researchers in computer vision, A.I., robotics, and other fields, image synthesis makes whole new avenues of investigation accessible.
Farid started by sending his graduate students out on the Dartmouth campus to photograph a few hundred license plates. Then, based on those photographs, he and his team built a “generative model” capable of synthesizing more. In the course of a few weeks, they produced tens of millions of realistic license-plate images, each one unique. Then, by feeding their synthetic license plates through a simulated surveillance camera, they rendered them indecipherable. The aim was to create a Rosetta Stone, connecting pixels to plate numbers.
Next, they began “training” a neural network to interpret those degraded images. Modern neural networks are multilayered, and each layer juggles millions of variables; tracking the flow of information through such a system is like following drops of water through a waterfall. Researchers, unsure of how their creations work, must train them by trial and error. It took Farid’s team several attempts to perfect theirs. Eventually, though, they presented it with a still from the video. “The license plate was like ten pixels of noise,” Farid said. “But there was still a signal there.” Their network was “pretty confident about the last three characters.”
This summer, Farid e-mailed those characters to the detective working the case. Investigators had narrowed their search to a subset of blue Chevy Impalas; the network pinpointed which one. Someone connected to the car turned out to have been involved in another crime. A case that had lain dormant for nearly a decade is now moving again. Farid and his team, meanwhile, published their results in a computer-vision journal. In their paper, they noted that their system was a free upgrade for millions of low-quality surveillance cameras already in use. It was a paradoxical outcome typical of the world of image synthesis, in which unreal images, if they are realistic enough, can lead to the truth.
Farid is in the process of moving from Dartmouth to the University of California, Berkeley, where his wife, the psychologist Emily Cooper, studies human vision and virtual reality. Their modernist house, perched in the hills above the Berkeley campus, is enclosed almost entirely in glass; on a clear day this fall, I could see through the living room to the Golden Gate Bridge. At fifty-two, Farid is gray-haired, energized, and fit. He invited me to join him on the deck. “People have been doing synthesis for a long time, with different tools,” he said. He rattled off various milestones in the history of image manipulation: the transposition, in a famous photograph from the eighteen-sixties, of Abraham Lincoln’s head onto the body of the slavery advocate John C. Calhoun; the mass alteration of photographs in Stalin’s Russia, designed to purge his enemies from the history books; the convenient realignment of the pyramids on the cover of National Geographic, in 1982; the composite photograph of John Kerry and Jane Fonda standing together at an anti Vietnam demonstration, which incensed many voters after the Times credulously reprinted it, in 2004, above a story about Kerry’s antiwar activities.
“In the past, anybody could buy Photoshop. But to really use it well you had to be highly skilled,” Farid said. “Now the technology is democratizing.” It used to be safe to assume that ordinary people were incapable of complex image manipulations. Farid recalled a case—a bitter divorce—in which a wife had presented the court with a video of her husband at a café table, his hand reaching out to caress another woman’s. The husband insisted it was fake. “I noticed that there was a reflection of his hand in the surface of the table,” Farid said, “and getting the geometry exactly right would’ve been really hard.” Now convincing synthetic images and videos were becoming easier to make.
Farid speaks with a technologist’s enthusiasm and a lawyer’s wariness. “Why did Stalin airbrush those people out of those photographs?” he asked. “Why go to the trouble? It’s because there is something very, very powerful about the visual image. If you change the image, you change history. We’re incredibly visual beings. We rely on vision—and, historically, it’s been very reliable. And so photos and videos still have this incredible resonance.” He paused, tilting back into the sun and raising his hands. “How much longer will that be true?”
One of the world’s best image-syn-thesis labs is a seven-minute drive from Farid’s house, on the north side of the Berkeley campus. The lab is run by a forty-three-year-old computer scientist named Alexei A. Efros. Efros was born in St. Petersburg; he moved to the United States in 1989, when his father, a winner of the U.S.S.R.’s top prize for theoretical physics, got a job at the University of California, Riverside. Tall, blond, and sweetly genial, he retains a Russian accent and sense of humor. “I got here when I was fourteen, but, really, one year in the Soviet Union counts as two,” he told me. “I listened to classical music—everything!”
As a teenager, Efros learned to program on a Soviet PC, the Elektronika BK-0010. The system stored its programs on audiocassettes and, every three hours, overheated and reset; since Efros didn’t have a tape deck, he learned to code fast. He grew interested in artificial intelligence, and eventually gravitated toward computer vision—a field that allowed him to watch machines think.
In 1998, when Efros arrived at Berkeley for graduate school, he began exploring a problem called “texture synthesis.” “Let’s say you have a small patch of visual texture and you want to have more of it,” he said, as we sat in his windowless office. Perhaps you want a dungeon in a video game to be made of moss-covered stone. Because the human visual system is attuned to repetition, simply “tiling” the walls with a single image of stone won’t work. Efros developed a method for intelligently sampling bits of an image and probabilistically recombining them so that a texture could be indefinitely and organically extended. A few years later, a version of the technique became a tool in Adobe Photoshop called “content-aware fill”: you can delete someone from a pile of leaves, and new leaves will seamlessly fill in the gap.
From the front row of CS 194-26— Image Manipulation and Computational Photography—I watched as Efros, dressed in a blue shirt, washed jeans, and black boots, explained to about a hundred undergraduates how the concept of “texture” could be applied to media other than still images. Efros started his story in 1948, with the mathematician Claude Shannon, who invented information theory. Shannon had envisioned taking all the books in the English language and analyzing them in order to discover which words tended to follow which other words. He thought that probability tables based on this analysis might enable the construction of realistic English sentences.
“Let’s say that we have the words ‘we’ and ‘need,’ ” Efros said, as the words appeared on a large screen behind him. “What’s the likely next word?”
The students murmured until Efros advanced to the next slide, revealing the word “to.”
“Now let’s say that we move our contextual window,” he continued. “We just have ‘need’ and ‘to.’ What’s next?”
“Sleep!” one student said.
“Eat!” another said.
“Eat” appeared onscreen.
“If our data set were a book about the French Revolution, the next word might be ‘cake,’ ” Efros said, chuckling. “Now, what is this? You guys use it all the time.”
“Autocomplete!” a young man said.
You can read up to 3 premium stories before you subscribe to Magzter GOLD
Log in, if you are already a subscriber
Get unlimited access to thousands of curated premium stories, newspapers and 5,000+ magazines
READ THE ENTIRE ISSUE
November 12, 2018