A newly released research paper outlines a tool artists can use to make their work act like “poison” if it is ingested by an AI image generator
Hello again, dear readers. My name is Jon Keegan, and I’m an investigative data journalist here at The Markup. You may have read my reporting on how to read privacy policies, the companies that hoover up your personal data, and how you are packaged up as data for the online ad targeting industry.
Before my career pivoted to writing the words in news stories, I used to draw the illustrations that ran alongside them. As someone who comes from a background in visuals, I’ve been fascinated by the rise of generative AI text-to-image tools like Stable Diffusion, DALL-E, and Midjourney.
When I learned about how these tools were trained by ingesting literally billions of images from the web, I was surprised to see that some of my own images were part of the training set, included without any compensation to or approval by yours truly.
I’m far from alone. By and large, many artists are not happy about how their work (and their signature styles) have been turned into prompts that deprive them of control and compensation of their artwork. But now, a team of computer science researchers at the University of Chicago wants to level the playing field and arm artists with the tools they need to fight back against unauthorized use of their work in training new AI models.
The paper describes a new tool called “Nightshade” which can be used against these powerful image generators. Named after the deadly herb, Nightshade allows anyone to invisibly alter the pixels of an image to “poison” the image . Along with mislabeled metadata, the “poisoning attack” can help generate incorrect results in image generators—such as making the prompt “photo of a dog” generate a photo of a cat.
I spoke with Shawn Shan, a graduate researcher and lead student author of the paper, titled “Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models.” The paper was covered extensively in the media when it dropped on Arxiv.org late last month. But I wanted to learn more about what Nightshade means for the fight over artists’ rights online, and the potential that the tool could touch off an arms race between creators and the developers of AI image generators, whose voracious appetite for data doesn’t figure to be sated any time soon.
The interview was edited for clarity and brevity.
Jon Keegan: Can you tell me a little bit about the work that your team has been doing and what led you to build Nightshade?
Shawn Shan: We think at this time, there is really kind of a huge power asymmetry between artists or individual creators and big companies, right?
A big company just takes your data and there’s nothing artists can really do. OK. So, how can we help? If you take my data, that’s fine. I can’t stop that, but I’ll inject a certain type of malicious or crafted data, so you will poison or it will damage your model if you take my data. And we designed it in such a way that it is very hard to separate what is bad data, what is good data from artists’ websites. So this can really give some incentives to both companies and artists just to work together on this thing, right? Rather than just a company taking everything from artists because they can.
Keegan: It sounds like all the attacks that you lay out require the attacker to leave poisoned data in the path of the model that’s collecting data. So it’s too late for images that have already been scraped and fed into models, right? And it only works if someone uses Nightshade, posts an image online, and the image gets scraped at some point in the future?
Shan: That’s correct.
Keegan: Can you describe what one singular piece of poisoned data might look like?
Shan: So we discussed two type of attacks – one is just very trivial, just like OK, all I need to do is to post a cat image, change the alt text to “a picture of a dog” and the model—if you have enough of this—it kind of makes sense the model will start associating “dog” with, you know cat images.
But that’s fairly easy to remove, right? It’s very clear to a human, but also to many machine systems that this is not correct. So we did some work where we tried to make a cat image that looks like both a cat to a human, but to the model, it will think this is actually a dog.
Keegan: Your paper describes how Nightshade could be used by artists as a defense against unauthorized use of their images. But it also proposes some fascinating examples of possible uses by companies. One example that you mentioned in the paper is how Nightshade could be used for advertising by manipulating a model to produce pictures of Tesla cars for instance, when somebody types in “luxury cars” as a prompt. And you also suggest an idea that a company like Disney might use this to defend their intellectual property by replacing Disney characters in prompts with generic replacement characters. Has your team considered where this is all headed?
Shan: Yeah, absolutely. There are probably many use cases. But I think perhaps similar to the DRM [digital rights management] case, you know, you can protect copyright, but there are also tons of misuses of safeguarding people’s content using copyright in the past.
My take on this space is that it’s kind of about the power asymmetry. Right now, artists really have very limited power and anything will just help tremendously, right? There may be some collateral damage or some side effects of a certain company doing things, but what we think is this is worth it, just to give artists a tool to fight back.
Another take on this is that some of those entertainment companies, perhaps not Disney, but a small or medium-size game company are also very concerned about AI taking their work. So these can probably also help in those cases as well.
Keegan: What countermeasures might AI companies deploy to thwart tools like Nightshade?
Shan: We looked at quite a few kinds of detector mechanisms. Even though we’re trying to make the images look the same, there perhaps are ways to tell the difference, and (the companies that develop image generators) of course have tons of people to do this.
So you know, it’s possible for them to filter them out, say, OK, these are malicious data, let’s not train on them. In some sense, we also win in those cases because they remove the data that we don’t want them to train on, right?
So that’s also kind of a benefit of that case. But I feel like there may be some ways (companies) can train their model to be robust against attacks like that, but it’s really unclear what they are doing these days, because they don’t really talk too much about it, to see whether this is actually a really big concern to them or, if they have ways to circumvent it.
But once we deploy, once we start exploring a little bit more, perhaps we’ll see how these companies feel about it.
Keegan: That leads me to my next question, which is, we’re seeing large companies like Adobe and Getty release AI tools that come with the reassurance that they have only been trained on licensed imagery. This week OpenAI (creator of ChatGPT and DALL-E 3) announced that it is offering to help pay for any copyright lawsuits that its business tier customers might be subject to resulting from the use of their products. Considering the legal uncertainty, and now the potential for adversarial sabotage with tools like Nightshade, have we seen the last of the large-scale scraping efforts to train AI models on the open web?
Shan: So I think definitely companies are a lot more careful about what they do and what their services do. It’s kind of like we don’t know where are they getting the data at this point? But I was just playing with open.ai yesterday. In their new model, they’re very careful. Like, you’re not going to be able to use any artist’s name to prompt it unless they were born before the 20th century or something, or it is not able to generate any face images of anyone [Open.AI says their latest model will not allow public figures in prompts]. So like there are things they definitely are concerned about it. And of course, it’s because of these lawsuits, because of these concerns.
So I wouldn’t be surprised if they stopped—perhaps temporarily—scraping these data sets because they just have way too much data probably. But I think longer term, they kind of have to adapt their model, right? Your model can just be stuck in 2023 and at some point you need to learn something new. So I would say they probably will still keep scraping these websites and perhaps a little bit more carefully. But we don’t know at this point.
Image: Gabriel Hongsdusit