Meet Nightshade—A Tool Empowering Artists to Fight Back Against AI

Imatge
Àmbits Temàtics
Àmbits de Treball

A newly rele­a­sed rese­arch paper outli­nes a tool artists can use to make their work act like “poison” if it is inges­ted by an AI image gene­ra­tor

Hello again, dear readers. My name is Jon Keegan, and I’m an inves­ti­ga­tive data jour­na­list here at The Markup. You may have read my repor­ting on how to read privacy poli­cies, the compa­nies that hoover up your perso­nal data, and how you are packa­ged up as data for the online ad targe­ting industry. 

Before my career pivo­ted to writing the words in news stories, I used to draw the illus­tra­ti­ons that ran along­side them. As some­one who comes from a back­ground in visu­als, I’ve been fasci­na­ted by the rise of gene­ra­tive AI text-to-image tools like Stable Diffu­sion, DALL-E, and Midjour­ney. 

When I lear­ned about how these tools were trai­ned by inges­ting lite­rally billi­ons of images from the web, I was surpri­sed to see that some of my own images were part of the trai­ning set, inclu­ded without any compen­sa­tion to or appro­val by yours truly. 

I’m far from alone. By and large, many artists are not happy about how their work (and their signa­ture styles) have been turned into prompts that deprive them of control and compen­sa­tion of their artwork. But now, a team of compu­ter science rese­ar­chers at the Univer­sity of Chicago wants to level the playing field and arm artists with the tools they need to fight back against unaut­ho­ri­zed use of their work in trai­ning new AI models. 

The paper descri­bes a new tool called “Nights­hade” which can be used against these power­ful image gene­ra­tors. Named after the deadly herb, Nights­hade allows anyone to invi­sibly alter the pixels of an image to “poison” the image . Along with misla­be­led meta­data, the “poiso­ning attack” can help gene­rate incor­rect results in image gene­ra­tors—­such as making the prompt “photo of a dog” gene­rate a photo of a cat. 

I spoke with Shawn Shan, a gradu­ate rese­ar­cher and lead student author of the paper, titled “Prompt-Speci­fic Poiso­ning Attacks on Text-to-Image Gene­ra­tive Models.” The paper was cove­red exten­si­vely in the media when it drop­ped on Arxiv.org late last month. But I wanted to learn more about what Nights­hade means for the fight over artists’ rights online, and the poten­tial that the tool could touch off an arms race between crea­tors and the deve­lo­pers of AI image gene­ra­tors, whose vora­ci­ous appe­tite for data doesn’t figure to be sated any time soon. 

The inter­view was edited for clarity and brevity. 

Jon Keegan: Can you tell me a little bit about the work that your team has been doing and what led you to build Nights­hade?

Shawn Shan: We think at this time, there is really kind of a huge power asym­metry between artists or indi­vi­dual crea­tors and big compa­nies, right? 

A big company just takes your data and there’s nothing artists can really do. OK. So, how can we help? If you take my data, that’s fine. I can’t stop that, but I’ll inject a certain type of mali­ci­ous or craf­ted data, so you will poison or it will damage your model if you take my data. And we desig­ned it in such a way that it is very hard to sepa­rate what is bad data, what is good data from artists’ websi­tes. So this can really give some incen­ti­ves to both compa­nies and artists just to work toget­her on this thing, right? Rather than just a company taking everyt­hing from artists because they can.

Keegan: It sounds like all the attacks that you lay out require the attac­ker to leave poiso­ned data in the path of the model that’s collec­ting data. So it’s too late for images that have alre­ady been scra­ped and fed into models, right? And it only works if some­one uses Nights­hade, posts an image online, and the image gets scra­ped at some point in the future? 

Shan: That’s correct. 

Keegan: Can you describe what one singu­lar piece of poiso­ned data might look like?

Shan: So we discus­sed two type of attacks – one is just very trivial, just like OK, all I need to do is to post a cat image, change the alt text to “a picture of a dog” and the model—if you have enough of this—it kind of makes sense the model will start asso­ci­a­ting “dog” with, you know cat images. 

But that’s fairly easy to remove, right? It’s very clear to a human, but also to many machine systems that this is not correct. So we did some work where we tried to make a cat image that looks like both a cat to a human, but to the model, it will think this is actu­ally a dog. 

Keegan: Your paper descri­bes how Nights­hade could be used by artists as a defense against unaut­ho­ri­zed use of their images. But it also propo­ses some fasci­na­ting exam­ples of possi­ble uses by compa­nies. One exam­ple that you menti­o­ned in the paper is how Nights­hade could be used for adver­ti­sing by mani­pu­la­ting a model to produce pictu­res of Tesla cars for instance, when some­body types in “luxury cars” as a prompt. And you also suggest an idea that a company like Disney might use this to defend their inte­llec­tual property by repla­cing Disney charac­ters in prompts with gene­ric repla­ce­ment charac­ters. Has your team consi­de­red where this is all headed? 

Shan: Yeah, abso­lu­tely. There are probably many use cases. But I think perhaps simi­lar to the DRM [digi­tal rights manage­ment] case, you know, you can protect copy­right, but there are also tons of misu­ses of safe­guar­ding people’s content using copy­right in the past. 

My take on this space is that it’s kind of about the power asym­metry. Right now, artists really have very limi­ted power and anyt­hing will just help tremen­dously, right? There may be some colla­te­ral damage or some side effects of a certain company doing things, but what we think is this is worth it, just to give artists a tool to fight back. 

Anot­her take on this is that some of those enter­tain­ment compa­nies, perhaps not Disney, but a small or medium-size game company are also very concer­ned about AI taking their work. So these can probably also help in those cases as well.

Keegan: What coun­ter­me­a­su­res might AI compa­nies deploy to thwart tools like Nights­hade?

Shan: We looked at quite a few kinds of detec­tor mecha­nisms. Even though we’re trying to make the images look the same, there perhaps are ways to tell the diffe­rence, and (the compa­nies that deve­lop image gene­ra­tors) of course have tons of people to do this. 

So you know, it’s possi­ble for them to filter them out, say, OK, these are mali­ci­ous data, let’s not train on them. In some sense, we also win in those cases because they remove the data that we don’t want them to train on, right? 

So that’s also kind of a bene­fit of that case. But I feel like there may be some ways (compa­nies) can train their model to be robust against attacks like that, but it’s really unclear what they are doing these days, because they don’t really talk too much about it, to see whet­her this is actu­ally a really big concern to them or, if they have ways to circum­vent it. 

But once we deploy, once we start explo­ring a little bit more, perhaps we’ll see how these compa­nies feel about it.

Keegan: That leads me to my next ques­tion, which is, we’re seeing large compa­nies like Adobe and Getty rele­ase AI tools that come with the reas­su­rance that they have only been trai­ned on licen­sed imagery. This week OpenAI (crea­tor of ChatGPT and DALL-E 3) announ­ced that it is offe­ring to help pay for any copy­right lawsuits that its busi­ness tier custo­mers might be subject to resul­ting from the use of their products. Consi­de­ring the legal uncer­tainty, and now the poten­tial for adver­sa­rial sabo­tage with tools like Nights­hade, have we seen the last of the large-scale scra­ping efforts to train AI models on the open web?

Shan: So I think defi­ni­tely compa­nies are a lot more care­ful about what they do and what their servi­ces do. It’s kind of like we don’t know where are they getting the data at this point? But I was just playing with open.ai yester­day. In their new model, they’re very care­ful. Like, you’re not going to be able to use any artist’s name to prompt it unless they were born before the 20th century or somet­hing, or it is not able to gene­rate any face images of anyone [Open.AI says their latest model will not allow public figu­res in prompts]. So like there are things they defi­ni­tely are concer­ned about it. And of course, it’s because of these lawsuits, because of these concerns. 

So I wouldn’t be surpri­sed if they stop­ped—­per­haps tempo­ra­rily—s­cra­ping these data sets because they just have way too much data probably. But I think longer term, they kind of have to adapt their model, right? Your model can just be stuck in 2023 and at some point you need to learn somet­hing new. So I would say they probably will still keep scra­ping these websi­tes and perhaps a little bit more care­fully. But we don’t know at this point.

Image: Gabriel Hongs­du­sit