The adtech industry tracks most of what you do on the Internet. This file shows just how much.

Imatge
Àmbits Temàtics
Àmbits de Treball

Survei­llance adver­ti­sing in Europe

The adver­ti­sing industry has more than 650,000 labels to target people. Reading through them reve­als how even the most sensi­tive aspects of our life are moni­to­red. EU-based data brokers play a vital role in this system.

Everyt­hing we do on the Inter­net is being recor­ded and analy­zed in order to achi­eve one goal: to show us targe­ted adver­ti­sing. This is a reality to which many people have become accus­to­med in exchange for free servi­ces. Howe­ver, very few people unders­tand exac­tly where our data ends up when we visit websi­tes, use apps or make digi­tal payments. Targe­ted adver­ti­sing moves in myste­ri­ous ways. That’s anot­her fact we’ve become accus­to­med to.

An inves­ti­ga­tion by netz­po­li­tik.org is set to change this funda­men­tal imba­lance between the adtech industry and inter­net users. In June, we publis­hed a series of arti­cles shining a light on the collec­tion, trade and use of perso­nal data in the global adtech industry. We analy­zed an inven­tory file from a US-based data market­place called Xandr. The file contains more than 650,000 so-called audi­ence segments. These are used by adver­ti­sing compa­nies to cate­go­rize and target billi­ons of people.

The scope and detail of this data collec­tion is stag­ge­ring. There is hardly a human charac­te­ris­tic that adver­ti­sers do not want to exploit for their purpo­ses. Want to reach people in Denmark who have bought a Toyota? No problem. Itali­ans with finan­cial problems? No problem. Minors in Austria? Hard­core Chris­ti­ans in Portu­gal? Preg­nant women in Poland? Fragile seni­ors in France? Queers in Spain? No problem.

”Unac­cep­tably intru­sive”

“It’s the largest piece of evidence I’ve ever seen about what I call today’s distri­bu­ted survei­llance economy”, says Wolfie Christl, a privacy rese­ar­cher at Crac­ked Labs. He disco­ve­red the file and shared it with netz­po­li­tik.org and US-based non-profit news website The Markup. “The repor­ting confirms that the global survei­llance ads industry is unac­cep­tably intru­sive and poses a threat to demo­cracy”, comments Jan Penfrat from Euro­pean Digi­tal Rights orga­ni­za­tion EDRi.

The file, which dates back to May 2021, shows meta­data for a total of 651,463 audi­ence segments. It inclu­des a name and number for each segment, as well as the company that provi­ded it to Xandr and an ID for that data provi­der. It looks like this (extract):

  • Lotame | 422 | 4073004 | Inter­na­ti­o­nal_EU – France Alco­ho­lic Beve­ra­ges
  • Lotame | 422 | 4073353 | Inter­na­ti­o­nal_EU – France Auto­mo­bile Brands – Land Rover
  • Lotame | 422 | 4073669 | Inter­na­ti­o­nal_EU – France Brow­ser Language – Arabic
  • Lotame | 422 | 4073677 |Inter­na­ti­o­nal_EU – France Credit Level – Poor
  • Lotame | 422 | 4073768 | Inter­na­ti­o­nal_EU – France Dads
  • Lotame | 422 | 4072781 | Inter­na­ti­o­nal_EU – France Forum Readers
  • Lotame | 422 | 4072930 | Inter­na­ti­o­nal_EU – France Mili­tary
  • Lotame | 422 | 4073729 |Inter­na­ti­o­nal_EU – France Rela­ti­ons­hip Status – Divor­ced

Audi­ence segments work like giant contain­ers for groups of people who are likely to share a common charac­te­ris­tic. That might be demo­grap­hics, inter­ests, consu­mer beha­vior and perso­na­lity traits. Addi­ti­o­nal infor­ma­tion can also end up in segments, about what apps and websi­tes we use, where we go, what we beli­eve, what illnes­ses we have. Adtech compa­nies collect and trade these segments like commo­di­ties, meaning that people’s data often passes through the hands of dozens or hundreds of compa­nies.

40,000 segments on EU coun­tries

Targe­ted adver­ti­sing is an industry worth more than 550 billion US dollars. Xandr is one of the most impor­tant infras­truc­tu­res in this ecosys­tem for those who don’t want to depend on the walled adver­ti­sing gardens of Google, Meta or Amazon. In 2022, Micro­soft acqui­red Xandr from U.S. tele­com­mu­ni­ca­ti­ons provi­der AT&T. Neit­her Xandr nor Micro­soft respon­ded to multi­ple press inqui­ries from netz­po­li­tik.org and The Markup.

The file in ques­tion, contai­ning over 650,000 segments, provi­des a rare insight into the global adver­ti­sing survei­llance economy, not only in the US but also in Europe. It was hidden on a docu­men­ta­tion page for adver­ti­sing clients, but was acces­si­ble for anyone via the open web. It was taken down shortly after our initial email to Micro­soft and Xandr. An archi­ved version of the website und and the file [23 MB] can still be found at the Inter­net Archive.

Our analy­sis is far from complete. We encou­rage everyone to make their own assess­ment of the file. Rese­ar­chers or jour­na­lists wishing to discuss ideas or findings can contact Ingo Dach­witz at ingo [ punto ] dach­witzatnetz­po­li­tik [ punto ] org (ingo[dot]dach­witz[at]netz­po­li­tik[dot]org).

While the vast majo­rity of segments do not refer to a speci­fic country, tens of thou­sands do have such a refe­rence. Some have a country code such as „ES“ for Spain in their name. The segments in the file cover most regi­ons of the world, showing that adtech survei­llance is global.

The largest group of segments with expli­cit refe­rence to coun­tries is the Euro­pean Union. Accor­ding to our analy­sis, about 40,000 segments mention EU coun­tries or nati­o­na­li­ties, while the file contains about 30,000 segments that expli­citly mention the United States. Thou­sands of segments refer to Austra­lia and South America, some even mention China or Afri­can coun­tries like Nige­ria.

Howe­ver, as Xandr is a US-based data market­place, it can be assu­med that the majo­rity of segments without an expli­cit country refe­rence is targe­ted in the United States.

Data on almost every Euro­pean citi­zen

Our analy­sis of the file shows that within the Euro­pean Union, the coun­tries menti­o­ned most often are France and Spain, each with around 9000 segments. They are follo­wed by segments menti­o­ning Germany (about 6000), Portu­gal (about 4500), Italy (about 3500), the Nether­lands (about 3000), Sweden (about 1500) and Denmark (about 1000).

The file does not contain infor­ma­tion on how many diffe­rent entries each segment contains. Howe­ver, it is known that there can be hundreds of thou­sands or even milli­ons of diffe­rent IDs in one segment. Oracle alone, the largest provi­der in the Xandr list with more than 200,000 segments, claims to have data on more than five billion people. It is there­fore reaso­na­ble to assume that the adtech industry holds data on most of the citi­zens of the Euro­pean Union.

It is impor­tant to note that segments are typi­cally not people’s names, but indi­vi­dual IDs linked to people’s devi­ces. Those could be mobile ad IDs, IP addres­ses, brow­ser finger­prints or cookie IDs. Adtech compa­nies stress that this means they are working with pseu­dony­mi­zed data, some­ti­mes even falsely clai­ming that their data is „anony­mi­zed“. Never­the­less, the IDs allow ad compa­nies to recog­nize the devi­ces asso­ci­a­ted with people with certain charac­te­ris­tics anyw­here in the online adver­ti­sing ecosys­tem.

Adver­ti­sers can use the segments through so-called demand-side plat­forms to target the audi­en­ces they want, but they don’t usually get access to this raw data. That’s why many compa­nies in the industry reject the term „data broker“ to describe their busi­ness. They prefer to call them­sel­ves tech­no­logy plat­forms, adver­ti­sing infras­truc­ture service provi­ders or loca­tion inte­lli­gence plat­forms. But these compa­nies take data from dispa­rate sour­ces, reor­ga­nize and repac­kage that data, help track and reach people across diffe­rent devi­ces, and offer it to other compa­nies for use in exchange for money or other econo­mic bene­fit. That’s why we chose to call these compa­nies data brokers.

Time and again, it also comes to light that some of these compa­nies do sell raw data after all, for exam­ple to the FBI or to the United States Immi­gra­tion and Customs Enfor­ce­ment.

Merce­des, mothers and mili­tary

It is diffi­cult to unders­tand the file in its enti­rety. This is not only due to the size of the segment collec­tion, but also because the cate­gory names are struc­tu­red very diffe­rently from one data vendor to anot­her. The list also contains some segments crea­ted speci­fi­cally for indi­vi­dual adver­ti­sers. Over 50,000 segments are labe­lled „cus­tom“. Accor­ding to the Xandr docu­men­ta­tion, these are segments that cannot be used by all adver­ti­sers. Instead, the provi­der only unlocks them for speci­fic clients.

Despite this comple­xity, The Markup has perfor­med a data analy­sis that shows at least a rough frequency distri­bu­tion of some higher level cate­go­ries. Accor­ding to this, segments rela­ted to the auto­mo­tive sector are the largest group. Adver­ti­sers can use Xandr’s data to target, for exam­ple, fans or owners of a parti­cu­lar make of car, or people whose house­hold has more than two cars and who drive more than 32,000 kilo­me­ters a year. More than 1,000 segments can be found for the keyword „Mer­ce­des“ alone.

The second largest group is demo­grap­hics. Adver­ti­sers can select not only by gender or age, but also, for exam­ple, parents of teena­gers, single mothers with small chil­dren or people who are about to get divor­ced. Lifestyle infor­ma­tion is often inclu­ded, such as „con­ser­va­tive reti­rees“, „urban elites“ or even „mul­ti­cul­tu­ral fami­lies“. Mothers seem to be a parti­cu­larly inter­es­ting group; there are segments for „soc­cer moms“, „big city moms“, „busy moms“ or even „moms who shop like crazy“.

Accor­ding to the rough analy­sis, the third largest group of segments is based on infor­ma­tion about people’s profes­sion or industry. Segments then have names like „beauty centre owner“, „lawyer“ or „poli­ti­cian“. This cate­gory can also refer to employees of speci­fic compa­nies, such as „Aldi compe­ti­tor“ or „Volvo SUV compe­ti­tor“. Members of the mili­tary and police, jour­na­lists, lawma­kers and poli­ti­ci­ans can also be targe­ted.

Cancer, Depres­sion and Eating Disor­ders

Hundreds of segment labels point to highly sensi­tive data such as health infor­ma­tion. Adver­ti­sers can choose from cate­go­ries such as breast cancer, blad­der cancer and depres­sion. Many segments also refer to repro­duc­tive health, period trac­king, meno­pause or heavy buyers of preg­nancy test kits. Some segment names even refer to visi­tors to indi­vi­dual clinics. Here are some exam­ples from US supplier Live­ramp:

  • Live­Ramp Data Store | 8082 | 16237485 | Health­Ran­kings > BPD
  • Live­Ramp Data Store | 8082 | 16237395 | Health­Ran­kings > BPH
  • Live­Ramp Data Store | 8082 | 16237478 | Health­Ran­kings > Breast Cancer
  • Live­Ramp Data Store | 8082 | 24900788 | Health­Ran­kings > Breast Cancer Care­gi­vers
  • Live­Ramp Data Store | 8082 | 16237416 | Health­Ran­kings > Choles­te­rol
  • Live­Ramp Data Store | 8082 | 16237450 | Health­Ran­kings > Cough/Cold
  • Live­Ramp Data Store | 8082 | 16237432 | Health­Ran­kings > Diabe­tes
  • Live­Ramp Data Store | 8082 | 16237508 | Health­Ran­kings > Diabe­tes Type II
  • Live­Ramp Data Store | 8082 | 16237498 | Health­Ran­kings > Eating Disor­der

In addi­tion to health-rela­ted segments, there are many segments that refer to reli­gion, such as „Mus­lim“ or „Jewish“, as well as those that refer to people’s sexual orien­ta­tion or their origin and ethni­city. The list also inclu­des poli­ti­cal issues: Who is for and who is against Donald Trump? Who is for or against Black Lives Matter and who is against abor­tion rights?

Accor­ding to trac­king expert Wolfie Christl targe­ting not only influ­en­ces how we perceive the world and oursel­ves. It is also used to exploit people’s vulne­ra­bi­li­ties, as he demons­tra­ted in a study on the targe­ting of gambling addicts in 2022. In the Xandr file we found many segments refer­ring to gambling, also segments targe­ting people who are „always getting a raw deal out of life“, are consi­de­red to be „fra­gile seni­ors“, are labe­lled „opi­ate addic­tion“ or who want to consume less tobacco, fast food or alco­hol.

LGBT in Spain, multi­cul­tu­ral fami­lies in Sweden

Some of the sensi­tive segments have a clear link to the US. Even people in the vici­nity of mili­tary bases or visi­tors to certain elec­tion campaign events appear to be targe­ted there. Howe­ver, many criti­cal segments do not have a clear country of origin. We asked Xandr and Micro­soft if they could guaran­tee that the IDs of EU citi­zens would not be inclu­ded. We did not receive an answer.

There are some sensi­tive segments with a clear refe­rence to EU coun­tries in their names. These include segments with infor­ma­tion about casino visits, sports betting habits or even gambling addic­tion. Also, many segments rela­ted to low income, poverty, preg­nancy, or inter­est in loss-making or specu­la­tive finan­cial products have an expli­cit refe­rence to EU coun­tries. We found seve­ral EU segments refer­ring to minors under the age of 16.

For Germany, we found seve­ral segments refer­ring to health issues such as sleep disor­ders. We also found a segment refer­ring to strong beli­e­vers in Chris­ti­a­nity in Portu­gal, „mul­ti­cul­tu­ral fami­lies“ in Sweden or „LGBT“ in Spain, short for lesbian, gay, bise­xual and trans­gen­der people. Some of the compa­nies we confron­ted with the findings answe­red that certain segments were no longer offe­red.

93 data provi­ders with hundreds of sour­ces

There’s anot­her aspect tying the data to the Euro­pean Union: Accor­ding to the file Euro­pean compa­nies are part of the network of data brokers that buy, refine and distri­bute the segment data.

In total, 93 compa­nies are listed as data provi­ders, meaning that they have appa­rently offe­red to use their audi­ence data for targe­ted adver­ti­sing via Xandr. The names of the segments often include infor­ma­tion about where these 93 data brokers obtai­ned their data. Sour­ces range from advice websi­tes, weat­her apps and credit card compa­nies such as Master­card to other data brokers and market rese­arch compa­nies, amas­sing to hundreds of data sour­ces.

Most of the 93 data provi­ders offe­ring their data on Xandr are based in the US, but we were also able to iden­tify seve­ral Euro­pean compa­nies. The largest of these is the previ­ously Dutch and now London-based market rese­arch giant Niel­sen. Its adtech divi­sion, Niel­sen Marke­ting Cloud, is listed with more than 65,000 segments in the file, making it the third largest data provi­der after US compa­nies Oracle and Live­ramp.

Data brokers from Germany, France, Italy, Spain, Denmark and the Nether­lands

Adsquare is anot­her large data provi­der based in Berlin. It is listed with more than 15,000 segments in the file. Six other German data brokers are listed in the file: DataX­Trade, Emetriq, a company owned by the German tele­com giant Deuts­che Tele­kom, Roq.ad, Sema­sio, Zeop­tap and The ADEX, which is owned by the media company ProSi­e­ben­Sat1. Toget­her they offer more than 5,000 segments not only on German citi­zens, but also on people in other Euro­pean coun­tries and the US.

There are at least four major data brokers from France in the file, toget­her offe­ring more than 4,500 segments. There is also a data provi­der called Orange Private Data Market­place with 2215 segments, which seems to be linked to the French tele­com giant Orange.

Our analy­sis is far from complete. We encou­rage everyone to make their own assess­ment of the file which can be found at the Inter­net Archive [23 MB]. Rese­ar­chers or jour­na­lists wishing to discuss ideas or findings can contact Ingo Dach­witz at ingo [ punto ] dach­witzatnetz­po­li­tik [ punto ] org (ingo[dot]dach­witz[at]netz­po­li­tik[dot]org.)

With GroupM NL and Green­house Group B.V. we also find two data brokers based in the Nether­lands in the list. Accor­ding to the file, toget­her they had more than 2100 segments on Xandr. The Italian company Audi­ens S.R.L. is listed with more than 1300 segments and the Spanish company DatMean with more than 600.

A Danish company, Digi­seg, is also listed in the file with about 400 segments. Audi­enzz is a Swiss data broker owned by the Neue Zürcher Zeitung news­pa­per and is listed with 29 segments.

The above list repre­sents the status quo in May 2021, the time to which the file dates back to. We cannot say anyt­hing to the current situ­a­tion.

How the adtech system works

We asked seve­ral civil soci­ety experts to comment on our findings.

„The inves­ti­ga­tion is anot­her strong signal confir­ming the proble­ma­tic nature of the current online adver­ti­sing ecosys­tem, “ says Dorota Glowacka of the Polish digi­tal rights NGO Fundacja Panopty­kon. „In our opinion, such a model can easily lead to the exploi­ta­tion of users‘ vulne­ra­bi­lity for adver­ti­sing purpo­ses. This may not only lead to exces­sive shop­ping, but also – as we alre­ady know – influ­ence our poli­ti­cal choi­ces or contri­bute to mental health problems.“ In 2021, Panopty­kon publis­hed a study showing how targe­ting based on health infor­ma­tion can fuel seri­ous anxi­ety disor­ders.

Jan Penfrat of Euro­pean Digi­tal Rights agrees: „The industry is sorting us all into data cate­go­ries to sell our atten­tion and scre­ens to the highest bidder. Worse, the survei­llance ad industry, inclu­ding EU-based compa­nies, provi­des a system that allows all kinds of actors to target and mani­pu­late people, and to discri­mi­nate against margi­na­li­zed people.“ Penfrat points out that survei­llance ads are suspec­ted of influ­en­cing Brexit, as well as nume­rous demo­cra­tic elec­ti­ons over the years. A detai­led report by EDRi explains the damage survei­llance ads can do to people every day.

“The perso­nal data are so inti­mate, and they are shared so widely, and with so little care, the harm is poten­ti­ally enor­mous, ” says Johnny Ryan of the Irish Coun­cil for Civil Liber­ties (ICCL). “By expo­sing everyone in Europe to conti­nu­ous profi­ling by virtu­ally any company, the industry is putting Euro­pe’s secu­rity, poli­ti­cal stabi­lity, and economy at risk.”

We asked Ryan, himself a former adtech execu­tive, what role Euro­pean compa­nies play in global ad survei­llance. His answer: „As far as I can see, EU compa­nies are fully inte­gra­ted into the industry.“ Howe­ver, Ryan adds that the industry’s irres­pon­si­ble rules have been set in the United States. „Euro­pe­ans are stan­dard takers rather than stan­dard makers. The result is that the industry has no respect for Euro­pean values.“ Ryan, who has sued major players in the industry, points to a lawcase the ICCL is currently pursuing in Hamburg against the industry orga­ni­za­tion IAB Tech­Lab.

For our German repor­ting, seve­ral data protec­tion experts told us that data collec­tion of this enor­mous scale and comple­xity can hardly comply with the Euro­pean Gene­ral Data Protec­tion Regu­la­tion (GDPR). Among others, the head of Berlin’s data protec­tion autho­rity, Meike Kamp, said that it is almost impos­si­ble for people to unders­tand the impli­ca­ti­ons of giving their consent to this kind of data proces­sing, making it unli­kely to meet the GDPR requi­re­ments of infor­med and freely given consent.

To improve the situ­a­tion, Dorota Glowacka, Jan Penfrat and Johnny Ryan agree that poli­ti­cal action is needed. In the words of Johnny Ryan: „First, enfor­ce­ment has failed at the nati­o­nal level. The enfor­ce­ment failure in Ireland is parti­cu­larly dange­rous, because Ireland is respon­si­ble for super­vi­sing Google, Meta, Micro­soft, and others. Second, the Euro­pean Commis­sion has not put pres­sure on Euro­pean Member States to correct this.“

Jan Penfrat from EDRi adds: „After years of data protec­tion enfor­ce­ment, we know pretty well that obtai­ning valid consent for survei­llance ads is near impos­si­ble. Rather than requi­ring civil soci­ety and data protec­tion agen­cies to sue and fine every single infrin­ging data broker, the next EU Commis­sion should propose a ban of survei­llance ads in Europe.“

With the coope­ra­tion of Johan­nes Gille.

AI trans­pa­rency notice: This arti­cle was in part trans­la­ted by DeepL and linguis­ti­cally enhan­ced by DeepL Write.

Image: Every click leaves a trail that hundreds of adtech compa­nies are happy to pick up. – Public Domain Midjour­ney