A guide to the EU’s new rules for researcher access to platform data

Thanks to the Digi­tal Servi­ces Act (DSA), public inter­est rese­ar­chers in the EU have a new legal frame­work to access and study inter­nal data held by major tech plat­forms. What does this frame­work look like, and how can it be put into prac­tice?

For years, major social media plat­forms like Insta­gram, TikTok, and YouTube have offe­red only limi­ted trans­pa­rency into how they design and manage their algo­rith­mic systems—sys­tems with unchec­ked power to influ­ence what we see and how we inter­act with one anot­her online. Plat­forms’ efforts to iden­tify and combat risks like disin­for­ma­tion have like­wise been prac­ti­cally impos­si­ble for third parties to scru­ti­nize without reli­a­ble access to the data they need.

The Digi­tal Servi­ces Act (DSA) promi­ses to change this status quo, in part through a new trans­pa­rency regime that will enable vetted public inter­est rese­ar­chers to access the data of very large online plat­forms with more than 45 million active users in the EU (Arti­cle 40). Now that the law has been publis­hed in the EU’s offi­cial jour­nal, we can visu­a­lize the blue­print for the DSA’s data access rules which are essen­tial to holding plat­forms accoun­ta­ble for the risks they pose to indi­vi­du­als and soci­ety.



In this visual guide, we’ll walk through the data access regime envi­si­o­ned by the DSA by follo­wing a hypot­he­ti­cal rese­arch team in their quest to access inter­nal data held by “Plat­form X”. Along the way, we try to point out impor­tant elements which still need to be fles­hed out before the promise of this new data access regime can be reali­zed in prac­tice.

 

How do rese­ar­chers get access to plat­form data

Appli­ca­tion process

First, let’s imagine a team of public inter­est rese­ar­chers. These rese­ar­chers are concer­ned about the influ­ence that, let’s say, the very large online “Plat­form X” may be having on poli­ti­cal pola­ri­za­tion. It has been sugges­ted that Plat­form X exacer­ba­tes poli­ti­cal pola­ri­za­tion — a process which poses risks to demo­cracy and funda­men­tal rights by pushing people toward more extreme poli­ti­cal views (indeed, there is a vast scho­lars­hip explo­ring the link between social media and poli­ti­cal dysfunc­tion). Our rese­ar­chers want to study whet­her the design of Plat­form X’s recom­men­der algo­rithm encou­ra­ges users to inter­act with more pola­ri­zing poli­ti­cal content, and to assess whet­her the plat­form has put adequate (and legal) safe­guards in place to stop pushing this kind of content to vulne­ra­ble users.

Now that our rese­ar­chers have a ques­tion which is tied to a parti­cu­lar syste­mic risk—in this case, risks to demo­cracy stem­ming from poli­ti­cal pola­ri­za­ti­on—t­hey need access to Plat­form X’s inter­nal data to prove or disprove their hypot­he­sis. Speci­fi­cally, they want to analyze the algo­rithms and trai­ning data that crea­ted the model for Plat­form X’s recom­men­der system.

Now we have a scena­rio — a rese­ar­cher (or rese­arch team) wants to study a parti­cu­lar syste­mic risk, and they need access to plat­form X’s data to unders­tand it. In order to gain such access, step one — accor­ding to Arti­cle 40 of the DSA — is for these rese­ar­chers to create a detai­led rese­arch appli­ca­tion laying out, among other things, why they need the data, their metho­do­logy, and their concept for protec­ting data secu­rity, and commit­ting to sharing their rese­arch results publicly and free of charge.

The rese­ar­chers also need to demons­trate that they are affi­li­a­ted with either a univer­sity or a non-acade­mic rese­arch insti­tute or civil soci­ety orga­ni­za­tion that conducts “scien­ti­fic rese­arch with the primary goal of suppor­ting their public inter­est mission.” This might extend to consor­tia of rese­ar­chers — inclu­ding rese­ar­chers not based in the EU and jour­na­lists — so long as the main appli­cant is a Euro­pean rese­ar­cher that fits the above crite­ria. So, let’s say that one of the main appli­cants in our rese­arch team is a post-docto­ral rese­ar­cher at a Euro­pean univer­sity in Spain.  

 

Vetting process

The next step is for the rese­ar­chers to file their appli­ca­tion with the respon­si­ble nati­o­nal regu­la­tor— the so-called Digi­tal Servi­ces Coor­di­na­tor (DSC) of Esta­blish­ment—w­hich will ulti­ma­tely approve or reject the rese­ar­chers’ appli­ca­tion follo­wing an exten­sive vetting process. This process is desig­ned to ensure that rese­ar­chers are legi­ti­ma­tely working in the public inter­est, meet the neces­sary crite­ria, and have adequate tech­ni­cal and orga­ni­za­ti­o­nal safe­guards in place to protect sensi­tive data once they gain access.

Given that most of the largest plat­forms and search engi­nes are based in Ireland, this means the Digi­tal Servi­ces Coor­di­na­tor of Esta­blish­ment tasked with vetting rese­ar­chers will likely be the Irish DSC in many cases. Let’s assume this is the case for Plat­form X.

Rese­ar­chers may also send their appli­ca­ti­ons to the nati­o­nal digi­tal servi­ces coor­di­na­tor in the country where they are based – in our ficti­o­nal case, this would be the Spanish DSC. This nati­o­nal DSC would then issue an opinion to the Irish DSC about whet­her to grant the data access request – but ulti­ma­tely, the Irish DSC will be the final deci­si­on­ma­ker in vetting most rese­arch propo­sals.

To be clear, these digi­tal servi­ces coor­di­na­tors don’t exist yet – EU member states will have to appo­int them and make sure they’re up and running by the time the DSA beco­mes appli­ca­ble across the EU in early 2024. Even in this nascent stage, the DSA’s draf­ters anti­ci­pa­ted that vetting data access requests may prove to be a complex task for some nati­o­nal regu­la­tors, espe­ci­ally those with limi­ted capa­ci­ties, resour­ces, or exper­tise in this domain – and the stakes are high, espe­ci­ally when it comes to vetting requests to access sensi­tive data­sets to study complex rese­arch ques­ti­ons. That’s why the DSA provi­des in Arti­cle 40(13) a poten­tial role for an “inde­pen­dent advi­sory mecha­nism” to help support the sharing of plat­form data with rese­ar­chers.  

 

Inde­pen­dent advi­sory mecha­nism

The role of this advi­sory mecha­nism—as well as the tech­ni­cal condi­ti­ons under which data may be safely shared and the purpo­ses for which they may be used—s­till needs to be clari­fied by the Commis­sion in forth­co­ming dele­ga­ted acts. But it’s worth noting that the major plat­forms alre­ady commit­ted to deve­lo­ping, funding, and coope­ra­ting a simi­lar advi­sory mecha­nism—an “inde­pen­dent, third-party inter­me­di­ary body”—pre­ci­sely for this purpose when they signed onto the EU’s revi­sed Code of Prac­tice on Disin­for­ma­tion in June of 2022. A recom­men­da­tion to esta­blish such an inter­me­di­ary is affir­med in the Euro­pean Digi­tal Media Obser­va­tory (EDMO)’s report on rese­ar­cher access to plat­form data, which details how plat­forms may share data with inde­pen­dent rese­ar­chers in a way that protects users’ rights in compli­ance with the Gene­ral Data Protec­tion Regu­la­tion (GDPR). Algo­rithm­Watch also publis­hed a simi­lar recom­men­da­tion in our Gover­ning Plat­forms project, using exam­ples from other indus­tries to show how rese­arch access could be opera­ti­o­na­li­zed in plat­form gover­nance (in our view, this inter­me­di­ary body could even play a more central role, e.g., by main­tai­ning access infras­truc­ture or audi­ting disclo­sing parties).



So far, it’s unclear whet­her plat­forms are taking their commit­ments in the new Code of Prac­tice on Disin­for­ma­tion seri­ously. We don’t yet know when to expect the inde­pen­dent inter­me­di­ary body they promi­sed in the Code, or whet­her this body will dove­tail with the inde­pen­dent advi­sory mecha­nism refe­ren­ced in the DSA. But its role in the propo­sed gover­nance struc­ture for plat­form-to-rese­ar­cher data access means it is poten­ti­ally an impor­tant puzzle piece in the overall data access regime – if it is equip­ped with a rele­vant and clear mandate, adequate resour­ces, suffi­ci­ent exper­tise, and genuine inde­pen­dence.

To recap: our rese­arch team has filed their appli­ca­tion with the Irish Digi­tal Servi­ces Coor­di­na­tor. If the Irish DSC deci­des that the rese­ar­chers and their appli­ca­tion fulfill the neces­sary crite­ria, then the DSC may award them the status of "vetted rese­ar­chers” for the purpose of carrying out the speci­fic rese­arch detai­led in their appli­ca­tion. If the rese­ar­chers filed their appli­ca­tion with the DSC of the country where they are based—in this case, the Spanish DSC—t­hen the deci­sion of the Irish DSC may be infor­med by the assess­ment of the Spanish DSC. The vetting process may also be aided by an inde­pen­dent, third-party inter­me­di­ary body.

Let’s say now that the DSC (again, in our case and probably most cases the Irish DSC) has finally appro­ved the rese­arch appli­ca­tion. Hurrah! Now the hard part is over, right?

 

Offi­cial request

Not neces­sa­rily. Once the rese­arch appli­ca­tion is appro­ved, the DSC will submit an offi­cial data access request to Plat­form X on behalf of the vetted rese­ar­chers. The plat­form will then have 15 days to respond to the DSC’s request. If the plat­form says yes, it can provide the data as reques­ted, then we keep moving right along. But, if the plat­form says that it can’t provide the data — either because it doesn’t have access to the reques­ted data, or because provi­ding access to the data carries a secu­rity risk, or could compro­mise “confi­den­tial infor­ma­tion” like trade secrets — then the plat­form can seek to amend the data access request. If the plat­form goes this route, it needs to propose alter­na­tive means of provi­ding the data or suggest other data that can satisfy the rese­arch purpose of the initial request. The DSC will then have anot­her 15 days to either confirm or decline the plat­form’s amend­ment request.

There were seri­ous concerns raised during the DSA nego­ti­a­ti­ons that this vague, so-called “trade secret exemp­tion” would make it too easy for plat­forms to routi­nely deny data access requests – somet­hing legis­la­tors tried to account for in the DSA’s final text, by clarifying that the law must be inter­pre­ted in such a way that plat­forms can’t just abuse the clause as an excuse to deny data access for vetted rese­ar­chers. Just like with the vetting process, the challenge of parsing these amend­ment requests suggests that an inde­pen­dent body could have anot­her key role to play here in advi­sing the DSCs.

 

Data sharing

The process is nearly complete — once Plat­form X has either complied with the initial data access request or had its amend­ment request appro­ved, we’ve finally reached the point where it must actu­ally provide our rese­ar­chers with access to the data speci­fied in the rese­arch appli­ca­tion. Now it’s up to our rese­ar­chers to put their skills to work and show the public whet­her Plat­form X’s recom­men­der algo­rithm is indeed pushing users to engage with more poli­ti­cally pola­ri­zing content, and to scru­ti­nize whet­her the plat­form is taking the appro­pri­ate measu­res needed to miti­gate the risk.

There are still many tech­ni­cal and proce­du­ral details that need to be worked out by the Euro­pean Commis­sion over the next year before the DSA’s data access regime can be fully imple­men­ted and enfor­ced. EU member states still need to desig­nate their digi­tal servi­ces coor­di­na­tors, and an inde­pen­dent, third-party inter­me­di­ary that helps to faci­li­tate data access requests has yet to mate­ri­a­lize.

In the mean­time, inde­pen­dent rese­ar­chers at Algo­rithm­Watch and elsew­here will conti­nue trying to unders­tand the algo­rith­mic deci­sion-making systems behind online plat­forms and how they influ­ence soci­ety, often using rese­arch methods that don’t rely on plat­forms’ inter­nal data, such as through data dona­tion projects and other adver­sa­rial audits. Adver­sa­rial audits have shown, for exam­ple, that Insta­gram dispro­por­ti­o­na­tely pushed far-right poli­ti­cal content via its recom­men­der system and that TikTok promo­ted wartime content that was suppo­sedly banned to users in Russia – this kind of rese­arch will remain an impor­tant comple­ment to regu­la­ted data access in the DSA to expose syste­mic risks, and must be protec­ted given plat­forms’ track record of hosti­lity toward such exter­nal scru­tiny.  



The poten­tial of the DSA’s new trans­pa­rency and data access rules is that these will allow public inter­est rese­ar­chers to dig even deeper and help us gain a more compre­hen­sive unders­tan­ding of how plat­forms’ algo­rith­mic systems work. Only once we have that level of trans­pa­rency and public scru­tiny can we really hold plat­forms accoun­ta­ble for the risks they pose to indi­vi­du­als and to soci­ety.