reCAPTCHA Privacy — Is it an Oxymoron Now? | dones i noves tecnologies

11.10.2023 10:10

Àmbits Temàtics

Google’s reCAPTCHA is by far the most popular CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart). According to BuiltWith, it’s currently used by more than 15 million websites, and according to Slintel, reCAPTCHA has a market share of 97.08%.

That being the case, the fact that the French privacy commission CNIL recently said that reCAPTCHA uses excessive personal data for purposes other than security comes as a wake-up call. It means that website owners wanting to guarantee the safety of their users’ personal data may struggle to do so.

reCAPTCHA Privacy Concerns

This isn’t the first time that Google has run into trouble with the French regulator. CNIL fined the company 150 million euros in 2021 (90 million for GOOGLE LLC and 60 million for GOOGLE IRELAND LIMITED) because google.fr and youtube.com users couldn’t refuse or accept cookies as easily as GDPR says they should.

It’s a similar story this time too, but in this case, CNIL wasn’t initially looking at Google. It only discovered reCAPTCHA privacy concerns (it was sending user data from European users to Google’s US servers) as part of its investigation into an e-scooter company called Cityscoot. The firm was using reCAPTCHA on its website and app, but it wasn’t attempting to gain users’ consent to use it and didn’t offer them any information about what was happening to their data.

It should have done both because the latest version of reCAPTCHA relies on cookies, and under the EU ePrivacy Directive, you need to tell users about what cookies you are using and why, as well as gain their consent.

The reCAPTCHA Privacy Cookie

There are exceptions to this rule though, and Cityscoot tried to argue that it wasn’t responsible for the reCAPTCHA privacy issue because its use of cookies was, “strictly necessary (to provide a) service explicitly requested by the user…” However, there is a caveat to this exception which says that “The act of authentication must not be taken as an opportunity to use the cookie for other secondary purposes…”

Since reCAPTCHA sends application and device data to Google for analysis, Cityscoot couldn’t claim this exception. So, in the opinion of the regulator, the act of authentication was taken as an opportunity to use the cookie for additional purposes. CNIL found that Cityscoot should have been informing its users about the reCAPTCHA privacy cookie and giving them the chance to opt-out.

This creates quite a problem for reCAPTCHA privacy protection. If it requires consent, as CNIL says, then that would clear the way for spambots to decline it, effectively turning it into a security door that opens for anyone.

Website Tracking Technologies

The table below shows details of reCAPTCHA privacy issues and similar essential tools with potentially problematic behaviors.

Name	Description	Additional Information	Privacy Risks
reCAPTCHA	Automatically differentiates between humans and internet bots.	Often includes image challenges that humans can complete but bots struggle with.	Data collection, including IP addresses, user agent strings, and browser info. Can be used to put together a profile of users’ online activities. Google can track user browsing habits to construct behavior profiles. If users are signed into their Google accounts while using reCAPTCHA, Google could link collected data to their personal profiles. Google’s reCAPTCHA data collection practices are not made clear to users.
Cookie	Small text files placed on a user’s device when they visit a website. The website server creates and uses them to track user behavior and store information.	Session cookies are a temporary type. They are deleted when the browser is closed. Persistent cookies remain even after the browser is closed. They remember user settings and preferences for the next visit. Third-party cookies are set by external domains (not the ones being visited). Often used for tracking and advertising.	Can be used to track users’ online activities across different websites. Builds a profile of browsing behavior, preferences and interests. Can share info with third parties, including advertisers for targeted ads. Can collect IP addresses, device info, browsing history, demographic information and more.
Pixel	Tiny transparent images or snippets of code embedded in emails or web pages. When a user visits a site or opens an e-mail, the pixel triggers a request to a server, sending info about the visitor’s interaction with the content.	Often used for tracking how effective ad campaigns are, gathering analytics data, and measuring conversions. Good for gathering user metrics such as page views, clicks, and conversions. Pixels are often used for retargeting, which means readvertising a product or service to visitors who showed an interest in it before.	Pixels can be embedded in multiple websites and used to create a detailed profile of user browsing behavior. This cross-site tracking can help build detailed profiles that grow into digital fingerprints of their interests that may intrude on privacy. Pixels can collect IP addresses, device info, browsing history, and interactions with particular content. May lead to the collection of personally identifiable information without explicit user consent.
Tag	Also referred to as UTM codes, tracking tags or script tags. Code snippets embedded in a webpage’s HTML or placed in its header or footer. Frequently used to collect data, track analytics, serve ads, integrate with social media, third-party tools and more.	JavaScript, HTML, and other tags allow a website to communicate with external platforms or services. Tags can be used to track user behavior, measure website performance, personalize content, and perform various marketing and analysis roles.	Can potentially gather sensitive or personally identifiable information without users’ explicit knowledge or consent. Can share data with ad networks, analytics providers, and social media platforms, potentially without their explicit consent. Can track user behavior and interactions across platforms and websites, potentially creating detailed profiles with users’ online activities, interests, and preferences.
iFrame	Creates a region on a webpage where the content of another site can be displayed.	Stands for ‘inline frame’	Can be exploited for Cross-Site Scripting (XSS) Attacks, to access or manipulate sensitive information within the host page or perform unauthorized actions on behalf of the user. Embedded content could gather user cookies, IP addresses, and browsing patterns for tracking and profiling without users’ knowledge.

Your website may rely on all these tools to provide essential functionality, but you don’t want them to be misused, so how do you square that circle? The best answer may be external monitoring. Unlike embedded solutions, external monitoring can’t be blinded to the behaviors of third-party website components.

A case in point: the Reflectiz platform recently identified when the TikTok pixel tried to access the login forms on a financial service company’s website. It was trying to pass on sensitive user input data to TikTok’s servers. The Reflectiz investigation team immediately forwarded clear steps to the company to remedy this behavior, saving them the financial, legal, and reputational damage of a data breach.