[go: up one dir, main page]

Mark Colley, Toronto Star:

For more than 60 years, Canadian Tire money staked a legitimate claim as Canada’s unofficial second currency. At one point, a study showed half the country collected it. The coupons even made the Canadian Oxford dictionary.

[…]

Ultimately, experts speculate, Canadian Tire money fell victim to a world moving faster and smarter than ever. The coupon return on a purchase, once five per cent, depreciated to 0.4 per cent by the 2010s. Cash, at Canadian Tire and everywhere else, fell out of favour. And a digital rewards program gave the company an asset even more valuable than loyalty: data, on what and when and how often you buy.

This story yanked me right back to my childhood, finding Canadian Tire money jammed in the glove box and centre console of the family car. It was a loyalty program, sure, but one that had a unique charm and tactility that one does not get from a plastic card or by tapping their phone.

Thomas Germain, for BBC News, looked at the efforts of Ethan Zuckerman and others to study YouTube. Their findings are sometimes expected — most videos have been viewed fewer than 500 times — but often notable. Most videos are not edited, have no monetization, and have no requests for viewers to like, comment, and subscribe.

So, what is YouTube, anyway? A place for people like you and I to watch a relatively small number of headliners? Germain:

This narrative misses a critical piece of the picture, says Ryan McGrady, the senior researcher in Zuckerman’s lab, who participated in the scraping project. YouTube is a free service that was built from the ground up by a private company, and it could be argued that Google should be able to run the platform as such. But when you examine how people are actually using YouTube, it looks less like TV and more like infrastructure, McGrady says.

[…]

YouTube is one of the internet’s de facto repositories, the first place many of us go when we have videos we want to post or store online. It’s also a place where local authority meetings are broadcast, for example, providing a vital opportunity for public accountability in ways that weren’t possible before it existed. It isn’t just a “platform”, McGrady says, it’s a critical piece of infrastructure, and that’s how it should be regulated. “For companies that own so much of our public sphere, there are some minimum expectations we should have about transparency.”

I think it is generally a mistake to treat the popularity of corporations like these as a basis to treat them as infrastructure. They are at the very top of a deep stack. Fulfilling the law, the answer to this question should be “no, YouTube is not infrastructure”.

Even so, there is something appealing about this argument because video is special. It is cumbersome; it requires complex arrangements to serve it efficiently and reliably. But some of those barriers are becoming less foreboding, giving us more places to post and watch videos. It was not so long ago that YouTube was the only name in general-purpose video hosting. Yet you can now publish to most any social network. Instagram and TikTok host a different type of video but, for lots of people, they are just as relevant as YouTube. Alternatives like Rumble and X are appearing for the perpetually aggrieved set who are convinced their broadcasts would be censored elsewhere. Yet there is nothing else quite like YouTube.

I still believe it would be difficult and unwise to govern YouTube like it is infrastructure, even if it seems to have that role. And, so, the best thing we can do is to stop treating YouTube like infrastructure. It should not be the place to stream or archive government or board meetings. It should not be treated as a video host by other businesses. It is not a good destination for your important family video. It is a place to put those things to share them, if you would like, but it is not an archival choice. YouTube needs to have the ability to moderate videos because there are things expected of infrastructure but not appropriate for a general-purpose entertainment platform. As such, we need to stop seeing it as a video repository.

Update: As if to prove YouTube is still unique, Facebook says it is going to begin deleting old live broadcasts. For comparison, YouTube archives live videos indefinitely. It would be so great if there were alternatives not focused on boosting mindless reactionaries.

Priced originally at $65, it’s currently on sale for $39 (plus shipping) — no coupon is required.

How Comics Were Made follows the intensive, magical, and industrial journey a newspaper comic takes from a cartoonist’s hand through the transformative production process to make it ready to appear as ink on newsprint—and behind, onto digital displays. The book richly illustrates the stages through never-before-seen printing artifacts dating back as far as the 1890s, original artwork from cartoonists like Bill Watterson, Lynn Johnston, Garry Trudeau, and Charles Schulz.

Sponsor: How Comics Were Made

Author Glenn Fleishman uses historic images, preserved items, and industrial films to reconstruct nearly forgotten processes of the metal relief era of printing, as well as more modern periods, as newspaper and printing in general shifted into photographic reproduction using flat metal offset printing, and then, ultimately, digital processes from drawing tablets to laser-etched plates. The story is told through historic and modern interviews. Glenn interviewed over 40 cartoonists and others in the comics world, including rare talks with Trudeau and Watterson.

If you love comics and history or know someone who does, How Comics Were Made provides a unique lens from the dawn of newspaper cartoons to the present.

(The current deep discount stems from the book being printed and warehoused in Canada. With tariffs possible by March, the author is eager to have the book reach an audience before it becomes unaffordable to import.)

Joseph Cox and Dhruv Mehrotra, in an article jointly published by 404 Media and Wired:

Last year, a media investigation revealed that a Florida-based data broker, Datastream Group, was selling highly sensitive location data that tracked United States military and intelligence personnel overseas. At the time, the origin of that data was unknown.

Now, a letter sent to US senator Ron Wyden’s office that was obtained by an international collective of media outlets — including WIRED and 404 Media — claims that the ultimate source of that data was Eskimi, a little-known Lithuanian ad-tech company. Eskimi, meanwhile, denies it had any involvement.

The letter was apparently sent by Datastream, which means it either has no idea where it got this extremely precise location information, or Eskimi is being dishonest. That is kind of the data broker industry in a nutshell: a vast sea of our personal information being traded indiscriminately by businesses worldwide — whose names we have never heard of — with basically no accountability or limitations.

Something very useful from the Atlas of Type: a huge list of type foundries. Only a handful of Canadian designers on this list, including the legendary Canada Type and Pangram Pangram, but I was particularly excited to learn about Tiro Typeworks. They have a vast library of type for scientific and scholarly works — if you are reading this on MacOS, you probably have STIX Two installed — and they have also produced typefaces with vast language support, including for syllabics. Given their contributions to type design and the OpenType format, I feel like I used the word “legendary” above much too soon.

(Via Robb Knight.)

Chris Welch, the Verge:

Netflix spokesperson MoMo Zhou has told The Verge that this morning’s window where Netflix appeared as a “participating” service in Apple TV — including temporary support for the watchlist and “continue watching” features — was an error and has now been rolled back. That’s a shame. The jubilation in our comments on the original story was palpable.

Netflix sincerely apologizes for giving people what they want.

Online privacy isn’t just something you should be hoping for – it’s something you should expect. You should ensure your browsing history stays private and is not harvested by ad networks.

Magic Lasso Adblock: No ads, no trackers, no annoyances, no worries

By blocking ad trackers, Magic Lasso Adblock stops you being followed by ads around the web.

As an efficient, high performance and native Safari ad blocker, Magic Lasso blocks all intrusive ads, trackers and annoyances on your iPhone, iPad, and Mac. And it’s been designed from the ground up to protect your privacy.

Users rely on Magic Lasso Adblock to:

  • Remove ad trackers, annoyances and background crypto-mining scripts

  • Browse common websites 2.0× faster

  • Block all YouTube ads, including pre-roll video ads

  • Double battery life during heavy web browsing

  • Lower data usage when on the go

With over 5,000 five star reviews; it’s simply the best ad blocker for your iPhone, iPad, and Mac.

And unlike some other ad blockers, Magic Lasso Adblock respects your privacy, doesn’t accept payment from advertisers and is 100% supported by its community of users.

So, join over 350,000 users and download Magic Lasso Adblock today.

In November 2023, two researchers at the University of California, Irvine, and their supervisor published “Dazed and Confused”, a working paper about Google’s reCAPTCHAv2 system. They write mostly about how irritating and difficult it is to use, and also explore its privacy and labour costs — and it is that last section in which I had some doubts when I first noticed the paper being passed around in July.

I was content to leave it there, assuming this paper would be chalked up as one more curiosity on a heap of others on arXiv. It has not been subjected to peer review at any journal, as far as I can figure out, nor can I find another academic article referencing it. (I am not counting the dissertation by one of the paper’s authors summarizing its findings.) Yet parts of it are on their way to becoming zombie statistics. Mike Elgan, writing in his October Computerworld column, repeated the paper’s claim that “Google might have profited as much as $888 billion from cookies created by reCAPTCHA sessions”. Ted Litchfield of PC Gamer included another calculation alleging solving CAPTCHAs “consum[ed] 7.5 million kWhs of energy[,] which produced 7.5 million pounds of CO2 pollution”; the article is headlined reCAPTCHAs “[…] made us spend 819 million hours clicking on traffic lights to generate nearly $1 trillion for Google”. In a Boing Boing article earlier this month, Mark Frauenfelder wrote:

[…] Through analyzing over 3,600 users, the researchers found that solving image-based challenges takes 557% longer than checkbox challenges and concluded that reCAPTCHA has cost society an estimated 819 million hours of human time valued at $6.1 billion in wages while generating massive profits for Google through its tracking capabilities and data collection, with the value of tracking cookies alone estimated at $888 billion.

I get why these figures are alluring. CAPTCHAs are heavily studied; a search of Google Scholar for “CAPTCHA” returns over 171,000 results. As you might expect, most are adversarial experiments, but there are several examining usability and, others, privacy. However, I could find just one previous paper correlating, say, emissions and CAPTCHA solving, and it was a joke paper (PDF) from the 2009 SIGBOVIK conference, “the Association for Computational Heresy Special Interest Group”. Choice excerpt: “CAPTCHAs were the very starting point for human computation, a recently proposed new field of Computer Science that lets computer scientists appear less dumb to the world”. Excellent.

So you can see why the claims of the U.C. Irvine researchers have resonated in the press. For example, here is what they — Andrew Searles, Renascence Tarafder Prapty, and Gene Tsudik — wrote in their paper (PDF) about emissions:

Assuming un-cached scenarios from our technical analysis (see Appendix B), network bandwidth overhead is 408 KB per session. This translates into 134 trillion KB or 134 Petabytes (194 x 1024 Terrabytes [sic]) of bandwidth. A recent (2017) survey estimated that the cost of energy for network data transmission was 0.06 kWh/GB (Kilowatt hours per Gigabyte). Based on this rate, we estimate that 7.5 million kWh of energy was used on just the network transmission of reCAPTCHA data. This does not include client or server related energy costs. Based on the rates provided by the US Environmental Protection Agency (EPA) and US Energy Information Administration (EIA), 1 kWh roughly equals 1-2.4 pounds of CO2 pollution. This implies that reCAPTCHA bandwidth consumption alone produced in the range of 7.5-18 million pounds of CO2 pollution over 9 years.

Obviously, any emissions are bad — but how much is 7.5–18 million pounds of CO2 over nine years in context? A 2024 working paper from the U.S. Federal Housing Finance Agency estimated residential properties each produce 6.8 metric tons of CO2 emissions from electricity and heating, or about 15,000 pounds. That means CAPTCHAs produced as much CO2 as providing utilities to 55–133 U.S. houses per year. Not good, sure, but not terrible — at least, not when you consider the 408 kilobyte session transfer against, say, Google’s homepage, which weighs nearly 2 MB uncached. Realistically, CAPTCHAs are not a meaningful burden on the web or our environment.

The numbers in this discussion area are suspect. From these CO2 figures to the value of reCAPTCHA cookies — apparently responsible for nearly half of Google’s revenue from when it acquired the company — I find the evidence for them lacking. Yet they continue to circulate in print and, now, in a Vox-esque mini documentary.

The video, on the CHUPPL “investigative journalism” YouTube channel, was created by Jack Joyce. I found it via Frauenfelder, of Boing Boing, and it was also posted by Daniel Sims at TechSpot and Emma Roth at the Verge. The roughly 17-minute mini-doc has been watched nearly 200,000 times, and the CHUPPL channel has over 350,000 subscribers. Neither number is massive for YouTube, but it is not a small amount of viewers, either. Four of the ten videos from CHUPPL have achieved over a million views apiece. This channel has a footprint. But watching the first half of its reCAPTCHA video is what got me to open BBEdit and start writing this thing. It is a masterclass in how the YouTube video essay format and glossy production can mask bad journalism. I asked CHUPPL several questions about this video and did not receive a response by the time I published this.

Let me begin at the beginning:

How does this checkbox know that I’m not a robot? I didn’t click any motorcycles or traffic lights. I didn’t even type in distorted words — and yet it knew. This infamous tech is called reCAPTCHA and, when it comes to reach, few tools rival its presence across the web. It’s on twelve and a half million websites, quietly sitting on pages that you visit every day, and it’s actually not very good at stopping bots.

While Joyce provides sources for most claims in this video, there is not one for this specific number. According to BuiltWith, which tracks technologies used on websites, the claim is pretty accurate — it sees it used on about twelve million websites, and it is the most popular CAPTCHA script.

But Google has far more popular products than these if it wants to track you across the web. Google Maps, for example, is on over 15 million live websites, Analytics is on around 31 million, and AdSense is on nearly 49 million. I am not saying that we should not be concerned about reCAPTCHA because it is on only twelve million sites, but that number needs context. Google Maps is more popular, according to BuiltWith, than reCAPTCHA. If Google wants to track user activity across the web, AdSense is explicitly designed for that purpose. Yes, it is probably true that “few tools rival its presence across the web”, but you can say that of just about any technology from Google, Meta, Amazon, Cloudflare, and a handful of other giants — but, especially, Google.

Back to the intro:

It turns out reCAPTCHA isn’t what we think it is, and the public narrative around reCAPTCHA is an impossibly small sliver of the truth. And by accepting that sliver as the full truth, we’ve all been misled. For months, we followed the data, we examined glossed over research, and uncovered evidence that most people don’t know exists. This isn’t the story of an inconsequential box. It’s the story of a seemingly innocent tool and how it became a gateway for corporate greed and mass surveillance. We found buried lawsuits, whispers of the NSA, and echoes of Edward Snowden. This is the story of the future of the Internet and who’s trying to control it.

The claims in this introduction vastly oversell what will be shown in this video. The lawsuits are not “buried”, they were linked from the reCAPTCHA Wikipedia article as it appeared before the video was published. The “whispers” and “echoes” of mass surveillance disclosures will prove to be based on almost nothing. There are real concerns with reCAPTCHA, and this video does justice to almost none of them.

The main privacy problems with reCAPTCHA are found in its ubiquity and its ownership. Google swears up and down it collects device and user behaviour data through reCAPTCHA only for better bot detection. It issued a statement saying as much to Techradar in response to the “Dazed and Confused” paper circulating again. In a 2021 blog post announcing reCAPTCHA Enterprise — the latest version combining V2, V3, and the mobile SDKs under a single brand — Google says:

Today, reCAPTCHA Enterprise is a pure security product. Information collected is used to provide and improve reCAPTCHA Enterprise and for general security purposes. We don’t use this data for any other purpose.

[…] Additionally, none of the data collected can be used for personalized advertising by Google.

Google goes on to explain that it collects data as a user navigates through a website to help determine if they are a bot without having to present a challenge. Again, it is adamant none of this data is used to feed its targeted advertising machine.

There are a couple of problems with this. First, because Google does not disclose exactly how reCAPTCHA works, its promise requires that you trust the company. It is not a great idea to believe the word of corporations in general. Specifically, in Google’s case, a leak of its search ranking signals last year directly contradicted its public statements. But, even though Google was dishonest then, there is currently no evidence reCAPTCHA data is being misused in the way Joyce’s video suggests. Coyly asking questions with sinister-sounding music underneath is not a substitute for evidence.

The second problem is the way Google’s privacy policy can be interpreted, as reported by Thomas Claburn in 2020 in the Register:

Zach Edwards, co-founder of web analytics biz Victory Medium, found that Google’s reCAPTCHA’s JavaScript code makes it possible for the mega-corp to conduct “triangle syncing,” a way for two distinct web domains to associate the cookies they set for a given individual. In such an event, if a person visits a website implementing tracking scripts tied to either those two advertising domains, both companies would receive network requests linked to the visitor and either could display an ad targeting that particular individual.

You will hear from Edwards later in Joyce’s video making a similar argument. Just because Google can do this, it does not mean it is actually doing so. It has the far more popular AdSense for that.

ReCAPTCHA interacts with three Google cookies when it is present: AEC, NID, and OGPC. According to Google, AEC is “used to detect spam, fraud, and abuse” including for advertising click fraud. I could not find official documentation about OGPC, but it and NID appear to be used for advertising for signed-out users. Of these, NID is most interesting to me because it is also used to store Google Search preferences, so someone who uses Google’s most popular service is going to have it set regardless, and its value is fixed for six months. Therefore, it is possible to treat it as a unique identifier for that time.

I could not find a legal demand of Google specifically for reCAPTCHA history. But I did find a high-profile request to re-identify NID cookies. In 2017, the first Trump administration began seizing records from reporters, including those from the New York Times. The Times uses Google Apps for its email system. That administration and then the Biden one tried obtaining email metadata, too, while preventing Times executives from disclosing anything about it. In the warrant (PDF), the Department of Justice demands of Google:

PROVIDER is required to disclose to the United States the following records and other information, if available, for the Account(s) for the time period from January 14, 2017, through April 30, 2017, constituting all records and other information relating to the Account(s) (except the contents of communications), including:

[…]

Identification of any PROVIDER account(s) that are linked to the Account(s) by cookies, including all PROVIDER user IDs that logged into PROVIDER’s services by the same machine as the Account(s).

And by “cookies”, the government says that includes “[…] cookies related to user preferences (such as NID), […] cookies used for advertising (such as NID, SID, IDE, DSID, FLC, AID, TAID, and exchange_uid) […]” plus Google Analytics cookies. This is not the first time Google’s cookies have been used in intelligence or law enforcement matters — the NSA has, of course, been using them that way for years — but it is notable for being an explicit instance of tying the NID cookie, which is among those used with reCAPTCHA, to a user’s identity. (Google says site owners can use a different reCAPTCHA domain to disassociate its cookies.) Also, given the effort of the Times’ lawyers to release this warrant, it is not surprising I was unable to find another public document containing similar language. I could not find any other reporting on this cookie-based identification effort, so I think this is news. In this case, Google successfully fought the government’s request for email metadata.

Assuming Google retains these records, what the Department of Justice was demanding would be enough to connect a reCAPTCHA user to other Google product activity and a Google account holder using the shared NID cookie. Furthermore, it is a problem that so much of the web relies on a relative handful of companies. Google has long treated the open web as its de facto operating system, coercing site owners to use features like AMP or making updates to comply with new search ranking guidelines. It is not just Google that is overly controlling, to be fair — I regularly cannot access websites on my iMac because Cloudflare believes I am a robot and it will not let me prove otherwise — but it is the most significant example. Its fingers in every pie — from site analytics, to fonts, to advertising, to maps, to browsers, to reCAPTCHA — means it has a unique vantage point from which to see how billions of people use the web.

These are actual privacy concerns, but you will learn none of them from Joyce’s video. You will instead be on the receiving end of a scatterbrained series of suggestions of reCAPTCHA’s singularly nefarious quality, driven by just-asking-questions conspiratorial thinking, without reaching a satisfying destination.

From here on, I am going to use timecodes as reference points. 1:56:

Journalists told you such a small sliver of the truth that I would consider it to be deceptive.

Bad news: Joyce is about to be fairly deceptive while relying on the hard work of journalists.

At 3:24:

Okay, you’re probably thinking “why does any of this matter?”, and I agree with you.

I did agree with you. I actually halted this investigation for a few weeks because I thought it was quite boring — until I went to renew my passport. (Passport status dot state dot gov.)

I got a CAPTCHA — not a checkbox, not fire hydrants, but the old one. And I clicked it. And it took me here.

The “here” Joyce mentions is a page at captcha.org, which is redirected from its original destination at captcha.com. The material is similar on both. The ownership of the .org domain is unclear, but the .com is run by Captcha, Inc., and it sells the CAPTCHA package used by the U.S. Department of State among other government departments. I have a sneaking suspicion the .org retains some ties to Captcha, Inc. given the DNS records of each. Also, the list of CAPTCHA software on the .org site begins with all the packages offered by Captcha, Inc., and its listing for reCAPTCHA is outdated — it does not display Google as its owner, for example — but the directory’s operators found time to add the recaptcha.sucks website.

About that. 4:07:

An entire page dedicated to documenting the horrors of reCAPTCHA: alleging national security implications for the U.S. and foreign governments, its ability to doxx users, mentioning secret FISA orders — the same type of orders that Edward Snowden risked his life to warn us about. […]

Who put this together? “Anonymous”.

if you are a web-native journalist, wishing to get in touch, we doubt you are going to have a hard-time figuring out who we are anyway.

This felt like a key left in plain sight, whispering there’s a door nearby and it’s meant to be opened. This is what we’re good at. This is what we do.

The U.S. “national security implications” are, as you can see on screen as these words are being said, not present: “stay tuned — it will be continued”, the message from ten years ago reads. The FISA reference, meanwhile, is a quote from Google’s national security requests page acknowledging the types of data it can disclose under these demands. It is a note that FISA exists and, under that law, Google can be compelled to disclose user data — a policy that applies to every company.

This all comes from the ReCAPTCHA Sucks website. On the About page, the site author acknowledges they are a competitor and maintains their anonymity is due to trademark concerns:

a free-speech / gripe-site on trademarked domains must not be used in a ‘bad faith’ — what includes promotion of competing products and services.

and under certain legal interpretations disclosing of our identity here might be construed as a promotion of our own competing captcha product or service.

it frustrates us indeed, but those are the rules of the game.

The page concludes, as Joyce quoted:

if you are a web-native journalist, wishing to get in touch, we doubt you are going to have a hard-time figuring out who we are anyway.

Joyce reads this as a kind of spooky challenge yet, so far as I can figure out, did not attempt to contact the site’s operators. I asked CHUPPL about this and I have not heard back. It is not very difficult to figure out who they are. The site has a shared technical infrastructure, including a historic Google Analytics account, with captcha.com. It feels less like the work of a super careful anonymous tipster, and more like an open secret from an understandably cheesed competitor.

5:05:

Okay, let’s get this out of the way: reCAPTCHA is not and really has never been very good at stopping bots.

Joyce points to the success rate of a couple of reCAPTCHA breakers here as evidence of its ineffectiveness, though does not mention they were both against the audio version. What Joyce does not establish is whether these programs were used much in the real world.

In 2023, Trend Micro published research into the way popular CAPTCHA solving services operate. Despite the seemingly high success rate of automated techniques, “they break CAPTCHAs by farming out CAPTCHA-breaking tasks to actual human solvers” because there are a lot more services out there than reCAPTCHA. That is exactly how many CAPTCHA solvers market their services, though some are now saying they use A.I. instead. Also, it is not as though other types of CAPTCHAs are not subject to similar threats. In 2021, researchers solved hCAPTCHA (PDF) with a nearly 96% success rate. Being only okay at stopping bot traffic is not unique to reCAPTCHA, and these tools are just one of several technologies used to minimize automated traffic. And, true enough, none of these techniques is perfect, or even particularly successful. But that does not mean their purpose is nefarious, as Joyce suggests later in the video, at 11:45:

Google has said that they don’t use the data collected from reCAPTCHA for targeted advertising, which actually scares me a bit more. If not for targeted ads, which is their whole business model, why is Google acting like an intelligence agency?

Joyce does not answer this directly, instead choosing to speculate about a way reCAPTCHA data could be used to identify people who submit anonymous tips to the FBI — yes, really. More on that later.

5:49:

2018 was the launch of V3. According to researchers at U.C. Irvine, there’s practically no difference between V2 and V3.

Onscreen, Joyce shows an excerpt from the “Dazed and Confused” paper, and the sentence fragment “there is no discernable difference between reCAPTCHAv2 and reCAPTCHAv3” is highlighted. But just after that, you can see the sentence continues: “in terms of appearance or perception of image challenges and audio challenges”.

Screenshot from CHUPPL video.
Screenshot from CHUPPL video showing excerpt from an academic paper.

Remember: these researchers were mainly studying the usability of these CAPTCHAs. This section is describing how users perceive the similar challenges presented by both versions. They are not saying V2 and V3 have “practically no difference” in general terms.

At 6:56:

ReCAPTCHA “takes a pixel-by-pixel fingerprint” of your browser. A real-time map of everything you do on the internet.

This part contains a quote from a 2015 Business Insider article by Lara O’Reilly. O’Reilly, in turn, cites research by AdTruth, then — as now — owned by Experian. I can find plenty of references to O’Reilly’s article but, try as I might, I have not been able to find a copy of the original report. But, as a 2017 report from Cracked Labs (PDF) points out, Experian’s AdTruth “provides ‘universal device recognition’”, “creat[ing] a ‘unique user ID’ for each device, by collecting information such as IP addresses, device models and device settings”. To the extent “pixel-by-pixel fingerprint” means anything in this context — it does not, but it misleadingly sounds to me like it is taking screenshots — Experian’s offering also fits that description. It is a problem there are so many things which quietly monitor user activity across their entire digital footprint.

Unfortunately, at 7:41, Joyce whiffs hard while trying to make this point:

If there’s any part of this video you should listen to, it’s this. Stop making dinner, stop scrolling on your phone, and please listen.

When I tell you that reCAPTCHA is watching you, I’m not saying that in some abstract, metaphorical way. Right now, reCAPTCHA is watching you. It knows that you’re watching me. And it doesn’t want you to know.

This stumbles in two discrete ways. First, reCAPTCHA is owned by Google, but so is YouTube. Google, by definition, knows what you are doing on YouTube. It does not need reCAPTCHA to secretly gather that information, too.

Second, the evidence Joyce presents for why “it doesn’t want you to know” is that Google has added some CSS to hide a floating badge, a capability it documents. This is for one presentation of reCAPTCHAv2, which is as invisible background validation and where a checkbox is shown only to suspicious users.

Screenshot from CHUPPL video.
Screenshot from CHUPPL video.

I do not think Google “does not want you to know” about reCAPTCHA on YouTube. I think it thinks it is distracting. Google products using other Google technologies has not been a unique concern since the company merged user data and privacy policies in 2012.

The second half of the video, following the sponsor read, is a jumbled mess of arguments. Joyce spends time on a 2015 class action lawsuit filed against Google in Massachusetts alleging completing the old-style word-based reCAPTCHA was unfairly using unpaid labour to transcribe books. It was tossed in 2016 because the plaintiff (PDF) “failed to identify any statute assigning value to the few seconds it takes to transcribe one word”, and “Google’s profit is not Plaintiff’s damage”.

Joyce then takes us on a meandering journey through the way Google’s terms of use document is written — this is where we hear from Edwards reciting the same arguments as appeared in that 2020 Register article — and he touches briefly on the U.S. v. Google antitrust trial, none of which concerned reCAPTCHA. There is a mention of a U.K. audit in 2015 specifically related to its 2012 privacy policy merger. This is dropped with no explanation into the middle of Edwards’ questioning of what Google deems “security related” in the context of its current privacy policy.

Then we get to the FBI stuff. Remember earlier when I told you Joyce has a theory about how Google uses reCAPTCHA to unmask FBI tipsters? Here is when that comes up again:

Check this out: if you want to submit a tip to the FBI, you’re met with this notice acknowledging your right to anonymity. But even though the State Department doesn’t use reCAPTCHA, the FBI and the NSA do. […] If they want to know who submitted the anonymous report, Google has to tell them.

This is quite the theory. There is video of Edward Snowden and clips from news reports about the mysteries of the FISA court. Dramatic music. A chart of U.S. government requests for user data from Google.

But why focus on reCAPTCHA when the FBI and NSA — and a whole bunch of other government sites — also use Google Analytics? Though Google says Analytics cookies are distinct from those used by its advertising services, site owners can link them together, which would not be obvious to users. There is no evidence the FBI or any other government agency is doing so. The actual problem here is that sensitive and ostensibly anonymous government sites are using any Google services whatsoever, probably because they are a massive corporation with lots of widely-used products and services.

Even so, many federal sites use the product offered by Captcha, Inc. and it seems to respect privacy by being self-hosted. All of them should just use that. The U.S. government has its own analytics service; the stats are public. The reason for inconsistencies is probably the same reason any massive organization’s websites are fragmented: it is a lot of work to keep them unified.

Non-U.S. government sites are not much better. RCMP Alberta also uses Google Analytics, though not reCAPTCHA, as does London’s Metropolitan Police.

Joyce juxtaposes this with the U.S. Secret Service’s use of Babel Street’s Locate X data. He does not explain any direct connection to reCAPTCHA or Google, and there is a very good reason for this: there is none. Babel Street obtained some of its location data from Venntel, which is owned by Gravy Analytics, which obtained it from personalized ads.

Joyce ultimately settles on a good point near the end of the video, saying Google uses various browsing signals “before, during, and after” clicking the CAPTCHA to determine whether you are likely human. If it does not have enough information about you — “you clear your cookies, you are browsing Incognito, maybe you are using a privacy-focused browser” — it is more likely to challenge you.

None of this is actually news. It has all been disclosed by Google itself on its website and in a 2014 Wired article by Andy Greenberg, linked from O’Reilly’s Business Insider story. This is what Joyce refers to at 7:24 in the video in saying “reCAPTCHA doesn’t need to be good at stopping bots because it knows who you are. The new reCAPTCHA runs in the background, is invisible, and only shows challenges to bots or suspicious users”. But that is exactly how reCAPTCHA stops bots, albeit not perfectly: it either knows who you are and lets you through without a challenge, or it asks you for confirmation.

It is this very frustration I have as I try to protect my privacy while still using the web. I hit reCAPTCHA challenges frequently, especially when working on something like this article, in which I often relied on Google’s superior historical index and advanced search operators to look up stories from ten years ago. As I wrote earlier, I run into Cloudflare’s bot wall constantly on one of my Macs but not the other, and I often cannot bypass it without restarting my Mac or, ironically enough, using a private browsing window. Because I use Safari, website data is deleted more frequently, which means I am constantly logging into services I use all the time. The web becomes more cumbersome to use when you want to be tracked less.

There are three things I want to leave you with. First, there is an interesting video to be made about the privacy concerns of reCAPTCHA, but this is not it. It is missing evidence, does not put findings in adequate context, and drifts conspiratorially from one argument to another while only gesturing at conclusions. Joyce is incorrect in saying “journalists told you such a small sliver of the truth that I would consider it to be deceptive”. In fact, they have done the hard work over many years to document Google’s many privacy failures — including in reCAPTCHA. That work should bolster understandable suspicions about massive corporations ruining our right to privacy. This video is technically well produced, but it is of shoddy substance. It does not do justice to the work of the better journalists whose work it relies upon.

Second, CAPTCHAs offer questionable utility. As iffy as I find the data in the discussion section of the “Dazed and Confused” paper, its other findings seem solid: people find it irritating to label images or select boxes containing an object. A different paper (PDF) with two of the same co-authors and four other researchers found people most like reCAPTCHA’s checkbox-only presentation — the one that necessarily compromises user privacy — but also found some people will abandon tasks rather than solve a CAPTCHA. Researchers in 2020 (PDF) found CAPTCHAs were an impediment to people with visual disabilities. This is bad. Unfortunately, we are in a new era of mass web scraping — one reason I was able to so easily find many CAPTCHA solving services. Site owners wishing to control that kind of traffic have options like identifying user agents or I.P. address strings, but all of these can be defeated. CAPTCHAs can, too. Sometimes, all you can do is pile together a bunch of bad options and hope the result is passable.

Third, this is yet another illustration of how important it is for there to be strong privacy legislation. Nobody should have to question whether checking a box to prove they are not a robot is, even in a small way, feeding a massive data mining operation. We are never going to make progress on tracking as long as it remains legal and lucrative.

Natasha Lomas, TechCrunch:

Germany’s antitrust watchdog has been investigating Apple’s app privacy framework since 2022. On Thursday, releasing preliminary findings from this probe, the Bundeskartellamt (FCO) said it suspects the iPhone maker may not be treating third-party app developers as equally as the law requires.

The antitrust watchdog said it believes Apple’s behavior could amount to self-preferencing. Apple is banned from preferring its own services and products in Germany since April 2023, when it became subject to special abuse controls aimed at regulating big tech’s market power.

The Bundeskartellamt says a ruling to Apple’s appeal over the April 2023 decision is expected in March. So far, its findings over App Tracking Transparency remain “preliminary”; it says “Apple now has the opportunity to comment on the allegations”.

Zac Hall, of 9to5Mac, received a statement from Apple reading, in part:

Apple has led the way in developing industry leading technologies to provide users great features without compromising privacy. App Tracking Transparency gives users more control of their privacy through a required, clear, and easy-to-understand prompt about one thing: tracking. That prompt is consistent for all developers, including Apple, and we have received strong support for this feature from consumers, privacy advocates, and data protection authorities around the world.

This does not meaningfully address the German authority’s concerns, which are based on the way Apple defines “tracking” as exclusively a third-party phenomenon. Apple collects highly granular data about users’ interactions with its internet services and associates it with their Apple ID. It allows precise ad targeting. I would expect people to have more comfort around first-party collection than third-party, but Apple’s own definition of “tracking” excludes these behaviours.

I hope the resolution here creates better privacy protections for all users, not by relaxing App Tracking Transparency.

Joe Rossignol, MacRumors:

Apple this month started advertising on X for the first time in more than a year. The company had stopped advertising on the social media platform in November 2023 following controversial remarks made by its owner Elon Musk.

Translation: Apple is participating in a nakedly corrupt government by giving money to co-president Elon Musk, an increasing level of cooperation which will continue to be justified on the basis it is a disproportionately influential public corporation with shares held by retirement funds and, therefore, should continue blurring the lines between diplomacy and obsequiousness.

Jason Snell, Macworld:

Apple doesn’t have to end up with the best large language model around to win the AI wars. It can be in the ballpark of the best or partner with the leaders to get what it needs. But it can’t fail at the part that is uniquely Apple: Making those features a pleasure to use, in the way we all expect from Apple. Right now, that’s where Apple is failing.

I get why Apple wanted to rush these things out. I disagree with it since it betrays a lack of confidence in the time it takes to thoughtful and polished software — but I get it. Yet we can only judge the products that have shipped, and what we can use right now is disappointing because it feels sloppy.

As Snell writes, Apple has a chance to move A.I. features beyond a blinking cursor in a chat bot — like a plain language command line. Very little of what is out today is a thoughtful implementation of these features. Cleanup in Photos is pretty good. Most of the other stuff — summaries of phone calls, Notification Summaries, Writing Tools, Memory Movies in Photos, and response suggestions in Mail and Messages — are more cumbersome than they are elegant.

James Temperton and Murad Hemmadi, the Logic:

Shopify’s general counsel said the company took down musician Kanye West’s online store because of the potential for fraud, not because it was selling a Nazi T-shirt, an internal staff announcement obtained by The Logic reveals.

In the message, which was posted on Shopify’s Slack Tuesday morning, general counsel Jess Hertz said the swastika-emblazoned T-shirt listed for sale by West was “a stunt” and “not a good faith attempt to make money.” This, Hertz added, “brought with it the real risk of fraud.” It was for this reason, she added, that the store had been closed.

Here is the thing: choosing not to support Nazis, even tacitly, is a pretty comfortable stance to take. You are in good company if you just say no to Nazi stuff. Nobody gets points for providing infrastructure to sell Nazi merch. What Shopify’s general counsel is indicating is that it would be happy to operate this store so long as orders for these shirts would actually be fulfilled.

Mark Gurman, Bloomberg:

Apple Inc. is renaming the Gulf of Mexico to Gulf of America on its Maps app, following an executive order signed by US President Donald Trump on his first day in office.

[…]

Apple is making the change Tuesday for customers in the US, but said it would soon roll out the shift for all users globally. Apple offers its Maps app on most of its devices, including the iPhone, iPad and Mac, and recently launched a web version to better compete with Google Maps.

In the United States, Google Maps labels it “Gulf of America”; in nearly every other region, it is shown as “Gulf of Mexico (Gulf of America)”. However, in Mexico, it is displayed as “Gulf of Mexico” only. So far, Bing and MapQuest have not updated their maps, as Gurman writes. Neither has Mapbox. OpenStreetMap is currently displaying “Gulf of Mexico” everywhere but in the U.S., but that has been a contentious choice.

All of these digital map distributors have choices. The best choice, given the circumstances, is to display “Gulf of America” only in the U.S. and, I suppose, in any other country pledging loyalty to this jingoistic change. Google’s decision seems like an acceptable alternative, and it is what I hope Gurman means in reporting the change will be seen by “all users globally”. (Update: I like Steve Jamieson’s suggestion to “localize it [in the U.S.] as ‘Gulf of America (Gulf of Mexico)'”. Then, everywhere else, reversing the order or dropping the “Gulf of America” part makes sense. But I fear a compromise is not what this president has in mind, and it will put any company that attempts this at risk of being singled out.)

I do not think it makes sense to be mad at mapmakers updating their labels to correctly reflect official naming changes, nor do I think it is helpful to file bug reports against the name. I do think people should continue mocking the stupidity of this renaming as it is a minor symptom of a nationalistic verve.

By the way, the big mountain in Alaska is still showing as “Denali”, even in U.S. Google Maps. The National Parks Service has fully removed its page on the history of the mountain’s name; it simply redirects to a page that makes no mention of it. Is it good when a country is desperately burying its history? Asking for a neighbour.

Update: Parker Molloy:

Yesterday, the Associated Press found itself locked out of an Oval Office press event for refusing to bow to presidential pressure to change its style guide. The reason? The AP won’t refer to the Gulf of Mexico exclusively as the “Gulf of America,” as newly-renamed by executive order.

This may seem like a relatively minor dispute on the surface. After all, what’s in a name? But that’s exactly what makes this such a perfect example of how authoritarianism creeps into our lives — it starts with something that might feel insignificant before snowballing into something much worse.

This is far more troubling.

Kate Knibbs, Wired:

In 2020, the media and technology conglomerate filed an unprecedented AI copyright lawsuit against the legal AI startup Ross Intelligence. In the complaint, Thomson Reuters claimed the AI firm reproduced materials from its legal research firm Westlaw. Today, a judge ruled in Thomson Reuters’ favor, finding that the company’s copyright was indeed infringed by Ross Intelligence’s actions.

“None of Ross’s possible defenses holds water. I reject them all,” wrote US District Court of Delaware judge Stephanos Bibas, in a summary judgement.

I am still unsure copyright law or, for that matter, robots.txt are the best tools for creators to control A.I. training, but this ruling sure seems to complicate the fair use justification upon which the entire field is currently based.

Last week, the Washington Post broke the news that the U.K. government is demanding access to iCloud accounts with Advanced Data Protection enabled. Joseph Menn, the Post:

Security officials in the United Kingdom have demanded that Apple create a back door allowing them to retrieve all the content any Apple user worldwide has uploaded to the cloud, people familiar with the matter told The Washington Post.

The British government’s undisclosed order, issued last month, requires blanket capability to view fully encrypted material, not merely assistance in cracking a specific account, and has no known precedent in major democracies. […]

This phrasing is, it turns out, somewhat ambiguous, as Myke Hurley points out in the latest episode of “Upgrade”, starting at about 39:15; this transcript is adapted from David Smith’s:

The BBC is the only outlet that, from what I can see, has done their own reporting on this. I’ve been reading a bunch of them, and everybody’s reporting the same thing. The BBC’s reporting is different. They are saying that the U.K. wants to have access to the data in Advanced Data Protection if it was needed in the same way that a law enforcement agency can request iCloud data from anyone where needed.

Zoe Kleinman, in the BBC News article Hurley references:

It’s also important to note that the government notice does not mean the authorities are suddenly going to start combing through everybody’s data.

It is believed that the government would want to access this data if there were a risk to national security – in other words, it would be targeting an individual, rather than using it for mass surveillance.

Authorities would still have to follow a legal process, have a good reason and request permission for a specific account in order to access data – just as they do now with unencrypted data.

A small point of correction to Hurley: the Financial Times story also relies on its own reporting. The Times plays it down the middle and without reference to either mass surveillance nor targeted unlocking.

Reading all three stories is actually a good exercise in interpreting what each outlet’s sources disclosed and was deemed important. Yet the varying interpretations strike me as a distinction without much difference. Hurley is likely correct in understanding the BBC story as more accurate, but to comply with those demands is to create the “blanket capability” necessarily. What I believe to be the case — reading between the lines and without a copy of the technical capability notice — is that the U.K. government is asking Apple to create a back door in the Advanced Data Protection process and then, if it has a warrant for one of those accounts, it can ask Apple to decrypt this data. This is both technically a “blanket capability” and still, in policy, individually targeted.

Regardless of specifics, this demand — as noted by both Hurley and co-host Jason Snell — is still very bad. It applies to global data demands, meaning Apple cannot simply turn off Advanced Data Protection for U.K. users, and there is a narrow path by which Apple may dispute it.

Mike Masnick, Techdirt:

The UK government’s approach here is particularly insidious. While Apple can appeal the order, their appeal rights are bizarrely limited: They can only argue about the cost of implementing the backdoor, not the catastrophic privacy and security implications for billions of users worldwide. This reveals the UK government’s complete indifference to the fundamental right to privacy.

The best case scenario is for the U.K. government to drop this demand. But these demands for encrypted data will keep coming. I expect the businesses I entrust with my data — like Apple and Backblaze — to stand by their end-to-end encryption promises. In this case, however, I am not sure what that looks like. It is hard to imagine arguing anything is too costly for one of the richest companies in the world.

Even though a 2023 class action suit filed by authors against Meta has been shaky so far, some of the details in what is left of the suit are stunning. Apparently, Meta downloaded a hundred terabytes of pirated books, according to documents recently unsealed.

Ashley Belanger, Ars Technica:

Supposedly, Meta tried to conceal the seeding by not using Facebook servers while downloading the dataset to “avoid” the “risk” of anyone “tracing back the seeder/downloader” from Facebook servers, an internal message from Meta researcher Frank Zhang said, while describing the work as in “stealth mode.” Meta also allegedly modified settings “so that the smallest amount of seeding possible could occur,” a Meta executive in charge of project management, Michael Clark, said in a deposition.

Now that new information has come to light, authors claim that Meta staff involved in the decision to torrent LibGen must be deposed again, because allegedly the new facts “contradict prior deposition testimony.”

Mark Zuckerberg, for example, claimed to have no involvement in decisions to use LibGen to train AI models. But unredacted messages show the “decision to use LibGen occurred” after “a prior escalation to MZ,” authors alleged.

It should surprise nobody that A.I. is trained on illicit material. Even if you believe A.I. training through bulk web scraping is a perfectly legitimate expression of free use, it is obviously going to run across things which are posted illegally. There are entire blockbuster movies on video platforms; photos and books get reshared without permission constantly.

If Meta or any other A.I. company had bothered to license this data from its copyright holders, it would be less likely to ingest pirated material. That would, of course, be expensive and slow. But Meta, as of writing, posted the world’s seventh highest earnings in its 2024 fiscal year: over $71 billion. I think it can afford to pay for the data it harvests.

Patty Winsa, Toronto Star:

The federal government has reversed its advertising boycott of Meta spending nearly $300,000 for campaigns on the company’s Facebook and Instagram social media platforms.

The reversal comes despite Meta’s continued ban on posting news from Canadian media sites.

This was retaliation for Meta’s restriction on news links in Canada which was, itself, a response to link tax legislation. But what a time to resume spending on a platform publicly and loudly aligning itself with a government that really does want to take over our country. It sure sucks that one of the most effective ways for the Canadian government to advertise to Canadians is necessarily through the U.S. duopoly.

Jeff Johnson:

To use Kagi as your default search engine in Safari, you have to install Kagi’s Safari extension.

So I installed the extension and entered a search in the Safari address bar. Note below how Safari says “Search Google” and “Google Search”, even though I’m supposed to be using Kagi.

[…]

Why does this happen? It turns out that Safari has no extension API to set a new search engine. The workaround for the lack of an API is a kind of hack: Safari extensions instead use the webNavigation onBeforeNavigate API to detect a connection to your default search engine, and then they redirect to your custom search engine using the tabs update() API. This technique is not unique to the Kagi extension. Other Safari extensions such as xSearch must do the same thing, because there’s no better way.

Michael Tsai:

Even though Chrome is made by Google, it lets you pick another search engine. Even though Edge is made by Microsoft, it doesn’t lock you into Bing, and you can add any search URL template that you want. Apple is not encumbered with its own search engine to push, yet it seems to be constrained by its desire for revenue sharing, so Safari users get stuck with fewer choices that are arguably lower quality and less private.

One other possibility is that Apple’s nominal desire for simplicity in preferences led to the company ignoring requests for an arguably niche feature like a custom search engine. Yet Safari preferences are complex and messy in other ways, and the company has — thankfully — retained legacy features like user stylesheets. Even if revenue sharing discouraged Apple from developing this feature, how many people are actually going to set a custom search engine, and would they have a meaningful impact on its beloved Google revenue stream? My guesses: very few, and I doubt it. Yet here we are, over twenty years after Safari’s launch, and we can generously choose between five search engines, of which three — Bing, DuckDuckGo, and Yahoo — are dependent on the same index.

Apple’s reluctance to add this feature to Safari is one of the main reasons I am so thankful for DuckDuckGo’s bang operations, of which there are hundreds just for other search engines. It is not identical to configuring a custom search engine — a query is still being passed through DuckDuckGo before being sent to the third-party engine — but it is frequently useful.

Want to experience twice as fast load times in Safari on your iPhone, iPad and Mac?

Then download Magic Lasso Adblock — the ad blocker designed for you.

Magic Lasso Adblock: browse 2.0x faster

As an efficient, high performance, and native Safari ad blocker, Magic Lasso blocks all intrusive ads, trackers, and annoyances – delivering a faster, cleaner, and more secure web browsing experience.

By cutting down on ads and trackers, common news websites load 2× faster and browsing uses less data while saving energy and battery life.

Rely on Magic Lasso Adblock to:

  • Improve your privacy and security by removing ad trackers

  • Block all YouTube ads, including pre-roll video ads

  • Block annoying cookie notices and privacy prompts

  • Double battery life during heavy web browsing

  • Lower data usage when on the go

With over 5,000 five star reviews; it’s simply the best ad blocker for your iPhone, iPad. and Mac.

And unlike some other ad blockers, Magic Lasso Adblock respects your privacy, doesn’t accept payment from advertisers, and is 100% supported by its community of users.

So, join over 350,000 users and download Magic Lasso Adblock today.

One of the odder Apple rumours remaining unresolved is its robotics project.

Mark Gurman, Bloomberg, in April:

Engineers at Apple have been exploring a mobile robot that can follow users around their homes, said the people, who asked not to be identified because the skunk-works project is private. The iPhone maker also has developed an advanced table-top home device that uses robotics to move a display around, they said.

On first glance, these ideas are weird, right? I can see the appeal of things like these, especially for people with disabilities or who are older. But they do not really fit my expectations of a typical Apple product, which are often designed for mass markets, and to recede into a lived environment instead of being so conspicuous. Yet Gurman followed up in August with news this is something the company is actually interested in.

Then last month, on its Machine Learning Research blog, Apple published a post describing “ELEGNT: Expressive and Functional Movement Design for Non-Anthropomorphic Robot”, and a companion paper that helps explain the forced acronym. Embedded in the post is a video that, indeed, shows a table-mounted lamp that responds to a user’s gestures. It is really quite something.

This is nominally research about making a robot’s movements less — uh — robotic. The result is a lamp that more than one publication has compared to the charming Pixar intro. It is very cool — but it is still very weird. Apple almost never shows works-in-progress, and what is posted to its research blog does not necessarily correlate to real-world products. Also, I am not accustomed to this much whimsy in anything Apple has released for at least a decade. It is refreshing.