[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce load of preview fetching on third-party servers #23662

Open
ddevault opened this issue Feb 17, 2023 · 25 comments
Open

Reduce load of preview fetching on third-party servers #23662

ddevault opened this issue Feb 17, 2023 · 25 comments
Labels
bug Something isn't working

Comments

@ddevault
Copy link
Contributor

Steps to reproduce the problem

Posting an image on Mastodon causes hundreds or thousands of federated servers to fetch the preview details all at once, causing a high load for the target server and effectively creating the by now well-known "Mastodon DDoS" effect. For static websites this may not be much of an issue but it routinely brings down larger pages or those which have dynamic server-rendered content.

Expected behaviour

No DDoS

Actual behaviour

DDoS

Detailed description

The discussion at #4486 has been going on for six years and is quite a mess. I'm opening this ticket to start fresh and collect the information scattered throughout the thread into one place so that the discussion doesn't have to keep going in circles.

Rationale

It is the responsibility of software like Mastodon to be a good neighbor on the internet. DDoSing others is not being a good neighbor! It's important to figure out how to prevent this issue from occurring.

IMPORTANT: Let's make this discussion better than the last one. For Mastodon devs: don't blame the victims. It's not the website's fault that Mastodon is DDoSing them. For server operators: Mastodon is a volunteer-run free software project. Be mindful of that.

Present mitigations

Currently a random jitter is added of between 0 and 60 seconds after a federated server is made aware of a post which includes a link, before it fetches the preview details. This does not seem to be sufficient to prevent the DDoS effect from occurring.

Suggested mitigations

The discussion has focused on two main suggestions for a fix.

Federating previews

The original poster's instance can fetch the preview details and attach them to the message, federating the preview details without requiring other servers to fetch it themselves. Criticism of this solution is mainly focused around the fact that Mastodon is a zero-trust environment, so instance A cannot trust instance B's word that a preview accurately represents the URL.

Because the original post is always fetched when a post is federated, the trust space can be reduced to the origin server alone; intermediates need not be trusted. Opinions on the depth of this problem have ranged from "it's no different from posting an image" to "we absolutely cannot trust anyone ever for any reason".

Answers proposed to the objections of trust have included random sampling wherein the preview is fetched 1 in N times, and if it's found to be inconsistent with the federated preview, some action can be taken, such as setting a flag on that instance which causes future (and past?) previews to be fetched unconditionally from that server, automatically making a report to the instance admins of suspected foul play, or federating the flag so that other servers can force a sample from that instance when foul play is suggested.

Any of these changes would likely involve a slow roll-out across the fediverse. Sometimes link previews might not work for older clients as a consequence. My take: I believe this is quite acceptable, it's a small price to pay for correcting this behavior. User experience does not outweigh "don't DDoS people". Furthermore, a slow roll-out will naturally imply that the problem does not get fixed overnight, but rather that the behavior corrects gradually over time as the fix is rolled out across the fediverse -- not an issue imo.

Reducing load

No federated previews, but instances don't immediately fetch the preview. Most to least effective mitigations along this line of thought:

  1. Add "show preview" button or so to the UI (or otherwise detect when the user is "interacting" with a post) and fetch the preview only then. Note that the preview only needs to be fetched once, if one user on an instance interacts with a post then the preview becomes available for all users without further interaction.
  2. Fetch robots.txt (and cache it) for that remote URL. Link previews should respect robots.txt #21738
  3. Lazily fetch the preview only when it's shown in the UI. For smaller instances, this would reduce the load if no one is actively using the server (e.g. idling Mastodon in an unfocused tab). It would also likely reduce the need to fetch previews for posts that appear on the federated timeline.
  4. Policy-based approach, e.g. if a post comes in because a user follows the poster, fetch the preview; otherwise don't.
  5. Increase the random jitter for fetching previews from 60 seconds to some higher number.

Some combination of these mitigations is also possible, for instance the jitter could be increased to five minutes, but done immediately if the post shows up in the UI.

Specifications

n/a, all versions affected

@ddevault ddevault added the bug Something isn't working label Feb 17, 2023
@ddevault
Copy link
Contributor Author

For my part, I think the mitigation which best strikes a balance of non-controversial, effective, and easy to implement, is robots.txt. It allows server operators to have fine-grained control so that they can indicate which routes are expensive to render. For my servers, I have already had expensive routes excluded in robots.txt for a long time to keep crawlers away; this has been a well-known mitigation for automated requests to expensive routes for some time now.

I also think that someone within Mastodon should be willing to take responsibility for implementing these mitigations. I am generally sympathetic to the incentives of FOSS projects and the nature of the volunteer labor pool, but in the case where software is causing disruptions to other services it's important for the maintainers of that software to accept responsibility for it -- expecting the victims of a DDoS to put up the labor to mitigate the DDoS is just rubbing salt into the wound. If no one is available to do the work, fetching previews should simply be disabled entirely until such a time as a mitigation can be implemented.

@flashymittens
Copy link

As a random passerby I can say, that a server, the author of a message belongs to, should be responsible to retrieve this information (subject to a timeout) and only then propagate the message along with this data. Other servers should not retrieve the data themselves.

You have to trust it, sure, buy you already have to trust it's not altering messages… And you might as well block bad actors on your server when that happens.

@renchap
Copy link
Sponsor Member
renchap commented Feb 17, 2023

I wrote a document on the topic a few months ago, which I think is quite extensive on the subject: https://gist.github.com/renchap/3ae0df45b7b4534f98a8055d91d52186

I tried to highlight the challenges related to this, as well as the possible implementations and their drawbacks.

@jimrhiz
Copy link
jimrhiz commented Feb 18, 2023

random sampling wherein the preview is fetched 1 in N times, and if it's found to be inconsistent with the federated preview, some action can be taken

Perhaps a small point, but what happens if the original web page linked to is updated in the meantime so that the preview changes (as sometimes happens with news articles for example)?

@ThisIsMissEm
Copy link
Contributor
ThisIsMissEm commented Feb 19, 2023

Another path that could be taken here is to have a link preview service, which can cache the preview. That'd mean rather than every mastodon server going to the origin server to prepare a preview, the mastodon servers delegate that to a trusted preview service.

E.g., a lot of large companies use a service like https://embed.ly for generating previews. Perhaps there's room for a setup based on low-cost infrastructure (e.g., Cloudflare workers or similar on-demand compute environments) that could be deployed?

That way a server admin could choose to trust one of the preview services, instead of having their instance do the fetching (of course, you could always opt out, with the caveat that your service would amplify traffic potentially).

Logic here is that there'd be less preview services deployed than mastodon servers, and that'd prevent the thundering herd. Trust would be between mastodon server admins and the preview services, a preview service that doesn't provide accurate previews would quickly find their reputation trashed & instances would switch to a different provider.

Edit: this is Solution 5 in @renchap's gist

@ClearlyClaire
Copy link
Contributor

A few more thoughts:

  1. the trust issue is particularly important because currently, all posts sharing the same link will reuse the same preview, and that preview is what's used in the “News” tab of the “Explore” section (trending links), so poisoning can have a pretty wide effect
  2. not every link is worth a preview, so even without trust, I think it would be an improvement for the author to signal whether a preview card is expected or not, so that if the author decided not to include a preview, or the preview could not be fetched, other servers would not attempt fetching it

To lessen the impact of an untrustworthy server, something that might make sense is pre-fetch the link while the author is writing their posts, offer them a preview, and have them select it. Then, somehow serialize that preview when federating it. On the receiving end, if you receive a post with a preview:

  • if the link is already known and its preview trusted, discard the attached preview and use the already-stored one
  • if the link is not known, store the attached preview as non-trusted
  • if the link is already known but its preview is not trusted, compare it with the attached preview, if it's the same, re-use it, if it's different, schedule the preview to be fetched

Whenever a local user authors a post with a link to a known but non-trusted preview, fetch the link yourself to get a trusted preview. Whenever a non-trusted preview is eligible for trending, fetch the link yourself and update the trusted preview.

Compared to the current state, this would require (possibly breaking—in that clients not implementing this change may not get previews to generate) changes in the REST API to chose which link to preview (if any at all) and S2S protocol changes to communicate that information. It would introduce an attack vector in which a remote user could generate a misleading preview for their own posts, but without introducing a cache poisoning issue. An attacker could still purposefully send two conflicting previews to a server to cause it to fetch the link, but it should greatly reduce the impact of normal usage.

@ryanfb
Copy link
ryanfb commented Jun 1, 2023

One option for @renchap's "Solution 6: Design and implement a protocol for websites to provide a signed preview" would be for Open Graph to define an "integrity" structured property on image/video/audio properties to allow for zero-trust redistribution of bandwidth-heavy media resources (similar to what's used used for Subresource Integrity). e.g.:

<meta property="og:image" content="https://ia.media-imdb.com/images/rock.jpg" />
<meta property="og:image:integrity" content="sha384-oqVuAfXRKap7fdgcCY5uykM6+R9GqQ8K/uxy9rx7HNQlGYl1kPzQho1wx4JwY8wC" />

Unfortunately, the Facebook & Google groups linked for discussion from the official Open Graph page seem to be derelict and defunct respectively, so I'm not sure of the best way to get this proposed and adopted.

Edit: Another option would be to use the HTTP Digest header for this purpose.

@jonaharagon
Copy link
Sponsor

I think it would be an improvement for the author to signal whether a preview card is expected or not, so that if the author decided not to include a preview, or the preview could not be fetched, other servers would not attempt fetching it

+1. Also, having this information and federating it would improve compatibility with federated news aggregator platforms like Kbin/Lemmy, which don't have a reliable way of extracting the "best link" from an incoming Mastodon post (LemmyNet/lemmy#2889 (comment)). If this information was added to a status as an explicitly attached Link, e.g.:

{
...
	"attachment": [{
		"href": "https://example.com",
		"type": "Link"
	}],
...
}

...this would be helpful in determining both whether a link preview should be generated or not, and which link the author intended to have shown as a preview (in the case of multiple links in a post).

(Presumably the preview property could also be added to an attached link for federating this other card preview information?)

@alexshpilkin
Copy link
alexshpilkin commented Jun 2, 2023

@ryanfb FWIW, in theory Open Graph uses a subset of RDFa, though which subset exactly was never clear to me (full RDFa is a bear to implement, duplicating body content in meta tags seems silly when RDFa was specifically made to reuse it, what real-world parsers support isn’t written down anywhere that I can see). And in theory, it’s an open system—as long as you use valid URIs you control and not ones under og: (aka https://ogp.me/ns#), nothing stops you from defining whatever relations you want, including “has a preview image with SRI hash of”, or perhaps you could get one registered with Schema.org or similar.

Whether there’s a good place to talk to people interested in such things, I don’t know—Facebook’s relationship with the wider Web has gotten a lot less cuddly since 2010. Perhaps those who once maintained the spec repo could be contacted?

@callionica
Copy link

Some questions for the group:

  1. Do all post authors want all their posts and all links in their posts to be displayed with link previews?

  2. Do post authors expect link previews to be visible to them when they author a post?

  3. If link previews are visible during authoring, do posters expect the link preview seen by readers of that post to be unchanged from what the author saw during posting?

  4. If link previews shown to readers are different than what the poster saw during authoring (either poster did not see a link preview or poster saw different preview than reader), does the poster expect this difference to be obvious to the reader of the post?

  5. Do posters want servers to have the ability to add/modify content displayed with their post without drawing a clear distinction between authored content and generated content?

  6. If a reader receives posts where the author hasn't included a link preview or has requested that link previews not be shown, should readers have the ability to override this and generate link previews automatically in bulk?

  7. If a poster, a reader, or a server admin believes that link previews are wasteful (of energy) or misleading (misrepresenting authoring intent or knowledge) or essential (for accessibility/understanding where a link goes) or important (for aesthetics or engagement), what controls should be in place for these competing views to be negotiated?

There are many more questions that could be asked, particularly from the perspectives of readers and admins, but I thought I'd start with these.

The point is that the discussions so far seem a little bit focused on the current implementation which seems based on an understanding of link previews as an automatic aid to readers whereas I believe link previews should primarily be driven by posters during authoring since this is the path to full trust in the content.

As a post creator, I want to see link previews, and I don't want them to change after I have posted: they should be authored content.

As other people on this thread have indicated, some of the answers to the technical questions are clearer when this is the starting point of the feature.

@dpk
Copy link
dpk commented Sep 5, 2023

Yet another site – this time a major bookmarking service – is considering blocking Mastodon because of the DDoS effect: https://twitter.com/pinboard/status/1698775481292832978

Please fix this.

@brendanjones
Copy link
brendanjones commented Oct 4, 2023

Some questions for the group:

1. Do all post authors want all their posts and all links in their posts to be displayed with link previews?

Moot. It should be up to the preference of the person viewing the post, and the client they use. Also, can you stop a client creating a link preview if it wants to?

2. Do post authors expect link previews to be visible to them when they author a post?

Expect? No, it depends on if the client I'm using has the capability. It's certainly a nice to have, though.

3. If link previews are visible during authoring, do posters expect the link preview seen by readers of that post to be unchanged from what the author saw during posting?

Yes.

4. If link previews shown to readers are different than what the poster saw during authoring (either poster did not see a link preview or poster saw different preview than reader), does the poster expect this difference to be obvious to the reader of the post?

Moot, because of (3).

  1. Do posters want servers to have the ability to add/modify content displayed with their post without drawing a clear distinction between authored content and generated content?

If by non-authored content you mean previews, then I think it's clear that link previews aren't part of the authored content. Right? Link previews are a thing across the web, they display what's on the linked page. They're not part of the post.

  1. If a reader receives posts where the author hasn't included a link preview or has requested that link previews not be shown, should readers have the ability to override this and generate link previews automatically in bulk?

See (1).

@renchap renchap added this to the 4.3.0 milestone Oct 8, 2023
@ddevault
Copy link
Contributor Author

After 6 years of no movement on Mastodon's DDoS-by-design, and a string of outages caused by people just tooting links to SourceHut, I have blacklisted Mastodon User-Agents across SourceHut's services.

@renchap
Copy link
Sponsor Member
renchap commented Oct 13, 2023

This is on our roadmap for the next version FYI, expect some news soon.

@ddevault
Copy link
Contributor Author

That is excellent news, will be keeping an eye out. Thanks!

@nemobis
Copy link
Contributor
nemobis commented Nov 25, 2023

I wrote a document on the topic a few months ago, which I think is quite extensive on the subject: https://gist.github.com/renchap/3ae0df45b7b4534f98a8055d91d52186

Solutions 2–6 are all variants of the same idea, a shared cache.

Solution 1 ("On-demand generation of previews") is a variant of the idea of just adding a delay. Further variants are simple to come up with and implement. For example, the most obvious solution to reduce the intensity of the load is to introduce a random delay where each instance would wait e.g. between 0 seconds and 30 minutes before doing exactly what it's doing now. (I'm assuming this is rather simple to implement with scheduled jobs but I might be wrong.)

It's possible to come up with other triggers and thresholds which would consider the instance's specific circumstances to reduce the number of "wasteful" preview generations (for example, on many single-user instances a preview may never be seen). It might be possible to collect statistics to measure the extent of the current waste and estimate the impact of the alternatives. (For example, instrument Mastodon by adding metrics for the hit rates of the previews, as you'd do for a cache.)

@gunchleoc
Copy link
Contributor

I run a small server and use relays to fill the federated timeline. Most of those posts are probably never displayed, so only fetching previews for posts for the other timelines (home, trending, lists, local) and replies to those posts would greatly reduce the load caused by servers like mine. For the federated timeline, fetch previews for the last 200 messages or so when a user requests its display.

@fabiscafe
Copy link

For the shared cache idea, the server would still need to check the original source. The cached version could be

  • outdated
  • in the wrong localization (based on the original server the cache comes from)
  • a completely different website (based on the original servers location)
  • manipulated
  • a cookie banner (preview cache was done in the EU)

How about a different take: Warm preview generation based on active users, cold grey cache distribution to less active instances.

What I mean with this:

  1. The source instance will generate the preview, as well as a grey "cache version" of this preview to share.
  2. Another instance receiving this posting will check if it does have many more¹ active users then the previous instance. If yes it will generate an own preview ASAP without even pulling the cached preview, if not it will show the grey cache version.
  3. On instances that now only display the grey cache version: If a certain % of user clicks on the link/preview the instance will also generate its own preview

¹ By many more, I think about a scale. For example an instance with 1 active user and an instance of 100 active users would be "the same range of active users". An instance with 101 would be considered 'many more' and so would generate its own preview.

With this, we would have a pretty limited first series of requests, as mostly bigger instances will do the first request. Other instances will have a preview ASAP, as a grey preview that will change to an actual preview as soon as someone clicks on it. Smaller instances also wont go down by cache requests, as only same sized and smaller will even pull the cache. We still face the problems of a distributed cache, but as the cache version is shown grey and will probably only last for a while, this should'nt be a huge problem, especially because only smaller instances will less active users will see them.

@Neustradamus
Copy link

It will be nice to have a solution for this problem...

@duaneking
Copy link
duaneking commented May 5, 2024

Based on the available data, In my humble opinion, the jiggle delay is simply a defect that was added to try to game the system and slow it down, and it should be fully removed as it distracts from the real issue, that of creating an effective caching solution and not re-rendering data that has not changed.

The core issue here seems to be that the generated data shouldn't be rendered more than once before being added to a cache with a long ttl, so that it doesn't need to be rendered again, and so requests for it can simply be served from that cache.

Any thoughts about a global cache that spans multiple tlds or domains needs to also consider using a strong HMAC and algorithm table that can be independently validated by 3rd parties using the named validation mechanism to protect against injection and enforce a chain of trust on the data content, so that cached records/data blobs that are fetched from it can be validated as really from the true origin via the hMac and shown to be what they claim to be, or marked as corrupted/edited/malicious At that point, simply adding the original contents last updated timestamp would allow you to create a distributed cache that would able to help allow the entire system to scale ( domain, resource url, hmac, byte size, timestamp ) Because your primary key would include the origins version data, the Hmac data for its identity, byte size, and the most recent timestamp. This would also make expiration of old records easier.

However, I don't believe that a distributed cache in this instance is really needed for the bulk of servers, and if it did exist, solving the initial issue of not generating the data multiple times on the local instance would allow that cache to be more effective long term because then it could simply consume those records, instead of forcing the server to recreate them.

The system shouldn't be hitting your database to get this data or generate that data blob more than once if it hasn't been edited anyway.

tldr; Wouldn't just adding a X second ttl based cache and then adding every new post to it so the fediverse will get the data from that cache be enough? Ensure that cache has the resources it needs based on the number of new posts on your server, And when you generate links that get pumped out to the fediverse, direct requests to the cache.

@jefft0
Copy link
jefft0 commented May 6, 2024

The AT protocol (Bluesky) uses IPFS to fetch content by hash from anywhere. In its foundation, IPFS is a distributed caching system.
https://atproto.com/specs/data-model

@p37307
Copy link
p37307 commented May 6, 2024

I run a small server and use relays to fill the federated timeline. Most of those posts are probably never displayed, so only fetching previews for posts for the other timelines (home, trending, lists, local) and replies to those posts would greatly reduce the load caused by servers like mine. For the federated timeline, fetch previews for the last 200 messages or so when a user requests its display.

I'd go farther than that. Don't fetch a preview unless a post in the federated timeline that doesn't meet your criteria has been opened. I would also add fetching of preview cards for saved hashtags, although, that should be included in the home timeline fetch already by default.

I'm fetching so many posts a day and will never see most of that on my self-hosted instance unless I do a search. I have come to rely on my instance being my first stop for searching, even over Google. Can't wait for the ability for query string searches so I can set it up in my browser as a default search provider. See Help Wanted: Use /search for search

@renchap renchap removed this from the 4.4.0 milestone Aug 14, 2024
@dpk
Copy link
dpk commented Aug 26, 2024

After being pushed back version after version, this was quietly removed from the roadmap last week.

This is very disappointing, to say the least. The thundering herd behaviour is obviously wrong.

@ThisIsMissEm
Copy link
Contributor

@dpk the event here is a bit misleading, all milestones were removed: https://github.com/mastodon/mastodon/milestones?state=open

@renchap
Copy link
Sponsor Member
renchap commented Aug 27, 2024

I removed the milestones are they were not indicative of our real roadmap.

We did more research on this topic in the last few weeks, and, if possible with the time we have, will try to move forward on this topic for 4.4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests