8000 docs(trustless-gateway): Add p2p usage section by aschmahmann · Pull Request #520 · ipfs/specs · GitHub
[go: up one dir, main page]

Skip to content

Conversation

@aschmahmann
Copy link
Contributor

Clarify caveats around utilizing trustless gateways within p2p networks.

Closes #519

cc @lidel

@aschmahmann aschmahmann requested a review from lidel as a code owner November 11, 2025 19:55
@github-actions
Copy link
github-actions bot commented Nov 11, 2025

🚀 Build Preview on IPFS ready

Comment on lines 556 to 557
Gateways serving data to non-LAN peers SHOULD support HTTPS and HTTP/2 or greater.
Similarly, it is RECOMMENDED that clients restrict to HTTPS and HTTP/2 or greater.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Technically #519 asks for "why is http/2 recommended" rather than just documenting that it's the case. IIUC generally we don't specify rationale within the spec document, but if it makes sense we can.

As I was quoted in the above issue my understanding for how we ended up with http/2 as required within boxo (and recommended in general) is:

  • HTTP/1.1 does not have multiplexing which means that if we clients want to send many downloading requests (e.g. for various resources, to handle parallel block requests, optimistic queries to avoid having to round-trip through the content routing system, etc.) this will be quite painful
  • Browsers in particular have a limited number of concurrent requests they can make at a time with http 1.1 to a given origin (6-8 iirc). This means that supporting http/1.1 would mean users would sometimes see reasonable performance but frequently get pretty bad performance
  • The result is it seemed better to not support http/1.1 at all then put users in the patchwork "but it works on my small test, oh well I guess ipfs is just slow" regime

Note: "Recommended" is present in the spec because sometimes the developer understands the caveats present and decides to support HTTP/1.1 anyway. For example, within browsers there's no control over which HTTP versions are used so the client will support them all anyway even if a server using HTTP/1.1 ultimately performs poorly. Similarly, in LAN environments getting a TLS certificate setup with which to use HTTPS may be painful, h2c may not be easily accessible across platforms / languages, and performance criteria are more controllable which makes the downsides more manageable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this. Rationale sgtm, but should be in the document imo: I've pushed b155d57 which expands the P2P usage section with detailed rationale for HTTP/2 and TLS requirements. Since this is in "Appendix: Notes for implementers", I think it is fine to provide extra context (as :::note) explaining WHY these are SHOULD requirements:

  • HTTP/2: explain head-of-line blocking, connection overhead, and impact on DAG fetching with many block requests
  • TLS: clarify privacy vs integrity (multihash provides integrity, TLS provides confidentiality and is required for HTTP/2 in browsers)
  • Add security considerations for both gateway operators and clients
  • Include practical defaults (30s timeout, 2MiB blocks)
  • Reference RFC 9113 for HTTP/2 specifications

If not for humans, this "WHY" will be even more important for LLMs that will generate client/server code for this spec.

@aschmahmann @BigLep thoughts?

Clarify caveats around utilizing trustless gateways within p2p networks
@aschmahmann aschmahmann force-pushed the feat/trustless-gateway-p2p-notes branch from ab86146 to f7daba8 Compare November 11, 2025 20:10
…nale

Expand the P2P usage section with detailed rationale for HTTP/2 and TLS
requirements. Since this is in "Appendix: Notes for implementers", we
provide extra context explaining why these are SHOULD requirements:

- HTTP/2: explain head-of-line blocking, connection overhead, and impact
  on DAG fetching with many block requests
- TLS: clarify privacy vs integrity (multihash provides integrity, TLS
  provides confidentiality and is required for HTTP/2 in browsers)
- Add security considerations for both gateway operators and clients
- Include practical defaults (30s timeout, 2MiB blocks)
- Reference RFC 9113 for HTTP/2 specifications

Addresses #519 request to document why HTTP/2 is recommended rather than
just stating the requirement.

Clients SHOULD NOT download unbounded amounts of data before being able to validate that data.

Clients SHOULD limit the maximum block size to 2MiB. This value aligns with the maximum block size used in UnixFS chunking and provides a reasonable balance between transfer efficiency and resource constraints.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something like this?

Suggested change
Clients SHOULD limit the maximum block size to 2MiB. This value aligns with the maximum block size used in UnixFS chunking and provides a reasonable balance between transfer efficiency and resource constraints.
Clients SHOULD limit the maximum block size to 2MiB. This value aligns with the maximum block size used in Bitswap, and throughout much of the ecosystem.

UnixFS really has nothing to do with this, the 2MiB isn't in the spec (implementations like kubo won't even let you get that high when creating a UnixFS DAG). This is just where much of the ecosystem has set their limits so that individual storage providers, pinning services, etc. don't ingest blocks they can't serve back to clients because either the clients will reject the blocks as too big or the protocols won't support it. See the similar comment in the Bitswap spec

## Block Sizes
Bitswap implementations MUST support sending and receiving individual blocks of
sizes less than or equal to 2MiB. Handling blocks larger than 2MiB is not recommended
so as to keep compatibility with implementations which only support up to 2MiB.

If you want we can use language closer to this

Additionally, should the "This value aligns with the..." section move into the note below?


:::note

Blocks larger than 2MiB can cause memory pressure on resource-constrained clients and increase the window for incomplete transfers. Since blocks must be validated as a unit, smaller blocks allow for more granular verification and easier retries on failure.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As alluded to above there are a whole host of issues beyond just these, some of which are bigger deals. Notably, working with larger blocks is in general, unfortunately, not yet ecosystem-safe. If you don't know what you're doing (i.e. you fall under the category of overriding the RECOMMEND / SHOULD) you're likely to cause yourself problems by virtue of building tooling that doesn't work with much else in the ecosystem.

Also, the DoS risks, and latency / excessive bandwidth consumption tradeoffs generally become more difficult.

Trustless Gateways serve two primary deployment models:

1. **Verifiable bridges**: Gateways that provide trustless access from HTTP clients into IPFS networks, where the gateway operator is distinct from content providers
2. **P2P retrieval endpoints**: Gateways embedded within P2P networks where they serve as HTTP interfaces to peer-operated block stores
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I get we're stuck with this phrasing to some extent, but IMO if we can shy away from referring to implementations of the non-recursive trustless IPFS HTTP gateway API used in a p2p network as gateways that'd be great. They're not really gateways into the network as much as part of it 😅.


To work around this limitation, clients must open multiple parallel TCP connections to achieve concurrent requests. However, each additional connection incurs significant overhead: TCP handshake latency, memory buffers, bandwidth competition, and increased implementation complexity. Browsers limit concurrent connections per origin (typically 6-8) to manage these costs, but this limitation affects all HTTP/1.1 clients, not just browsers, as the overhead of maintaining many connections becomes prohibitive.

When fetching a DAG that requires many block requests, HTTP/1.1's lack of multiplexing creates a critical bottleneck. Clients face a difficult trade-off: either serialize requests (severely limiting throughput) or maintain many parallel connections (incurring substantial overhead). Users may experience acceptable performance with small test cases, but real-world IPFS content with deep DAG structures will encounter significant slowdowns. HTTP/2's stream multiplexing (:cite[rfc9113]) eliminates this bottleneck by allowing many concurrent requests over a single connection without head-of-line blocking at the application layer.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true, but even if not relying on block requests (e.g. CARs with queries sufficient to describe what the client needs) there are other constraints within p2p networks that can cause issues here too (e.g. optimistic queries which also eat up requests).


Trustless Gateways operating in P2P contexts SHOULD NOT recursively search for content.

In P2P networks, gateways typically serve as block stores for specific peers or content, rather than attempting to locate content across the entire network. Recursive content discovery is handled by the P2P layer (e.g., Amino DHT, IPFS routing), not by individual HTTP gateways.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recursive content discovery

This isn't quite right. Recursion implies it happens again (e.g. a gateway that returns data itself doing data lookups, or a routing endpoint itself doing routing lookups), the Amino DHT, etc. don't do that. Additionally, as this is sort of longer justification it should probably move into the note section as well,

Does this sentence need to exist given the note below seems to cover the same content anyway?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gateway spec: add a "p2p" section

3 participants

0