-
Notifications
You must be signed in to change notification settings - Fork 236
docs(trustless-gateway): Add p2p usage section #520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🚀 Build Preview on IPFS ready
|
| Gateways serving data to non-LAN peers SHOULD support HTTPS and HTTP/2 or greater. | ||
| Similarly, it is RECOMMENDED that clients restrict to HTTPS and HTTP/2 or greater. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Technically #519 asks for "why is http/2 recommended" rather than just documenting that it's the case. IIUC generally we don't specify rationale within the spec document, but if it makes sense we can.
As I was quoted in the above issue my understanding for how we ended up with http/2 as required within boxo (and recommended in general) is:
- HTTP/1.1 does not have multiplexing which means that if we clients want to send many downloading requests (e.g. for various resources, to handle parallel block requests, optimistic queries to avoid having to round-trip through the content routing system, etc.) this will be quite painful
- Browsers in particular have a limited number of concurrent requests they can make at a time with http 1.1 to a given origin (6-8 iirc). This means that supporting http/1.1 would mean users would sometimes see reasonable performance but frequently get pretty bad performance
- The result is it seemed better to not support http/1.1 at all then put users in the patchwork "but it works on my small test, oh well I guess ipfs is just slow" regime
Note: "Recommended" is present in the spec because sometimes the developer understands the caveats present and decides to support HTTP/1.1 anyway. For example, within browsers there's no control over which HTTP versions are used so the client will support them all anyway even if a server using HTTP/1.1 ultimately performs poorly. Similarly, in LAN environments getting a TLS certificate setup with which to use HTTPS may be painful, h2c may not be easily accessible across platforms / languages, and performance criteria are more controllable which makes the downsides more manageable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this. Rationale sgtm, but should be in the document imo: I've pushed b155d57 which expands the P2P usage section with detailed rationale for HTTP/2 and TLS requirements. Since this is in "Appendix: Notes for implementers", I think it is fine to provide extra context (as :::note) explaining WHY these are SHOULD requirements:
- HTTP/2: explain head-of-line blocking, connection overhead, and impact on DAG fetching with many block requests
- TLS: clarify privacy vs integrity (multihash provides integrity, TLS provides confidentiality and is required for HTTP/2 in browsers)
- Add security considerations for both gateway operators and clients
- Include practical defaults (30s timeout, 2MiB blocks)
- Reference RFC 9113 for HTTP/2 specifications
If not for humans, this "WHY" will be even more important for LLMs that will generate client/server code for this spec.
@aschmahmann @BigLep thoughts?
Clarify caveats around utilizing trustless gateways within p2p networks
ab86146 to
f7daba8
Compare
…nale Expand the P2P usage section with detailed rationale for HTTP/2 and TLS requirements. Since this is in "Appendix: Notes for implementers", we provide extra context explaining why these are SHOULD requirements: - HTTP/2: explain head-of-line blocking, connection overhead, and impact on DAG fetching with many block requests - TLS: clarify privacy vs integrity (multihash provides integrity, TLS provides confidentiality and is required for HTTP/2 in browsers) - Add security considerations for both gateway operators and clients - Include practical defaults (30s timeout, 2MiB blocks) - Reference RFC 9113 for HTTP/2 specifications Addresses #519 request to document why HTTP/2 is recommended rather than just stating the requirement.
|
|
||
| Clients SHOULD NOT download unbounded amounts of data before being able to validate that data. | ||
|
|
||
| Clients SHOULD limit the maximum block size to 2MiB. This value aligns with the maximum block size used in UnixFS chunking and provides a reasonable balance between transfer efficiency and resource constraints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about something like this?
| Clients SHOULD limit the maximum block size to 2MiB. This value aligns with the maximum block size used in UnixFS chunking and provides a reasonable balance between transfer efficiency and resource constraints. | |
| Clients SHOULD limit the maximum block size to 2MiB. This value aligns with the maximum block size used in Bitswap, and throughout much of the ecosystem. |
UnixFS really has nothing to do with this, the 2MiB isn't in the spec (implementations like kubo won't even let you get that high when creating a UnixFS DAG). This is just where much of the ecosystem has set their limits so that individual storage providers, pinning services, etc. don't ingest blocks they can't serve back to clients because either the clients will reject the blocks as too big or the protocols won't support it. See the similar comment in the Bitswap spec
Lines 67 to 71 in 110bf46
| ## Block Sizes | |
| Bitswap implementations MUST support sending and receiving individual blocks of | |
| sizes less than or equal to 2MiB. Handling blocks larger than 2MiB is not recommended | |
| so as to keep compatibility with implementations which only support up to 2MiB. |
If you want we can use language closer to this
Additionally, should the "This value aligns with the..." section move into the note below?
|
|
||
| :::note | ||
|
|
||
| Blocks larger than 2MiB can cause memory pressure on resource-constrained clients and increase the window for incomplete transfers. Since blocks must be validated as a unit, smaller blocks allow for more granular verification and easier retries on failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As alluded to above there are a whole host of issues beyond just these, some of which are bigger deals. Notably, working with larger blocks is in general, unfortunately, not yet ecosystem-safe. If you don't know what you're doing (i.e. you fall under the category of overriding the RECOMMEND / SHOULD) you're likely to cause yourself problems by virtue of building tooling that doesn't work with much else in the ecosystem.
Also, the DoS risks, and latency / excessive bandwidth consumption tradeoffs generally become more difficult.
| Trustless Gateways serve two primary deployment models: | ||
|
|
||
| 1. **Verifiable bridges**: Gateways that provide trustless access from HTTP clients into IPFS networks, where the gateway operator is distinct from content providers | ||
| 2. **P2P retrieval endpoints**: Gateways embedded within P2P networks where they serve as HTTP interfaces to peer-operated block stores |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I get we're stuck with this phrasing to some extent, but IMO if we can shy away from referring to implementations of the non-recursive trustless IPFS HTTP gateway API used in a p2p network as gateways that'd be great. They're not really gateways into the network as much as part of it 😅.
|
|
||
| To work around this limitation, clients must open multiple parallel TCP connections to achieve concurrent requests. However, each additional connection incurs significant overhead: TCP handshake latency, memory buffers, bandwidth competition, and increased implementation complexity. Browsers limit concurrent connections per origin (typically 6-8) to manage these costs, but this limitation affects all HTTP/1.1 clients, not just browsers, as the overhead of maintaining many connections becomes prohibitive. | ||
|
|
||
| When fetching a DAG that requires many block requests, HTTP/1.1's lack of multiplexing creates a critical bottleneck. Clients face a difficult trade-off: either serialize requests (severely limiting throughput) or maintain many parallel connections (incurring substantial overhead). Users may experience acceptable performance with small test cases, but real-world IPFS content with deep DAG structures will encounter significant slowdowns. HTTP/2's stream multiplexing (:cite[rfc9113]) eliminates this bottleneck by allowing many concurrent requests over a single connection without head-of-line blocking at the application layer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true, but even if not relying on block requests (e.g. CARs with queries sufficient to describe what the client needs) there are other constraints within p2p networks that can cause issues here too (e.g. optimistic queries which also eat up requests).
|
|
||
| Trustless Gateways operating in P2P contexts SHOULD NOT recursively search for content. | ||
|
|
||
| In P2P networks, gateways typically serve as block stores for specific peers or content, rather than attempting to locate content across the entire network. Recursive content discovery is handled by the P2P layer (e.g., Amino DHT, IPFS routing), not by individual HTTP gateways. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recursive content discovery
This isn't quite right. Recursion implies it happens again (e.g. a gateway that returns data itself doing data lookups, or a routing endpoint itself doing routing lookups), the Amino DHT, etc. don't do that. Additionally, as this is sort of longer justification it should probably move into the note section as well,
Does this sentence need to exist given the note below seems to cover the same content anyway?
Clarify caveats around utilizing trustless gateways within p2p networks.
Closes #519
cc @lidel