10000 [HttpKernel] Be smarter when merging cache directives in HttpCache/ResponseCacheStrategy by nicolas-grekas · Pull Request #26352 · symfony/symfony · GitHub
[go: up one dir, main page]

Skip to content

[HttpKernel] Be smarter when merging cache directives in HttpCache/ResponseCacheStrategy #26352

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

nicolas-grekas
Copy link
Member
Q A
Branch? 3.4
Bug fix? yes
New feature? no
BC breaks? no
Deprecations? no
Tests pass? yes
Fixed tickets #26245
License MIT
Doc PR -

Allows a better default behavior now that we fixed session handling and thus made cache-control stricter.

@nicolas-grekas nicolas-grekas force-pushed the http-cache-merge-directives branch from fef1a4c to a243bca Compare March 1, 2018 12:59

if (!$this->cacheable) {
$response->headers->set('Cache-Control', 'no-cache, must-revalidate');
$response->headers->set('Cache-Control', implode(', ', array_keys(array_filter($this->cacheDirectives))));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the max age allowing client-side caching ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's say we don't care, we don't have to be that smart?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, allowing private caching without telling the browser how long it can cache it is broken: it won't allow caching in well-behaving browsers (and may allow infinite caching if a crappy browser treats the absence of age on the response as an invitation to cache forever instead of for 0s).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

client side caching can have 2 modes: expiration-based (caching for a timestamp) and validation-based (asking whether the last-modified or the etag they have is still the right one).
When using ESI or SSI (which this code is about), we don't send an Etag or LastModified (which this class is handling) because we cannot build them properly when the final response is built based on embedded responses. So this only leaves the expiration-based caching client-side. If you don't care about it, close your PR as it brings nothing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm about to create unit tests to validate that case so we can implement the required checks.

we don't send an Etag or LastModified (which this class is handling) because we cannot build them properly when the final response is built based on embedded responses.

Actually we could do that by generating an ETag based on the complete response. Something like a hash of the whole content. But I think that's for another PR if we want to add that at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not helpful, as we could still not use Etag validation of the different sub-requests as we would not have Etags for each of them (unless we build the final Etag as a reversible combination of info about each response), and so we could not perform Etag validation in the controllers generating each subresponse.

@nicolas-grekas
Copy link
Member Author
nicolas-grekas commented Mar 1, 2018

Basically, @stof is right (as usual ;) ):

  • ETag-based validation cannot work, unless we store the Etags of all fragments somewhere next to the master request
  • expiration-based validation needs a duration, otherwise sending a "private" instead of "no-cache" is useless

I'm going to close here, as I don't have the time nor the use case to finish the PR.
@aschempp would you like to take over? @Toflar maybe?

@nicolas-grekas nicolas-grekas deleted the http-cache-merge-directives branch March 1, 2018 15:14
@Toflar
Copy link
Contributor
Toflar commented Mar 1, 2018

We will, because we need it. That's why I asked for the preferred way to do it in the original bug report. Nobody expected you're going to do the work 😄

@aschempp
Copy link 8000
Contributor
aschempp commented Mar 2, 2018

ETag-based validation cannot work, unless we store the Etags of all fragments somewhere next to the master request

My idea was to generate the ETag based on the generated content. It would not save the server from generating the whole response (which potentially is cached), but it would prevent the transmission of the data to the client. Not an ideal solution I agree.

@nicolas-grekas
Copy link
Member Author
nicolas-grekas commented Mar 2, 2018

@aschempp this wouldn't really help in practice, not sure we want to maintain a non-solution :)

fabpot added a commit that referenced this pull request Feb 25, 2019
…he/ResponseCacheStrategy (aschempp)

This PR was squashed before being merged into the 3.4 branch (closes #26532).

Discussion
----------

[HttpKernel] Correctly merging cache directives in HttpCache/ResponseCacheStrategy

| Q             | A
| ------------- | ---
| Branch?       | 3.4
| Bug fix?      | yes
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | #26245, #26352, #28872
| License       | MIT
| Doc PR        | -

This PR is a first draft to fix the incorrect merging of private and other cache-related headers that are not meant for the shared cache but the browser (see mentioned issues).

The existing implementation of `HttpFoundation\Response` is very much tailored to the `HttpCache`, for example `isCacheable` returns `false` if the response is `private`, which is not true for a browser cache. That is why my implementation does not longer use much of the response methods. They are however still used by the `HttpCache` and we should keep them as-is. FYI, the `ResponseCacheStrategy` does **not** affect the stored data of `HttpCache` but is only applied to the result of multiple merged subrequests/ESI responses.

I did read up a lot on RFC2616 as a reference. [Section 13.4](https://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.4) gives an overall view of when a response MAY be cached. [Section 14.9.1](https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.1) has more insight into the `Cache-Control` directives.

Here's a summary of the relevant information I applied to the implementation:

 - > Unless specifically constrained by a cache-control (section 14.9) directive, a caching system MAY always store a successful response (see section 13.8) as a cache entry, MAY return it without validation if it is fresh, and MAY return it after successful validation.

    A response without cache control headers is totally fine, and it's up to the cache (shared or private) to decide what to do with it. That is why the implementation does not longer set `no-cache` if no `Cache-Control` headers are present.

 - > A response received with a status code of 200, 203, 206, 300, 301 or 410 MAY be stored […] unless a cache-control directive prohibits caching.

    > A response received with any other status code (e.g. status codes 302 and 307) MUST NOT be returned […] unless there are cache-control directives or another header(s) that explicitly allow it.

    This is what `ResponseCacheStrategy::isUncacheable` implements to decide whether a response is not cacheable at all. It differs from `Response::isCacheable` which only returns true if there are actual `Cache-Control` headers.

 - > [Section 13.2.3](https://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.2.3): When a response is generated from a cache entry, the cache MUST include a single Age header field in the response with a value equal to the cache entry's current_age.

    That's why the implementation **always** adds the `Age` header. It takes the oldest age of any of the responses as common denominator for the content.

 - > [Section 14.9.3](https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.3): If a response includes an s-maxage directive, then for a shared cache (but not for a private cache), the maximum age specified by this directive overrides the maximum age specified by either the max-age directive or the Expires header.

    This effectively means that `max-age`, `s-maxage` and `Expires` must all be kept on the response. My implementation assumes that we can only do that if they exist in **all** of the responses, and then takes the lowest value of any of them. Be aware the implementation might look confusing at first. Due to the fact that the `Age` header might come from another subresponse than the lowest expiration value, the values are stored relative to the current response date and then re-calculated based on the age header.

The Symfony implementation did not and still does not implement the full RFC. As an example, some of the `Cache-Control` headers (like `private` and `no-cache`) MAY actually have a string value, but the implementation only supports boolean. Also, [Custom `Cache-Control` headers](https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.6) are currently not merged into the final response.

**ToDo/Questions:**

 1. [Section 13.5.2](https://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.5.2) specifies that we must add a [`Warning 214 Transformation applied`](https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.46) if we modify the response headers.

 2. Should we add an `Expires` headers based on `max-age` if none is explicitly set in the responses? This would essentially provide the same information as `max-age` but with support for HTTP/1.0 proxies/clients.

 3. I'm not sure about the implemented handling of the `private` directive. The directive is currently only added to the final response if it is present in all of the subresponses. This can effectively result in no cache-control directive, which does not tell a shared cache that the response must not be cached. However, adding a `private` might also tell a browser to actually cache it, even though non of the other responses asked for that.

 4. > [Section 14.9.2](https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.2): The purpose of the `no-store` directive is to prevent the inadvertent release or retention of sensitive information […]. The `no-store` directive applies to the entire message, and MAY be sent either in a response or in a request. If sent in a request, a cache MUST NOT store any part of either this request or any response to it. If sent in a response, a cache MUST NOT store any part of either this response or the request that elicited it.

    I have not (yet) validated whether the `HttpCache` implementation respects any of this.

 5. As far as I understand, the current implementation of [`ResponseHeaderBag::computeCacheControlValue`](https://github.com/symfony/symfony/blob/master/src/Symfony/Component/HttpFoundation/ResponseHeaderBag.php#L313) is incorrect. `no-cache` means a response [must not be cached by a shared or private cache](https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.1), which overrides `private` automatically.

 5. The unit tests are still very limited and I want to add plenty more to test and sort-of describe the implementation or assumptions on the RFC.

/cc @nicolas-grekas

#SymfonyConHackday2018

Commits
-------

893118f [HttpKernel] Correctly merging cache directives in HttpCache/ResponseCacheStrategy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0