8000 [Cache] Add `MarshallerInterface` allowing to change the serializer, providing a default one that automatically uses igbinary when available by nicolas-grekas · Pull Request #27645 · symfony/symfony · GitHub
[go: up one dir, main page]

Skip to content

[Cache] Add MarshallerInterface allowing to change the serializer, providing a default one that automatically uses igbinary when available #27645

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

nicolas-grekas
Copy link
Member
@nicolas-grekas nicolas-grekas commented Jun 19, 2018
Q A
Branch? master
Bug fix? no
New feature? yes
BC breaks? no
Deprecations? yes
Tests pass? yes
Fixed tickets #19895
License MIT
Doc PR -

With this PR, when igbinary is available, it is automatically used to serialize values.
This provides faster and smaller cache payloads.
The unserializing logic is autoadaptative:

  • when an igbinary-serialized value is unserialized but the extension is missing, a cache miss is triggered
  • when a natively-serialized value is unserialized and the extension is available, the native unserialize() is used

Ping @palex-fpt since you provided very useful comments on the topic and might be interested in reviewing here also.

* Serializes a list of values using igbinary_serialize() if available, serialize() otherwise.
*/
protected static function serialize(array $values, ?array &$failed): array
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it placed into AbstractTrait? It didn't used not by AbstractTrait nor by AbstractCache. It would be surprise to override serialize method and got nothing. Only 3 of 7 AbstractCache subclasses requires serialization: FilesystemCache, PdoCache and RedisCache.

@@ -194,6 +194,10 @@ public static function createConnection($servers, array $options = array())
*/
protected function doSave(array $values, $lifetime)
{
if (!$values = parent::serialize($values, $failed)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memcache API allows to store plain PHP vars. And its serialization method can be configured in connection strings. Do we need double serialization here?

return $value;
}
throw new \DomainException('Failed to unserialize cached value');

throw new \DomainException(error_get_last() ? error_get_last()['message'] : 'Failed to unserialize cached value');
} catch (\Error $e) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need that complex logic here? Why simple call to unserialization function and result check would not be enough?

@@ -4,6 +4,7 @@ CHANGELOG
4.2.0
-----

* automatically use igbinary when available, with graceful error handling
Copy link
@palex-fpt 10000 palex-fpt Jun 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO it is OK to have meaningful defaults for framework. But it is not OK to force settings for components.
[edit: see answer below]

@palex-fpt
Copy link

Opps. I'm late with review.

Just checked our issue board: there was tickets when we was forced to revert igbinary serialization for some components. Having ability to select serialization method without rebuilding operation environment is critical.

@nicolas-grekas
Copy link
Member Author

Having ability to select serialization method without rebuilding operation environment is critical

💯 that's exactly what is provided here: when data is read from the backend, we adaptatively unserialize it using igbinary or native-serialize depending on the payload. Of course, if igbinary is not loaded but the data had been serializing with it, we report a cache miss - there is no other way around.
This allows seamless migration from native-serialize to igbinary-serialize, with data preservation in this direction and without in the other direction. Do you see something else?

memcache API allows to store plain PHP vars. Do we need double serialization here?

That's actually required to provide the auto-adaptative behavior just described above. I'd recommend to not use igbinary at the extension level for this reason also. Maybe even deprecate having it enabled?

not OK to force settings for components.

What's the downside? If there is none, better keep this as an internal detail. If you have igbinary, there is no downside to using it, and since this extension is opt-in, you can always remove it.

@nicolas-grekas nicolas-grekas force-pushed the cache-igbinary branch 2 times, most recently from 36820be to 9a5c2ff Compare June 20, 2018 11:20
@@ -110,13 +110,13 @@ public function clear()
if ($cleared = $this->versioningIsEnabled) {
$namespaceVersion = 2;
try {
foreach ($this->doFetch(array('@'.$this->namespace)) as $v) {
foreach ($this->doFetch(array('/'.$this->namespace)) as $v) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this creates a separate set of keys when using 3.4 to 4.1 vs 4.2 & up. This is required for Memcached adapters because of the added serialization that was delegated to the extension before. Not doing so breaks loading values using 3.4 with a cache that has been populated with 4.2 (can happen during migrations.) Let's be safe. Since versioning is enabled only for Memcached by default, this is nicely selective.

@palex-fpt
Copy link

This allows seamless migration from native-serialize to igbinary-serialize, with data preservation in this direction and without in the other direction.

This fallback is required due fact that from application point 'nothing' changed. When I reconfigure application cache (change file to memcache, change directory or change serialization mode) I'm pretty OK with repopulating cache. And I prefer fast fail when igbinary is configured for application but is not available on deployment node, than slow running application with hard to detect reasons.

What's the downside? If there is none, better keep this as an internal detail. If you have igbinary, there is no downside to using it, and since this extension is opt-in, you can always remove it.

It is now igbinary is faster. It can be changed. When some other component requires igbinary you would not be able to disable it for cache. Moreover - it can not be easy tested cause enabling/disabling extension is hard.

Well, can we have selectable modes: adaptive, igbinary_only, php_only?

@nicolas-grekas
Copy link
Member Author
nicolas-grekas commented Jun 21, 2018

I prefer fast fail when igbinary is configured for application but is not available

No way: that explicitly conflicts with PSR-6 spirit and wordings (for good reasons IMHO.)

Well, can we have selectable modes?

Done with a new constructor argument: $allowIgbinarySerialize, set to true by default, and forcing native-serialize() when set to false.

@nicolas-grekas nicolas-grekas changed the title [Cache] automatically use igbinary when available, with graceful error handling [Cache] Add MarshallerInterface allowing to change the serializer, providing a default one that automatically uses igbinary when available Jun 22, 2018
@nicolas-grekas
Copy link
Member Author

Now with marshaller injection instead of boolean option.

@nicolas-grekas nicolas-grekas force-pushed the cache-igbinary branch 2 times, most recently from 99f0534 to fd8d65c Compare June 22, 2018 13:15
@Toflar
Copy link
Contributor
Toflar commented Jun 25, 2018

Just a general question: I haven't checked out the code in depth but it seems to me like the Marshaller could be used outside of the Cache component to dump stuff in a performant way, no? Wouldn't it make more sense to have that somewhere else?

nicolas-grekas added a commit that referenced this pull request Jun 25, 2018
…lon" controller service refs (nicolas-grekas)

This PR was merged into the 4.1 branch.

Discussion
----------

[TwigBundle] bump lowest deps to fix issue with "double-colon" controller service refs

| Q             | A
| ------------- | ---
| Branch?       | 4.1
| Bug fix?      | yes
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | #27645
| License       | MIT
| Doc PR        | -

Bumping http-kernel to ~4.1 and conflicting with framework-bundle <4.1.
The rest is transitivity.

Commits
-------

f75723f [TwigBundle] bump lowest deps to fix issue with "double-colon" controller service refs
@nicolas-grekas
Copy link
Member Author

ping @symfony/deciders

*
* @author Nicolas Grekas <p@tchwork.com>
*/
class DefaultMarshaller implements MarshallerInterface
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this has been asked, but why do we have a DefaultMarshaller with some internal logic in all methods to differentiate between PHP and IgBinary ... instead of defining a PhpMarshaller and IgBinaryMarshaller (moreover given that we define a MarshallerInterface).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there is only one implementation of MarshallerInterface: DefaultMarshaller, which auto-adaptatively uses serialize() or igbinary_serialize().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that looks to me like creating a DefaultMailer which makes if($use_smtp) { ... } elseif ($use_gmail) { ...} inside its methods. In other parts of the framework we define 1 interface and multiple different implementation classes. Here it looks like we have multiple implementations inside a single class. It looks confusing to me, sorry.

Copy link
Member Author
@nicolas-grekas nicolas-grekas Jun 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@javiereguiluz it's not the same. Here, by dealing with both serialization methods (and there are none others for PHP objects), we provide an important feature: auto-adaptative unserialization, providing safe forward and backward compatibility at the data level. This is not possible without having something that knows about both formats.

Copy link
Contributor
@teohhanhui teohhanhui Jun 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IgBinaryMarshaller could take PhpMarshaller as a fallback marshaller (which is optional), no?

Copy link
Contributor
@teohhanhui teohhanhui Jun 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of the MarshallerInterface then if there are no other reasonable implementations?

Copy link
Member Author
@nicolas-grekas nicolas-grekas Jun 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to the interface, the serialized string itself can still be processed, e.g compressed and/or encrypted.
One could also build a serializer that would be dedicated to and optimized for very specific data structures (eg only numbers, etc.), etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... But actually it's not true that igbinary is the only serialization format that might benefit from a fallback to PHP serialization. There are / were other contenders like msgpack (MessagePack) and Protobuf, both of which have PHP extensions.

Copy link
Member Author
@nicolas-grekas nicolas-grekas Jun 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, another reason to have the interface. Let's keep the topic: should we split the default marshaller in two. I think we would do a misfeature by doing so: things would be more complex to understand, more complex to wire, less capable, and less performant while in a critical code path. Would the individual envisioned marshallers empower our users in comparison to these downsides? Not at all in my opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right, and those other serializer formats are not PHP-specific, so it wouldn't make sense for them to fallback to PHP serialization, actually. 😆

@joelwurtz
Copy link
Contributor

What will happen if someone install the igbinary extension on their server but already have a lot of cache ?

@nicolas-grekas
Copy link
Member Author
nicolas-grekas commented Jun 28, 2018

@joelwurtz see test cases: native unserialize() will be used.

{
private $allowIgbinarySerialize = true;

public function __construct(bool $allowIgbinarySerialize = true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A way to force only igbinary (disable the PHP serialization fallback)?

Copy link
Contributor
@teohhanhui teohhanhui Jun 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@palex-fpt said:

I prefer fast fail when igbinary is configured for application but is not available on deployment node, than slow running application with hard to detect reasons.

To which @nicolas-grekas you replied:

No way: that explicitly conflicts with PSR-6 spirit and wordings (for good reasons IMHO.)

But if the fail-fast behavior means a cache miss, that's not violating PSR-6 in any way. (Bad idea, as it'll result in exactly "slow running application with hard to detect reasons".) It could fail-fast by throwing an exception in the constructor when igbinary is forced but the extension is not available. That would not violate PSR-6.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Not convinced yet. Really worth it?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I prefer fail-fast feature over auto-adaptive feature. Reconfigured application cache can and should be warmed up. But having spoiled node that writes to shared cache in php format and forces all other properly configured nodes to use slow method is not good. If someone concerning about preserving old cache data he can use workaround with distinct namespaces and chain cache instances with igbinary an php-serialize.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are: the default mode is auto-adaptative, but if you pass true, you'll get an exception when igbinary is not loaded.

@nicolas-grekas nicolas-grekas force-pushed the cache-igbinary branch 2 times, most recently from 73ea8dd to 3a7517d Compare June 29, 2018 11:19
@nicolas-grekas
Copy link
Member Author

Any other comment here?
Time to vote @symfony/deciders :)

@nicolas-grekas nicolas-grekas force-pushed the cache-igbinary branch 2 times, most recently from 3cff315 to 9c328c4 Compare July 8, 2018 08:55
…providing a default one that automatically uses igbinary when available
@fabpot
Copy link
Member
fabpot commented Jul 9, 2018

Thank you @nicolas-grekas.

fabpot added a commit that referenced this pull request Jul 9, 2018
…he serializer, providing a default one that automatically uses igbinary when available (nicolas-grekas)

This PR was merged into the 4.2-dev branch.

Discussion
----------

[Cache] Add `MarshallerInterface` allowing to change the serializer, providing a default one that automatically uses igbinary when available

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | no
| New feature?  | yes
| BC breaks?    | no
| Deprecations? | yes
| Tests pass?   | yes
| Fixed tickets | #19895
| License       | MIT
| Doc PR        | -

With this PR, when igbinary is available, it is automatically used to serialize values.
This provides faster and smaller cache payloads.
The unserializing logic is autoadaptative:
- when an igbinary-serialized value is unserialized but the extension is missing, a cache miss is triggered
- when a natively-serialized value is unserialized and the extension is available, the native `unserialize()` is used

Ping @palex-fpt since you provided very useful comments on the topic and might be interested in reviewing here also.

Commits
-------

9c328c4 [Cache] Add `MarshallerInterface` allowing to change the serializer, providing a default one that automatically uses igbinary when available
@nicolas-grekas nicolas-grekas deleted the cache-igbinary branch July 9, 2018 15:49
@Koc
Copy link
Contributor
Koc commented Jul 15, 2018

Is it possible decoration for serialization? Create MarshallerCacheItemPool which will decorate any other pool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants
0