8000 [Serializer] Add support for auto generated custom normalizers by Nyholm · Pull Request #52905 · symfony/symfony · GitHub
[go: up one dir, main page]

Skip to content

[Serializer] Add support for auto generated custom normalizers #52905

New issue < 8000 /summary>

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: 7.3
Choose a base branch
from

Conversation

Nyholm
Copy link
Member
@Nyholm Nyholm commented Dec 5, 2023
Q A
Branch? 7.3
Bug fix? no
New feature? yes
Deprecations? no
Issues
License MIT

This is an opt-in performance boost for the serializer component. The trick is to move work from runtime to build-time by generating custom normalizers.


The Serializer is slow because every time we serialize or deserialize an object we are trying to figure out what this object really is. By marking the classes we are interested in serializeing, we can do all this "figure out" at build time. This saves LOADs of time.

The more complex the value objects are, the more time we will save. On a simple "hello word" object, we reduce the time with about 80%.

This feature is not complete. We do not yet support the Serializer attributes like Ignore and SerializerName. But since this is opt-in AND experimental I think it is safe to review, try and maybe even merge this PR. Because many large PRs on this component have stalled the passed few years.

Usage

# config/package/serializer.yaml
framework:
    serializer:
        auto_normalizer:
            paths:
                App\Model: 'src/Model'
                App\Entity: 'src/Entity'
App\Model value objects
# src/Model/Foo/Blog.php

namespace App\Model\Foo;

use Symfony\Component\Serializer\Attribute\Serializable;

#[Serializable]
class Blog
{
    public string $name;
}
# src/Model/User.php

namespace App\Model;

use App\Model\Foo\Blog;
use Symfony\Component\Serializer\Attribute\Serializable;

#[Serializable]
class User
{
    public string $name;
    public Blog $blog;
}
#[AsCommand('app:debug')]
class DebugCommand extends Command
{
    public function __construct(
        private SerializerInterface $serializer
    ){
        parent::__construct();
    }

    protected function execute(InputInterface $input, OutputInterface $output): int
    {
        $blog = new Blog();
        $blog->name = 'My Blog';

        $user = new User();
        $user->name = 'Tobias';
        $user->blog = $blog;

        $start = microtime(true);
        for($i = 0; $i < 200000; $i++) {
            $string = $this->serializer->serialize($user, 'json');
        }
        $end = microtime(true);
        echo 'Time: '.round(($end - $start) * 1000).' ms'.\PHP_EOL;
        $output->writeln($string);

        return Command::SUCCESS;
    }
}

Output:

➜  example-app git:(7.1) ✗ sf app:debug                                 
Time: 2557 ms
{"name":"Tobias","blog":{"name":"My Blog"}}

➜  example-app git:(better-serializer) ✗ sf app:debug
Time: 310 ms
{"name":"Tobias","blog":{"name":"My Blog"}}

Generated classes

The tests have a lot of example classes and expected outputs. See https://github.com/Nyholm/symfony/tree/better-serializer/src/Symfony/Component/Serializer/Tests/Fixtures/CustomNormalizer

See generated classes for PR example
namespace Symfony\Serializer\Normalizer;

use App\Model\Foo\Blog;
use Symfony\Component\Serializer\Normalizer\NormalizerInterface;
use Symfony\Component\Serializer\Normalizer\DenormalizerInterface;

class App_Model_Foo_Blog implements NormalizerInterface, DenormalizerInterface
{
    public function getSupportedTypes(?string $format): array
    {
        return [Blog::class => true];
    }

    public function supportsNormalization(mixed $data, ?string $format = NULL, array $context = []): bool
    {
        return $data instanceof Blog;
    }

    public function supportsDenormalization(mixed $data, string $type, ?string $format = NULL, array $context = []): bool
    {
        return $type === Blog::class;
    }

    /**
     * @param Blog $object
     */
    public function normalize(mixed $object, ?string $format = NULL, array $context = []): array|string|int|float|bool|\ArrayObject|null
    {
        return [
            'name' => $object->name,
        ];
    }

    public function denormalize(mixed $data, string $type, ?string $format = NULL, array $context = []): mixed
    {
        
        $output = new Blog();
        if (array_key_exists('name', $data)) {
            $output->name = $data['name'];
        }
        
        return $output;
    }

}
namespace Symfony\Serializer\Normalizer;

use App\Model\User;
use Symfony\Component\Serializer\Normalizer\NormalizerInterface;
use Symfony\Component\Serializer\Normalizer\DenormalizerInterface;
use Symfony\Component\Serializer\Normalizer\NormalizerAwareInterface;
use Symfony\Component\Serializer\Exception\DenormalizingUnionFailedException;
use Symfony\Component\Serializer\Normalizer\DenormalizerAwareInterface;

class App_Model_User implements NormalizerInterface, DenormalizerInterface, NormalizerAwareInterface, DenormalizerAwareInterface
{
    private null|NormalizerInterface $normalizer = NULL;
    private null|DenormalizerInterface $denormalizer = NULL;

    public function getSupportedTypes(?string $format): array
    {
        return [User::class => true];
    }

    public function supportsNormalization(mixed $data, ?string $format = NULL, array $context = []): bool
    {
        return $data instanceof User;
    }

    public function supportsDenormalization(mixed $data, string $type, ?string $format = NULL, array $context = []): bool
    {
        return $type === User::class;
    }

    /**
     * @param User $object
     */
    public function normalize(mixed $object, ?string $format = NULL, array $context = []): array|string|int|float|bool|\ArrayObject|null
    {
        return [
            'name' => $object->name,
            'blog' => $this->normalizeChild($object->blog, $format, $context, false),
        ];
    }

    public function setNormalizer(NormalizerInterface $normalizer): void
    {
        $this->normalizer = $normalizer;
    }

    private function normalizeChild(mixed $object, ?string $format, array $context, bool $canBeIterable): mixed
    {
        if (is_scalar($object) || null === $object) {
            return $object;
        }
        
        if ($canBeIterable === true && is_iterable($object)) {
            return array_map(fn($item) => $this->normalizeChild($item, $format, $context, true), $object);
        }
        
        return $this->normalizer->normalize($object, $format, $context);
        
    }

    public function denormalize(mixed $data, string $type, ?string $format = NULL, array $context = []): mixed
    {
        
        $output = new User();
        if (array_key_exists('name', $data)) {
            $output->name = $data['name'];
        }
        if (array_key_exists('blog', $data)) {
            $setter0 = $this->denormalizeChild($data['blog'], \App\Model\Foo\Blog::class, $format, $context, false);
            $output->blog = $setter0;
        }
        
        return $output;
    }

    public function setDenormalizer(DenormalizerInterface $denormalizer): void
    {
        $this->denormalizer = $denormalizer;
    }

    private function denormalizeChild(mixed $data, string $type, ?string $format, array $context, bool $canBeIterable): mixed
    {
        if (is_scalar($data) || null === $data) {
            return $data;
        }
        
        if ($canBeIterable === true && is_iterable($data)) {
            return array_map(fn($item) => $this->denormalizeChild($item, $type, $format, $context, true), $data);
        }
        
        return $this->denormalizer->denormalize($data, $type, $format, $context);
        
    }

}

@yceruto
Copy link
Member
yceruto commented Dec 5, 2023

Thanks Tobias for this proposal! It's nice to see some performance improvements for this component.

I did just a quick check of the code and I wonder if the code generator building blocks can be a separate component or consider https://github.com/nette/php-generator. It looks decoupled from this feature.

@yceruto
Copy link
Member
yceruto commented Dec 5, 2023

Another alternative for PHP code generation nikic/php-parser https://github.com/nikic/PHP-Parser/blob/HEAD/doc/component/AST_builders.markdown we already use this library in the Translation component.

@Nyholm
Copy link
Member Author
Nyholm commented Dec 5, 2023

That are some good points.
Im not in love with Nette because of its ugly generated PHP code, awkward API, it is opinionated and lack of extensibility. This is of course 100% my opinion and it is from a few years back.

Nikic is way too complex and verbose.

I also looked into re-use the code generator from Config component. But that one is also very opinionated and not generic. I've also heard @mtarld is using a code generator for upcoming PR... I really think a group of contributors should look over Symfony's use of code generators, but I dont think that is a blocker for this PR. (The CodeGenerator here is marked as internal)

@RobertMe
Copy link
Contributor
RobertMe commented Dec 6, 2023

Just a quick glance over the description and example, and I noticed this in the generated code:

        if (array_key_exists('name', $data)) {
            $output->name = $data['name'];
        }

To my understanding isset() checks have always been (way) faster than array_key_exists(). Obviously there is the difference in null handling, but over the years I've seen quite a lot of code where this is optimized into isset(...) || array_key_exists(...) where normally the (much) quicker isset() check is enough. Might this be an optimization which could be applied here as well?

Great idea though, and looking forward for this 😄

@mtarld
Copy link
Contributor
mtarld commented Dec 6, 2023

That are some good points. Im not in love with Nette because of its ugly generated PHP code, awkward API, it is opinionated and lack of extensibility. This is of course 100% my opinion and it is from a few years back.

Nikic is way too complex and verbose.

I also looked into re-use the code generator from Config component. But that one is also very opinionated and not generic. I've also heard @mtarld is using a code generator for upcoming PR... I really think a group of contributors should look over Symfony's use of code generators, but I dont think that is a blocker for this PR. (The CodeGenerator here is marked as internal)

I finally used nikic/php-parser in #51718, and I think that's the way to go.
IMHO we can rely on such a stable library. While it is indeed verbose, it'll lighten the codebase a lot, allowing to get rid of builders.
Plus, having an AST instead of a string feels more secure to me, as it avoids typos and can be compatible to any PHP version.

@staabm
Copy link
Contributor
staabm commented Dec 13, 2023

To my understanding isset() checks have always been (way) faster than array_key_exists(). Obviously there is the difference in null handling, but over the years I've seen quite a lot of code where this is optimized into isset(...) || array_key_exists(...) where normally the (much) quicker isset() check is enough. Might this be an optimization which could be applied here as well?

Thats only correct for older PHP versions. In recent versions the opposite is true.

@RobertMe
Copy link
Contributor

Thats only correct for older PHP versions. In recent versions the opposite is true.

You'll learn something new every day 😄. Ran a quick PHPBench (for loop with 10M iterations doing isset(...) || array_key_exists(...) vs array_key_exists(...) and split over "key exists" and "key doesn't exist", so 4 benches in total). And array_key_exists with a non existing key is even quickest of all. Then comes isset || array_key_exists with an existing key quickly followed by array_key_exists with an existing key, while isset || array_key_exists for a non existing key is over 50% slower compared to all of the others.

@fabpot
Copy link
Member
fabpot commented Dec 13, 2023

Using nikic/php-parser is the way to go IMHO, which is what the maker bundle uses IIRC.

< 8000 /svg>

@Korbeil
Copy link
Contributor
Korbeil commented Dec 13, 2023

Hey, we spoke a bit together during SymfonyCon hackday. I am working on JanePHP components & AutoMapper. We do a lot of code generation and I love seeing such feature coming to Symfony Serializer ! I didn't saw your PR before some tweet I saw today that's why I didn't understand well all you were talking with Mathias.
I'll also recommend using nikic/php-parser since it allows you to validate part of the input and will make this code safer ! Also you're not obligated to work only with AST, the library comes with a builder that makes some tasks easier, you can see some examples there: https://github.com/Korbeil/jane-v8/blob/main/src/Component/JsonSchemaGenerator/Generator/ModelGenerator.php#L56-L66 (this is part of the new JanePHP version that I'm working on 😉).

In your implementation I would like to see some sort of hash corresponding to the model your generated the normalizer from, the idea is to know that you generated the normalizer with a specific version of that model and if the model changes, you will generate a new version of it. You can find this logic here if you want an example: https://github.com/jolicode/automapper/blob/9c43e0d3710f202097d30656667041c3cbf8382c/src/MapperMetadata.php#L138-L155.

I'm not sure I am a fan of having the model that holds information about what is going to be generated having also the role to generate stuff, but it fits well in your pull request. Also, do you plan on replacing stuff like ArrayDenormalizer with this PR or a later PR ? I would love being able to have all the normalizer code being generated and not only some part.

@dbu
Copy link
Contributor
dbu commented Dec 13, 2023

this has crazy performance impacts.

me and some collegues wrote something like this a while ago for JMS serializer. if you need more inspiration: https://github.com/liip/serializer

we generate code to convert between object and array and back. we did not use any code generation tool but twig :-D (but using a code generator is probably the better way to do this). for our complex models, we generate 50k or more lines of PHP that is all dead simple, and thanks to opcache it is quite ok to have such large code files.

depending on event system during the on-the-fly conversion, our generated code will not do the same. it should be possible to trigger the events from the generated code, but we did not use events so we did not make the effort.

Copy link
Contributor
@soyuka soyuka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice it reminds me of the work by @GuilhemN #22051. I'm definitely in favor of nikic/php-parser for code generation.

@@ -17,8 +17,6 @@
* Infers the constructor argument type.
*
* @author Dmitrii Poddubnyi <dpoddubny@gmail.com>
*
* @internal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan, you did this because it's used in another component? Just ignore the error for internal use between components it's fine no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It is because I used it in another component.

Do you know the reason why it is internal in the first place?

@joelwurtz
Copy link
Contributor

Really nice, would love to see that in symfony, i think it's simple enough for a first implementation and other features can be added in subsequent PR.

More optimizations can be done also to increase performance (like directly inject the child normalizer when needed, to have done a similar thing having this direct injection normally provide between 10 to 20% performance in some cases, like a property that is an array of objects) but all of this can be added further.

Also ATM there is now 3 different pull request with a similar goal with the Mapper component and the JsonEncoder component, i don't know which way the symfony team want to go but IMO there is a path were all of this PR can use the same implementation and code base which will drasticly reduce code maintenance and burden to the core team by doing something like :

  • create an internal implementation to generate a mapper that read data from an array / object and write it to an array / object, if the implementation use nikic/php-parser it's really easy to change the way the property is read / write on the fly during generation

With this implementation you will be able to achieve the same thing as those 3 PR :

  • Normalization Generation -> object to array
  • Denormalization Generation -> array to object
  • Mapper Generation -> object to object

For the JsonEncode PR this may require some more works to be able to use only specific part of the implementation but i think this is totally doable

Having this will also allow each of this component to share new features (supporting groups, excluding properties, renaming properties, circular loop, etc ...)

@dbu
Copy link
Contributor
dbu commented Dec 14, 2023
* create an internal implementation to generate a mapper that read data from an array / object and write it to an array / object, if the implementation use nikic/php-parser it's really easy to change the way the property is read / write on the fly during generation

this is also the approach we took for our liip serializer-generator: https://github.com/liip/metadata-parser

because we wanted to build a drop-in replacement for JMS, we specifically support JMS annotations, but such a metadata parser could be really generic and provide metadata from the code or any kind of attributes/annotations/metadata mapping files in yaml/xml like doctrine does etc.

@Nyholm Nyholm force-pushed the A3E2 better-serializer branch from 753f197 to d10be79 Compare January 15, 2024 02:08
Copy link
Member Author
@Nyholm Nyholm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do you plan on replacing stuff like ArrayDenormalizer with this PR or a later PR ?

That is a later PR =)

me and some collegues wrote something like this a while ago for JMS serializer. if you need more inspiration: https://github.com/liip/serializer

Yeah. I think I've seen a conference talk about that.


PR is updated. We now use nikic/PHP-Parser

@@ -17,8 +17,6 @@
* Infers the constructor argument type.
*
* @author Dmitrii Poddubnyi <dpoddubny@gmail.com>
*
* @internal
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It is because I used it in another component.

Do you know the reason why it is internal in the first place?

@Nyholm
Copy link
Member Author
Nyholm commented Jan 15, 2024

Tests are green.
The Psalm error is a false negative.

@xabbuh xabbuh modified the milestones: 7.1, 7.2 May 15, 2024
@andreybolonin
Copy link
Contributor
andreybolonin commented Jun 4, 2024

Any chances to merge into 7.2?

@Korbeil
Copy link
Contributor
Korbeil commented Jun 4, 2024

Any chances to merge into 7.2?

I do think it will overlap with coming ObjectMapper (see #51741). First implementation will use Reflection but it's planned (see #54476) to make a second implementation in order to generated mappers that will act as normalizers.

@tsantos84
Copy link
Contributor

Hi @Nyholm, I've tried to develop my own serializer component based on generated code, exactly the same way you're proposing to Symfony Serializer component. In that moment (5 years back), I made a lot of performance tests with the generated code and without any surprise, the benchmark showed that this is the way to go. If I can contribute with your PR, I'd say to take a look on the code generator class to see how I generated the normalizer code. Each decision on the generated code was implemented with the light of Blackfire to minimize any bottleneck. You'll notice that to increase even more the performance, some Symfony Serializer logic should be refactored to allow them to be executed by the generated code, like ignored attributes, normalized name, class inheritance and etc. All of this feature has an overhead when used through reflection at runtime and can be improved on the generated code.

This tool helped me a lot when I wrote the component.

@dunglas
Copy link
Member
dunglas commented Jun 17, 2024

How would this interact with #51718?

@Nyholm
Copy link
Member Author
Nyholm commented Jun 17, 2024

How would this interact with #51718?

#51718 is for specific (but very common) scenarios. Ie DTOs to/form JSON (and maybe XML). It also skips the normalizer step.

This PR is for the generic use case, to/form all kind of data and does not change the architecture of the serialization process. However, it does not give as impressive results (but almost). Me and @mtarld believes they can/should co-exists.

@Korbeil
Copy link
Contributor
Korbeil commented Jun 19, 2024

I made a quick schema to summarize how each PR interact with the Serializer concepts:

image

ping #51741
ping #51718

@fabpot fabpot modified the milestones: 7.2, 7.3 Nov 20, 2024
@joelwurtz
Copy link
Contributor

ObjectMapper was merged recently and @soyuka tell me that it would be nice to have generated code to improve performance of such mapper.

I do believe it's possible to have the same implementation that does :

  • Normalizer
  • Denormalizer
  • ObjectMapper
  • (Maybe JsonEncoder ?)

As the only difference in this is the way to write / read data

The main problem is : where to put this implementation, as other components will have to require this implementation and this is something that should be fully internal (no public extensions points, they are provided by other components as their are specific to the wanted behavior)

I'm not sure what your status are on this actually @Nyholm ?

@soyuka
Copy link
Contributor
soyuka commented Mar 30, 2025

The main problem is : where to put this implementation, as other components will have to require this implementation and this is something that should be fully internal (no public extensions points, they are provided by other components as their are specific to the wanted behavior)

I see a few ways:

  • create a new "internal" component that's used to generate code (could re-use some of the JsonStream component for example)
  • create a public component (just like [AstGenerator] [WIP] New Component, add normalizer generator #17516)
  • each component does its own thing and some code will be duplicated (actually we need to see if there's really something reusable across components)

Anyways this probably needs an RFC issue to prevent polluting this topic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0