8000 [Translation] Speed up TranslationDebugCommand by enumag · Pull Request #49585 · symfony/symfony · GitHub
[go: up one dir, main page]

Skip to content

[Translation] Speed up TranslationDebugCommand #49585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

enumag
Copy link
Contributor
@enumag enumag commented Mar 2, 2023
Q A
Branch? 6.3
Bug fix? yes
New feature? no
Deprecations? no
Tickets
License MIT
Doc PR

The TranslationDebugCommand is currently very very slow. I was digging in Symfony codebase and it turns out that Symfony has quite a few optimizations to limit the search paths for this command to only files where translator is used and twig templates:

But all of this hard work is then thrown out of the window by the command itself adding $kernel->getProjectDir().'/src' to the list - making it impossible to search only the optimized list and searching the entire project anyway. In my case the command takes over 5 minutes with this line but less then 5 seconds when I remove it while keeping the exact same result.

@welcoMattic
Copy link
Member

Hi @enumag, thank you for this contribution. As you said you get the very same result with or without this line in TranslationDebugCommand, could you add a test to cover the case?

At first sight, I tend to agree with your fix, as TranslationDebugCommand should act the same way as TranslationUpdateCommand about the scanned paths.

@lyrixx lyrixx changed the title Speed up TranslationDebugCommand [Translation] Speed up TranslationDebugCommand Mar 3, 2023
@enumag
Copy link
Contributor Author
enumag commented Mar 3, 2023

Hi @enumag, thank you for this contribution. As you said you get the very same result with or without this line in TranslationDebugCommand, could you add a test to cover the case?

@welcoMattic Not sure what test you mean. There is https://github.com/symfony/symfony/blob/6.3/src/Symfony/Bundle/FrameworkBundle/Tests/Functional/TranslationDebugCommandTest.php and this one is still passing.

I wasn't using TranslationUpdateCommand but not that you mention it it actually has the exact same problem:

$codePaths[] = $kernel->getProjectDir().'/src';

@welcoMattic
Copy link
Member

If we take a step back, those 2 commands should find translations keys from your code. However, translation keys could be located anywhere in src/ directory (flash messages in controllers, FormTypes, form validation, etc).

If you need to narrow the search location, there is --bundle argument you can use.

I'm afraid that if we remove src/ from codePaths by default, many users will miss some translations keys (to extract or to debug).

@enumag
Copy link
Contributor Author
enumag commented Mar 3, 2023

If we take a step back, those 2 commands should find translations keys from your code. However, translation keys could be located anywhere in src/ directory (flash messages in controllers, FormTypes, form validation, etc).

That's true but as far as I can tell the point of analyzing which services are using TranslatorInterface in TranslatorPass is to limit the paths which will be searched. There is no point in analyzing the entire src directory when we already have a much narrower list to search.

If you need to narrow the search location, there is --bundle argument you can use.

IIRC bundles are no longer recommended. My app isn't using them so this isn't an option.

I'm afraid that if we remove src/ from codePaths by default, many users will miss some translations keys (to extract or to debug).

Even if you want to keep the src there for compatibility with some obscure cases that TranslatorPass would miss shouldn't there be an option to disable it?

@welcoMattic
Copy link
Member

TranslatorPathsPass is a bit hard to understand, but if I understand correctly, it only looks for services with Translator as dependancy. But translation key could be located anywhere else (using t() function for instance).

Could you test something with the t() function call in some file without Translator as dependency, and your fix, to check if the translation key in t() is correctly extracted?

@enumag
Copy link
Contributor Author
enumag commented Mar 3, 2023

@welcoMattic I had no clue such function existed. How does it even locate a translator instance? Anyway yeah there is no way those calls would be found without the global search.

@welcoMattic
Copy link
Member

The t() function does not really translate, it creates a TranslatableMessage, containing the translation key. But like any classic key, it must be extracted (to local file, or to debug output).

@enumag
Copy link
Contributor Author
enumag commented Mar 6, 2023

@welcoMattic Ah I see. Yeah I'm pretty sure those wouldn't be discovered in that case and kinda make the inclusion of /src necessary. That said anynone concerned with performance of these commands would likely want to disable it anyway and configure the paths where this function is used in their project manually. So I still think there should be an option to disable this behavior even if it stays enabled by default.

@nicolas-grekas
Copy link
Member

What is the slow part? Is it parsing with PhpParser? Or just opening them all? If it's PhpParser, can we fuzzy search the content before parsing it using some clever str_contains() calls?

@enumag
Copy link
Contributor Author
enumag commented Mar 6, 2023

@nicolas-grekas I don't have a way to easily check the bottleneck since blackfire.io removed the free tier.

@enumag
Copy link
Contributor Author
enumag commented Mar 9, 2023

That said I believe that in my case the main issue is the sheer number of files in /src directory. The project is large and only a few files are using translations.

@nicolas-grekas
Copy link
Member

Can you maybe give my idea a try to see if that provides anything measurable for you?

@enumag
Copy link
Contributor Author
enumag commented Mar 17, 2023

@nicolas-grekas I tried your way by changing PhpExtractor like this:

    public function extract($resource, MessageCatalogue $catalog)
    {
        $files = $this->extractFiles($resource);
        foreach ($files as $file) {
            $contents = file_get_contents($file);
            if (strpos($contents, 'trans') !== false) {
                $this->parseTokens(token_get_all($contents), $catalog, $file);

                gc_mem_caches();
            }
        }
    }

Current implementation: 302 seconds
Your idea: 62 seconds
Removal of /src: 5 seconds

While your idea does provide significant improvement it's still nowhere near fast enough.

@nicolas-grekas
Copy link
Member

I would still to it. Dividing the time by 5 is still worth it.
Up for a PR?

@enumag
Copy link
Contributor Author
enumag commented Mar 17, 2023

Unfortunately your idea isn't good enough for me so I won't spend time on implementing it. I'm forced to decorate the extractor to ignore the src directory no matter if your idea is implemented or not.

@stof
Copy link
Member
stof commented Mar 17, 2023

Removing the whole src directory is a no-go, as this would break the feature (as explained before)

@enumag
Copy link
Contributor Author
enumag commented Mar 17, 2023

I'm aware but I'd still like to see an option to suppress it.

@nicolas-grekas
Copy link
Member

Let's close then. PR welcome if someone wants to submit the perf improvement that has been discussed. And PR welcome also of course if another approach could work for what you're trying to do @enumag

@welcoMattic welcoMattic removed the Bug label Mar 24, 2023
fabpot added a commit that referenced this pull request Mar 25, 2023
…or big code bases (welcoMattic)

This PR was merged into the 6.3 branch.

Discussion
----------

[Translation] Improve message extraction performance for big code bases

| Q             | A
| ------------- | ---
| Branch?       | 6.3
| Bug fix?      | no
| New feature?  | yes <!-- please update src/**/CHANGELOG.md files -->
| Deprecations? | no <!-- please update UPGRADE-*.md and src/**/CHANGELOG.md files -->
| Tickets       | Fix #49585 <!-- prefix each issue number with "Fix #", no need to create an issue if none exists, explain below instead -->
| License       | MIT
| Doc PR        |

As discussed in #49585, we try to extract messages from all src/ directory in `translation:debug` and `translation:extract` commands. But, it's not necessary to look into all files, as we can easily detect if a file contains something about translation (with some regexes).

This PR filter files before messages extraction, and it divide by at least 3 the time used to extract messages from an app with 100 000 php files.

Commits
-------

ae6c25b fix(translation): Improve performance of debug and extract command for big code base
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0