8000 Replace ArrayList in DataFormats with List<DataFormat> and refactor the class by h3xds1nz · Pull Request #9199 · dotnet/wpf · GitHub
[go: up one dir, main page]

Skip to content

Replace ArrayList in DataFormats with List<DataFormat> and refactor the class #9199

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

h3xds1nz
Copy link
Member
@h3xds1nz h3xds1nz commented Jun 4, 2024

Description

DataFormats is one of those classes that still use ArrayList instead of generic List<DataFormat>.

There are no functional changes besides removing the Ordinal -> OrdinalIgnoreCase compare and just using case-insensitive one by default as in .NET 8 this path is vectorized and I believe decreasing the code-size outweights the now immeasurable benefit of two subsequent compares (not to mention, should such a case occur, this solution will come on top) as this isn't a hot path regularly.

While refactoring, I've also removed obsolete CAS remarks and fixed some grammar mistakes.

As a last step, I removed the unnecessary double lock on InternalGetDataFormat/GetDataFormat methods by creating an internal static class, running the initialization of predefined formats in the static constructor, saving all the unnecessary overhead (the internal static class and the formats are not initialized if only the fields are used, as beforefieldinit is set, so there's no regression in that sense as well.)

Benchmark for id 16 and "locale" string

Method Mean [ns] Error [ns] StdDev [ns] Code Size [B] Allocated [B]
Original_int 33.11 ns 0.230 ns 0.204 ns 684 B -
Original_string 91.79 ns 0.635 ns 0.594 ns 1,064 B -
PR_EDIT_int 12.31 ns 0.089 ns 0.079 ns 465 B -
PR_EDIT_string 26.76 ns 0.171 ns 0.151 ns 617 B -

Customer Impact

This will increase performance 3x and almost 4x times increase can be observed should you query data formats with wrong casing input (which shall not be the case of WPF itself but if used externally, it might).

Regression

No.

Risk

I believe there are none as the static collection is used internally as a storage and access is done via defined GetDataFormat overloads.

Microsoft Reviewers: Open in CodeFlow

@h3xds1nz h3xds1nz requested a review from a team as a code owner June 4, 2024 15:46
@dotnet-policy-service dotnet-policy-service bot added PR metadata: Label to tag PRs, to facilitate with triage Community Contribution A label for all community Contributions labels Jun 4, 2024
@miloush
Copy link
Contributor
miloush commented Jun 4, 2024

@singhashish-wpf there is a lot of code style changes departing from the current codebase. Perhaps we should have some rules about what is expected going forward.

@h3xds1nz Can you provide any numbers demonstrating the performance/size impact? How did you test the changes?

@miloush
Copy link
Contributor
miloush commented Jun 4, 2024

The last change with static constructor would penalize anyone just using the static properties which I would think are the most common usage of this class.

@h3xds1nz
Copy link
Member Author
h3xds1nz commented Jun 4, 2024

@miloush I believe that WPF should move forward in the future while not introducing any breaking-changes. I'm actually testing waters to be entirely honest. As for code style, I'm happy to adapt if we get proper guidelines.

As for the static constructor, I do agree with you it will "penalize", but I view it as very cheap. It is a point for discussion though, I do not feel strongly about it, I'm happy to remove it if we feel it should be a blocker for the PR itself.

Running a very dumb benchmark on .NET 9 preview, generating some work so that it is not optimized out:

Method Mean Error StdDev
GetDataFormat 60.35 ms 0.176 ms 0.156 ms
GetDataFormat_Int 17.95 ms 0.114 ms 0.107 ms

The same, with the PR improvements (without static constructor) - PresentationCore swapped:

Method Mean Error StdDev
GetDataFormat 20.10 ms 0.079 ms 0.070 ms
GetDataFormat_Int 15.34 ms 0.124 ms 0.110 ms

The same, with the PR improvements (including static constructor) - PresentationCore swapped:

Method Mean Error StdDev Median
GetDataFormat 13.86 ms 0.080 ms 0.066 ms 13.875 ms
GetDataFormat_Int 10.58 ms 0.536 ms 1.580 ms 9.542 ms

Do note that this targets the case-insensitive compare hence the big difference between string and int compare. The regression between CS -> CI and going straight for CI compare only is about 1ms for 1mil passes.

   public partial class BenchmarkDemo
   {
       public StringBuilder sb = new StringBuilder(10_000_000);

       [Benchmark]
       public StringBuilder GetDataFormat()
       {
           sb.Clear();
           for (int i = 0; i < 1_000_000; i++)
           {
               DataFormat format = System.Windows.DataFormats.GetDataFormat("bitmap");
               sb.Append(format.Id);
           }

           return sb;
       }

       [Benchmark]
       public StringBuilder GetDataFormat_Int()
       {
           sb.Clear();
           for (int i = 0; i < 1_000_000; i++)
           {
               DataFormat format = System.Windows.DataFormats.GetDataFormat(2);
               sb.Append(format.Id);
           }

           return sb;
       }
   }

@lindexi
Copy link
Member
lindexi commented Jun 5, 2024

If the change doesn't involve a behavioral change, that's a particularly good thing.

@h3xds1nz
Copy link
Member Author
h3xds1nz commented Jun 5, 2024

@miloush @lindexi As per your very good points, I've changed the implementation to behave the same way as it did before with the last commit while keeping the benefits of all the changes. The internal list/class won't be initialized unless it is actually meant to be used (with one of the Get*) methods.

The call methods are likely to get inlined by the JIT and with tiered compilation this can all go smooth like butter.

Also the lock now belongs to the internal class.

@ThomasGoulet73
Copy link
Contributor

I'm personally not a big fan of moving code to static constructor just for performance, especially when the static constructor runs arbitrary code (Like UnsafeNativeMethods.RegisterClipboardFormat that could have side effects depending on when a static constructor ran).

You can probably improve the performance of the method EnsurePredefined by also checking if _formatList is null before locking (It would be: If _formatList is null -> Lock -> If _formatList is still null -> Initialize). This is a very common pattern and a cheap way to avoid locking for no reason.

Other than that, the other changes LGTM.

@miloush
Copy link
Contributor
miloush commented Jun 6, 2024

I'm actually testing waters to be entirely honest. As for code style, I'm happy to adapt if we get proper guidelines.

I am not saying these cannot or should not change, but indeed guidelines should be in place.

I have to admit I am a bit puzzled by the original code, I am having difficult time imagining a scenario where case sensitive search through data formats would provide a performance improvement worth the complexity but not a hardcoded switch. The only situation I can think of this needs to be done quickly is if it is executed on every mouse move during dragging.

I will also note that in general, this change would cause different format to match if they differ by case only, however RegisterClipboardFormat is documented as case-insensitive, so this seems like a fair change.

@h3xds1nz
Copy link
Member Author
h3xds1nz commented Jun 6, 2024

Earlier in the days the CI search essentially meant two different compares where CI was way more expensive as you couldn't just load a register and OR the two masks. However I think the main intention was to optimize for the internal library use where the formats are pre-defined. Now I find this irrelevant and as you've perfectly noted, formats are CI on their own.

As per the static constructor, I understand that the PInvoke call might be of concern (was originally mine as well) however we do not depend on any state of the call there, either it registers or it doesn't. Even though 'popular', I've always found the double checking a bit of an anti-pattern, where static constructor is clear and expressive. It also removes _formatList from being readonly, which means that currently the null check itself can never be optimized out unless I've missed something.

@h3xds1nz h3xds1nz changed the title [DataFormats.cs] Replace ArrayList with List<DataFormat> and refactor the class Replace ArrayList in DataFormats with List<DataFormat> and refactor the class Sep 1, 2024
@h3xds1nz h3xds1nz force-pushed the remove-arraylist-from-dataformats branch from 6538173 to 2960b60 Compare October 16, 2024 21:23
@h3xds1nz h3xds1nz requested a review from a team as a code owner October 16, 2024 21:23
@h3xds1nz h3xds1nz mentioned this pull request Jan 3, 2025
< 9AA7 div class="flex-shrink-0 my-1 my-md-0 ml-md-3"> Merged
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Contribution A label for all community Contributions Included in test pass PR metadata: Label to tag PRs, to facilitate with triage Status:Completed Status:Proposed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0