8000 ICU-20392 Split the Locale payload into nested and heap allocated by roubert · Pull Request #3518 · unicode-org/icu · GitHub
[go: up one dir, main page]

Skip to content

ICU-20392 Split the Locale payload into nested and heap allocated #3518

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

roubert
Copy link
Member
@roubert roubert commented Jun 6, 2025

All the most commonly used Locale objects have very little payload, most of them don't use any extensions, don't use a language tag longer than 3 characters and don't use more than a single variant.

There's room for all that data in a simple 32 byte large payload object, which can be nested directly in the Locale object.

Any payload larger than that can instead be heap allocated as needed, in order to save storage for the most commonly used objects while retaining the ability to create arbitrarily large and complex Locale objects.

This reduces the storage requirements for all Locale objects.

For nested payloads, this reduction is from 224 bytes to 48 bytes.

For payloads that need to be heap allocated, the reduction depends on several factors, but for most cases there's some reduction. There are also cases where this refactoring actually increases the storage used, because CharString allocates more storage than necessary. There are a number of ways in which this could be improved upon, such as optimizing CharString to not allocate more than necessary when copying a string of known length, not allocating any empty CharString objects or possibly replacing CharString with a new class for fixed length strings.

The public API remains unchanged but the operations which can lead to U_MEMORY_ALLOCATION_ERROR change.

Checklist

  • Required: Issue filed: ICU-20392
  • Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable

ALLOW_MANY_COMMITS=true

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/common/locid.cpp is different
  • icu4c/source/common/unicode/locid.h is different
  • icu4c/source/test/depstest/dependencies.txt is now changed in the branch
  • icu4c/source/test/depstest/depstest.py is now changed in the branch

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/common/unicode/locid.h is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/common/unicode/locid.h is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/common/unicode/locid.h is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/common/unicode/locid.h is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/common/unicode/locid.h is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/common/unicode/locid.h is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/common/unicode/locid.h is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/common/unicode/locid.h is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/common/unicode/locid.h is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/common/unicode/locid.h is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

All the most commonly used Locale objects have very little payload, most
of them don't use any extensions, don't use a language tag longer than 3
characters and don't use more than a single variant.

There's room for all that data in a simple 32 byte large payload object,
which can be nested directly in the Locale object.

Any payload larger than that can instead be heap allocated as needed, in
order to save storage for the most commonly used objects while retaining
the ability to create arbitrarily large and complex Locale objects.

This reduces the storage requirements for all Locale objects.

For nested payloads, this reduction is from 224 bytes to 48 bytes.

For payloads that need to be heap allocated, the reduction depends on
several factors, but for most cases there's some reduction. There are
also cases where this refactoring actually increases the storage used,
because CharString allocates more storage than necessary. There are a
number of ways in which this could be improved upon, such as optimizing
CharString to not allocate more than necessary when copying a string of
known length, not allocating any empty CharString objects or possibly
replacing CharString with a new class for fixed length strings.

The public API remains unchanged but the operations which can lead to
U_MEMORY_ALLOCATION_ERROR change.
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/common/unicode/locid.h is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0