10000 Revert "Test width = sum(grapheme cluster widths)" · unicode-rs/unicode-width@ded852c · GitHub
[go: up one dir, main page]

Skip to content

Commit ded852c

Browse files
Revert "Test width = sum(grapheme cluster widths)"
This reverts commit a7a1056.
1 parent 6edfc60 commit ded852c

File tree

4 files changed

+5
-1291
lines changed

4 files changed

+5
-1291
lines changed

scripts/unicode.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -754,9 +754,8 @@ def main(module_path: str):
754754
{EffectiveWidth.NARROW, EffectiveWidth.AMBIGUOUS},
755755
)
756756

757-
# Download files for use by tests
757+
# Download normalization test file for use by tests
758758
fetch_open("NormalizationTest.txt", "../tests/")
759-
fetch_open("auxiliary/GraphemeBreakTest.txt", "../tests/")
760759

761760
print("------------------------")
762761
total_size = 0

src/lib.rs

Lines changed: 4 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,7 @@
3030
//! # Rules for determining width
3131
//!
3232
//! This crate currently uses the following rules to determine the width of a
33-
//! character or string, in order of decreasing precedence. These may be tweaked in the future;
34-
//! however see [guarantees](#guarantees) below.
33+
//! character or string, in order of decreasing precedence. These may be tweaked in the future.
3534
//!
3635
//! 1. [Emoji presentation sequences] have width 2.
3736
//! 2. Outside of an East Asian context, [text presentation sequences] have width 1
@@ -77,16 +76,10 @@
7776
//!
7877
//! [Enclosed Ideographic Supplement]: https://unicode.org/charts/PDF/U1F200.pdf
7978
//!
80-
//! ## Guarantees
79+
//! ## Canonical equivalence
8180
//!
82-
//! - Any two canonically equivalent strings have the same non-CJK width.
83-
//! This will not change in any future semver-compatible version.
84-
//! (This guarantee does not currently hold for the CJK width variants.)
85-
//! - The width of any string equals the sum of the widths of its [extended grapheme clusters].
86-
//! This is unlikely to change in any future semver-compatible version.
87-
//! (This guarantee holds for both CJK and non-CJK width.)
88-
//!
89-
//! [extended grapheme clusters]: https://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries
81+
//! The non-CJK width methods guarantee that canonically equivalent strings are assigned the same width.
82+
//! However, this guarantee does not currently hold for the CJK width variants.
9083
9184
#![forbid(unsafe_code)]
9285
#![deny(missing_docs)]
@@ -102,14 +95,6 @@ pub use tables::UNICODE_VERSION;
10295
mod tables;
10396

10497
/// Methods for determining displayed width of Unicode characters.
105-
///
106-
/// **NB:** the width of a string may differ from the sum of the widths of its characters;
107-
/// see the [crate-level documentation](crate#rules-for-determining-width) for more.
108-
/// Instead of working with individual characters, consider using [extended grapheme clusters],
109-
/// perhaps with the [`unicode-segmentation`] crate.
110-
///
111-
/// [extended grapheme clusters]: https://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries
112-
/// [`unicode-segmentation`]: https://docs.rs/unicode-segmentation/latest/unicode_segmentation/trait.UnicodeSegmentation.html#tymethod.graphemes
11398
pub trait UnicodeWidthChar {
11499
/// Returns the character's displayed width in columns, or `None` if the
115100
/// character is a control character.

0 commit comments

Comments
 (0)
0