8000 GitHub - cmyr/unicode-segmentation at 8bac7c72ddd70426acfe1e58545cdd1694c61d88
[go: up one dir, main page]

Skip to content
  • Insights
  • cmyr/unicode-segmentation

    Folders and files

    NameName
    Last commit message
    Last commit date

    Latest commit

     

    History

    42 Commits
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     

    Repository files navigation

    Iterators which split strings on Grapheme Cluster or Word boundaries, according to the Unicode Standard Annex #29 rules.

    Build Status

    Documentation

    extern crate unicode_segmentation;
    
    use unicode_segmentation::UnicodeSegmentation;
    
    fn main() {
        let s = "a̐éö̲\r\n";
        let g = UnicodeSegmentation::graphemes(s, true).collect::<Vec<&str>>();
        let b: &[_] = &["a̐", "é", "ö̲", "\r\n"];
        assert_eq!(g, b);
    
        let s = "The quick (\"brown\") fox can't jump 32.3 feet, right?";
        let w = s.unicode_words().collect::<Vec<&str>>();
        let b: &[_] = &["The", "quick", "brown", "fox", "can't", "jump", "32.3", "feet", "right"];
        assert_eq!(w, b);
    
        let s = "The quick (\"brown\")  fox";
        let w = s.split_word_bounds().collect::<Vec<&str>>();
        let b: &[_] = &["The", " ", "quick", " ", "(", "\"", "brown", "\"", ")", " ", " ", "fox"];
        assert_eq!(w, b);
    }

    no_std

    unicode-segmentation does not depend on libstd, so it can be used in crates with the #![no_std] attribute.

    crates.io

    You can use this package in your project by adding the following to your Cargo.toml:

    [dependencies]
    unicode-segmentation = "0.1.3"

    About

    Grapheme Cluster and Word boundaries according to UAX#29 rules

    Resources

    License

    Apache-2.0, MIT licenses found

    Licenses found

    Apache-2.0
    LICENSE-APACHE
    MIT
    LICENSE-MIT

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published

    Languages

    • Rust 95.3%
    • Python 4.7%
    0