Port/rewrite builtins/string to Rust by henrikhorluck · Pull Request #9854 · fish-shell/fish-shell

henrikhorluck · 2023-06-20T12:13:23Z

- For regex replace/subsitution, I'm planning on going through rust-pcre2
- This requires a patched rust-pcre2 that makes Captures public to build (so far)
- I'm not a particular fan of how each subcommand was set up before or with this new setup, still a lot of boilerplate/duplication

Individual sub-commands to review:

Please feel free to only look at the overall setup/some individual subcommands and not review everything at once, this is quite a lot

Merging is blocked on fish-shell/rust-pcre2#1, unless you want to always point to my fork

fish-rust/src/builtins/string.rs

faho · 2023-06-22T07:12:58Z

builtin_string is still used in test-setup https://github.com/fish-shell/fish-shell/blob/master/src/fish_tests.cpp#L5088, replacing it is a bit convoluted

It's fine to remove those if we have equivalent tests elsewhere - either as a unit test or as an integration test, for everything that's still sensible to test.

A bunch of these cases are already covered by the littlecheck tests.

This is now partly blocked on #9739, as most of the remaining failing tests are because of it

Ah. It crashes when it tries to output a NUL, which we need to allow.

I had so far only seen it try that when it got a NUL passed through as an argument, which in the C++ version is impossible (because we pass args as a C-style NUL-delimited array).

But string can also get a NUL from its stdin, and is supposed to handle that sensibly, so we need to allow that (and then, possibly, to avoid a behavior change truncate after a NUL for arguments passed to a builtin).

The error is in

fish-shell/fish-rust/src/builtins/shared.rs

Lines 90 to 92 in a75de42

    
               pub fn append<Str: AsRef<wstr>>(&mut self, s: Str) -> bool { 
        
                   self.ffi().append1(c_str!(s)) 
        
               }

, which tries to append a "C string" (with NUL-terminator).

That's bogus, the output stream can and should get NULs. I'll see if I can figure it out - the c_str! needs to go and be replaced with something that gets us a CxxWString with the NULs intact.

It's also fine to disable these tests for now, so you're unblocked.

Thoughts on using PCRE2_LITERAL to handle the literal case of StringReplacer::Literal? It would remove quite a bit of code, and quite trivial to implement support in rust-pcre2

We've been trying to first do as direct a port as possible. It's fine to do this later, but divide-and-conquer - first port what we have, then think about making it nice.

faho · 2023-06-22T08:12:25Z

Try this for append():

use crate::wchar_ffi::ToCppWString;

// ...

pub fn append<Str: AsRef<wstr> + ToCppWString>(&mut self, s: Str) -> bool {
    self.ffi().append(&s.into_cpp())
}

This builds and runs for me, and causes echo a\x00b to print a, a NUL and then a b.

Edit: Now available as #9859.

fish-rust/src/builtins/tests/string_tests.rs

ridiculousfish · 2023-06-26T02:11:51Z

This is an incredible amount of work! Thank you!!

I'd very much appreciate some feedback/potential alternatives to the whole subcommand-setup, I'm not a fan or either the old one or the new, old because it mixed all the command-options, and new because it's quite a bit of boilerplate.

One possibility to avoid the whole object-safe nonsense is to make it concrete, via a generic handler, and then maybe use match:

impl SubCommandRunner {

     fn run<T: StringSubCommand>(&mut self, cmd: T) -> Option<c_int> { ... }

    fn run_subcommand(&mut self, cmd: &str) {
        match cmd {
            "collect" => self.run(Collect::default()),
            "escape" => self.run(Escape::default()),
            ...
        }
    }

The advantage is that you don't have to sweat "object safe;" it does require an awkward conversion to &str but that's not a big deal since we are matching ASCII only. I'm not sure how much boilerplate it removes though.

fish-rust/src/builtins/tests/string_tests.rs

fish-rust/src/builtins/string/split.rs

src/builtins/string.cpp

fish-rust/src/builtins/string/match.rs

krobelus · 2023-07-15T07:48:00Z

fish-rust/src/builtins/string.rs

+        let mut args_read = Vec::with_capacity(args.len());
+        args_read.extend_from_slice(args);
+
+        let mut w = wgetopter_t::new(Self::SHORT_OPTIONS, Self::LONG_OPTIONS, args);


The copy of args into args_read is ugly.
wgetopter_t wants exclusive access to args (because it will rearrange args so positionals are next to each other).
How about giving wgetopt accessor methods like w.cmd() and w.argv()?

There's another issue: args: &mut [&wstr] has the wrong type (for all builtins).
In practice it is always underpinned by a &mut Vec<WString> we should change
it to args: &mut [WString] to avoid the need to allocate a slice-of-slices;

Here's some of that change I made a while ago:

Details

diff --git a/fish-rust/src/wgetopt.rs b/fish-rust/src/wgetopt.rs index bc4224c98..0ff56323f 100644 --- a/fish-rust/src/wgetopt.rs +++ b/fish-rust/src/wgetopt.rs @@ -23,7 +23,9 @@ License along with the GNU C Library; see the file COPYING.LIB. If not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ -use crate::wchar::{wstr, WExt, L}; +use std::ops; + +use crate::wchar::{wstr, WExt, WString, L}; /// Describe how to deal with options that follow non-option ARGV-elements. /// @@ -64,14 +66,14 @@ fn empty_wstr() -> &'static wstr { Default::default() } -pub struct wgetopter_t<'opts, 'args, 'argarray> { +pub struct wgetopter_t<'opts, 'argarray> { /// Argv. - argv: &'argarray mut [&'args wstr], + argv: &'argarray mut [WString], /// For communication from `getopt` to the caller. When `getopt` finds an option that takes an /// argument, the argument value is returned here. Also, when `ordering` is RETURN_IN_ORDER, each /// non-option ARGV-element is returned here. - pub woptarg: Option<&'args wstr>, + woptarg_idx: Option<(usize, usize)>, shortopts: &'opts wstr, longopts: &'opts [woption<'opts>], @@ -80,7 +82,13 @@ pub struct wgetopter_t<'opts, 'args, 'argarray> { /// returned was found. This allows us to pick up the scan where we left off. /// /// If this is empty, it means resume the scan by advancing to the next ARGV-element. + nextchar_slice: ops::Range<usize>, /// Index in ARGV of the next element to be scanned. This is used for communication to and from /// the caller and for communication between successive calls to `getopt`. @@ -153,11 +161,11 @@ pub const fn wopt(name: &wstr, has_arg: woption_argument_t, val: char) -> woptio woption { name, has_arg, val } } -impl<'opts, 'args, 'argarray> wgetopter_t<'opts, 'args, 'argarray> { +impl<'opts, 'argarray> wgetopter_t<'opts, 'argarray> { pub fn new( shortopts: &'opts wstr, longopts: &'opts [woption], - argv: &'argarray mut [&'args wstr], + argv: &'argarray mut [WString], ) -> Self { return wgetopter_t { woptopt: '?', @@ -168,9 +176,9 @@ impl<'opts, 'args, 'argarray> wgetopter_t<'opts, 'args, 'argarray> { initialized: false, last_nonopt: 0, missing_arg_return_colon: false, - nextchar: Default::default(), + nextchar_slice: 0..0, ordering: Ordering::PERMUTE, - woptarg: None, + woptarg_idx: None, woptind: 0, }; } @@ -185,6 +193,23 @@ impl<'opts, 'args, 'argarray> wgetopter_t<'opts, 'args, 'argarray> { return self._wgetopt_internal(opt_index, false); } + pub fn woptarg(&self) -> Option<&wstr> { + self.woptarg_idx + .map(|(woptind, start)| &self.argv[woptind][start..]) + } + + pub fn cmd(&self) -> &wstr { + &self.argv[0] + } + + pub fn argv(&self) -> &[WString] { + self.argv + } + + pub fn nextchar(&self) -> &wstr { + &self.argv[self.woptind][self.nextchar_slice.clone()] + } + /// \return the number of arguments. fn argc(&self) -> usize { return self.argv.len(); @@ -241,7 +266,7 @@ impl<'opts, 'args, 'argarray> wgetopter_t<'opts, 'args, 'argarray> { self.first_nonopt = 1; self.last_nonopt = 1; self.woptind = 1; - self.nextchar = empty_wstr(); + self.nextchar_slice = 0..0; let mut optstring = self.shortopts; @@ -321,7 +346,7 @@ impl<'opts, 'args, 'argarray> wgetopter_t<'opts, 'args, 'argarray> { if self.ordering == Ordering::REQUIRE_ORDER { return None; } - self.woptarg = Some(self.argv[self.woptind]); + self.woptarg_idx = Some((self.woptind, 0)); self.woptind += 1; return Some(char::from(1)); } @@ -332,15 +357,15 @@ impl<'opts, 'args, 'argarray> wgetopter_t<'opts, 'args, 'argarray> { } else { 1 }; - self.nextchar = self.argv[self.woptind][skip..].into(); + self.nextchar_slice = skip..(self.argv[self.woptind].len()); return Some(char::from(0)); } /// Check for a matching short opt. fn _handle_short_opt(&mut self) -> char { // Look at and handle the next short option-character. - let mut c = self.nextchar.char_at(0); - self.nextchar = &self.nextchar[1..]; + let mut c = self.nextchar().char_at(0); + self.nextchar_slice = self.nextchar_slice.start + 1..self.nextchar_slice.end; let temp = match self.shortopts.chars().position(|sc| sc == c) { Some(pos) => &self.shortopts[pos..], @@ -348,14 +373,14 @@ impl<'opts, 'args, 'argarray> wgetopter_t<'opts, 'args, 'argarray> { }; // Increment `woptind' when we start to process its last character. - if self.nextchar.is_empty() { + if self.nextchar().is_empty() { self.woptind += 1; } if temp.is_empty() || c == ':' { self.woptopt = c; - if !self.nextchar.is_empty() { + if !self.nextchar().is_empty() { self.woptind += 1; } return '?'; @@ -367,17 +392,17 @@ impl<'opts, 'args, 'argarray> wgetopter_t<'opts, 'args, 'argarray> { if temp.char_at(2) == ':' { // This is an option that accepts an argument optionally. - if !self.nextchar.is_empty() { - self.woptarg = Some(self.nextchar); + if !self.nextchar().is_empty() { + self.woptarg_idx = Some((self.woptind, 0)); self.woptind += 1; } else { - self.woptarg = None; + self.woptarg_idx = None; } - self.nextchar = empty_wstr(); + self.nextchar_slice = 0..0; } else { // This is an option that requires an argument. - if !self.nextchar.is_empty() { - self.woptarg = Some(self.nextchar); + if !self.nextchar().is_empty() { + self.woptarg_idx = Some((self.woptind, 0)); // If we end this ARGV-element by taking the rest as an arg, we must advance to // the next element now. self.woptind += 1; @@ -391,10 +416,10 @@ impl<'opts, 'args, 'argarray> wgetopter_t<'opts, 'args, 'argarray> { } else { // We already incremented `woptind' once; increment it again when taking next // ARGV-elt as argument. - self.woptarg = Some(self.argv[self.woptind]); + self.woptarg_idx = Some((self.woptind, 0)); self.woptind += 1; } - self.nextchar = empty_wstr(); + self.nextchar_slice = 0..0; } return c; @@ -409,21 +434,23 @@ impl<'opts, 'args, 'argarray> wgetopter_t<'opts, 'args, 'argarray> { retval: &mut char, ) { self.woptind += 1; - assert!(self.nextchar.char_at(nameend) == '\0' || self.nextchar.char_at(nameend) == '='); - if self.nextchar.char_at(nameend) == '=' { + assert!( + self.nextchar().char_at(nameend) == '\0' || self.nextchar().char_at(nameend) == '=' + ); + if self.nextchar().char_at(nameend) == '=' { if pfound.has_arg != woption_argument_t::no_argument { - self.woptarg = Some(self.nextchar[(nameend + 1)..].into()); + self.woptarg_idx = Some((self.woptind, nameend + 1)); } else { - self.nextchar = empty_wstr(); + self.nextchar_slice = 0..0; *retval = '?'; return; } } else if pfound.has_arg == woption_argument_t::required_argument { if self.woptind < self.argc() { - self.woptarg = Some(self.argv[self.woptind]); + self.woptarg_idx = Some((self.woptind, 0)); self.woptind += 1; } else { - self.nextchar = empty_wstr(); + self.nextchar_slice = 0..0; *retval = if self.missing_arg_return_colon { ':' } else { @@ -433,7 +460,7 @@ impl<'opts, 'args, 'argarray> wgetopter_t<'opts, 'args, 'argarray> { } } - self.nextchar = empty_wstr(); + self.nextchar_slice = 0..0; *longind = option_index; *retval = pfound.val; } @@ -451,7 +478,7 @@ impl<'opts, 'args, 'argarray> wgetopter_t<'opts, 'args, 'argarray> { // Test all long options for either exact match or abbreviated matches. for (option_index, p) in self.longopts.iter().enumerate() { // Check if current option is prefix of long opt - if p.name.starts_with(&self.nextchar[..nameend]) { + if p.n 4D24 ame.starts_with(&self.nextchar()[..nameend]) { if nameend == p.name.len() { // The current option is exact match of this long option pfound = Some(*p); @@ -483,14 +510,14 @@ impl<'opts, 'args, 'argarray> wgetopter_t<'opts, 'args, 'argarray> { let mut indfound: usize = 0; let mut nameend = 0; - while self.nextchar.char_at(nameend) != '\0' && self.nextchar.char_at(nameend) != '=' { + while self.nextchar().char_at(nameend) != '\0' && self.nextchar().char_at(nameend) != '=' { nameend += 1; } let pfound = self._find_matching_long_opt(nameend, &mut exact, &mut ambig, &mut indfound); if ambig && !exact { - self.nextchar = empty_wstr(); + self.nextchar_slice = 0..0; self.woptind += 1; *retval = '?'; return true; @@ -509,9 +536,9 @@ impl<'opts, 'args, 'argarray> wgetopter_t<'opts, 'args, 'argarray> { || !self .shortopts .as_char_slice() - .contains(&self.nextchar.char_at(0)) + .contains(&self.nextchar().char_at(0)) { - self.nextchar = empty_wstr(); + self.nextchar_slice = 0..0; self.woptind += 1; *retval = '?'; return true; @@ -541,7 +568,7 @@ impl<'opts, 'args, 'argarray> wgetopter_t<'opts, 'args, 'argarray> { /// If a char in OPTSTRING is followed by a colon, that means it wants an arg, so the following text /// in the same ARGV-element, or the text of the following ARGV-element, is returned in `optarg`. /// Two colons mean an option that wants an optional arg; if there is text in the current - /// ARGV-element, it is returned in `w.woptarg`, otherwise `w.woptarg` is set to zero. + /// ARGV-element, it is returned in `w.woptarg()`, otherwise `w.woptarg()` is set to zero. /// /// If OPTSTRING starts with `-` or `+', it requests different methods of handling the non-option /// ARGV-elements. See the comments about RETURN_IN_ORDER and REQUIRE_ORDER, above. @@ -563,9 +590,9 @@ impl<'opts, 'args, 'argarray> wgetopter_t<'opts, 'args, 'argarray> { if !self.initialized { self._wgetopt_initialize(); } - self.woptarg = None; + self.woptarg_idx = None; - if self.nextchar.is_empty() { + if self.nextchar().is_empty() { let narg = self._advance_to_next_argv(); if narg != Some(char::from(0)) { return narg; @@ -586,7 +613,7 @@ impl<'opts, 'args, 'argarray> wgetopter_t<'opts, 'args, 'argarray> { // // This distinction seems to be the most useful approach. if !self.longopts.is_empty() && self.woptind < self.argc() { - let arg = self.argv[self.woptind]; + let arg = &self.argv[self.woptind]; #[allow(clippy::if_same_then_else)] #[allow(clippy::needless_bool)]

I agree the cloning is ugly, but this is independent of this PR and present in other subcommands as well, I would prefer to not batch that into this already quite large change

fish-rust/src/builtins/string.rs

fish-rust/src/builtins/string/collect.rs

fish-rust/src/builtins/string/length.rs

fish-rust/src/builtins/string/match.rs

krobelus · 2023-07-15T18:17:29Z

I've reviewed everything except match and replace.
Github rejects my prr comments so I'm pasting my comments here.
It's in standard diff format so it should be easy to go-to-source etc.

Details

> diff --git a/fish-rust/src/builtins/string.rs b/fish-rust/src/builtins/string.rs
> new file mode 100644
> index 00000000000..e32a46ee2d9
> --- /dev/null
> +++ b/fish-rust/src/builtins/string.rs
> @@ -0,0 +1,493 @@
> +use std::borrow::Cow;
> +use std::fs::File;
> +use std::io::{BufRead, BufReader, Read};
> +use std::os::fd::FromRawFd;
> +
> +use crate::builtins::shared::{
> +    builtin_missing_argument, builtin_print_error_trailer, builtin_print_help, io_streams_t,
> +    BUILTIN_ERR_INVALID_SUBCMD, BUILTIN_ERR_MISSING_SUBCMD, BUILTIN_ERR_NOT_NUMBER,
> +    BUILTIN_ERR_TOO_MANY_ARGUMENTS, BUILTIN_ERR_UNKNOWN, STATUS_CMD_OK, STATUS_INVALID_ARGS,
> +};
> +use crate::common::str2wcstring;
> +use crate::ffi::parser_t;
> +use crate::wchar::{wstr, WString, L};
> +use crate::wchar_ext::WExt;
> +use crate::wcstringutil::fish_wcwidth_visible;
> +use crate::wgetopt::{wgetopter_t, woption};
> +use crate::wutil::{sprintf, wgettext_fmt};
> +use libc::c_int;
> +
> +mod collect;
> +mod escape;
> +mod join;
> +mod length;
> +mod r#match;
> +mod pad;
> +mod repeat;
> +mod replace;
> +mod shorten;
> +mod split;
> +mod sub;
> +mod transform;
> +mod trim;
> +mod unescape;
> +
> +macro_rules! string_error {
> +    (
> +    $streams:expr,
> +    $string:expr
> +    $(, $args:expr)+
> +    $(,)?
> +    ) => {
> +        $streams.err.append(L!("string "));
> +        $streams.err.append(wgettext_fmt!($string, $($args),*));
> +    };
> +}
> +pub(self) use string_error;
> +
> +fn string_unknown_option(
> +    parser: &mut parser_t,
> +    streams: &mut io_streams_t,
> +    subcmd: &wstr,
> +    opt: &wstr,
> +) {
> +    string_error!(streams, BUILTIN_ERR_UNKNOWN, subcmd, opt);
> +    builtin_print_error_trailer(parser, streams, L!("string"));
> +}
> +
> +trait StringSubCommand<'args> {
> +    const SHORT_OPTIONS: &'static wstr;
> +    const LONG_OPTIONS: &'static [woption<'static>];
> +
> +    /// Parse and store option specificed by the associated short or long option.
> +    fn parse_opt(&mut self, optarg: Option<&'args wstr>, c: char) -> Result<(), StringError>;
> +
> +    fn parse_opts(
> +        &mut self,
> +        args: &mut [&'args wstr],
> +        parser: &mut parser_t,
> +        streams: &mut io_streams_t,
> +    ) -> Result<usize, Option<c_int>> {
> +        let cmd = args[0];
> +        let mut args_read = Vec::with_capacity(args.len());
> +        args_read.extend_from_slice(args);
> +
> +        let mut w = wgetopter_t::new(Self::SHORT_OPTIONS, Self::LONG_OPTIONS, args);
> +        while let Some(c) = w.wgetopt_long() {
> +            match c {
> +                ':' => {
> +                    streams.err.append(L!("string ")); // clone of string_error
> +                    builtin_missing_argument(parser, streams, cmd, args_read[w.woptind - 1], false);
> +                    return Err(STATUS_INVALID_ARGS);
> +                }
> +                '?' => {
> +                    string_unknown_option(parser, streams, cmd, args_read[w.woptind - 1]);
> +                    return Err(STATUS_INVALID_ARGS);
> +                }
> +                c => {
> +                    let retval = self.parse_opt(w.woptarg, c);
> +                    if let Err(e) = retval {
> +                        let long_name = Self::LONG_OPTIONS.iter().find_map(|o| {
> +                            if o.val == c {
> +                                Some(o.name.to_owned())
> +                            } else {
> +                                None
> +                            }
> +                        });
> +                        let opt_name = long_name.unwrap_or_else(|| sprintf!("-%c", c));
> +                        e.print_error(&args_read, parser, streams, w.woptarg, w.woptind, &opt_name);
> +                        return Err(e.retval());
> +                    }
> +                }
> +            }
> +        }
> +
> +        return Ok(w.woptind);
> +    }
> +
> +    /// Take any positional arguments after options have been parsed.
> +    #[allow(unused_variables)]
> +    fn take_args(
> +        &mut self,
> +        optind: &mut usize,
> +        args: &[&'args wstr],
> +        streams: &mut io_streams_t,
> +    ) -> Option<c_int> {
> +        STATUS_CMD_OK
> +    }
> +
> +    /// Perform the business logic of the command.
> +    fn handle(
> +        &mut self,
> +        parser: &mut parser_t,
> +        streams: &mut io_streams_t,
> +        optind: &mut usize,
> +        args: &[&'args wstr],
> +    ) -> Option<c_int>;
> +
> +    fn run(
> +        &mut self,
> +        parser: &mut parser_t,
> +        streams: &mut io_streams_t,
> +        args: &mut [&'args wstr],
> +    ) -> Option<c_int> {
> +        if args.len() >= 3 && (args[2] == "-h" || args[2] == "--help") {
> +            let string_dash_subcmd = WString::from(args[0]) + L!("-") + args[1];
> +            builtin_print_help(parser, streams, &string_dash_subcmd);
> +            return STATUS_CMD_OK;
> +        }
> +
> +        let args = &mut args[1..];
> +
> +        let mut optind = match self.parse_opts(args, parser, streams) {
> +            Ok(optind) => optind,
> +            Err(retval) => return retval,
> +        };
> +
> +        let retval = self.take_args(&mut optind, args, streams);
> +        if retval != STATUS_CMD_OK {
> +            return retval;
> +        }
> +
> +        if streams.stdin_is_directly_redirected() && args.len() > optind {
> +            string_error!(streams, BUILTIN_ERR_TOO_MANY_ARGUMENTS, args[0]);
> +            return STATUS_INVALID_ARGS;
> +        }
> +
> +        return self.handle(parser, streams, &mut optind, args);
> +    }
> +}
> +
> +/// This covers failing argument/option parsing
> +enum StringError {
> +    InvalidArgs(&'static wstr),
> +    /// Like above but automatically `Invalid <opt_name> value '<arg>'`
> +    InvalidValue,
> +    NotANumber,
> +    UnknownOption,
> +}
> +
> +enum RegexError {
> +    Compile(WString, pcre2::Error),
> +    InvalidCaptureGroupName(WString),
> +    InvalidEscape(WString),
> +}
> +
> +impl RegexError {
> +    fn print_error(&self, args: &[&wstr], streams: &mut io_streams_t) {
> +        let cmd = args[0];
> +        use RegexError::*;
> +        match self {
> +            Compile(pattern, e) => {
> +                string_error!(
> +                    streams,
> +                    "%ls: Regular expression compile error: %ls\n",
> +                    cmd,
> +                    &WString::from(e.error_message())
> +                );
> +                string_error!(streams, "%ls: %ls\n", cmd, pattern);
> +                string_error!(streams, "%ls: %*ls\n", cmd, e.offset().unwrap(), "^");
> +            }
> +            InvalidCaptureGroupName(name) => {
> +                streams.err.append(wgettext_fmt!(
> +                    "Modification of read-only variable \"%ls\" is not allowed\n",
> +                    name
> +                ));
> +            }
> +            InvalidEscape(pattern) => {
> +                string_error!(
> +                    streams,
> +                    "%ls: Invalid escape sequence in pattern \"%ls\"\n",
> +                    cmd,
> +                    pattern
> +                );
> +            }
> +        }
> +    }
> +}
> +
> +impl From<crate::wutil::wcstoi::Error> for StringError {
> +    fn from(_: crate::wutil::wcstoi::Error) -> Self {
> +        StringError::NotANumber
> +    }
> +}
> +
> +impl From<std::num::TryFromIntError> for StringError {
> +    fn from(_: std::num::TryFromIntError) -> Self {
> +        // the result of `try_into` failing on the result of fish_wcsto{i,l}
> +        StringError::InvalidValue
> +    }
> +}
> +
> +impl From<&'static wstr> for StringError {
> +    fn from(s: &'static wstr) -> Self {
> +        StringError::InvalidArgs(s)
> +    }
> +}
> +
> +impl StringError {
> +    fn print_error(
> +        &self,
> +        args: &[&wstr],
> +        parser: &mut parser_t,
> +        streams: &mut io_streams_t,
> +        optarg: Option<&wstr>,
> +        optind: usize,
> +        option_name: &wstr,
> +    ) {
> +        let cmd = args[0];
> +        use StringError::*;
> +        match self {
> +            InvalidArgs(_) | InvalidValue => {
> +                let msg = match *self {
> +                    InvalidArgs(s) => s.to_owned(),
> +                    _ => sprintf!("Invalid %s value", option_name),
> +                };
> +                let error_msg = wgettext_fmt!("%ls: %ls '%ls'\n", cmd, msg, optarg.unwrap());

This translates only `"%ls: %ls '%ls'\n"`, not the dynamic messages.
We should rather put the already translated message inside `InvalidArgs`.
In practice it will probably work fine to store the translated string as `&'static wstr` in `InvalidArgs`, and format it here,
however that doesn't feel safe so I think we should store a WString that is already formatted.

> +
> +                streams.err.append(L!("string "));
> +                streams.err.append(error_msg);
> +            }
> +            NotANumber => {
> +                string_error!(streams, BUILTIN_ERR_NOT_NUMBER, cmd, optarg.unwrap());
> +            }
> +            UnknownOption => {
> +                string_unknown_option(parser, streams, cmd, args[optind - 1]);
> +            }
> +        }
> +    }
> +
> +    fn retval(&self) -> Option<c_int> {
> +        STATUS_INVALID_ARGS
> +    }
> +}
> +
> +#[derive(Default, PartialEq, Clone, Copy)]
> +enum Direction {
> +    #[default]
> +    Left,
> +    Right,
> +}
> +
> +pub(self) fn width_without_escapes(ins: &wstr, start_pos: usize) -> usize {
> +    let mut width = 0i32;
> +    for c in ins[start_pos..].chars() {
> +        let w = fish_wcwidth_visible(c);
> +        // We assume that this string is on its own line,
> +        // in which case a backslash can't bring us below 0.
> +        if w > 0 || width > 0 {
> +            width += w;
> +        }
> +    }
> +    // ANSI escape sequences like \e\[31m contain printable characters. Subtract their width
> +    // because they are not rendered.
> +    let mut pos = start_pos;
> +    while let Some(ec_pos) = ins.slice_from(pos).find_char('\x1B') {
> +        pos += ec_pos;
> +        if let Some(len) = escape_code_length(ins.slice_from(pos)) {
> +            let sub = ins.slice_from(pos).slice_to(len);

I vote for using bracket indexing for the sake of consistency

> diff --git a/fish-rust/src/builtins/string/join.rs b/fish-rust/src/builtins/string/join.rs
> new file mode 100644
> index 00000000000..19dae7f9060
> --- /dev/null
> +++ b/fish-rust/src/builtins/string/join.rs
> @@ -0,0 +1,106 @@
> +use super::{string_error, Arguments, StringError, StringSubCommand};
> +use crate::builtins::shared::{
> +    io_streams_t, BUILTIN_ERR_ARG_COUNT0, STATUS_CMD_ERROR, STATUS_CMD_OK, STATUS_INVALID_ARGS,
> +};
> +use crate::ffi::parser_t;
> +use crate::wchar::{wstr, L};
> +use crate::wgetopt::{wopt, woption, woption_argument_t::*};
> +use crate::wutil::wgettext_fmt;
> +
> +pub struct Join<'args> {
> +    quiet: bool,
> +    no_empty: bool,
> +    pub is_join0: bool,
> +    sep: &'args wstr,
> +}
> +
> +impl Default for Join<'_> {
> +    fn default() -> Self {
> +        Self {
> +            quiet: false,
> +            no_empty: false,
> +            is_join0: false,
> +            sep: L!("\0"),
> +        }
> +    }
> +}
> +
> +impl<'args> StringSubCommand<'args> for Join<'args> {
> +    const LONG_OPTIONS: &'static [woption<'static>] = &[
> +        wopt(L!("quiet"), no_argument, 'q'),
> +        wopt(L!("no-empty"), no_argument, 'n'),
> +    ];
> +    const SHORT_OPTIONS: &'static wstr = L!(":qn");
> +
> +    fn parse_opt(&mut self, _optarg: Option<&wstr>, c: char) -> Result<(), StringError> {
> +        match c {
> +            'q' => self.quiet = true,
> +            'n' => self.no_empty = true,
> +            _ => return Err(StringError::UnknownOption),
> +        }
> +        return Ok(());
> +    }
> +
> +    fn take_args(
> +        &mut self,
> +        optind: &mut usize,
> +        args: &[&'args wstr],
> +        streams: &mut io_streams_t,
> +    ) -> Option<libc::c_int> {
> +        if self.is_join0 {
> +            return STATUS_CMD_OK;
> +        }
> +
> +        let Some(arg) = args.get(*optind).copied() else {
> +           string_error!(streams, BUILTIN_ERR_ARG_COUNT0, args[0]);
> +           return STATUS_INVALID_ARGS;
> +       };
> +        *optind += 1;
> +        self.sep = arg;
> +
> +        STATUS_CMD_OK
> +    }
> +
> +    fn handle(
> +        &mut self,
> +        _parser: &mut parser_t,
> +        streams: &mut io_streams_t,
> +        optind: &mut usize,
> +        args: &[&wstr],
> +    ) -> Option<libc::c_int> {
> +        let sep = &self.sep;
> +        let mut nargs = 0usize;
> +        let mut print_newline = true;

`print_newline` is false only in a really special case (when a final newline is missing
from stdin). Let's give it a better name (everywhere) to communicate that it only applies
to the last line. maybe `print_trailing_newline` or `terminate_last_line`.

> diff --git a/fish-rust/src/builtins/string/pad.rs b/fish-rust/src/builtins/string/pad.rs
> new file mode 100644
> index 00000000000..4f1ca2fffa3
> --- /dev/null
> +++ b/fish-rust/src/builtins/string/pad.rs
> @@ -0,0 +1,104 @@
> +use std::borrow::Cow;
> +
> +use super::{width_without_escapes, Arguments, Direction, StringError, StringSubCommand};
> +use crate::builtins::shared::{io_streams_t, STATUS_CMD_OK};
> +use crate::fallback::fish_wcwidth;
> +use crate::ffi::parser_t;
> +use crate::wchar::{wstr, WString, L};
> +use crate::wgetopt::{wopt, woption, woption_argument_t::*};
> +use crate::wutil::fish_wcstol;
> +
> +pub struct Pad {
> +    char_to_pad: char,
> +    pad_char_width: usize,
> +    pad_from: Direction,
> +    width: usize,
> +}
> +
> +impl Default for Pad {
> +    fn default() -> Self {
> +        Self {
> +            char_to_pad: ' ',
> +            pad_char_width: 1,
> +            pad_from: Direction::Left,
> +            width: 0,
> +        }
> +    }
> +}
> +
> +impl StringSubCommand<'_> for Pad {
> +    const LONG_OPTIONS: &'static [woption<'static>] = &[
> +        wopt(L!("quiet"), no_argument, 'q'),
> +        wopt(L!("right"), no_argument, 'r'),
> +        wopt(L!("width"), required_argument, 'w'),
> +    ];
> +    const SHORT_OPTIONS: &'static wstr = L!(":c:qrw:");
> +
> +    fn parse_opt(&mut self, optarg: Option<&wstr>, c: char) -> Result<(), StringError> {
> +        match c {
> +            'c' => {
> +                let [pad_char] = optarg.unwrap().as_char_slice() else {
> +                    return Err(L!("Padding should be a character").into());

This error message is missing both translation support, see my other comment about wgettext_fmt. Maybe we should keep using string_error here.

> +                };
> +                let pad_char_width = fish_wcwidth(*pad_char);
> +                if pad_char_width <= 0 {
> +                    return Err(L!("Invalid padding character of width zero").into());
> +                }
> +                self.pad_char_width = pad_char_width as usize;
> +                self.char_to_pad = *pad_char;
> +            }
> +            'r' => self.pad_from = Direction::Right,
> +            'w' => self.width = fish_wcstol(optarg.unwrap())?.try_into()?,
> +            _ => return Err(StringError::UnknownOption),
> +        }
> +        return Ok(());
> +    }
> +
> +    fn handle<'args>(
> +        &mut self,
> +        _parser: &mut parser_t,
> +        streams: &mut io_streams_t,
> +        optind: &mut usize,
> +        args: &[&'args wstr],
> +    ) -> Option<libc::c_int> {
> +        let mut max_width = 0usize;
> +        let mut inputs: Vec<(Cow<'args, wstr>, usize)> = Vec::new();
> +        let mut print_newline = true;
> +
> +        for (arg, went_newline) in Arguments::new(args, optind, streams, true) {

typo:

```suggestion
       for (arg, want_newline) in Arguments::new(args, optind, streams, true) {
```


> diff --git a/fish-rust/src/builtins/string/repeat.rs b/fish-rust/src/builtins/string/repeat.rs
> new file mode 100644
> index 00000000000..b1278b66f6c
> --- /dev/null
> +++ b/fish-rust/src/builtins/string/repeat.rs
> @@ -0,0 +1,142 @@
> +use super::{Arguments, StringError, StringSubCommand};
> +use crate::builtins::shared::{io_streams_t, STATUS_CMD_ERROR, STATUS_CMD_OK};
> +use crate::ffi::parser_t;
> +use crate::wchar::{wstr, WString, L};
> +use crate::wchar_ext::WExt;
> +use crate::wgetopt::{wopt, woption, woption_argument_t::*};
> +use crate::wutil::fish_wcstol;
> +
> +#[derive(Default)]
> +pub struct Repeat {
> +    count: usize,
> +    max: usize,
> +    quiet: bool,
> +    no_newline: bool,
> +}
> +
> +impl StringSubCommand<'_> for Repeat {
> +    const LONG_OPTIONS: &'static [woption<'static>] = &[
> +        wopt(L!("count"), required_argument, 'n'),
> +        wopt(L!("max"), required_argument, 'm'),
> +        wopt(L!("quiet"), no_argument, 'q'),
> +        wopt(L!("no-newline"), no_argument, 'N'),
> +    ];
> +    const SHORT_OPTIONS: &'static wstr = L!(":n:m:qN");
> +
> +    fn parse_opt(&mut self, optarg: Option<&wstr>, c: char) -> Result<(), StringError> {
> +        match c {
> +            'n' => self.count = fish_wcstol(optarg.unwrap())?.try_into()?,
> +            'm' => self.max = fish_wcstol(optarg.unwrap())?.try_into()?,
> +            'q' => self.quiet = true,
> +            'N' => self.no_newline = true,
> +            _ => return Err(StringError::UnknownOption),
> +        }
> +        return Ok(());
> +    }
> +
> +    fn handle(
> +        &mut self,
> +        _parser: &mut parser_t,
> +        streams: &mut io_streams_t,
> +        optind: &mut usize,
> +        args: &[&wstr],
> +    ) -> Option<libc::c_int> {
> +        if self.max == 0 && self.count == 0 {
> +            // XXX: This used to be allowed, but returned 1.
> +            // Keep it that way for now instead of adding an error.
> +            // streams.err.append(L"Count or max must be greater than zero");
> +            return STATUS_CMD_ERROR;
> +        }
> +
> +        let mut all_empty = true;
> +        let mut first = true;
> +        let mut print_newline = true;
> +
> +        for (w, want_newline) in Arguments::new(args, optind, streams, true) {
> +            print_newline = want_newline;
> +            if w.is_empty() {
> +                continue;
> +            }
> +
> +            all_empty = false;
> +
> +            if self.quiet {
> +                // Early out if we can - see #7495.
> +                return STATUS_CMD_OK;
> +            }
> +
> +            if !first {
> +                streams.out.append1('\n');
> +            }
> +            first = false;
> +
> +            // The maximum size of the string is either the "max" characters,
> +            // or it's the "count" repetitions, whichever ends up lower.
> +            let max = if self.max == 0 || (self.count > 0 && w.len() * self.count < self.max) {
> +                w.len() * self.count
> +            } else {
> +                self.max
> +            };
> +
> +            // Reserve a string to avoid writing constantly.
> +            // The 1500 here is a total gluteal extraction, but 500 seems to perform slightly worse.
> +            let chunk_size = 1500;
> +            // The + word length is so we don't have to hit the chunk size exactly,
> +            // which would require us to restart in the middle of the string.
> +            // E.g. imagine repeating "12345678". The first chunk is hit after a last "1234",
> +            // so we would then have to restart by appending "5678", which requires a substring.
> +            // So let's not bother.
> +            //
> +            // Unless of course we don't even print the entire word, in which case we just need max.
> +            let mut chunk = WString::with_capacity(max.min(chunk_size + w.len()));
> +
> +            let mut i = max;
> +            loop {
> +                if i >= w.len() {
> +                    chunk.push_utfstr(&w);
> +                } else {
> +                    chunk.push_utfstr(w.slice_to(i));
> +                    break;
> +                }
> +
> +                if chunk.len() >= chunk_size {
> +                    // We hit the chunk size, write it repeatedly until we can't anymore.
> +                    streams.out.append(&chunk);
> +                    while i >= chunk.len() {
> +                        streams.out.append(&chunk);
> +                        // We can easily be asked to write *a lot* of data,
> +                        // so we need to check every so often if the pipe has been closed.
> +                        // If we didn't, running `string repeat -n LARGENUMBER foo | pv`
> +                        // and pressing ctrl-c seems to hang.
> +                        if streams.out.flush_and_check_error() != STATUS_CMD_OK.unwrap() {
> +                            return STATUS_CMD_ERROR;
> +                        }
> +                        i -= chunk.len();
> +                    }
> +                    chunk.clear();
> +                }
> +
> +                let Some(new_i) = i.checked_sub(w.len()) else {
> +                    break
> +                };
> +                i = new_i;

I don't think there's the possibility of overflow here because otherwise the C++ code  would have a possible infinite loop, no?
Can we keep the loop with `while i > 0`?

> diff --git a/fish-rust/src/builtins/string/shorten.rs b/fish-rust/src/builtins/string/shorten.rs
> new file mode 100644
> index 00000000000..17611c1df37
> --- /dev/null
> +++ b/fish-rust/src/builtins/string/shorten.rs
> @@ -0,0 +1,244 @@
> +use super::{
> +    escape_code_length, width_without_escapes, Arguments, Direction, StringError, StringSubCommand,
> +};
> +use crate::builtins::shared::{io_streams_t, STATUS_CMD_ERROR, STATUS_CMD_OK};
> +use crate::common::get_ellipsis_str;
> +use crate::ffi::parser_t;
> +use crate::wchar::{wstr, WString, L};
> +use crate::wchar_ext::WExt;
> +use crate::wcstringutil::fish_wcwidth_visible;
> +use crate::wcstringutil::split_string;
> +use crate::wgetopt::{wopt, woption, woption_argument_t::*};
> +use crate::wutil::fish_wcstol;
> +
> +pub struct Shorten<'args> {
> +    ellipsis: &'args wstr,
> +    ellipsis_width: usize,
> +    max: Option<usize>,
> +    no_newline: bool,
> +    quiet: bool,
> +    shorten_from: Direction,
> +}
> +
> +impl Default for Shorten<'_> {
> +    fn default() -> Self {
> +        Self {
> +            ellipsis: get_ellipsis_str(),
> +            ellipsis_width: width_without_escapes(get_ellipsis_str(), 0),
> +            max: None,
> +            no_newline: false,
> +            quiet: false,
> +            shorten_from: Direction::Right,
> +        }
> +    }
> +}
> +
> +impl<'args> StringSubCommand<'args> for Shorten<'args> {
> +    const LONG_OPTIONS: &'static [woption<'static>] = &[
> +        // FIXME: documentation says it's --char
> +        wopt(L!("chars"), required_argument, 'c'),

right. Maybe it's better to call it `--ellipsis` but that can be fixed later

> +        wopt(L!("max"), required_argument, 'm'),
> +        wopt(L!("no-newline"), no_argument, 'N'),
> +        wopt(L!("left"), no_argument, 'l'),
> +        wopt(L!("quiet"), no_argument, 'q'),
> +    ];
> +    const SHORT_OPTIONS: &'static wstr = L!(":c:m:Nlq");
> +
> +    fn parse_opt(&mut self, optarg: Option<&'args wstr>, c: char) -> Result<(), StringError> {
> +        match c {
> +            'c' => {
> +                self.ellipsis = optarg.unwrap();
> +                self.ellipsis_width = width_without_escapes(self.ellipsis, 0);
> +            }
> +            'm' => self.max = Some(fish_wcstol(optarg.unwrap())?.try_into()?),
> +            'N' => self.no_newline = true,
> +            'l' => self.shorten_from = Direction::Left,
> +            'q' => self.quiet = true,
> +            _ => return Err(StringError::UnknownOption),
> +        }
> +        return Ok(());
> +    }
> +
> +    fn handle(
> +        &mut self,
> +        _parser: &mut parser_t,
> +        streams: &mut io_streams_t,
> +        optind: &mut usize,
> +        args: &[&wstr],
> +    ) -> Option<libc::c_int> {
> +        let mut min_width = usize::MAX;
> +        let mut inputs = Vec::new();
> +
> +        let iter = Arguments::new(args, optind, streams, true);
> +
> +        if Some(0) == self.max {

not a fan of yoda condition

> +            // Special case: Max of 0 means no shortening.
> +            // This makes this more reusable, so you don't need special-cases like
> +            //
> +            // if test $shorten -gt 0
> +            //     string shorten -m $shorten whatever
> +            // else
> +            //     echo whatever
> +            // end
> +            for (arg, _) in iter {
> +                streams.out.append(arg);
> +                streams.out.append1('\n');
> +            }
> +            return STATUS_CMD_OK;
> +        }
> +
> +        for (arg, _) in iter {
> +            // Visible width only makes sense line-wise.
> +            // So either we have no-newlines (which means we shorten on the first newline),
> +            // or we handle the lines separately.
> +            let mut splits = split_string(&arg, '\n').into_iter();
> +            if self.no_newline && splits.len() > 1 {
> +                let mut s = match self.shorten_from {
> +                    Direction::Right => splits.next(),
> +                    Direction::Left => splits.last(),
> +                }
> +                .unwrap();
> +                s.push_utfstr(self.ellipsis);
> +                let width = width_without_escapes(&s, 0);
> +
> +                if width > 0 && width < min_width {
> +                    min_width = width;
> +                }
> +                inputs.push(s);
> +            } else {
> +                for s in splits {
> +                    let width = width_without_escapes(&s, 0);
> +                    if width > 0 && width < min_width {
> +                        min_width = width;
> +                    }
> +                    inputs.push(s);
> +                }
> +            }
> +        }
> +
> +        let ourmax: usize = self.max.unwrap_or(min_width);

nit regarding the earlier `// TODO: Can we have negative width` comment.
If easily possible, I'd have preferred putting the behavior fix commit first, so that such questions never arise.

> diff --git a/fish-rust/src/builtins/string/trim.rs b/fish-rust/src/builtins/string/trim.rs
> new file mode 100644
> index 00000000000..ed36dbbedd7
> --- /dev/null
> +++ b/fish-rust/src/builtins/string/trim.rs
> @@ -0,0 +1,99 @@
> +use super::{Arguments, StringError, StringSubCommand};
> +use crate::builtins::shared::{io_streams_t, STATUS_CMD_ERROR, STATUS_CMD_OK};
> +use crate::ffi::parser_t;
> +use crate::wchar::{wstr, L};
> +use crate::wchar_ext::WExt;
> +use crate::wgetopt::{wopt, woption, woption_argument_t::*};
> +
> +pub struct Trim<'args> {
> +    chars_to_trim: &'args wstr,
> +    left: bool,
> +    right: bool,
> +    quiet: bool,
> +}
> +
> +impl Default for Trim<'_> {
> +    fn default() -> Self {
> +        Self {
> +            // from " \f\n\r\t\v"
> +            chars_to_trim: L!(" \x0C\n\r\x09\x0B"),
> +            left: false,
> +            right: false,
> +            quiet: false,
> +        }
> +    }
> +}
> +
> +impl<'args> StringSubCommand<'args> for Trim<'args> {
> +    const LONG_OPTIONS: &'static [woption<'static>] = &[
> +        wopt(L!("chars"), required_argument, 'c'),
> +        wopt(L!("left"), no_argument, 'l'),
> +        wopt(L!("right"), no_argument, 'r'),
> +        wopt(L!("quiet"), no_argument, 'q'),
> +    ];
> +    const SHORT_OPTIONS: &'static wstr = L!(":c:lrq");
> +
> +    fn parse_opt(&mut self, optarg: Option<&'args wstr>, c: char) -> Result<(), StringError> {
> +        match c {
> +            'c' => self.chars_to_trim = optarg.unwrap(),
> +            'l' => self.left = true,
> +            'r' => self.right = true,
> +            'q' => self.quiet = true,
> +            _ => return Err(StringError::UnknownOption),
> +        }
> +        return Ok(());
> +    }
> +
> +    fn handle(
> +        &mut self,
> +        _parser: &mut parser_t,
> +        streams: &mut io_streams_t,
> +        optind: &mut usize,
> +        args: &[&wstr],
> +    ) -> Option<libc::c_int> {
> +        // If neither left or right is specified, we do both.
> +        if !self.left && !self.right {
> +            self.left = true;
> +            self.right = true;
> +        }
> +
> +        let mut ntrim = 0;
> +
> +        let to_trim_end = |str: &wstr| -> usize {
> +            str.chars()
> +                .rev()
> +                .take_while(|&c| self.chars_to_trim.contains(c))
> +                .count()
> +        };
> +
> +        let to_trim_start = |str: &wstr| -> usize {
> +            str.chars()
> +                .take_while(|&c| self.chars_to_trim.contains(c))
> +                .count()
> +        };

yeah I guess we don't have `trim_end_matches()` yet

> +
> +        for (arg, want_newline) in Arguments::new(args, optind, streams, true) {
> +            let trim_start = self.left.then(|| to_trim_start(&arg)).unwrap_or(0);
> +            // collision is only an issue if the whole string is getting trimmed
> +            let trim_end = (self.right && trim_start != arg.len())
> +                .then(|| to_trim_end(&arg))
> +                .unwrap_or(0);
> +
> +            ntrim += trim_start + trim_end;
> +            if !self.quiet {
> +                streams.out.append(&arg[trim_start..arg.len() - trim_end]);
> +                if want_newline {
> +                    streams.out.append1('\n');
> +                }
> +            } else if ntrim > 0 {
> +                return STATUS_CMD_OK;
> +            }
> +        }
> +
> +        if ntrim > 0 {
> +            STATUS_CMD_OK
> +        } else {
> +            STATUS_CMD_ERROR
> +        }
> +    }
> +}
> diff --git a/fish-rust/src/builtins/string/unescape.rs b/fish-rust/src/builtins/string/unescape.rs
> new file mode 100644
> index 00000000000..7f982848582
> --- /dev/null
> +++ b/fish-rust/src/builtins/string/unescape.rs
> @@ -0,0 +1,54 @@
> +use super::{Arguments, StringError, StringSubCommand};
> +use crate::builtins::shared::{io_streams_t, STATUS_CMD_ERROR, STATUS_CMD_OK};
> +use crate::common::{unescape_string, UnescapeStringStyle};
> +use crate::ffi::parser_t;
> +use crate::wchar::{wstr, L};
> +use crate::wgetopt::{wopt, woption, woption_argument_t::*};
> +
> +#[derive(Default)]
> +pub struct Unescape {
> +    no_quoted: bool,
> +    style: UnescapeStringStyle,
> +}
> +
> +impl StringSubCommand<'_> for Unescape {
> +    const LONG_OPTIONS: &'static [woption<'static>] = &[
> +        wopt(L!("no-quoted"), no_argument, 'q'),
> +        wopt(L!("style"), required_argument, '\u{1}'),
> +    ];
> +    const SHORT_OPTIONS: &'static wstr = L!(":q");
> +
> +    fn parse_opt(&mut self, optarg: Option<&wstr>, c: char) -> Result<(), StringError> {
> +        match c {
> +            'q' => self.no_quoted = true,
> +            '\u{1}' => self.style = optarg.unwrap().try_into()?,
> +            _ => return Err(StringError::UnknownOption),
> +        }
> +        return Ok(());
> +    }
> +
> +    fn handle(
> +        &mut self,
> +        _parser: &mut parser_t,
> +        streams: &mut io_streams_t,
> +        optind: &mut usize,
> +        args: &[&wstr],
> +    ) -> Option<libc::c_int> {
> +        let mut nesc = 0;
> +        for (arg, want_newline) in Arguments::new(args, optind, streams, true) {
> +            if let Some(res) = unescape_string(&arg, self.style) {
> +                streams.out.append(res);
> +                if want_newline {
> +                    streams.out.append1('\n');
> +                }
> +                nesc += 1;
> +            }
> +        }
> +
> +        if nesc > 0 {
> +            STATUS_CMD_OK
> +        } else {
> +            STATUS_CMD_ERROR
> +        }
> +    }
> +}
> diff --git a/fish-rust/src/builtins/tests/mod.rs b/fish-rust/src/builtins/tests/mod.rs
> index d718bc4f77c..b
4D1C
25db0384a0 100644
> --- a/fish-rust/src/builtins/tests/mod.rs
> +++ b/fish-rust/src/builtins/tests/mod.rs
> @@ -1 +1,2 @@
> +mod string_tests;
>  mod test_tests;
> diff --git a/fish-rust/src/builtins/tests/string_tests.rs b/fish-rust/src/builtins/tests/string_tests.rs
> new file mode 100644
> index 00000000000..5927246536c
> --- /dev/null
> +++ b/fish-rust/src/builtins/tests/string_tests.rs
> @@ -0,0 +1,305 @@
> +use crate::ffi_tests::add_test;
> +
> +add_test! {"test_string", || {
> +    use crate::ffi::parser_t;
> +    use crate::ffi;
> +    use crate::builtins::string::string;
> +    use crate::wchar_ffi::WCharFromFFI;
> +    use crate::common::{EscapeStringStyle, escape_string};
> +    use crate::wchar::wstr;
> +    use crate::wchar::L;
> +    use crate::builtins::shared::{STATUS_CMD_ERROR,STATUS_CMD_OK, STATUS_INVALID_ARGS};
> +
> +    use crate::future_feature_flags::{scoped_test, FeatureFlag};
> +
> +    // avoid 1.3k L!()'s
> +    macro_rules! test_cases {
> +        ([$($x:expr),*], $rc:expr, $out:expr) => { (vec![$(L!($x)),*], $rc, L!($out)) };
> +        [$($x:tt),* $(,)?] => { [$(test_cases!$x),*] };
> +    }
> +
> +    // TODO: these should be individual tests, not all in one, port when we can run these with `cargo test`
> +    fn string_test(mut args: Vec<&wstr>, expected_rc: Option<i32>, expected_out: &wstr) {
> +        let parser: &mut parser_t = unsafe { &mut *parser_t::principal_parser_ffi() };
> +        let mut streams = ffi::make_test_io_streams_ffi();
> +        let mut io = crate::builtins::shared::io_streams_t::new(streams.pin_mut());
> +
> +        let rc = string(parser, &mut io, args.as_mut_slice()).expect("string failed");
> +
> +        assert_eq!(expected_rc.unwrap(), rc, "string builtin returned unexpected return code");
> +
> +        let string_stream_contents = &ffi::get_test_output_ffi(&streams);
> +        let actual = escape_string(&string_stream_contents.from_ffi(), EscapeStringStyle::default());
> +        let expected = escape_string(expected_out, EscapeStringStyle::default());
> +        assert_eq!(expected, actual, "string builtin returned unexpected output");
> +    }
> +
> +    let tests = test_cases![
> +        (["string", "escape"], STATUS_CMD_ERROR, ""),
> +        (["string", "escape", ""], STATUS_CMD_OK, "''\n"),
> +        (["string", "escape", "-n", ""], STATUS_CMD_OK, "\n"),
> +        (["string", "escape", "a"], STATUS_CMD_OK, "a\n"),
> +        (["string", "escape", "\x07"], STATUS_CMD_OK, "\\cg\n"),
> +        (["string", "escape", "\"x\""], STATUS_CMD_OK, "'\"x\"'\n"),
> +        (["string", "escape", "hello world"], STATUS_CMD_OK, "'hello world'\n"),
> +        (["string", "escape", "-n", "hello world"], STATUS_CMD_OK, "hello\\ world\n"),
> +        (["string", "escape", "hello", "world"], STATUS_CMD_OK, "hello\nworld\n"),
> +        (["string", "escape", "-n", "~"], STATUS_CMD_OK, "\\~\n"),
> +
> +        (["string", "join"], STATUS_INVALID_ARGS, ""),
> +        (["string", "join", ""], STATUS_CMD_ERROR, ""),
> +        (["string", "join", "", "", "", ""], STATUS_CMD_OK, "\n"),
> +        (["string", "join", "", "a", "b", "c"], STATUS_CMD_OK, "abc\n"),
> +        (["string", "join", ".", "fishshell", "com"], STATUS_CMD_OK, "fishshell.com\n"),
> +        (["string", "join", "/", "usr"], STATUS_CMD_ERROR, "usr\n"),
> +        (["string", "join", "/", "usr", "local", "bin"], STATUS_CMD_OK, "usr/local/bin\n"),
> +        (["string", "join", "...", "3", "2", "1"], STATUS_CMD_OK, "3...2...1\n"),
> +        (["string", "join", "-q"], STATUS_INVALID_ARGS, ""),
> +        (["string", "join", "-q", "."], STATUS_CMD_ERROR, ""),
> +        (["string", "join", "-q", ".", "."], STATUS_CMD_ERROR, ""),
> +
> +        (["string", "length"], STATUS_CMD_ERROR, ""),
> +        (["string", "length", ""], STATUS_CMD_ERROR, "0\n"),
> +        (["string", "length", "", "", ""], STATUS_CMD_ERROR, "0\n0\n0\n"),
> +        (["string", "length", "a"], STATUS_CMD_OK, "1\n"),
> +
> +    // #if WCHAR_T_BITS > 16
> +        (["string", "length", "\u{2008A}"], STATUS_CMD_OK, "1\n"),
> +    // #endif

This #if we can drop because we have stronger guarantees now.
In Rust, we only deal with platform-specific wchar_t when converting to and from OS strings using mbrtowc which is not what we test here.

fish-rust/src/parse_util.rs

src/builtins/string.cpp

henrikhorluck

The prr-comments got kinda mangled

fish-rust/src/builtins/string.rs

henrikhorluck · 2023-07-15T19:09:30Z

fish-rust/src/builtins/string.rs

+    while let Some(ec_pos) = ins.slice_from(pos).find_char('\x1B') {
+        pos += ec_pos;
+        if let Some(len) = escape_code_length(ins.slice_from(pos)) {
+            let sub = ins.slice_from(pos).slice_to(len);


I vote for using bracket indexing for the sake of consistency

I agree, but then we should probably just remove slice_{to,from} from WExt

Suggested change

let sub = ins.slice_from(pos).slice_to(len);

let sub = &ins[pos..pos + len];

I agree, but then we should probably just remove slice_{to,from} from WExt

yeah. I'm not sure what others think

The logical distinction of slice_to and slice_from is that they operate on char counts, instead of byte counts.

In a hypothetical future where we try to switch to UTF-8, that distinction will become important. I worry that the bracket indexing will introduce bugs at that point. But I'm open to other ideas for how to handle that.

If that's the case we should warn for all usage of wstr-slicing and indexing with a clippy lint. I think the point here is that we want to write code that's portable across string representations, and that's not necessarily that simple. I am very much in support of full unicode-support with graphemes and UTF8, so we should prepare for it.

But: this should have an API closer to slicing, e.g. like

fn slice_chars<I>(&self, index: I) -> Self where I: RangeBounds<usize> + SliceIndex<[char], Output = [char]>,

so we can still use slice-like operations

Interesting; a clippy lint banning str slicing seems easy and robust enough

fish-rust/src/builtins/string/join.rs

fish-rust/src/builtins/string/pad.rs

fish-rust/src/builtins/string/shorten.rs

henrikhorluck · 2023-07-15T19:30:02Z

fish-rust/src/builtins/string/shorten.rs

+            }
+        }
+
+        let ourmax: usize = self.max.unwrap_or(min_width);


nit regarding the earlier // TODO: Can we have negative width comment.
If easily possible, I'd have preferred putting the behavior fix commit first, so that such questions never arise.

One issue here is that the earlier Rust code before the fix panicked with a capacity overflow, and backporting to C++ so you have to rebase with a C++ bugfix is weird, so you kinda need to squash the fix with the port, which is not ideal. I don't like this situation. In this case it was easier to fix the bug than port the bug

and backporting to C++ so you have to rebase with a C++ bugfix is weird, so you kinda need to squash the fix with the port, which is not ideal. I don't like this situation. In this case it was easier to fix the bug than port the bug

one recipe is to rebase, breaking before the rewrite to add a commit that fixes the bug in C++. Then revert that commit and continue the rebase. There are no conflicts because of the revert. Of course the bug fix needs to be applied again on the Rust side (before or after the rebase), and squashed (along with the revert; both need to be squashed into the rewrite commit to minimize noise)

I think the conclusion here is to squash this all into one commit, I don't think this needs to be backported for a 3.6.2 release, as fixing the C++ code is not as trivial as #9900, and less important. If we want to backport this please inform me and make a PR towards master so I can rebase on that when I rebase for #9900

any order is fine here (hence nit) but in general it's better to put "smaller" commits first because the larger ones are more likely to be changed further.
Squashing the behavior change into the port is not something we should do (but there is no need to right)

Just to make it clear: I do not want to write a C++ version of the behavior fix. The initial porting-commit still has bugged behavior (but different, as in crashing!).

My understanding is that we do not want to backport the fix, which will leave the "broken" commit in place if we do not squash it. But as you are telling me, that is what we want? If so please mark this as resolved.

I'd have preferred putting the behavior fix commit first, so that such questions never arise.

I agree that we should ideally have the fix separately, but without writing a C++ version of the fix there is very little I can do to improve the commit ordering.

fish-rust/src/builtins/string/trim.rs

fish-rust/src/builtins/tests/string_tests.rs

krobelus · 2023-07-15T20:01:29Z

The prr-comments got kinda mangled

fixed

ridiculousfish · 2023-07-22T03:46:10Z

FYI I plan on finishing reviewing this weekend.

ridiculousfish

So once again this is an incredible amount of work and it looks overall fantastic. Thank you!!

Regarding splitting subcommands into separate modules: either seems fine to me, and in general I defer to the person doing the actual work. While this diverges from the C++ I understand why you chose to do it (string is pretty weird) and I'm very pleased with the new structure.

Regarding bracket-slicing [start..] vs slice_from(start): I'm not going to be a stickler about the difference for now. If/when we later move to UTF-8 we will need to audit all the slices; the fact that this can be done with a custom clippy lint rule is really good info (I didn't know you could do that!) so no reason to sweat it now. Same story for len() / char_count().

I spotted some small bugs, see my comments.

The biggest missing piece here is localization - a lot of the C++ error messages were wrapped in _(...) to plug them into wgettext and much of that has been lost. You can fix that now, or defer it and I or others can go back and add them after merge.

I don't expect to have any further comments after the small bugs are fixed. Once again I think is awesome.

fish-rust/src/builtins/string.rs

ridiculousfish · 2023-07-22T19:17:49Z

fish-rust/src/builtins/string/length.rs

+                    }
+                }
+            } else {
+                let n = arg.len();


This is an example of a place we'd have to audit and fix should we switch to UTF-8 strings, since string length gets the number of chars, not number of bytes. You can switch to char_count() now, or defer it, since we really will do that audit.

fish-rust/src/common.rs

fish-rust/src/builtins/string/pad.rs

fish-rust/src/builtins/string/split.rs

ridiculousfish · 2023-07-23T21:30:06Z

fish-rust/src/builtins/string/unescape.rs

+
+impl StringSubCommand<'_> for Unescape {
+    const LONG_OPTIONS: &'static [woption<'static>] = &[
+        wopt(L!("no-quoted"), no_argument, 'q'),


no-quoted has short option 'n'. Distressing that this wasn't caught by the tests.

I think this is rather a fault in that no-quoted does not mean anything for unescape — it did nothing and does nothing, the NO_QUOTED flag is not defined for unescapeflags

fish-shell/fish-rust/src/common.rs

Line 134 in ade6650

pub struct UnescapeFlags: u32 {

. It should probably rather be removed, but that is a potentially breaking change.

It is also not documented that it should support no-quoted https://fishshell.com/docs/current/cmds/string-unescape.html#synopsis

I fixed this so it is the correct n, but left the option still there

- Add test to verify piped string replace exit code Ensure fields parsing error messages are the same. Note: C++ relied upon the value of the parsed value even when `errno` was set, that is defined behaviour we should not rely on, and cannot easilt be replicated from Rust. Therefore the Rust version will change the following error behaviour from: ```shell > string split --fields=a "" abc string split: Invalid fields value 'a' > string split --fields=1a "" abc string split: 1a: invalid integer ``` To: ```shell > string split --fields=a "" abc string split: a: invalid integer > string split --fields=1a "" abc string split: 1a: invalid integer ```

Padding with an unprintable character is now disallowed, like it was for other zero-length characters. `string shorten` now ignores escape sequences and non-printable characters when calculating the visible width of the ellipsis used (except for `\b`, which is treated as a width of -1). Previously `fish_wcswidth` returned a length of -1 when the ellipsis-str contained any non-printable character, causing the command to poentially print a larger width than expected. This also fixes an integer overflows in `string shorten`'s `max` and `max2`, when the cumulative sum of character widths turned negative (e.g. with any non-printable characters, or `\b` after the changes above). The overflow potentially caused strings containing non-printable characters to be truncated. This adds test that verify the fixed behaviour.

henrikhorluck

Now rebased and squashed. Also reduced the number of import statements. There are some minor nits still open:

whether we want to backport the non-printable-fixes.
For the UTF8-compat-changes I would much rather do that separately.
I still think the string-translation-stuff is messy, but I can't do anything about it until we have an xgettext-like thing working for Rust code

There are a lot of TODO's that are better suited for later PR's, this should be mergeable as is

henrikhorluck · 2023-07-24T07:38:44Z

fish-rust/src/builtins/string/split.rs

+                self.fields = arg.unwrap().try_into().map_err(|e| match e {
+                    FieldParseError::Number => StringError::NotANumber,
+                    FieldParseError::Range => {
+                        invalid_args!("%ls: Invalid range value for field '%ls'\n", name, arg)


These should now all support translation — how we extract the translation strings I leave as a future work as I don't know whether that will happen before/after macro-expansion

henrikhorluck · 2023-07-24T07:39:54Z

fish-rust/src/builtins/string/match.rs

+            streams.err.append(wgettext_fmt!(
+                BUILTIN_ERR_COMBO2,
+                cmd,
+                wgettext!("--invert and --groups-only are mutually exclusive")


These also now have translation support added back

- The dermination is from commit 7988cff - See PR fish-shell#9139

ridiculousfish · 2023-07-28T05:00:35Z

Merged. What can I say, except thank you.

henrikhorluck force-pushed the riir/builtins/string branch from 0635cf8 to be98fb2 Compare June 20, 2023 12:15

henrikhorluck mentioned this pull request Jun 21, 2023

Add (very basic) support for replace fish-shell/rust-pcre2#1

Closed

Xiretza reviewed Jun 21, 2023

View reviewed changes

fish-rust/src/builtins/string.rs Outdated Show resolved Hide resolved

This comment was marked as outdated.

Sign in to view

henrikhorluck force-pushed the riir/builtins/string branch from 78cd564 to cfdd217 Compare June 22, 2023 18:10

faho mentioned this pull request Jun 22, 2023

Make NULs work for builtins #9859

Merged

3 tasks

henrikhorluck force-pushed the riir/builtins/string branch 2 times, most recently from 3eb99c2 to ca5a147 Compare June 23, 2023 16:40

henrikhorluck marked this pull request as ready for review June 23, 2023 23:42

henrikhorluck commented Jun 24, 2023

View reviewed changes

fish-rust/src/builtins/tests/string_tests.rs Outdated Show resolved Hide resolved

zanchey added the rust label Jun 24, 2023

zanchey added this to the fish 3.7.0 milestone Jun 24, 2023

henrikhorluck force-pushed the riir/builtins/string branch 3 times, most recently from ca16bd5 to 5aa0b21 Compare June 30, 2023 01:38

henrikhorluck mentioned this pull request Jul 1, 2023

Fully migrate string escaping #9869

Closed

henrikhorluck force-pushed the riir/builtins/string branch 3 times, most recently from ad8eb3f to 5a04a9b Compare July 4, 2023 22:41

henrikhorluck commented Jul 8, 2023

View reviewed changes

fish-rust/src/builtins/tests/string_tests.rs Outdated Show resolved Hide resolved

henrikhorluck force-pushed the riir/builtins/string branch from d0afdac to ed62f66 Compare July 9, 2023 01:47

henrikhorluck mentioned this pull request Jul 9, 2023

Clean up Feature Flags API, add support for scoped feature tests #9881

Merged

henrikhorluck commented Jul 9, 2023

View reviewed changes

fish-rust/src/builtins/string/split.rs Show resolved Hide resolved

henrikhorluck commented Jul 10, 2023

View reviewed changes

src/builtins/string.cpp Show resolved Hide resolved

henrikhorluck force-pushed the riir/builtins/string branch from f1dfa8c to 8d925f4 Compare July 11, 2023 21:39

henrikhorluck commented Jul 12, 2023

View reviewed changes

fish-rust/src/builtins/string/match.rs Show resolved Hide resolved

henrikhorluck force-pushed the riir/builtins/string branch from aec45b4 to 8d2fa70 Compare July 13, 2023 09:52

henrikhorluck force-pushed the riir/builtins/string branch from 8d2fa70 to 81e36a5 Compare July 13, 2023 09:56

zanchey requested a review from ridiculousfish July 13, 2023 10:10

krobelus reviewed Jul 15, 2023

View reviewed changes

fish-rust/src/parse_util.rs Show resolved Hide resolved

krobelus reviewed Jul 15, 2023

View reviewed changes

src/builtins/string.cpp Show resolved Hide resolved

henrikhorluck commented Jul 15, 2023

View reviewed changes

henrikhorluck force-pushed the riir/builtins/string branch 2 times, most recently from 6d37e91 to b9b840f Compare July 15, 2023 20:14

henrikhorluck mentioned this pull request Jul 17, 2023

string repeat has an infinite loop #9899

Closed

faho mentioned this pull request Jul 17, 2023

Fix integer overflow in string repeat #9900

Merged

ridiculousfish requested changes Jul 23, 2023

View reviewed changes

henrikhorluck force-pushed the riir/builtins/string branch 4 times, most recently from fefcb4e to d8b406d Compare July 24, 2023 07:09

henrikhorluck added 2 commits July 24, 2023 09:24

henrikhorluck force-pushed the riir/builtins/string branch from d8b406d to a8d6806 Compare July 24, 2023 07:25

henrikhorluck commented Jul 24, 2023

View reviewed changes

henrikhorluck added 2 commits July 27, 2023 08:09

Add back well-backed comment

8ee7780

- The dermination is from commit 7988cff - See PR fish-shell#9139

Very minor leftover codereview var-renaming

23e7b5b

ridiculousfish merged commit 2ec3633 into fish-shell:master Jul 28, 2023

henrikhorluck deleted the riir/builtins/string branch July 28, 2023 08:13

henrikhorluck mentioned this pull request Aug 14, 2023

Add appendln #9952

Merged

github-actions bot locked as resolved and limited conversation to collaborators Jan 20, 2025

	let sub = ins.slice_from(pos).slice_to(len);
	let sub = &ins[pos..pos + len];

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!