stringmagic/man/string_split.Rd at master · cran/stringmagic

History

127 lines (104 loc) · 4.83 KB

Raw

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

% Generated by roxygen2: do not edit by hand

% Please edit documentation in R/string_tools.R

\name{string_split}

\alias{string_split}

\alias{stsplit}

\title{Splits a character string wrt a pattern}

\usage{

string_split(

split,

simplify = TRUE,

fixed = FALSE,

ignore.case = FALSE,

word = FALSE,

envir = parent.frame()

)

stsplit(

split,

simplify = TRUE,

fixed = FALSE,

ignore.case = FALSE,

word = FALSE,

envir = parent.frame()

)

}

\arguments{

\item{x}{A character vector.}

\item{split}{A character scalar. Used to split the character vectors. By default

this is a regular expression. You can use flags in the pattern in the form \verb{flag1, flag2/pattern}.

Available flags are \code{ignore} (case), \code{fixed} (no regex), word (add word boundaries),

magic (add interpolation with \code{"{}"}). Example:

if "ignore/hello" and the text contains "Hello", it will be split at "Hello".

Shortcut: use the first letters of the flags. Ex: "iw/one" will split at the word

"one" (flags 'ignore' + 'word').}

\item{simplify}{Logical scalar, default is \code{TRUE}. If \code{TRUE}, then when the vector input \code{x}

is of length 1, a character vector is returned instead of a list.}

\item{fixed}{Logical, default is \code{FALSE}. Whether to consider the argument \code{split}

as fixed (and not as a regular expression).}

\item{ignore.case}{Logical scalar, default is \code{FALSE}. If \code{TRUE}, then case insensitive search is triggered.}

\item{word}{Logical scalar, default is \code{FALSE}. If \code{TRUE} then a) word boundaries are added to the pattern,

and b) patterns can be chained by separating them with a comma, they are combined with an OR logical operation.

Example: if \code{word = TRUE}, then pattern = "The, mountain" will select strings containing either the word

'The' or the word 'mountain'.}

\item{envir}{Environment in which to evaluate the interpolations if the flag \code{"magic"} is provided.

Default is \code{parent.frame()}.}

}

\value{

If \code{simplify = TRUE} (default), the object returned is:

\itemize{

\item a character vector if \code{x}, the vector in input, is of length 1: the character vector contains

the result of the split.

\item a list of the same length as \code{x}. The ith element of the list is a character vector

containing the result of the split of the ith element of \code{x}.

}

If \code{simplify = FALSE}, the object returned is always a list.

}

\description{

Splits a character string with respect to pattern

}

\section{Functions}{

\itemize{

\item \code{stsplit()}: Alias to \code{string_split}

}}

\section{Generic regular expression flags}{

All \code{stringmagic} functions support generic flags in regular-expression patterns.

The flags are useful to quickly give extra instructions, similarly to \emph{usual}

\href{https://javascript.info/regexp-introduction}{regular expression flags}.

Here the syntax is "flag1, flag2/pattern". That is: flags are a comma separated list of flag-names

separated from the pattern with a slash (\code{/}). Example: \code{string_which(c("hello...", "world"), "fixed/.")} returns \code{1}.

Here the flag "fixed" removes the regular expression meaning of "." which would have otherwise meant \emph{"any character"}.

The no-flag verion \code{string_which(c("hello...", "world"), ".")} returns \code{1:2}.

Alternatively, and this is recommended, you can collate the initials of the flags instead of using a

comma separated list. For example: "if/dt[" will apply the flags "ignore" and "fixed" to the pattern "dt[".

The four flags always available are: "ignore", "fixed", "word" and "magic".

\itemize{

\item "ignore" instructs to ignore the case. Technically, it adds the perl-flag "(?i)"

at the beginning of the pattern.

\item "fixed" removes the regular expression interpretation, so that the characters ".", "$", "^", "["

(among others) lose their special meaning and are treated for what they are: simple characters.

\item "word" adds word boundaries (\code{"\\\\b"} in regex language) to the pattern. Further, the comma (\code{","})

becomes a word separator. Technically, "word/one, two" is treated as "\\b(one|two)\\b". Example:

\code{string_clean("Am I ambushed?", "wi/am")} leads to " I ambushed?" thanks to the flags "ignore" and "word".

\item "magic" allows to interpolate variables inside the pattern before regex interpretation.

For example if \code{letters = "aiou"} then \code{string_clean("My great goose!", "magic/[{letters}] => e")}

leads to \code{"My greet geese!"}

}

\examples{

time = "This is the year 2024."

# we break the sentence

string_split(time, " ")

# simplify = FALSE leads to a list

string_split(time, " ", simplify = FALSE)

# let's break at "is"

string_split(time, "is")

# now breaking at the word "is"

# NOTE: we use the flag `word` (`w/`)

string_split(time, "w/is")

# same but using a pattern from a variable

# NOTE: we use the `magic` flag

pat = "is"

string_split(time, "mw/{pat}")

}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

string_split.Rd

Latest commit

History

string_split.Rd

File metadata and controls