-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathstring_split.Rd
More file actions
127 lines (104 loc) · 4.83 KB
/
string_split.Rd
File metadata and controls
127 lines (104 loc) · 4.83 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/string_tools.R
\name{string_split}
\alias{string_split}
\alias{stsplit}
\title{Splits a character string wrt a pattern}
\usage{
string_split(
x,
split,
simplify = TRUE,
fixed = FALSE,
ignore.case = FALSE,
word = FALSE,
envir = parent.frame()
)
stsplit(
x,
split,
simplify = TRUE,
fixed = FALSE,
ignore.case = FALSE,
word = FALSE,
envir = parent.frame()
)
}
\arguments{
\item{x}{A character vector.}
\item{split}{A character scalar. Used to split the character vectors. By default
this is a regular expression. You can use flags in the pattern in the form \verb{flag1, flag2/pattern}.
Available flags are \code{ignore} (case), \code{fixed} (no regex), word (add word boundaries),
magic (add interpolation with \code{"{}"}). Example:
if "ignore/hello" and the text contains "Hello", it will be split at "Hello".
Shortcut: use the first letters of the flags. Ex: "iw/one" will split at the word
"one" (flags 'ignore' + 'word').}
\item{simplify}{Logical scalar, default is \code{TRUE}. If \code{TRUE}, then when the vector input \code{x}
is of length 1, a character vector is returned instead of a list.}
\item{fixed}{Logical, default is \code{FALSE}. Whether to consider the argument \code{split}
as fixed (and not as a regular expression).}
\item{ignore.case}{Logical scalar, default is \code{FALSE}. If \code{TRUE}, then case insensitive search is triggered.}
\item{word}{Logical scalar, default is \code{FALSE}. If \code{TRUE} then a) word boundaries are added to the pattern,
and b) patterns can be chained by separating them with a comma, they are combined with an OR logical operation.
Example: if \code{word = TRUE}, then pattern = "The, mountain" will select strings containing either the word
'The' or the word 'mountain'.}
\item{envir}{Environment in which to evaluate the interpolations if the flag \code{"magic"} is provided.
Default is \code{parent.frame()}.}
}
\value{
If \code{simplify = TRUE} (default), the object returned is:
\itemize{
\item a character vector if \code{x}, the vector in input, is of length 1: the character vector contains
the result of the split.
\item a list of the same length as \code{x}. The ith element of the list is a character vector
containing the result of the split of the ith element of \code{x}.
}
If \code{simplify = FALSE}, the object returned is always a list.
}
\description{
Splits a character string with respect to pattern
}
\section{Functions}{
\itemize{
\item \code{stsplit()}: Alias to \code{string_split}
}}
\section{Generic regular expression flags}{
All \code{stringmagic} functions support generic flags in regular-expression patterns.
The flags are useful to quickly give extra instructions, similarly to \emph{usual}
\href{https://javascript.info/regexp-introduction}{regular expression flags}.
Here the syntax is "flag1, flag2/pattern". That is: flags are a comma separated list of flag-names
separated from the pattern with a slash (\code{/}). Example: \code{string_which(c("hello...", "world"), "fixed/.")} returns \code{1}.
Here the flag "fixed" removes the regular expression meaning of "." which would have otherwise meant \emph{"any character"}.
The no-flag verion \code{string_which(c("hello...", "world"), ".")} returns \code{1:2}.
Alternatively, and this is recommended, you can collate the initials of the flags instead of using a
comma separated list. For example: "if/dt[" will apply the flags "ignore" and "fixed" to the pattern "dt[".
The four flags always available are: "ignore", "fixed", "word" and "magic".
\itemize{
\item "ignore" instructs to ignore the case. Technically, it adds the perl-flag "(?i)"
at the beginning of the pattern.
\item "fixed" removes the regular expression interpretation, so that the characters ".", "$", "^", "["
(among others) lose their special meaning and are treated for what they are: simple characters.
\item "word" adds word boundaries (\code{"\\\\b"} in regex language) to the pattern. Further, the comma (\code{","})
becomes a word separator. Technically, "word/one, two" is treated as "\\b(one|two)\\b". Example:
\code{string_clean("Am I ambushed?", "wi/am")} leads to " I ambushed?" thanks to the flags "ignore" and "word".
\item "magic" allows to interpolate variables inside the pattern before regex interpretation.
For example if \code{letters = "aiou"} then \code{string_clean("My great goose!", "magic/[{letters}] => e")}
leads to \code{"My greet geese!"}
}
}
\examples{
time = "This is the year 2024."
# we break the sentence
string_split(time, " ")
# simplify = FALSE leads to a list
string_split(time, " ", simplify = FALSE)
# let's break at "is"
string_split(time, "is")
# now breaking at the word "is"
# NOTE: we use the flag `word` (`w/`)
string_split(time, "w/is")
# same but using a pattern from a variable
# NOTE: we use the `magic` flag
pat = "is"
string_split(time, "mw/{pat}")
}