stringmagic/man/string_get.Rd at master · cran/stringmagic

10000

History

229 lines (185 loc) · 9.52 KB

Raw

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

% Generated by roxygen2: do not edit by hand

% Please edit documentation in R/string_tools.R

\name{string_get}

\alias{string_get}

\alias{stget}

\title{Gets elements of a character vector}

\usage{

string_get(

...,

fixed = FALSE,

ignore.case = FALSE,

word = FALSE,

or = FALSE,

seq = FALSE,

seq.unik = FALSE,

pattern = NULL,

envir = parent.frame()

)

stget(

...,

fixed = FALSE,

ignore.case = FALSE,

word = FALSE,

or = FALSE,

seq = FALSE,

seq.unik = FALSE,

pattern = NULL,

envir = parent.frame()

)

}

\arguments{

\item{x}{A character vector.}

\item{...}{Character scalars representing the patterns to be found. By default they are (perl) regular-expressions.

Use ' & ' or ' | ' to chain patterns and combine their result logically (ex: \code{'[[:alpha:]] & \\\\d'} gets strings

containing both letters and numbers). You can negate by adding a \code{!} first (ex: \code{"!sepal$"} will

return \code{TRUE} for strings that do not end with \code{"sepal"}).

Add flags with the syntax 'flag1, flag2/pattern'. Available flags are: 'fixed', 'ignore', 'word' and 'magic'.

Ex: "ignore/sepal" would get "Sepal.Length" (wouldn't be the case w/t 'ignore').

Shortcut: use the first letters of the flags. Ex: "if/dt[" would get \code{"DT[i = 5]"} (flags 'ignore' + 'fixed').

For 'word', it adds word boundaries to the pattern. The \code{magic} flag first interpolates

values directly into the pattern with "{}".}

\item{fixed}{Logical scalar, default is \code{FALSE}. Whether to trigger a fixed search instead of a

regular expression search (default).}

\item{ignore.case}{Logical scalar, default is \code{FALSE}. If \code{TRUE}, then case insensitive search is triggered.}

\item{word}{Logical scalar, default is \code{FALSE}. If \code{TRUE} then a) word boundaries are added to the pattern,

and b) patterns can be chained by separating them with a comma, they are combined with an OR logical operation.

Example: if \code{word = TRUE}, then pattern = "The, mountain" will select strings containing either the word

'The' or the word 'mountain'.}

\item{or}{Logical, default is \code{FALSE}. In the presence of two or more patterns,

whether to combine them with a logical "or" (the default is to combine them with a logical "and").}

\item{seq}{Logical, default is \code{FALSE}. The argument \code{pattern} accepts a vector of

patterns which are combined with an \code{and} by default. If \code{seq = TRUE}, then it is like

if \code{string_get} was called sequentially with its results stacked. See examples.}

\item{seq.unik}{Logical, default is \code{FALSE}. The argument \code{...} (or the argument \code{pattern}) accepts

a vector of patterns which are combined with an \code{and} by default. If \code{seq.unik = TRUE}, then

\code{string_get} is called sequentially with its results stacked, and \code{unique()} is

applied in the end. See examples.}

\item{pattern}{(If provided, elements of \code{...} are ignored.) A character vector representing the

patterns to be found. By default a (perl) regular-expression search is triggered.

Use ' & ' or ' | ' to chain patterns and combine their result logically (ex: \code{'[[:alpha:]] & \\\\d'} gets strings

containing both letters and numbers). You can negate by adding a \code{!} first (ex: \code{"!sepal$"} will

return \code{TRUE} for strings that do not end with \code{"sepal"}).

Add flags with the syntax 'flag1, flag2/pattern'. Available flags are: 'fixed', 'ignore', 'word' and 'magic'.

Ex: "ignore/sepal" would get "Sepal.Length" (wouldn't be the case w/t 'ignore').

Shortcut: use the first letters of the flags. Ex: "if/dt[" would get \code{"DT[i = 5]"} (flags 'ignore' + 'fixed').

For 'word', it adds word boundaries to the pattern. The \code{magic} flag first interpolates

values directly into the pattern with "{}".}

\item{envir}{Environment in which to evaluate the interpolations if the flag \code{"magic"} is provided.

Default is \code{parent.frame()}.}

}

\value{

It always return a character vector.

}

\description{

Convenient way to get elements from a character vector.

}

\details{

This function is a wrapper to \code{\link[=string_is]{string_is()}}.

}

\section{Functions}{

\itemize{

\item \code{stget()}: Alias to \code{string_get}

}}

\section{Caching}{

In an exploratory stage, it can be useful to quicky get values from a vector with the

least hassle as possible. Hence \code{string_get} implements caching, so that users do not need

to repeat the value of the argument \code{x} in successive function calls, and can concentrate

only on the selection patterns.

Caching is a feature only available when the user calls \code{string_get} from the global environment.

If that feature were available in regular code, it would be too dangerous, likely leading to hard to debug bugs.

Hence caching is disabled when used within code (i.e. inside a function or inside an

automated script), and function calls without the main argument will lead to errors in such scripts.

}

\section{Generic regular expression flags}{

All \code{stringmagic} functions support generic flags in regular-expression patterns.

The flags are useful to quickly give extra instructions, similarly to \emph{usual}

\href{https://javascript.info/regexp-introduction}{regular expression flags}.

Here the syntax is "flag1, flag2/pattern". That is: flags are a comma separated list of flag-names

separated from the pattern with a slash (\code{/}). Example: \code{string_which(c("hello...", "world"), "fixed/.")} returns \code{1}.

Here the flag "fixed" removes the regular expression meaning of "." which would have otherwise meant \emph{"any character"}.

The no-flag verion \code{string_which(c("hello...", "world"), ".")} returns \code{1:2}.

Alternatively, and this is recommended, you can collate the initials of the flags instead of using a

comma separated list. For example: "if/dt[" will apply the flags "ignore" and "fixed" to the pattern "dt[".

The four flags always available are: "ignore", "fixed", "word" and "magic".

\itemize{

\item "ignore" instructs to ignore the case. Technically, it adds the perl-flag "(?i)"

at the beginning of the pattern.

\item "fixed" removes the regular expression interpretation, so that the characters ".", "$", "^", "["

(among others) lose their special meaning and are treated for what they are: simple characters.

\item "word" adds word boundaries (\code{"\\\\b"} in regex language) to the pattern. Further, the comma (\code{","})

becomes a word separator. Technically, "word/one, two" is treated as "\\b(one|two)\\b". Example:

\code{string_clean("Am I ambushed?", "wi/am")} leads to " I ambushed?" thanks to the flags "ignore" and "word".

\item "magic" allows to interpolate variables inside the pattern before regex interpretation.

For example if \code{letters = "aiou"} then \code{string_clean("My great goose!", "magic/[{letters}] => e")}

leads to \code{"My greet geese!"}

}

\examples{

x = rownames(mtcars)

# find all Mazda cars

string_get(x, "Mazda")

# same with ignore case flag

string_get(x, "i/mazda")

# all cars containing a single digit (we use the 'word' flag)

string_get(x, "w/\\\\d")

# finds car names without numbers AND containing `u`

string_get(x, "!\\\\d", "u")

# equivalently

string_get(x, "!\\\\d & u")

# Stacks all Mazda and Volvo cars. Mazda first

string_get(x, "Mazda", "Volvo", seq = TRUE)

# Stacks all Mazda and Volvo cars. Volvo first

string_get(x, "Volvo", "Mazda", seq = TRUE)

# let's get the first word of each car name

car_first = string_ops(x, "extract.first")

# we select car brands ending with 'a', then ending with 'i'

string_get(car_first, "a$", "i$", seq = TRUE)

# seq.unik is similar to seq but applies unique()

string_get(car_first, "a$", "i$", seq.unik = TRUE)

# flags

# you can combine the flags

x = string_magic("/One, two, one... Two!, Microphone, check")

# regular

string_get(x, "one")

# ignore case

string_get(x, "i/one")

# + word boundaries

string_get(x, "iw/one")

# you can escape the meaning of ! with backslashes

string_get(x, "\\\\!")

# Caching

# Caching is enabled when the function is used interactively

# so you don't need to repeat the argument 'x'

# Mostly useful at an exploratory stage

if(interactive() && is.null(sys.calls())){

# first run, the data is cached

string_get(row.names(mtcars), "i/vol")

# now you don't need to specify the data

string_get("i/^m & 4")

}

\seealso{

String operations: \code{\link[=string_is]{string_is()}}, \code{\link[=string_get]{string_get()}}, \code{\link[=string_clean]{string_clean()}}, \code{\link[=string_split2df]{string_split2df()}}.

Chain basic operations with \code{\link[=string_ops]{string_ops()}}. Clean character vectors efficiently

with \code{\link[=string_clean]{string_clean()}}.

Use \code{\link[=string_vec]{string_vec()}} to create simple string vectors.

String interpolation combined with operation chaining: \code{\link[=string_magic]{string_magic()}}. You can change \code{string_magic}

default values with \code{\link[=string_magic_alias]{string_magic_alias()}} and add custom operations with \code{\link[=string_magic_register_fun]{string_magic_register_fun()}}.

Display messages while benefiting from \code{string_magic} interpolation with \code{\link[=cat_magic]{cat_magic()}} and \code{\link[=message_magic]{message_magic()}}.

Other tools with aliases:

\code{\link{cat_magic_alias}()},

\code{\link{string_magic}()},

\code{\link{string_magic_alias}()},

\code{\link{string_ops_alias}()},

\code{\link{string_vec_alias}()}

}

\author{

Laurent R. Berge

}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

string_get.Rd

Latest commit

History

string_get.Rd

File metadata and controls