0% found this document useful (0 votes)

58 views25 pages

Chapter 15

Uploaded by

Dimple Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views25 pages

Chapter 15

Uploaded by

Dimple Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 25

Chapter 15

Dimple K Patel

2025-01-07

Chapter 15 Intro
#install.packages("tidyverse")
library(tidyverse)

## ── Attaching core tidyverse packages ────────────────────────

tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ──────────────────────────────────────────
tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to
force all conflicts to become errors

library(babynames)

#2nd argument is the REGEX expression that matches. STR_VIEW puts <>
around match.
str_view(fruit, "berry")

## [6] │ bil<berry>
## [7] │ black<berry>
## [10] │ blue<berry>
## [11] │ boysen<berry>
## [19] │ cloud<berry>
## [21] │ cran<berry>
## [29] │ elder<berry>
## [32] │ goji <berry>
## [33] │ goose<berry>
## [38] │ huckle<berry>
## [50] │ mul<berry>
## [70] │ rasp<berry>
## [73] │ salal <berry>
## [76] │ straw<berry>

#The period after the a is a METAcharacter.

str_view(c("a", "ab", "ae", "bd", "ea", "eab"), "a.")
## [2] │ <ab>
## [3] │ <ae>
## [6] │ e<ab>

#The 3 periods match any 3 letters inside the fruit.

str_view(fruit, "a...e")

## [1] │ <apple>
## [7] │ bl<ackbe>rry
## [48] │ mand<arine>
## [51] │ nect<arine>
## [62] │ pine<apple>
## [64] │ pomegr<anate>
## [70] │ r<aspbe>rry
## [73] │ sal<al be>rry

Quantifiers control # of times a pattern can match:

 ? makes a pattern optional (i.e. it matches 0 or 1 times) — question
mark is shaped like a 0 or 1.

 + lets a pattern repeat (i.e. it matches at least once)—just like the

addition sign. Resembles the number 1 so has to match at least one
time.

 * lets a pattern be optional or repeat (i.e. it matches any number of

times, including 0). * Draw a circle around the star like in those
children’s games growing up on hands. Infinite circle means ANY
number qualifies.
# ab? matches an "a", optionally followed by a "b".
str_view(c("a", "ab", "abb"), "ab?")

## [1] │ <a>
## [2] │ <ab>
## [3] │ <ab>b

# ab+ matches an "a", followed by at least one "b".

str_view(c("a", "ab", "abb"), "ab+")

## [2] │ <ab>
## [3] │ <abb>

# ab* matches an "a", followed by any number of "b"s.

str_view(c("a", "ab", "abb"), "ab*")

## [1] │ <a>
## [2] │ <ab>
## [3] │ <abb>

Character classes are denoted by []. [Nemo] matches the letters N, e, m,

or o. [^Nemo] matches everything besides N, e, m, or o.
l <- str_view(words, "[aeiou]x[aeiou]")
m <- str_view(words, "[^aeiou]y[^aeiou]")

Alternation picks between n>1 alternate patterns w | sign.

k <- str_view(fruit, "apple|melon|nut")
j <- str_view(fruit, "aa|ee|ii|oo|uu")
j

## [9] │ bl<oo>d orange

## [33] │ g<oo>seberry
## [47] │ lych<ee>
## [66] │ purple mangost<ee>n

Chapter 15.2 Key Functions

Detect matches with str_detect. Like a detective like Sherlock Holmes.
str_detect(c("a", "b", "c"), "[aeiou]")

## [1] TRUE FALSE FALSE

o <- babynames |>

filter(str_detect(name, "x")) |>
count(name, wt = n, sort = TRUE)

head(o)

## # A tibble: 6 × 2
## name n
## <chr> <int>
## 1 Alexander 665492
## 2 Alexis 399551
## 3 Alex 278705
## 4 Alexandra 232223
## 5 Max 148787
## 6 Alexa 123032

babynames |>
group_by(year) |>
summarize(prop_x = mean(str_detect(name, "x"))) |>
ggplot(aes(x = year, y = prop_x)) +
geom_line()
Str_count() quantifies matches. Str_view() highlights the matches. Regex
expressions are case-sensitive!!!!! Str_to_lower() converts all the words to
lower case. TL —> too long, so make it short. TL DR
babynames |>
count(name) |>
mutate(
name = str_to_lower(name),
vowels = str_count(name, "[aeiou]"),
consonants = str_count(name, "[^aeiou]")
)

## # A tibble: 97,310 × 4
## name n vowels consonants
## <chr> <int> <int> <int>
## 1 aaban 10 3 2
## 2 aabha 5 3 2
## 3 aabid 2 3 2
## 4 aabir 1 3 2
## 5 aabriella 5 5 4
## 6 aada 1 3 1
## 7 aadam 26 3 2
## 8 aadan 11 3 2
## 9 aadarsh 17 3 4
## 10 aaden 18 3 2
## # ℹ 97,300 more rows
3 ways to fix —> ick , Count Chocula experienced the ick and ignore_case
was invoked to prevent upper case fiascos.
Str_replace_all() —> RAxa is irreplaceable. Str_remove_all —> RA Raxa
removed my heart when she didn’t hug me.
x <- c("apple", "pear", "banana")
str_replace_all(x, "[aeiou]", "-")

## [1] "-ppl-" "p--r" "b-n-n-"

x <- c("apple", "pear", "banana")

str_remove_all(x, "[aeiou]")

## [1] "ppl" "pr" "bnn"

Extract variables with separate_wider_regex(). It’s the 3rd cousin (so

fuckable) of separate_wider_position() and separate_wider_delim() . Reggae
dances need to be separated wider bc of the inappropriate NSFW thumping
rhythmic vibes of the musical genre. Hence STRIPATTY –> str - pattern ,
PATTY -> it’s the type of musical genre that makes Ballsack’s mom Patty
strip. TF - WHAT The FISH!! too_few = “debug” can be added as an argument
if a match fails.
df <- tribble(
~str,
"<Sheryl>-F_34",
"<Kisha>-F_45",
"<Brandon>-N_33",
"<Sharon>-F_38",
"<Penny>-F_58",
"<Justin>-M_41",
"<Patricia>-F_84",
)

df2 <- df |>

separate_wider_regex(
str,
patterns = c(
"<",
name = "[A-Za-z]+",
">-",
gender = ".",
"_",
age = "[0-9]+"
)
)
15.3.5 Exercises
1. What baby name has the most vowels? What name has the highest
proportion of vowels? (Hint: what is the denominator?)

“Mariaguadalupe” has the most amount of vowels. “Louie” has the

highest proportion of vowels.
stuff <- babynames |>
mutate(vow = str_count(name, "[aeiou]"),
cons = str_count(name, "[^aeiou]"), denom = vow + cons,
prop = vow / denom) |>
arrange(desc(prop))

2. Replace all forward slashes in "a/b/c/d/e" with backslashes. What

happens if you attempt to undo the transformation by replacing all
backslashes with forward slashes? (We’ll discuss the problem very
soon.)

Doing the reverse throws an error bc “\” throws an error as an escape

character.
y <- "a/b/c/d/e"
str_replace_all(y, pattern = "/", replacement = "\\\\") |> str_view()

## [1] │ a\b\c\d\e

3. Implement a simple version of str_to_lower() using

str_replace_all().
test_string3 <- "Other branches opened in Floral City in 1958, and
Hernando in
1959, as well as the freestanding Crystal River and Homosassa
Libraries."
str_replace_all(test_string3,
pattern = "[A-Z|a-z]",
replacement = tolower)

## [1] "other branches opened in floral city in 1958, and hernando

in \n1959, as well as the freestanding crystal river and homosassa
libraries."

4. Create a regular expression that will match telephone numbers as

commonly written in your country.
telephone_numbers = c(
"555-123-4567",
"(555) 555-7890",
"888-555-4321",
"(123) 456-7890",
"555-987-6543",
"(555) 123-7890"
)
telephone_numbers |>
str_replace(" ", "-") |>
str_replace("\$", "") |>
str_replace("\$", "") |>
as_tibble()

## # A tibble: 6 × 1
## value
## <chr>
## 1 555-123-4567
## 2 555-555-7890
## 3 888-555-4321
## 4 123-456-7890
## 5 555-987-6543
## 6 555-123-7890

15.4.1 Escaping
A period is a metacharacter (to match with letters). Use a backslash to
escape the metacharacter. A backslash also esapes a backslash. Use 4
backslashes to write a literal ‘\’ or 1 backslash. Alternatively, use raw strings
like r”{\\}” to denote a literal backslash.
# To create the regular expression \., we need to use \\.
dot <- "\\."

# But the expression itself only contains one \

str_view(dot)

## [1] │ \.

#> [1] │ \.

# And this tells R to look for an explicit .

str_view(c("abc", "a.c", "bef"), "a\\.c")

## [2] │ <a.c>

x <- "a\\b"
str_view(x)

## [1] │ a\b

str_view(x, "\\\\")

## [1] │ a<\>b

str_view(x, r"{\\}")

## [1] │ a<\>b
str_view(c("abc", "a.c", "a*c", "a c"), "a[.]c")

## [2] │ <a.c>

To match a literal ., $, |, *, +, ?, {, }, (, ), use a character class: [.], [$], [|],

… all match the literal values.
str_view(c("abc", "a.c", "a*c", "a c"), "a[.]c")

## [2] │ <a.c>

str_view(c("abc", "a.c", "ac", "a c"), ".[]c")

## [3] │ <a*c>

Anchor - match at the start ^ (to START going inside of a rabbit hole, dangle
the CARROT, or carat). $ matches the end bc when the world comes to an
end, it’s probably over money. Either way, directionally speaking, the anchor
PROTECTS the letter. Use both carat and dollar sign to match a literal string
only.
str_view(fruit, "^a")

## [1] │ <a>pple
## [2] │ <a>pricot
## [3] │ <a>vocado

str_view(fruit, "a$")

## [4] │ banan<a>
## [15] │ cherimoy<a>
## [30] │ feijo<a>
## [36] │ guav<a>
## [56] │ papay<a>
## [74] │ satsum<a>

str_view(fruit, "apple")

## [1] │ <apple>
## [62] │ pine<apple>

str_view(fruit, "^apple$")

## [1] │ <apple>

“\\b” at the end or start of the word matches the boundary of the word.
y <- c("summary(x)", "summarize(df)", "rowsum(x)", "sum(x)")
z <- c("summary(x)", "summarize(df)", "rowsum(x)", "sum(x)")
z1 <- str_view(z, "sum")
y1 <- str_view(y, "\\bsum\\b")
y1

## [4] │ <sum>(x)
By themselves, anchors yield a zero-length match.
str_view("abc", c("$", "^", "\\b"))

## [1] │ abc<>
## [2] │ <>abc
## [3] │ <>abc<>

str_replace_all("abc", c("$", "^", "\\b"), "--")

## [1] "abc--" "--abc" "--abc--"

Character classes a.k.a. character sets. Special meanings inside the brackets
[] include:
 - defines a range, e.g., [a-z] matches any lower case letter.[0-9]
matches any #.
 \ escapes special characters, so [\^\-\]] matches ^, -, or ].

x <- "abcd ABCD 12345 -!@#%."

x1 <- str_view(x, "[abc]+")
x1

## [1] │ <abc>d ABCD 12345 -!@#%.

x2 <- str_view(x, "[a-z]+")

## [1] │ <abcd> ABCD 12345 -!@#%.

x3 <- str_view(x, "[^a-z0-9]+")

## [1] │ abcd< ABCD >12345< -!@#%.>

# You need an escape to match characters that are otherwise special

inside of []
x4 <- str_view("a-b-c", "[a-c]")
x4

## [1] │ <a>-<b>-<c>

x5 <- str_view("a-b-c", "[a\\-c]")

## [1] │ <a><->b<-><c>

DSW - Designer Shoe Warehouse. Digits are toes which are in shoes.
 \d matches any digit;
\D matches anything that isn’t a digit.
 \s matches any whitespace (e.g., space, tab, newline);
\S matches anything that isn’t whitespace.

 \w matches any “word” character, i.e. letters and numbers;

\W matches any “non-word” character.

x <- "abcd ABCD 12345 -!@#%."

str_view(x, "\\d+")

## [1] │ abcd ABCD <12345> -!@#%.

str_view(x, "\\D+")

## [1] │ <abcd ABCD >12345< -!@#%.>

str_view(x, "\\s+")

## [1] │ abcd< >ABCD< >12345< >-!@#%.

str_view(x, "\\S+")

## [1] │ <abcd> <ABCD> <12345> <-!@#%.>

str_view(x, "\\w+")

## [1] │ <abcd> <ABCD> <12345> -!@#%.

str_view(x, "\\W+")

## [1] │ abcd< >ABCD< >12345< -!@#%.>

Quantifiers -> {n} matches exactly n times. {n, } matches at least n times.
{n, m} matches between n and m times.
PEMDAS -> follow the order of operations parentheses, exponents, multiply,
divide, add, subtract to decide what regex rule gets precedence.
Capturing groups -> parentheses can help make capturing groups (like a
pseudo net) for a sub-match.
Back reference: \1 refers to 1st parentheses. \2 refers to 2nd parentheses.
#Match repeated letter pairs.
str_view(fruit, "(..)\\1")

## [4] │ b<anan>a
## [20] │ <coco>nut
## [22] │ <cucu>mber
## [41] │ <juju>be
## [56] │ <papa>ya
## [73] │ s<alal> berry

#Match words that start and end with repeated letters.

str_view(words, "^(..).*\\1$")
## [152] │ <church>
## [217] │ <decide>
## [617] │ <photograph>
## [699] │ <require>
## [739] │ <sense>

#Switches the 2nd and 3rd word.

sentences |>
str_replace("(\\w+) (\\w+) (\\w+)", "\\1 \\3 \\2") |>
str_view()

## [1] │ The canoe birch slid on the smooth planks.

## [2] │ Glue sheet the to the dark blue background.
## [3] │ It's to easy tell the depth of a well.
## [4] │ These a days chicken leg is a rare dish.
## [5] │ Rice often is served in round bowls.
## [6] │ The of juice lemons makes fine punch.
## [7] │ The was box thrown beside the parked truck.
## [8] │ The were hogs fed chopped corn and garbage.
## [9] │ Four of hours steady work faced us.
## [10] │ A size large in stockings is hard to sell.
## [11] │ The was boy there when the sun rose.
## [12] │ A is rod used to catch pink salmon.
## [13] │ The of source the huge river is the clear spring.
## [14] │ Kick ball the straight and follow through.
## [15] │ Help woman the get back to her feet.
## [16] │ A of pot tea helps to pass the evening.
## [17] │ Smoky lack fires flame and heat.
## [18] │ The cushion soft broke the man's fall.
## [19] │ The breeze salt came across from the sea.
## [20] │ The at girl the booth sold fifty bonds.
## ... and 700 more

#Convert the expressions to a TIBBLE & name the columns. Form of

#separate_wider_regex()
sentences |>
str_match("the (\\w+) (\\w+)") |>
as_tibble(.name_repair = "minimal") |>
set_names("match", "word1", "word2")

## # A tibble: 720 × 3
## match word1 word2
## <chr> <chr> <chr>
## 1 the smooth planks smooth planks
## 2 the sheet to sheet to
## 3 the depth of depth of
## 4 <NA> <NA> <NA>
## 5 <NA> <NA> <NA>
## 6 <NA> <NA> <NA>
## 7 the parked truck parked truck
## 8 <NA> <NA> <NA>
## 9 <NA> <NA> <NA>
## 10 <NA> <NA> <NA>
## # ℹ 710 more rows

Use ?: to create a non-capturing group.

x <- c("a gray cat", "a grey dog")
str_match(x, "gr(e|a)y")

## [,1] [,2]
## [1,] "gray" "a"
## [2,] "grey" "e"

str_match(x, "gr(?:e|a)y")

## [,1]
## [1,] "gray"
## [2,] "grey"

15.4.7 Exercises
1. How would you match the literal string "'\? How about "$^$"?
input_string <- "\"'\\"
str_view(input_string)

## [1] │ "'\

# Pattern to match the literal string

match_pattern <- "\"\'\\\\"
str_view(match_pattern)

## [1] │ "'\\

input_string <- "\"$^$\""

str_view(input_string)

## [1] │ "$^$"

# Pattern to match the literal string

match_pattern <- "\"\\$\\^\\$\""
str_view(match_pattern)

## [1] │ "\$\^\$"

2. Explain why each of these patterns don’t match a \: "\", "\\", "\\\". \
single backslash is an escape character. 2nd one has quotes. 3rd one
means literal backslash. 4th one –> 3 backslashes refers to a LITERAL
backslash in a regex expression; with the input string, it refers to a
single backslash.

3. Given the corpus of common words in stringr::words, create regular

expressions that find all words that:
a. Start with “y”.
words |>
str_view(pattern = "^y")

## [975] │ <y>ear
## [976] │ <y>es
## [977] │ <y>esterday
## [978] │ <y>et
## [979] │ <y>ou
## [980] │ <y>oung

b. Don’t start with “y”.

words |>
str_view(pattern = "^(?!y)")

## [1] │ <>a
## [2] │ <>able
## [3] │ <>about
## [4] │ <>absolute
## [5] │ <>accept
## [6] │ <>account
## [7] │ <>achieve
## [8] │ <>across
## [9] │ <>act
## [10] │ <>active
## [11] │ <>actual
## [12] │ <>add
## [13] │ <>address
## [14] │ <>admit
## [15] │ <>advertise
## [16] │ <>affect
## [17] │ <>afford
## [18] │ <>after
## [19] │ <>afternoon
## [20] │ <>again
## ... and 954 more

c. End with “x”.

words |> str_view(pattern = "x$")

## [108] │ bo<x>
## [747] │ se<x>
## [772] │ si<x>
## [841] │ ta<x>

d. Are exactly three letters long. (Don’t cheat by using

str_length()!)
words |>
str_subset(pattern = "\\b\\w{3}\\b")
## [1] "act" "add" "age" "ago" "air" "all" "and" "any" "arm"
"art" "ask" "bad"
## [13] "bag" "bar" "bed" "bet" "big" "bit" "box" "boy" "bus"
"but" "buy" "can"
## [25] "car" "cat" "cup" "cut" "dad" "day" "die" "dog" "dry"
"due" "eat" "egg"
## [37] "end" "eye" "far" "few" "fit" "fly" "for" "fun" "gas"
"get" "god" "guy"
## [49] "hit" "hot" "how" "job" "key" "kid" "lad" "law" "lay"
"leg" "let" "lie"
## [61] "lot" "low" "man" "may" "mrs" "new" "non" "not" "now"
"odd" "off" "old"
## [73] "one" "out" "own" "pay" "per" "put" "red" "rid" "run"
"say" "see" "set"
## [85] "sex" "she" "sir" "sit" "six" "son" "sun" "tax" "tea"
"ten" "the" "tie"
## [97] "too" "top" "try" "two" "use" "war" "way" "wee" "who"
"why" "win" "yes"
## [109] "yet" "you"

e. Have seven letters or more.

words |>
str_subset(pattern = "\\b\\w{7,}\\b")

## [1] "absolute" "account" "achieve" "address"

"advertise"
## [6] "afternoon" "against" "already" "alright"
"although"
## [11] "america" "another" "apparent" "appoint"
"approach"
## [16] "appropriate" "arrange" "associate" "authority"
"available"
## [21] "balance" "because" "believe" "benefit"
"between"
## [26] "brilliant" "britain" "brother" "business"
"certain"
## [31] "chairman" "character" "Christmas" "colleague"
"collect"
## [36] "college" "comment" "committee" "community"
"company"
## [41] "compare" "complete" "compute" "concern"
"condition"
## [46] "consider" "consult" "contact" "continue"
"contract"
## [51] "control" "converse" "correct" "council"
"country"
## [56] "current" "decision" "definite" "department"
"describe"
## [61] "develop" "difference" "difficult" "discuss"
"district"
## [66] "document" "economy" "educate" "electric"
"encourage"
## [71] "english" "environment" "especial" "evening"
"evidence"
## [76] "example" "exercise" "expense" "experience"
"explain"
## [81] "express" "finance" "fortune" "forward"
"function"
## [86] "further" "general" "germany" "goodbye"
"history"
## [91] "holiday" "hospital" "however" "hundred"
"husband"
## [96] "identify" "imagine" "important" "improve"
"include"
## [101] "increase" "individual" "industry" "instead"
"interest"
## [106] "introduce" "involve" "kitchen" "language"
"machine"
## [111] "meaning" "measure" "mention" "million"
"minister"
## [116] "morning" "necessary" "obvious" "occasion"
"operate"
## [121] "opportunity" "organize" "original" "otherwise"
"paragraph"
## [126] "particular" "pension" "percent" "perfect"
"perhaps"
## [131] "photograph" "picture" "politic" "position"
"positive"
## [136] "possible" "practise" "prepare" "present"
"pressure"
## [141] "presume" "previous" "private" "probable"
"problem"
## [146] "proceed" "process" "produce" "product"
"programme"
## [151] "project" "propose" "protect" "provide"
"purpose"
## [156] "quality" "quarter" "question" "realise"
"receive"
## [161] "recognize" "recommend" "relation" "remember"
"represent"
## [166] "require" "research" "resource" "respect"
"responsible"
## [171] "saturday" "science" "scotland" "secretary"
"section"
## [176] "separate" "serious" "service" "similar"
"situate"
## [181] "society" "special" "specific" "standard"
"station"
## [186] "straight" "strategy" "structure" "student"
"subject"
## [191] "succeed" "suggest" "support" "suppose"
"surprise"
## [196] "telephone" "television" "terrible" "therefore"
"thirteen"
## [201] "thousand" "through" "thursday" "together"
"tomorrow"
## [206] "tonight" "traffic" "transport" "trouble"
"tuesday"
## [211] "understand" "university" "various" "village"
"wednesday"
## [216] "welcome" "whether" "without" "yesterday"

f. Contain a vowel-consonant pair.

words |>
str_view(pattern = "[aeiou][^aeiou]")

## [2] │ <ab>le
## [3] │ <ab>o<ut>
## [4] │ <ab>s<ol><ut>e
## [5] │ <ac>c<ep>t
## [6] │ <ac>co<un>t
## [7] │ <ac>hi<ev>e
## [8] │ <ac>r<os>s
## [9] │ <ac>t
## [10] │ <ac>t<iv>e
## [11] │ <ac>tu<al>
## [12] │ <ad>d
## [13] │ <ad>dr<es>s
## [14] │ <ad>m<it>
## [15] │ <ad>v<er>t<is>e
## [16] │ <af>f<ec>t
## [17] │ <af>f<or>d
## [18] │ <af>t<er>
## [19] │ <af>t<er>no<on>
## [20] │ <ag>a<in>
## [21] │ <ag>a<in>st
## ... and 924 more

g. Contain at least two vowel-consonant pairs in a row.

words |>
str_view(pattern = "[aeiou][^aeiou][aeiou][^aeiou]")

## [4] │ abs<olut>e
## [23] │ <agen>t
## [30] │ <alon>g
## [36] │ <amer>ica
## [39] │ <anot>her
## [42] │ <apar>t
## [43] │ app<aren>t
## [61] │ auth<orit>y
## [62] │ ava<ilab>le
## [63] │ <awar>e
## [64] │ <away>
## [70] │ b<alan>ce
## [75] │ b<asis>
## [81] │ b<ecom>e
## [83] │ b<efor>e
## [84] │ b<egin>
## [85] │ b<ehin>d
## [87] │ b<enef>it
## [119] │ b<usin>ess
## [143] │ ch<arac>ter
## ... and 149 more

h. Only consist of repeated vowel-consonant pairs.

words |>
str_view(pattern = "^(?:[aeiou][^aeiou]){2,}$")

## [64] │ <away>
## [265] │ <eleven>
## [279] │ <even>
## [281] │ <ever>
## [436] │ <item>
## [573] │ <okay>
## [579] │ <open>
## [586] │ <original>
## [591] │ <over>
## [905] │ <unit>
## [911] │ <upon>

4. Create 11 regular expressions that match the British or American

spellings for each of the following words: airplane/aeroplane,
aluminum/aluminium, analog/ analogue, ass/arse, center/centre,
defense/defence, donut/doughnut, gray/grey, modeling/modelling,
skeptic/sceptic, summarize/summarise. Try and make the shortest
possible regex!
exp1 <-"The airplane is made of aluminum. The analog signal is
stronger. Don't
be an ass. The center is closed for defense training. I prefer a
donut, while
she likes a doughnut. His hair is gray, but hers is grey. We're
modeling a new
project. The skeptic will not believe it. Please summarize the
report."

for (pattern in patterns_to_detect) {

matches <- str_extract_all(exp1, pattern)
if (length(matches[[1]]) > 0) {
exp1 <- str_replace_all(exp1,
pattern,
paste0("**", matches[[1]], "**"))
}
}

exp1

## [1] "The airplane is made of aluminum. The analog

signal is stronger. Don't\nbe an **ass**. The **center** is closed for
**defense** training. I prefer a donut, while \nshe likes a
**doughnut**. His hair is **gray**, but hers is **gray**. We're
**modeling** a new \nproject. The **skeptic** will not believe it.
Please **summarize** the report."
## [2] "The **airplane** is made of **aluminum**. The **analog**
signal is stronger. Don't\nbe an **ass**. The **center** is closed for
**defense** training. I prefer a donut, while \nshe likes a
**doughnut**. His hair is **grey**, but hers is **grey**. We're
**modeling** a new \nproject. The **skeptic** will not believe it.
Please **summarize** the report."

5. Switch the first and last letters in words. Which of those strings are still
words?
new_words = words |>
str_replace_all(pattern = "\\b(\\w)(\\w*)(\\w)\\b",
replacement = "\\3\\2\\1")

6. Describe in words what these regular expressions match: (read

carefully to see if each entry is a regular expression or a string that
defines a regular expression.)

a. ^.*$ Match an entire string.

b. "\\{.+\\}" Matches an expression like {abc}.

c. \d{4}-\d{2}-\d{2} Matches digits of specified lengths with

hyphens between like “1989-02-18.”
d. "\\\\{4}" Matches an expression with 4 backslashes. “\\abcd”

e. \..\..\.. It matches strings like “.a.b.c” or “.1.2.3”

f. (.)\1\1 The parentheses (.) capture any single character, and \1

refers to the first captured character. So, it matches strings like
“aaa” or “111.”

g. "(..)\\1" Matches strings with 2 identical characters repeated.

Matches “aa” or “11.”

7. Solve the beginner regexp crosswords at

https://regexcrossword.com/challenges/beginner. No bueno!

Regex flags: control general Pacifics! of the regexp details. Coolest flag is
ignore_case = TRUE.
bananas <- c("banana", "Banana", "BANANA")
str_view(bananas, "banana")

## [1] │ <banana>

str_view(bananas, regex("banana", ignore_case = TRUE))

## [1] │ <banana>
## [2] │ <Banana>
## [3] │ <BANANA>

2ND regex flag: dotall = TRUE lets . match everything, including \n:
x <- "Line 1\nLine 2\nLine 3"
str_view(x, ".Line")
str_view(x, regex(".Line", dotall = TRUE))

## [1] │ Line 1<

## │ Line> 2<
## │ Line> 3

multiline = TRUE makes ^ and $ match the start and end of each line rather
than the start and end of the complete string:
x <- "Line 1\nLine 2\nLine 3"
str_view(x, "^Line")

## [1] │ <Line> 1
## │ Line 2
## │ Line 3

str_view(x, regex("^Line", multiline = TRUE))

## [1] │ <Line> 1
## │ <Line> 2
## │ <Line> 3
comments = TRUE ignores anything after #.
phone <- regex(
r"(
\(? # optional opening parens
(\d{3}) # area code
[)\-]? # optional closing parens or dash
\ ? # optional space
(\d{3}) # another three numbers
[\ -]? # optional space or dash
(\d{4}) # four more numbers
)",
comments = TRUE
)

str_extract(c("514-791-8141", "(123) 456 7890", "123456"), phone)

## [1] "514-791-8141" "(123) 456 7890" NA

Opt-out of regex rules with fixed() function, which can also ignore case.
str_view(c("", "a", "."), fixed("."))

## [3] │ <.>

str_view("x X", "X")

## [1] │ x <X>

str_view("x X", fixed("X", ignore_case = TRUE))

## [1] │ <x> <X>

NOTES!:
str_view(sentences, “^The”) matches ANYTHING that starts with ‘The’
including sentences that begin with ’There is no way home” and not just
sentences that start with “The.”
str_view(sentences, “^The\\b”) matches ONLY sentences that start with
“The”.
str_view(sentences, “^She|He|It|They\\b”) matches sentences that start w/ a
pronoun. But ADD parentheses like here:
str_view(sentences, “^(She|He|It|They)\\b”)
Try testing patterns to spot mistakes!
str_view(words, “^[^aeiou]+$”) matches words with ONLY consonants.
str_view(words[!str_detect(words, “[aeiou]”)]) ALSO matches words with
ONLY consonants.
str_view(words, “a.*b|b.*a”) matches words containing a and b in both
orders.
words[str_detect(words, “a”) & str_detect(words, “b”)] is an easier way to
detect same letters.
words[str_detect(words, “a.*e.*i.*o.*u”)] finds words with ALL five vowels.
The equivalent is 5 str_detect() calls with &.
words[ str_detect(words, “a”) & str_detect(words, “e”) & str_detect(words,
“i”) & str_detect(words, “o”) & str_detect(words, “u”)]
Str_flatten() and str_c() can create strings to use inside regex functions.
rgb <- c("red", "green", "blue")
j <- str_c("\\b(", str_flatten(rgb, "|"), ")\\b")
str_view(sentences, j)

## [2] │ Glue the sheet to the dark <blue> background.

## [26] │ Two <blue> fish swam in the tank.
## [92] │ A wisp of cloud hung in the <blue> air.
## [148] │ The spot on the blotter was made by <green> ink.
## [160] │ The sofa cushion is <red> and of light weight.
## [174] │ The sky that morning was clear and bright <blue>.
## [204] │ A <blue> crane is a tall wading bird.
## [217] │ It is hard to erase <blue> or <red> ink.
## [224] │ The lamp shone with a steady <green> flame.
## [247] │ The box is held by a bright <red> snapper.
## [256] │ The houses are built of <red> clay bricks.
## [274] │ The <red> tape bound the smuggled food.
## [288] │ Hedge apples may stain your hands <green>.
## [302] │ The plant grew large and <green> in the window.
## [330] │ Bathe and relax in the cool <green> grass.
## [368] │ The lake sparkled in the <red> hot sun.
## [372] │ Mark the spot with a sign painted <red>.
## [452] │ The couch cover and hall drapes were <blue>.
## [491] │ A man in a <blue> sweater sat at the desk.
## [551] │ The small <red> neon lamp went out.
## ... and 6 more

15.6.4 Exercises
1. For each of the following challenges, try solving it by using both a
single regular expression, and a combination of multiple
[str_detect()] (https://stringr.tidyverse.org/reference/str_detect.html)
calls.

a. Find all words that start or end with x.

start_r <- str_detect(words, "^x")
end_r <- str_detect(words, "x$")
words[start_r | end_r]

## [1] "box" "sex" "six" "tax"

b. Find all words that start with a vowel and end with a consonant.
start_r <- str_detect(words, "^[aeiou]")
end_r <- str_detect(words, "[^aeiou]$")

words[start_r & end_r]

## [1] "about" "accept" "account" "across"

"act"
## [6] "actual" "add" "address" "admit"
"affect"
## [11] "afford" "after" "afternoon" "again"
"against"
## [16] "agent" "air" "all" "allow"
"almost"
## [21] "along" "already" "alright" "although"
"always"
## [26] "amount" "and" "another" "answer"
"any"
## [31] "apart" "apparent" "appear" "apply"
"appoint"
## [36] "approach" "arm" "around" "art"
"as"
## [41] "ask" "at" "attend" "authority"
"away"
## [46] "awful" "each" "early" "east"
"easy"
## [51] "eat" "economy" "effect" "egg"
"eight"
## [56] "either" "elect" "electric" "eleven"
"employ"
## [61] "end" "english" "enjoy" "enough"
"enter"
## [66] "environment" "equal" "especial" "even"
"evening"
## [71] "ever" "every" "exact" "except"
"exist"
## [76] "expect" "explain" "express" "identify"
"if"
## [81] "important" "in" "indeed" "individual"
"industry"
## [86] "inform" "instead" "interest" "invest"
"it"
## [91] "item" "obvious" "occasion" "odd"
"of"
## [96] "off" "offer" "often" "okay"
"old"
## [101] "on" "only" "open" "opportunity"
"or"
## [106] "order" "original" "other" "ought"
"out"
## [111] "over" "own" "under" "understand"
"union"
## [116] "unit" "university" "unless" "until"
"up"
## [121] "upon" "usual"

c. Are there any words that contain at least one of each different
vowel?
vowels <-
str_detect(words, "a") & str_detect(words, "e") &
str_detect(words, "i") &
str_detect(words, "o") & str_detect(words, "u")

words[vowels]

## character(0)

2. Construct patterns to find evidence for and against the rule “i before e
except after c”?
rule <- str_detect(words, "[A-Za-z]*(cei|[^c]ie)[A-Za-z]*")

pattern_1a = "\\b\\w*ie\\w*\\b"
pattern_1b = "\\b\\w+ei\\w*\\b"

pattern_2a = "\\b\\w*cei\\w*\\b"
pattern_2b = "\\b\\w*cie\\w*\\b"
words[str_detect(words, pattern_1a)]

## [1] "achieve" "believe" "brief" "client" "die"

## [6] "experience" "field" "friend" "lie" "piece"

## [11] "quiet" "science" "society" "tie" "view"

# Words which contain "e" before an "i", thus giving evidence against
# the rule, unless there is a preceding "c"
words[str_detect(words, pattern_1b)]

## [1] "receive" "weigh"

# Words which contain "e" before an "i" after "c", thus following the
rule.
# That is, evidence in favour of the rule
words[str_detect(words, pattern_2a)]
## [1] "receive"

# Words which contain an "i" before "e" after "c", thus violating the
rule.
# That is, evidence against the rule
words[str_detect(words, pattern_2b)]

## [1] "science" "society"

3. colors() contains a number of modifiers like “lightgray” and

“darkblue”. How could you automatically identify these modifiers?
(Think about how you might detect and then remove the colors that
are modified).
col_vec = colours(distinct = TRUE)
col_vec = col_vec[!str_detect(col_vec, "\\b\\w*\\d\\w*\\b")]

col_vec[str_detect(col_vec, "\\b(?:light|dark)\\w*\\b")]

## [1] "darkgoldenrod" "darkgray" "darkgreen"

## [4] "darkkhaki" "darkmagenta" "darkolivegreen"

## [7] "darkorange" "darkorchid" "darkred"

## [10] "darksalmon" "darkseagreen" "darkslateblue"

## [13] "darkslategray" "darkturquoise" "darkviolet"

## [16] "lightblue" "lightcoral" "lightcyan"

## [19] "lightgoldenrod" "lightgoldenrodyellow" "lightgray"

## [22] "lightgreen" "lightpink" "lightsalmon"

## [25] "lightseagreen" "lightskyblue" "lightslateblue"

## [28] "lightslategray" "lightsteelblue" "lightyellow"

4. Create a regular expression that finds any base R dataset. You can get
a list of these datasets via a special use of the data() function:
data(package = "datasets")$results[, "Item"]. Note that a number
of old datasets are individual vectors; these contain the name of the
grouping “data frame” in parentheses, so you’ll need to strip those off.
# Extract all base R datasets into a character vector
base_r_packs = data(package = "datasets")$results[, "Item"]

# Remove all the names of grouping data.frames in parenthesis

base_r_packs = str_replace_all(base_r_packs,
pattern = "\$[^()]+\$",
replacement = "")
# Remove the whitespace, i.e., " " let after removing the parenthesis
words
base_r_packs = str_replace_all(base_r_packs,
pattern = "\\s+$",
replacement = "")

# Create the regular expression

huge_regex = str_c("\\b(", str_flatten(base_r_packs, "|"), ")\\b")

15.7 Regex in other places

3 cool places that invoke regex: matches(), pivot_longer(), and delim(). MS
DP!!!!!! lol my name.
Base R: apropos(“replace”) matches all objects w the contained pattern of
replace.
List.files can also use REGEX to match files with specific names.
head(list.files(pattern = "\\.Rmd$"))

## [1] "Chapter-15.Rmd" "Chapter 15.Rmd"

Mathematics I Term 2 Module
No ratings yet
Mathematics I Term 2 Module
60 pages
CV Format For PRC Activity
100% (8)
CV Format For PRC Activity
2 pages
MEL203 2019 Lab Manual
100% (1)
MEL203 2019 Lab Manual
72 pages
Painless Pre-Algebra
From Everand
Painless Pre-Algebra
Barron's Educational Series
3/5 (2)
PDA Visual Inspection - Course Highlights
100% (1)
PDA Visual Inspection - Course Highlights
167 pages
Lec 07
No ratings yet
Lec 07
14 pages
14 Strings - R For Data Science
No ratings yet
14 Strings - R For Data Science
19 pages
Chapter 9
No ratings yet
Chapter 9
16 pages
Regex
No ratings yet
Regex
1 page
R Programming
No ratings yet
R Programming
37 pages
Unit - 5
No ratings yet
Unit - 5
22 pages
String R
No ratings yet
String R
6 pages
A Guide To R Regular Expressions
No ratings yet
A Guide To R Regular Expressions
15 pages
Regex
No ratings yet
Regex
1 page
Strings
No ratings yet
Strings
2 pages
Reg Ex Cheat Sheet
No ratings yet
Reg Ex Cheat Sheet
1 page
R Programming Lab Manual-24-25
No ratings yet
R Programming Lab Manual-24-25
17 pages
Reg Ex Cheat Sheet
No ratings yet
Reg Ex Cheat Sheet
1 page
Stringr: Modern, Consistent String Processing
No ratings yet
Stringr: Modern, Consistent String Processing
3 pages
RJournal 2010-2 Wickham PDF
No ratings yet
RJournal 2010-2 Wickham PDF
3 pages
Work With Strings With Stringr::: Cheat Sheet
No ratings yet
Work With Strings With Stringr::: Cheat Sheet
2 pages
Learn R in W3 School
No ratings yet
Learn R in W3 School
21 pages
R Useful Stuff
No ratings yet
R Useful Stuff
3 pages
Tidy Verse
No ratings yet
Tidy Verse
76 pages
Programming in R. Ex 4 Detailed Explanation
No ratings yet
Programming in R. Ex 4 Detailed Explanation
10 pages
String Manipulation With Stringr::: Cheat Sheet
No ratings yet
String Manipulation With Stringr::: Cheat Sheet
2 pages
R Master Sheet - All Codes, Inbuilt Functions and Packages Needed For The Course
No ratings yet
R Master Sheet - All Codes, Inbuilt Functions and Packages Needed For The Course
2 pages
Unit 1.2
No ratings yet
Unit 1.2
52 pages
Python Regex v3p2
100% (1)
Python Regex v3p2
113 pages
Krish Bhatia BAS Assignment
No ratings yet
Krish Bhatia BAS Assignment
63 pages
R BasicCommands
No ratings yet
R BasicCommands
5 pages
Scilab String
No ratings yet
Scilab String
4 pages
Base R
No ratings yet
Base R
9 pages
DR - Pierpaolo-Delser - Introduction R
No ratings yet
DR - Pierpaolo-Delser - Introduction R
83 pages
Intro 2 R
No ratings yet
Intro 2 R
206 pages
Regular Expressions
No ratings yet
Regular Expressions
9 pages
R Fundamentals
No ratings yet
R Fundamentals
41 pages
Advanced String Patterns: Wolfram Mathematica ® Tutorial Collection
No ratings yet
Advanced String Patterns: Wolfram Mathematica ® Tutorial Collection
40 pages
UNIT-4 (Regular Expressions)
No ratings yet
UNIT-4 (Regular Expressions)
25 pages
Data Editor
No ratings yet
Data Editor
6 pages
13B RegExp
No ratings yet
13B RegExp
38 pages
Module-2 String, Date and Time, Data Preparation Example Code
No ratings yet
Module-2 String, Date and Time, Data Preparation Example Code
18 pages
Package GRR': Topics Documented
No ratings yet
Package GRR': Topics Documented
11 pages
Bio503 Version
No ratings yet
Bio503 Version
256 pages
Lec 07-II-DSFa23
No ratings yet
Lec 07-II-DSFa23
44 pages
R Basic and Advanced
No ratings yet
R Basic and Advanced
9 pages
Unit4 Advanced R Programming
No ratings yet
Unit4 Advanced R Programming
34 pages
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
No ratings yet
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
18 pages
Intro 2 R
No ratings yet
Intro 2 R
206 pages
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
100% (1)
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
18 pages
R Comandos
No ratings yet
R Comandos
13 pages
Regular Expressions Cheat Sheet
No ratings yet
Regular Expressions Cheat Sheet
5 pages
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
No ratings yet
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
18 pages
Introduction To Rlogistic
No ratings yet
Introduction To Rlogistic
135 pages
Working With Text Data in R
No ratings yet
Working With Text Data in R
1 page
Advanced - Regular Expressions Tutorial
No ratings yet
Advanced - Regular Expressions Tutorial
8 pages
Lec 07 II Dsfa23
No ratings yet
Lec 07 II Dsfa23
44 pages
Python Unit 3
No ratings yet
Python Unit 3
46 pages
The R Inferno: Patrick Burns 30th April 2011
No ratings yet
The R Inferno: Patrick Burns 30th April 2011
126 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
150+ C Pattern Programs
From Everand
150+ C Pattern Programs
Hernando Abella
No ratings yet
Learn Python through Nursery Rhymes and Fairy Tales: Classic Stories Translated into Python Programs (Coding for Kids and Beginners)
From Everand
Learn Python through Nursery Rhymes and Fairy Tales: Classic Stories Translated into Python Programs (Coding for Kids and Beginners)
Shari Eskenas
5/5 (1)
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Astronomy Distances
No ratings yet
Astronomy Distances
23 pages
Trial Test 1
No ratings yet
Trial Test 1
2 pages
Arun Resume Final
No ratings yet
Arun Resume Final
1 page
Sample Thesis Civil Engineering Students
100% (3)
Sample Thesis Civil Engineering Students
5 pages
Sample Product Evaluation Questionnaire PDF
No ratings yet
Sample Product Evaluation Questionnaire PDF
15 pages
Solution Manual For Applied Partial Differential Equations With Fourier Series and Boundary Value Problems, 5/E Richard Haberman
100% (9)
Solution Manual For Applied Partial Differential Equations With Fourier Series and Boundary Value Problems, 5/E Richard Haberman
42 pages
Transition Words or Connectors Categories Cause or Reason Effect/result/consequence Contrast Condition
No ratings yet
Transition Words or Connectors Categories Cause or Reason Effect/result/consequence Contrast Condition
2 pages
Generation Alpha Student Behaviour Research
No ratings yet
Generation Alpha Student Behaviour Research
18 pages
Stator Core Vibration and Temperature Analysis of Hydropower Generation Unit at 100 HZ Frequency
No ratings yet
Stator Core Vibration and Temperature Analysis of Hydropower Generation Unit at 100 HZ Frequency
6 pages
Amanuel Ashenafi
No ratings yet
Amanuel Ashenafi
85 pages
Revised Time-Table - BCA I III V Sem Main ATKT M.Sc. I III Sem Main ATKT Exam - 20.11.2019
No ratings yet
Revised Time-Table - BCA I III V Sem Main ATKT M.Sc. I III Sem Main ATKT Exam - 20.11.2019
1 page
TDR 240 Spec
No ratings yet
TDR 240 Spec
4 pages
Homework For Students With Disabilities Should Be Used Quizlet
100% (1)
Homework For Students With Disabilities Should Be Used Quizlet
5 pages
STTP - Brochure - RTICE-24 - 27.1.24 - 2
No ratings yet
STTP - Brochure - RTICE-24 - 27.1.24 - 2
2 pages
12th Maths Half Yearly Exam Original Question Paper 2022 Tirunelveli DT
No ratings yet
12th Maths Half Yearly Exam Original Question Paper 2022 Tirunelveli DT
4 pages
Les Valeurs Du Dirigeant Et La Croissance de La PME: Véra Ivanaj and Sybil Géhin
No ratings yet
Les Valeurs Du Dirigeant Et La Croissance de La PME: Véra Ivanaj and Sybil Géhin
29 pages
Ds-Syllabus of Pa2 Term II, STD Viii, 23-24
No ratings yet
Ds-Syllabus of Pa2 Term II, STD Viii, 23-24
1 page
CET Sail Mathematics, Algebra (2023)
No ratings yet
CET Sail Mathematics, Algebra (2023)
22 pages
Pre Submission Review - 0
No ratings yet
Pre Submission Review - 0
9 pages
Cambridge IGCSE: PHYSICS 0625/63
No ratings yet
Cambridge IGCSE: PHYSICS 0625/63
16 pages
23 1ere Compo Nov P2
No ratings yet
23 1ere Compo Nov P2
20 pages
Automata Revision
No ratings yet
Automata Revision
6 pages
04.02 Cells and The Structure of Life Guided Notes: Objectives
100% (1)
04.02 Cells and The Structure of Life Guided Notes: Objectives
2 pages
Lesson Plan 1
No ratings yet
Lesson Plan 1
2 pages
Viva, Oral For Sem Exam-Comm. Skills
No ratings yet
Viva, Oral For Sem Exam-Comm. Skills
2 pages
JS Mill's View On Liberty
No ratings yet
JS Mill's View On Liberty
4 pages

Chapter 15

Uploaded by

Chapter 15

Uploaded by

Chapter 15

## ── Attaching core tidyverse packages ────────────────────────

#The period after the a is a METAcharacter.

#The 3 periods match any 3 letters inside the fruit.

Quantifiers control # of times a pattern can match:

 + lets a pattern repeat (i.e. it matches at least once)—just like the

 * lets a pattern be optional or repeat (i.e. it matches any number of

# ab+ matches an "a", followed by at least one "b".

# ab* matches an "a", followed by any number of "b"s.

Character classes are denoted by []. [Nemo] matches the letters N, e, m,

Alternation picks between n>1 alternate patterns w | sign.

## [9] │ bl<oo>d orange

Chapter 15.2 Key Functions

## [1] TRUE FALSE FALSE

o <- babynames |>

## [1] "-ppl-" "p--r" "b-n-n-"

x <- c("apple", "pear", "banana")

## [1] "ppl" "pr" "bnn"

Extract variables with separate_wider_regex(). It’s the 3rd cousin (so

df2 <- df |>

“Mariaguadalupe” has the most amount of vowels. “Louie” has the

2. Replace all forward slashes in "a/b/c/d/e" with backslashes. What

Doing the reverse throws an error bc “\” throws an error as an escape

3. Implement a simple version of str_to_lower() using

## [1] "other branches opened in floral city in 1958, and hernando

4. Create a regular expression that will match telephone numbers as

# But the expression itself only contains one \

# And this tells R to look for an explicit .

To match a literal ., $, |, *, +, ?, {, }, (, ), use a character class: [.], [$], [|],

str_view(c("abc", "a.c", "a*c", "a c"), ".[*]c")

str_replace_all("abc", c("$", "^", "\\b"), "--")

## [1] "abc--" "--abc" "--abc--"

x <- "abcd ABCD 12345 -!@#%."

## [1] │ <abc>d ABCD 12345 -!@#%.

x2 <- str_view(x, "[a-z]+")

## [1] │ <abcd> ABCD 12345 -!@#%.

x3 <- str_view(x, "[^a-z0-9]+")

## [1] │ abcd< ABCD >12345< -!@#%.>

# You need an escape to match characters that are otherwise special

x5 <- str_view("a-b-c", "[a\\-c]")

 \w matches any “word” character, i.e. letters and numbers;

x <- "abcd ABCD 12345 -!@#%."

## [1] │ abcd ABCD <12345> -!@#%.

## [1] │ <abcd ABCD >12345< -!@#%.>

## [1] │ abcd< >ABCD< >12345< >-!@#%.

## [1] │ <abcd> <ABCD> <12345> <-!@#%.>

## [1] │ <abcd> <ABCD> <12345> -!@#%.

## [1] │ abcd< >ABCD< >12345< -!@#%.>

#Match words that start and end with repeated letters.

#Switches the 2nd and 3rd word.

## [1] │ The canoe birch slid on the smooth planks.

#Convert the expressions to a TIBBLE & name the columns. Form of

Use ?: to create a non-capturing group.

# Pattern to match the literal string

input_string <- "\"$^$\""

# Pattern to match the literal string

3. Given the corpus of common words in stringr::words, create regular

b. Don’t start with “y”.

c. End with “x”.

d. Are exactly three letters long. (Don’t cheat by using

e. Have seven letters or more.

## [1] "absolute" "account" "achieve" "address"

f. Contain a vowel-consonant pair.

g. Contain at least two vowel-consonant pairs in a row.

h. Only consist of repeated vowel-consonant pairs.

4. Create 11 regular expressions that match the British or American

for (pattern in patterns_to_detect) {

## [1] "The **airplane** is made of **aluminum**. The **analog**

6. Describe in words what these regular expressions match: (read

a. ^.*$ Match an entire string.

b. "\\{.+\\}" Matches an expression like {abc}.

c. \d{4}-\d{2}-\d{2} Matches digits of specified lengths with

e. \..\..\.. It matches strings like “.a.b.c” or “.1.2.3”

f. (.)\1\1 The parentheses (.) capture any single character, and \1

g. "(..)\\1" Matches strings with 2 identical characters repeated.

7. Solve the beginner regexp crosswords at

str_view(c("abc", "a.c", "ac", "a c"), ".[]c")

## [1] "The airplane is made of aluminum. The analog