0% found this document useful (0 votes)

13 views1 page

Regular Expressions (C++) - Microsoft Learn

Regular Expressions (C++) _ Microsoft Learn

Uploaded by

Murat Tanır

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views1 page

Regular Expressions (C++) - Microsoft Learn

Regular Expressions (C++) _ Microsoft Learn

Uploaded by

Murat Tanır

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

Microsoft Ignite Register now Ｔ

November 17–21, 2025

Learn Documentation Ｓ Training Ｓ Q&A Ｓ Topics Ｓ  Sign in

C++ C++ in Visual Studio overview Language reference Ｓ Libraries Ｓ C++ build process Ｓ Windows programming with C++ Ｓ

Version Learn / C++, C, and Assembler / 󿄐 Ask Learn  Focus mode  In this article

Regular expressions (C++)

Visual Studio 2022 Ｓ
Regular expression grammar
 Filter by title Grammar summary
C Standard Library containers Semantic details
08/06/2021
Iterators Matching and searching
Algorithms The C++ standard library supports multiple regular expression grammars. This topic Format flags
discusses the grammar variations available when using regular expressions. See also
Allocators
Function objects in the C++ Standard Library
Ｔ iostream programming
Regular expressions (C++)
Regular expression grammar Was this page helpful?

 Yes  No
File system navigation The regular expression grammar to use is by specified by the use of one of the
std::regex_constants::syntax_option_type enumeration values. These regular
 Download PDF
expression grammars are defined in std::regex_constants :

ECMAScript: This is closest to the grammar used by JavaScript and the .NET
languages.
basic: The POSIX basic regular expressions or BRE.
extended: The POSIX extended regular expressions or ERE.
awk: This is extended, but it has more escapes for non-printing characters.
grep: This is basic, but it also allows newline ( \n ) characters to separate alternations.
egrep: This is extended, but it also allows newline characters to separate alternations.

By default, if no grammar is specified, ECMAScript is assumed. Only one grammar may be

specified.

Several flags can also be applied:

icase : Ignore case when matching.

nosubs : Ignore marked matches (that is, expressions in parentheses); no substitutions

are stored.
optimize : Make matching faster, at the possible expense of greater construction

time.
collate : Use locale-sensitive collation sequences (for example, ranges of the form

[a-z] ).

Zero or more flags may be combined with the grammar to specify the regular expression
engine behavior. If only flags are specified, ECMAScript is assumed as the grammar.

Element
An element can be one of the following:

An ordinary character that matches the same character in the target sequence.

A wildcard character '.' that matches any character in the target sequence except a
newline.

A bracket expression of the form [expr] , which matches a character or a collation

element in the target sequence that is also in the set defined by the expression expr ,
or of the form [^expr] , which matches a character or a collation element in the
target sequence that isn't in the set defined by the expression expr .

The expression expr can contain any combination of the following:

An individual character. Adds the character to the set defined by expr .

A character range of the form ch1-ch2 . Adds the characters that are represented
by values in the closed range [ch1, ch2] to the set defined by expr .

A character class of the form [:name:] . Adds the characters in the named class to
the set defined by expr .

An equivalence class of the form [=elt=] . Adds the collating elements that are
equivalent to elt to the set defined by expr .

A collating symbol of the form [.elt.] . Adds the collation element elt to the set
defined by expr .

An anchor. Anchor ^ matches the beginning of the target sequence. Anchor $

matches the end of the target sequence.

A capture group of the form (subexpression), or $subexpression$ in basic and grep, which
matches the sequence of characters in the target sequence that is matched by the pattern
between the delimiters.

An identity escape of the form \k , which matches the character k in the target
sequence.

Examples:

a matches the target sequence "a" but doesn't match the target sequences "B" ,

"b" , or "c" .

. matches all the target sequences "a" , "B" , "b" , and "c" .

[b-z] matches the target sequences "b" and "c" but doesn't match the target
sequences "a" or "B" .

[:lower:] matches the target sequences "a" , "b" , and "c" but doesn't match the
target sequence "B" .

(a) matches the target sequence "a" and associates capture group 1 with the
subsequence "a" , but doesn't match the target sequences "B" , "b" , or "c" .

In ECMAScript, basic, and grep, an element can also be a back reference of the form \dd ,
where dd represents a decimal value N that matches a sequence of characters in the target
sequence that is the same as the sequence of characters that is matched by the Nth
capture group.
For example, (a)\1 matches the target sequence "aa" because the first (and only)
capture group matches the initial sequence "a" and then the \1 matches the final
sequence "a" .

In ECMAScript, an element can also be one of the following:

A non-capture group of the form (?: subexpression). Matches the sequence of

characters in the target sequence that is matched by the pattern between the
delimiters.

A limited file format escape of the form \f , \n , \r , \t , or \v . These match a form

feed, newline, carriage return, horizontal tab, and vertical tab, respectively, in the
target sequence.

A positive assert of the form (= subexpression). Matches the sequence of characters

in the target sequence that is matched by the pattern between the delimiters, but
doesn't change the match position in the target sequence.

A negative assert of the form (! subexpression). Matches any sequence of characters

in the target sequence that doesn't match the pattern between the delimiters, and
doesn't change the match position in the target sequence.

A hexadecimal escape sequence of the form \xhh . Matches a character in the target
sequence that is represented by the two hexadecimal digits hh .

A unicode escape sequence of the form \uhhhh . Matches a character in the target
sequence that is represented by the four hexadecimal digits hhhh .

A control escape sequence of the form \ck . Matches the control character that is
named by the character k .

A word boundary assert of the form \b . Matches when the current position in the
target sequence is immediately after a word boundary.

A negative word boundary assert of the form \B . Matches when the current position
in the target sequence isn't immediately after a word boundary.

A dsw character escape of the form \d , \D , \s , \S , \w , \W . Provides a short name

for a character class.

Examples:

(?:a) matches the target sequence "a" , but "(?:a)\1" is invalid because there's
no capture group 1.

(=a)a matches the target sequence "a" . The positive assert matches the initial
sequence "a" in the target sequence and the final "a" in the regular expression
matches the initial sequence "a" in the target sequence.

(!a)a doesn't match the target sequence "a" .

a\b. matches the target sequence "a~" , but doesn't match the target sequence
"ab" .

a\B. matches the target sequence "ab" , but doesn't match the target sequence
"a~" .

In awk, an element can also be one of the following:

A file format escape of the form \\ , \a , \b , \f , \n , \r , \t , or \v . These match a

backslash, alert, backspace, form feed, newline, carriage return, horizontal tab, and
vertical tab, respectively, in the target sequence.

An octal escape sequence of the form \ooo . Matches a character in the target
sequence whose representation is the value represented by the one, two, or three
octal digits ooo .

Repetition
Any element other than a positive assert, a negative assert, or an anchor can be followed
by a repetition count. The most general kind of repetition count takes the form {min,max},
or \{min,max\} in basic and grep. An element that is followed by this form of repetition
count matches at least min successive occurrences and no more than max successive
occurrences of a sequence that matches the element.

For example, a{2,3} matches the target sequence "aa" and the target sequence "aaa" ,

but not the target sequence "a" or the target sequence "aaaa" .

A repetition count can also take one of the following forms:

{min} or \{min} in basic and grep. Equivalent to {min,min}.

{min,} or \{min,\} in basic and grep. Equivalent to {min,unbounded}.

* is equivalent to {0,unbounded}.

Examples:

a{2} matches the target sequence "aa" but not the target sequence "a" or the
target sequence "aaa" .

a{2,} matches the target sequence "aa" , the target sequence "aaa" , and so on,
but doesn't match the target sequence "a" .

a* matches the target sequence "" , the target sequence "a" , the target sequence
"aa" , and so on.

For all grammars except basic and grep, a repetition count can also take one of the
following forms:

? is equivalent to {0,1}.

+ is equivalent to {1,unbounded}.

Examples:

a? matches the target sequence "" and the target sequence "a" , but not the target
sequence "aa" .

a+ matches the target sequence "a" , the target sequence "aa" , and so on, but not
the target sequence "" .

In ECMAScript, all the forms of repetition count can be followed by the character ? which
designates a non-greedy repetition.

Concatenation
Regular expression elements, with or without repetition counts, can be concatenated to
form longer regular expressions. The resulting expression matches a target sequence that
is a concatenation of the sequences that are matched by the individual elements.

For example, a{2,3}b matches the target sequence "aab" and the target sequence
"aaab" , but doesn't match the target sequence "ab" or the target sequence "aaaab" .

Alternation
In all regular expression grammars except basic and grep, a concatenated regular
expression can be followed by the character | (pipe) and another concatenated regular
expression. Any number of concatenated regular expressions can be combined in this
manner. The resulting expression matches any target sequence that matches one or more
of the concatenated regular expressions.

When more than one of the concatenated regular expressions match the target sequence,
ECMAScript chooses the first of the concatenated regular expressions that matches the
sequence as the match, which will be referred to as the first match. The other regular
expression grammars choose the one that achieves the longest match.

For example, ab|cd matches the target sequence "ab" and the target sequence "cd" , but
doesn't match the target sequence "abd" or the target sequence "acd" .

In grep and egrep, a newline character ( \n ) can be used to separate alternations.

Subexpression
In basic and grep, a subexpression is a concatenation. In the other regular expression
grammars, a subexpression is an alternation.

Grammar summary
The following table summarizes the features that are available in the various regular
expression grammars:

ﾉ Expand table

Element basic extended ECMAScript grep egrep awk

alternation using | + + + +

alternation using \n + +

anchor + + + + + +

back reference + + +

bracket expression + + + + + +

capture group using () + + + +

capture group using  + +

control escape sequence +

dsw character escape +

file format escape + +

hexadecimal escape sequence +

identity escape + + + + + +

negative assert +

negative word boundary assert +

non-capture group +

non-greedy repetition +

octal escape sequence +

ordinary character + + + + + +

positive assert +

repetition using {} + + + +

repetition using \{\} + +

repetition using * + + + + + +

repetition using ? and + + + + +

unicode escape sequence +

wildcard character + + + + + +

word boundary assert +

Semantic details
Anchor
An anchor matches a position in the target string, not a character. A ^ matches the
beginning of the target string, and a $ matches the end of the target string.

Back reference
A back reference is a backslash that is followed by a decimal value N. It matches the
contents of the Nth capture group. The value of N must not be more than the number of
capture groups that precede the back reference. In basic and grep, the value of N is
determined by the decimal digit that follows the backslash. In ECMAScript, the value of N is
determined by all the decimal digits that immediately follow the backslash. Therefore, in
basic and grep, the value of N is never more than 9, even if the regular expression has
more than nine capture groups. In ECMAScript, the value of N is unbounded.

Examples:

((a+)(b+))(c+)\3 matches the target sequence "aabbbcbbb" . The back reference

\3 matches the text in the third capture group, that is, the "(b+)" . It doesn't match

the target sequence "aabbbcbb" .

(a)\2 isn't valid.

(b(((((((((a))))))))))\10 has different meanings in basic and in ECMAScript. In

basic, the back reference is \1 . The back reference matches the contents of the first
capture group (that is, the one that begins with (b and ends with the final ) and
comes before the back reference), and the final 0 matches the ordinary character 0 .
In ECMAScript, the back reference is \10 . It matches the tenth capture group, that is,
the innermost one.

Bracket expression
A bracket expression defines a set of characters and collating elements. When the bracket
expression begins with the character ^ the match succeeds if no elements in the set match
the current character in the target sequence. Otherwise, the match succeeds if any one of
the elements in the set matches the current character in the target sequence.

The set of characters can be defined by listing any combination of individual characters,
character ranges, character classes, equivalence classes, and collating symbols.

Capture group
A capture group marks its contents as a single unit in the regular expression grammar and
labels the target text that matches its contents. The label that is associated with each
capture group is a number, which is determined by counting the opening parentheses that
mark capture groups up to and including the opening parenthesis that marks the current
capture group. In this implementation, the maximum number of capture groups is 31.

Examples:

ab+ matches the target sequence "abb" , but doesn't match the target sequence
"abab" .

(ab)+ doesn't match the target sequence "abb" , but matches the target sequence
"abab" .

((a+)(b+))(c+) matches the target sequence "aabbbc" and associates capture

group 1 with the subsequence "aabbb" , capture group 2 with the subsequence "aa" ,
capture group 3 with "bbb" , and capture group 4 with the subsequence "c" .

Character class
A character class in a bracket expression adds all the characters in the named class to the
character set that is defined by the bracket expression. To create a character class, use [:
followed by the name of the class, followed by :] .

Internally, names of character classes are recognized by calling id =

traits.lookup_classname . A character ch belongs to such a class if traits.isctype(ch,

id) returns true. The default regex_traits template supports the class names in the
following table.

ﾉ Expand table

Class Name Description

alnum lowercase letters, uppercase letters, and digits

alpha lowercase letters and uppercase letters

blank space or tab

cntrl the file format escape characters

digit digits

graph lowercase letters, uppercase letters, digits, and punctuation

lower lowercase letters

print lowercase letters, uppercase letters, digits, punctuation, and space

punct punctuation

space space

upper uppercase characters

xdigit digits, a , b , c , d , e , f , A , B , C , D , E , F

d same as digit

s same as space

w same as alnum

Character range
A character range in a bracket expression adds all the characters in the range to the
character set that is defined by the bracket expression. To create a character range, put the
character '-' between the first and last characters in the range. A character range puts all
characters that have a numeric value that is more than or equal to the numeric value of the
first character, and less than or equal to the numeric value of the last character, into the
set. Notice that this set of added characters depends on the platform-specific
representation of characters. If the character '-' occurs at the beginning or the end of a
bracket expression, or as the first or last character of a character range, it represents itself.

Examples:

[0-7] represents the set of characters { 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 }. It matches the

target sequences "0" , "1" , and so on, but not "a" .

On systems that use ASCII character encoding, [h-k] represents the set of characters
{ h , i , j , k }. It matches the target sequences "h" , "i" , and so on, but not "\x8A"
or "0" .

On systems that use EBCDIC character encoding, [h-k] represents the set of
characters { h , i , '\x8A' , '\x8B' , '\x8C' , '\x8D' , '\x8E' , '\x8F' , '\x90' , j , k
} ( h is encoded as 0x88 and k is encoded as 0x92 ). It matches the target sequences
"h" , "i" , "\x8A" , and so on, but not "0" .

[-0-24] represents the set of characters { - , 0 , 1 , 2 , 4 }.

[0-2-] represents the set of characters { 0 , 1 , 2 , - }.

On systems that use ASCII character encoding, [+--] represents the set of characters
{ + , - }.

However, when locale-sensitive ranges are used, the characters in a range are determined
by the collation rules for the locale. Characters that collate after the first character in the
definition of the range and before the last character in the definition of the range are in the
set. The two end characters are also in the set.

Collating element
A collating element is a multi-character sequence that is treated as a single character.

Collating symbol
A collating symbol in a bracket expression adds a collating element to the set that is
defined by the bracket expression. To create a collating symbol, use [. followed by the
collating element, followed by .]

Control escape sequence

A control escape sequence is a backslash followed by the letter 'c' followed by one of the
letters 'a' through 'z' or 'A' through 'Z' . It matches the ASCII control character that
is named by that letter. For example, "\ci" matches the target sequence "\x09" , because
Ctrl+I has the value 0x09 .

DSW character escape

A dsw character escape is a short name for a character class, as shown in the following
table.

ﾉ Expand table

Escape Sequence Equivalent Named Class Default Named Class

\d [[:d:]] [[:digit:]]

\D [^[:d:]] [^[:digit:]]

\s [[:s:]] [[:space:]]

\S [^[:s:]] [^[:space:]]

\w [[:w:]] [a-zA-Z0-9_] *

\W [^[:w:]] [^a-zA-Z0-9_] *

*ASCII character set

Equivalence class
An equivalence class in a bracket expression adds all the characters and collating elements
that are equivalent to the collating element in the equivalence class definition to the set
that is defined by the bracket expression.

To create an equivalence class, use [= followed by a collating element followed by =] .

Internally, two collating elements elt1 and elt2 are equivalent if

traits.transform_primary(elt1.begin(), elt1.end()) ==

traits.transform_primary(elt2.begin(), elt2.end()) .

File format escape

A file format escape consists of the usual C language character escape sequences, \\ , \a ,
\b , \f , \n , \r , \t , \v . These have the usual meanings, that is, backslash, alert,

backspace, form feed, newline, carriage return, horizontal tab, and vertical tab, respectively.
In ECMAScript, \a and \b aren't allowed. ( \\ is allowed, but it's an identity escape, not a
file format escape).

Hexadecimal escape sequence

A hexadecimal escape sequence is a backslash followed by the letter x followed by two
hexadecimal digits ( 0-9a-fA-F ). It matches a character in the target sequence that has the
value that is specified by the two digits.

For example, "\x41" matches the target sequence "a" when ASCII character encoding is
used.

Identity escape
An identity escape is a backslash followed by a single character. It matches that character.
It's required when the character has a special meaning. Using the identity escape removes
the special meaning. For example:

a* matches the target sequence "aaa" , but doesn't match the target sequence
"a*" .

a\* doesn't match the target sequence "aaa" , but matches the target sequence
"a*" .

The set of characters that are allowed in an identity escape depends on the regular
expression grammar, as shown in the following table.

ﾉ Expand table

Grammar Allowed Identity Escape Characters

basic, grep { ( ) { } . [ \ * ^ $ }

extended, { ( ) { . [ \ * ^ $ + ? | }
egrep

awk, plus { " / }

extended

ECMAScript All characters except those that can be part of an identifier. Typically, this includes
letters, digits, $ , _ , and unicode escape sequences. For more information, see the
ECMAScript Language Specification.

Individual character
An individual character in a bracket expression adds that character to the character set that
is defined by the bracket expression. Anywhere in a bracket expression except at the
beginning, a ^ represents itself.

Examples:

[abc] matches the target sequences "a" , "b" , and "c" , but not the sequence "d" .

[^abc] matches the target sequence "d" , but not the target sequences "a" , "b" , or
"c" .

[a^bc] matches the target sequences "a" , "b" , "c" , and "^" , but not the target
sequence "d" .

In all regular expression grammars except ECMAScript, if a ] is the first character that
follows the opening [ or is the first character that follows an initial ^ , it represents itself.

Examples:

[]a is invalid because there's no ] to end the bracket expression.

[]abc] matches the target sequences "a" , "b" , "c" , and "]" , but not the target
sequence "d" .

[^]abc] matches the target sequence "d" , but not the target sequences "a" , "b" ,

"c" , or "]" .

In ECMAScript, use \] to represent the character ] in a bracket expression.

Examples:

[]a matches the target sequence "a" because the bracket expression is empty.

[\]abc] matches the target sequences "a" , "b" , "c" , and "]" but not the target
sequence "d" .

Negative assert
A negative assert matches anything but its contents. It doesn't consume any characters in
the target sequence.

For example, (!aa)(a*) matches the target sequence "a" and associates capture group 1
with the subsequence "a" . It doesn't match the target sequence "aa" or the target
sequence "aaa" .

Negative word boundary assert

A negative word boundary assert matches if the current position in the target string isn't
immediately after a word boundary.

Non-capture group
A non-capture group marks its contents as a single unit in the regular expression grammar,
but doesn't label the target text.

For example, (a)(?:b)*(c) matches the target text "abbc" and associates capture group
1 with the subsequence "a" and capture group 2 with the subsequence "c" .

Non-greedy repetition
A non-greedy repetition consumes the shortest subsequence of the target sequence that
matches the pattern. A greedy repetition consumes the longest. For example, (a+)(a*b)
matches the target sequence "aaab" .

When a non-greedy repetition is used, it associates capture group 1 with the subsequence
"a" at the beginning of the target sequence and capture group 2 with the subsequence

"aab" at the end of the target sequence.

When a greedy match is used, it associates capture group 1 with the subsequence "aaa"

and capture group 2 with the subsequence "b" .

Octal escape sequence

An octal escape sequence is a backslash followed by one, two, or three octal digits (0-7). It
matches a character in the target sequence that has the value that is specified by those
digits. If all the digits are 0 , the sequence is invalid.

For example, \101 matches the target sequence "a" when ASCII character encoding is
used.

Ordinary character
An ordinary character is any valid character that doesn't have a special meaning in the
current grammar.

In ECMAScript, the following characters have special meanings:

^ $ \ . * + ? ( ) [ ] { } |

In basic and grep, the following characters have special meanings:

. [ \

Also in basic and grep, the following characters have special meanings when they're used
in a particular context:

* has a special meaning in all cases except when it's the first character in a regular
expression or the first character that follows an initial ^ in a regular expression, or
when it's the first character of a capture group or the first character that follows an
initial ^ in a capture group.

^ has a special meaning when it's the first character of a regular expression.

$ has a special meaning when it's the last character of a regular expression.

In extended, egrep, and awk, the following characters have special meanings:

. [ \ ( * + ? { |

Also in extended, egrep, and awk, the following characters have special meanings when
they're used in a particular context.

) has a special meaning when it matches a preceding (

^ has a special meaning when it's the first character of a regular expression.

$ has a special meaning when it's the last character of a regular expression.

An ordinary character matches the same character in the target sequence. By default, this
means that the match succeeds if the two characters are represented by the same value. In
a case-insensitive match, two characters ch0 and ch1 match if
traits.translate_nocase(ch0) == traits.translate_nocase(ch1) . In a locale-
sensitive match, two characters ch0 and ch1 match if traits.translate(ch0) ==
traits.translate(ch1) .

Positive assert
A positive assert matches its contents, but doesn't consume any characters in the target
sequence.

Examples:

(=aa)(a*) matches the target sequence "aaaa" and associates capture group 1 with
the subsequence "aaaa" .

(aa)(a*) matches the target sequence "aaaa" and associates capture group 1 with
the subsequence "aa" at the beginning of the target sequence and capture group 2
with the subsequence "aa" at the end of the target sequence.

(=aa)(a)|(a) matches the target sequence "a" and associates capture group 1 with
an empty sequence (because the positive assert failed) and capture group 2 with the
subsequence "a" . It also matches the target sequence "aa" and associates capture
group 1 with the subsequence "aa" and capture group 2 with an empty sequence.

Unicode escape sequence

A unicode escape sequence is a backslash followed by the letter 'u' followed by four
hexadecimal digits ( 0-9a-fA-F ). It matches a character in the target sequence that has the
value that is specified by the four digits. For example, \u0041 matches the target sequence
"a" when ASCII character encoding is used.

Wildcard character
A wildcard character matches any character in the target expression except a newline.

Word boundary
A word boundary occurs in the following situations:

The current character is at the beginning of the target sequence and is one of the
word characters A-Za-z0-9_

The current character position is past the end of the target sequence and the last
character in the target sequence is one of the word characters.

The current character is one of the word characters and the preceding character isn't.

The current character isn't one of the word characters and the preceding character is.

Word boundary assert

A word boundary assert matches when the current position in the target string is
immediately after a word boundary.

Matching and searching

For a regular expression to match a target sequence, the entire regular expression must
match the entire target sequence. For example, the regular expression bcd matches the
target sequence "bcd" but doesn't match the target sequence "abcd" nor the target
sequence "bcde" .

For a regular expression search to succeed, there must be a subsequence somewhere in

the target sequence that matches the regular expression. The search typically finds the left-
most matching subsequence.

Examples:

A search for the regular expression bcd in the target sequence "bcd" succeeds and
matches the entire sequence. The same search in the target sequence "abcd" also
succeeds and matches the last three characters. The same search in the target
sequence "bcde" also succeeds and matches the first three characters.

A search for the regular expression bcd in the target sequence "bcdbcd" succeeds
and matches the first three characters.

If there's more than one subsequence that matches at some location in the target
sequence, there are two ways to choose the matching pattern.

First match chooses the subsequence that was found first when the regular expression is
matched.

Longest match chooses the longest subsequence from the ones that match at that
location. If there's more than one subsequence that has the maximal length, longest match
chooses the one that was found first.

For example, when first match is used, a search for the regular expression b|bc in the
target sequence "abcd" matches the subsequence "b" because the left-hand term of the
alternation matches that subsequence; therefore, first match doesn't try the right-hand
term of the alternation. When longest match is used, the same search matches "bc"
because "bc" is longer than "b" .

A partial match succeeds if the match reaches the end of the target sequence without
failing, even if it hasn't reached the end of the regular expression. Therefore, after a partial
match succeeds, appending characters to the target sequence could cause a later partial
match to fail. However, after a partial match fails, appending characters to the target
sequence can't cause a later partial match to succeed. For example, with a partial match,
ab matches the target sequence "a" but not "ac" .

Format flags
ﾉ Expand table

ECMAScript Format sed Replacement Text

Rules Format
Rules

$& & The character sequence that matches the entire regular
expression: [match[0].first, match[0].second)

$$ $

\& &

$`" (dollar sign The character sequence that precedes the subsequence that
followed by back matches the regular expression: [match.prefix().first,
quote) match.prefix().second)

$'" (dollar sign The character sequence that follows the subsequence that
followed by forward matches the regular expression: [match.suffix().first,
quote) match.suffix().second)

$n \n The character sequence that matches the capture group at

position n , where n is a number between 0 and 9:
[match[n].first, match[n].second)

\\n \n

$nn The character sequence that matches the capture group at

position nn , where nn is a number between 10 and 99:
[match[nn].first, match[nn].second)

 English (United States) Your Privacy Choices ０ Theme Ｓ

Lecture II - Lexical Analysis - Handouts
No ratings yet
Lecture II - Lexical Analysis - Handouts
71 pages
Practical File: Be (Cse) 6 Semester
No ratings yet
Practical File: Be (Cse) 6 Semester
54 pages
Regular Expressions for Developers
No ratings yet
Regular Expressions for Developers
5 pages
Advanced String Patterns: Wolfram Mathematica ® Tutorial Collection
No ratings yet
Advanced String Patterns: Wolfram Mathematica ® Tutorial Collection
40 pages
67163118e98feCCWeek 03lecture05
No ratings yet
67163118e98feCCWeek 03lecture05
62 pages
Lecture02 Scanning 1
No ratings yet
Lecture02 Scanning 1
72 pages
Regex & Parsing for Developers
No ratings yet
Regex & Parsing for Developers
32 pages
Transcribe 2
No ratings yet
Transcribe 2
43 pages
Module 2 Chap1
No ratings yet
Module 2 Chap1
92 pages
Perl 5 Cheat Sheet: Contexts Sigils Arrays Hashes Do
No ratings yet
Perl 5 Cheat Sheet: Contexts Sigils Arrays Hashes Do
1 page
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
No ratings yet
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
18 pages
Unit-2 Lexical Analysis
No ratings yet
Unit-2 Lexical Analysis
36 pages
Season 0 Ep 07 Intro
No ratings yet
Season 0 Ep 07 Intro
51 pages
BBEdit Regex Guide & PCRE Reference
No ratings yet
BBEdit Regex Guide & PCRE Reference
4 pages
Manipulating Text
No ratings yet
Manipulating Text
13 pages
Lexi Cal A Analyzer
No ratings yet
Lexi Cal A Analyzer
38 pages
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
No ratings yet
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
18 pages
Yasm Manual PDF
No ratings yet
Yasm Manual PDF
135 pages
Yasm Manual
No ratings yet
Yasm Manual
135 pages
Regex Cheat Sheet
No ratings yet
Regex Cheat Sheet
10 pages
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
100% (1)
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
18 pages
Regular Expression Howto: A.M. Kuchling
No ratings yet
Regular Expression Howto: A.M. Kuchling
20 pages
Regular Expressions
No ratings yet
Regular Expressions
14 pages
Python How To Regex
No ratings yet
Python How To Regex
19 pages
International Standard: Iso/Iec 14882
No ratings yet
International Standard: Iso/Iec 14882
14 pages
Regex Guide for Programmers
No ratings yet
Regex Guide for Programmers
1 page
1586345305compiler Construction Lecture 1
No ratings yet
1586345305compiler Construction Lecture 1
4 pages
Compiler Construction Basics
No ratings yet
Compiler Construction Basics
16 pages
Regex Notes
No ratings yet
Regex Notes
2 pages
Ch2 CC
No ratings yet
Ch2 CC
47 pages
Re - Regular Expression Operations - Python 3.13.3 Documentation
No ratings yet
Re - Regular Expression Operations - Python 3.13.3 Documentation
28 pages
Regular Expression Syntax: Literals
No ratings yet
Regular Expression Syntax: Literals
5 pages
Regular Expression HOWTO: Guido Van Rossum and The Python Development Team
No ratings yet
Regular Expression HOWTO: Guido Van Rossum and The Python Development Team
18 pages
Regex
No ratings yet
Regex
24 pages
Gplex
No ratings yet
Gplex
96 pages
Regex Cheat Sheet
No ratings yet
Regex Cheat Sheet
12 pages
2 - Compilers (Lexical Analysis)
No ratings yet
2 - Compilers (Lexical Analysis)
60 pages
Pattern Matching - Part 01
No ratings yet
Pattern Matching - Part 01
25 pages
GNU GREP and RIPGREP
No ratings yet
GNU GREP and RIPGREP
111 pages
L02 - Programming - RE PLC
No ratings yet
L02 - Programming - RE PLC
35 pages
Lexical Analyzer 1
No ratings yet
Lexical Analyzer 1
37 pages
c++11 Regex Cheatsheet
100% (1)
c++11 Regex Cheatsheet
1 page
C & DS Notes
No ratings yet
C & DS Notes
83 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
Co Data
No ratings yet
Co Data
76 pages
C++ Regex Functions Guide
No ratings yet
C++ Regex Functions Guide
18 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
Regular Expressions - Pattern Matching
No ratings yet
Regular Expressions - Pattern Matching
107 pages
Module 4 Lex and Yacc
No ratings yet
Module 4 Lex and Yacc
17 pages
Bash Script
No ratings yet
Bash Script
21 pages
WINSEM2024-25 CSI2005 TH VL2024250502429 2024-12-14 Reference-Material-II
No ratings yet
WINSEM2024-25 CSI2005 TH VL2024250502429 2024-12-14 Reference-Material-II
84 pages
Compiler Construction Notes
No ratings yet
Compiler Construction Notes
21 pages
Mozart-Oz Notation
No ratings yet
Mozart-Oz Notation
39 pages
Visual Basic 6 Wikibook
No ratings yet
Visual Basic 6 Wikibook
260 pages
Resolving and Updating Dependencies - Documentation
No ratings yet
Resolving and Updating Dependencies - Documentation
4 pages
Widget - KivyMD 2.0.1.dev0 Documentation
No ratings yet
Widget - KivyMD 2.0.1.dev0 Documentation
3 pages
VersionSelection - wxPyWiki
No ratings yet
VersionSelection - wxPyWiki
7 pages
Resolving Package Dependency Failures - Documentation
No ratings yet
Resolving Package Dependency Failures - Documentation
4 pages
Tabs - KivyMD 2.0.1.dev0 Documentation
No ratings yet
Tabs - KivyMD 2.0.1.dev0 Documentation
33 pages
Provide Configurable Packages Using Traits. - Documentation
No ratings yet
Provide Configurable Packages Using Traits. - Documentation
7 pages
Tkinter - Python Interface To TCL:TK - Python 3.14.0 Documentation
No ratings yet
Tkinter - Python Interface To TCL:TK - Python 3.14.0 Documentation
22 pages
Following Python's Development
No ratings yet
Following Python's Development
4 pages
PEP 0 - Index of Python Enhancement Proposals (PEPs)
No ratings yet
PEP 0 - Index of Python Enhancement Proposals (PEPs)
59 pages
Python Developer's Guide
No ratings yet
Python Developer's Guide
10 pages
BeginnersGuide - Python Wiki
No ratings yet
BeginnersGuide - Python Wiki
5 pages
Elevation - KivyMD 2.0.1.dev0 Documentation
No ratings yet
Elevation - KivyMD 2.0.1.dev0 Documentation
17 pages
Icon Definitions - KivyMD 2.0.1.dev0 Documentation
No ratings yet
Icon Definitions - KivyMD 2.0.1.dev0 Documentation
5 pages
TimePicker - KivyMD 2.0.1.dev0 Documentation
No ratings yet
TimePicker - KivyMD 2.0.1.dev0 Documentation
22 pages
ToggleButton - KivyMD 2.0.1.dev0 Documentation
No ratings yet
ToggleButton - KivyMD 2.0.1.dev0 Documentation
5 pages
Elevation - KivyMD 2.0.1.dev0 Documentation
No ratings yet
Elevation - KivyMD 2.0.1.dev0 Documentation
17 pages
Ripple - KivyMD 2.0.1.dev0 Documentation
No ratings yet
Ripple - KivyMD 2.0.1.dev0 Documentation
14 pages
A First App - Kivy 2.3.1 Documentation
No ratings yet
A First App - Kivy 2.3.1 Documentation
1 page
ScreenManager - KivyMD 2.0.1.dev0 Documentation
No ratings yet
ScreenManager - KivyMD 2.0.1.dev0 Documentation
6 pages
Creating Packages For macOS - Kivy 2.3.1 Documentation
No ratings yet
Creating Packages For macOS - Kivy 2.3.1 Documentation
1 page
Input Management - Kivy 2.3.1 Documentation
No ratings yet
Input Management - Kivy 2.3.1 Documentation
1 page
Properties - Kivy 2.3.1 Documentation
No ratings yet
Properties - Kivy 2.3.1 Documentation
1 page
Events - Kivy 2.3.1 Documentation
No ratings yet
Events - Kivy 2.3.1 Documentation
1 page
Installing Kivy - Kivy 2.3.1 Documentation
No ratings yet
Installing Kivy - Kivy 2.3.1 Documentation
1 page
Overloading The - Operator For Your Own Classes - Microsoft Learn
No ratings yet
Overloading The - Operator For Your Own Classes - Microsoft Learn
1 page
Layouts - Kivy 2.3.1 Documentation
No ratings yet
Layouts - Kivy 2.3.1 Documentation
1 page
Algorithms - Microsoft Learn
No ratings yet
Algorithms - Microsoft Learn
1 page
String and I - O Formatting (Modern C++) - Microsoft Learn
No ratings yet
String and I - O Formatting (Modern C++) - Microsoft Learn
1 page
Constructing Output Stream Objects - Microsoft Learn
No ratings yet
Constructing Output Stream Objects - Microsoft Learn
1 page
Microsoft Foundation Classes (MFC) Application Architecture Classes - Microsoft Learn
No ratings yet
Microsoft Foundation Classes (MFC) Application Architecture Classes - Microsoft Learn
1 page
Tricentis Tosca Class
No ratings yet
Tricentis Tosca Class
22 pages
Automata & Compiler Design Course
No ratings yet
Automata & Compiler Design Course
43 pages
Automata & Regular Expressions Guide
No ratings yet
Automata & Regular Expressions Guide
52 pages
R Functions List
No ratings yet
R Functions List
8 pages
Python Programs Solutions 2025
No ratings yet
Python Programs Solutions 2025
11 pages
MSC Scheme and Syllabus 2024-2025AB-All Semesters
No ratings yet
MSC Scheme and Syllabus 2024-2025AB-All Semesters
57 pages
Self Study Assignment Python II
No ratings yet
Self Study Assignment Python II
4 pages
Intern
No ratings yet
Intern
120 pages
Defining BGP Communities and Extended Communities For Use in Routing Policy Match Conditions
No ratings yet
Defining BGP Communities and Extended Communities For Use in Routing Policy Match Conditions
7 pages
Notes Module3
No ratings yet
Notes Module3
49 pages
Name Asu Id: (10) (10) (10) (10) Total
No ratings yet
Name Asu Id: (10) (10) (10) (10) Total
3 pages
IEEE Conference Template 1
No ratings yet
IEEE Conference Template 1
5 pages
Compiler Design: RE to DFA
No ratings yet
Compiler Design: RE to DFA
23 pages
MongoDBTutorialApril2021 Robo3T
No ratings yet
MongoDBTutorialApril2021 Robo3T
20 pages
Using Regular Expressions in Oracle
No ratings yet
Using Regular Expressions in Oracle
10 pages
Reinhard Wilhelm, Helmut Seidl, Sebastian Hack (Auth.) - Compiler Design - Syntactic and Semantic Analysis-Springer-Verlag Berlin Heidelberg (2013)
No ratings yet
Reinhard Wilhelm, Helmut Seidl, Sebastian Hack (Auth.) - Compiler Design - Syntactic and Semantic Analysis-Springer-Verlag Berlin Heidelberg (2013)
232 pages
Lab Manual
No ratings yet
Lab Manual
23 pages
Question Bank On NLP, COA, ITB
No ratings yet
Question Bank On NLP, COA, ITB
154 pages
JavaScript RegExp Object PDF
No ratings yet
JavaScript RegExp Object PDF
6 pages
?automated Social Media Content Publishing Factory + System Prompt Composition
No ratings yet
?automated Social Media Content Publishing Factory + System Prompt Composition
54 pages
Automata May 2021
No ratings yet
Automata May 2021
5 pages
Server-Side Web Development With PHP
No ratings yet
Server-Side Web Development With PHP
66 pages
Speech and Language Processing An Introduction To Natural Language Processing Computational Linguistics and Speech Recognition 3rd Edition Daniel Jurafsky Download
100% (2)
Speech and Language Processing An Introduction To Natural Language Processing Computational Linguistics and Speech Recognition 3rd Edition Daniel Jurafsky Download
57 pages
80838581
No ratings yet
80838581
9 pages
Regular Expressions & Finite Automata
No ratings yet
Regular Expressions & Finite Automata
4 pages
TM-1221 Clash Manager
0% (1)
TM-1221 Clash Manager
36 pages
January 2023 Javascript
No ratings yet
January 2023 Javascript
20 pages
Lecture 2
No ratings yet
Lecture 2
70 pages
MB Python Manual Final
No ratings yet
MB Python Manual Final
39 pages

Regular Expressions (C++) - Microsoft Learn

Uploaded by

Regular Expressions (C++) - Microsoft Learn

Uploaded by

Microsoft Ignite Register now Ｔ

November 17–21, 2025

Learn Documentation Ｓ Training Ｓ Q&A Ｓ Topics Ｓ  Sign in

Regular expressions (C++)

By default, if no grammar is specified, ECMAScript is assumed. Only one grammar may be

Several flags can also be applied:

icase : Ignore case when matching.

A bracket expression of the form [expr] , which matches a character or a collation

The expression expr can contain any combination of the following:

An individual character. Adds the character to the set defined by expr .

An anchor. Anchor ^ matches the beginning of the target sequence. Anchor $

matches the end of the target sequence.

In ECMAScript, an element can also be one of the following:

A non-capture group of the form (?: subexpression). Matches the sequence of

A limited file format escape of the form \f , \n , \r , \t , or \v . These match a form

A positive assert of the form (= subexpression). Matches the sequence of characters

A negative assert of the form (! subexpression). Matches any sequence of characters

A dsw character escape of the form \d , \D , \s , \S , \w , \W . Provides a short name

(!a)a doesn't match the target sequence "a" .

In awk, an element can also be one of the following:

A file format escape of the form \\ , \a , \b , \f , \n , \r , \t , or \v . These match a

A repetition count can also take one of the following forms:

{min} or \{min} in basic and grep. Equivalent to {min,min}.

{min,} or \{min,\} in basic and grep. Equivalent to {min,unbounded}.

In grep and egrep, a newline character ( \n ) can be used to separate alternations.

Element basic extended ECMAScript grep egrep awk

capture group using () + + + +

capture group using \(\) + +

control escape sequence +

dsw character escape +

file format escape + +

hexadecimal escape sequence +

negative word boundary assert +

octal escape sequence +

repetition using \{\} + +

repetition using ? and + + + + +

unicode escape sequence +

word boundary assert +

((a+)(b+))(c+)\3 matches the target sequence "aabbbcbbb" . The back reference

the target sequence "aabbbcbb" .

(a)\2 isn't valid.

(b(((((((((a))))))))))\10 has different meanings in basic and in ECMAScript. In

((a+)(b+))(c+) matches the target sequence "aabbbc" and associates capture

Internally, names of character classes are recognized by calling id =

Class Name Description

alnum lowercase letters, uppercase letters, and digits

alpha lowercase letters and uppercase letters

blank space or tab

cntrl the file format escape characters

graph lowercase letters, uppercase letters, digits, and punctuation

lower lowercase letters

print lowercase letters, uppercase letters, digits, punctuation, and space

upper uppercase characters

[0-7] represents the set of characters { 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 }. It matches the

[-0-24] represents the set of characters { - , 0 , 1 , 2 , 4 }.

[0-2-] represents the set of characters { 0 , 1 , 2 , - }.

Control escape sequence

DSW character escape

Escape Sequence Equivalent Named Class Default Named Class

*ASCII character set

To create an equivalence class, use [= followed by a collating element followed by =] .

Internally, two collating elements elt1 and elt2 are equivalent if

File format escape

Hexadecimal escape sequence

Grammar Allowed Identity Escape Characters

awk, plus { " / }

[]a is invalid because there's no ] to end the bracket expression.

In ECMAScript, use \] to represent the character ] in a bracket expression.

Negative word boundary assert

"aab" at the end of the target sequence.

and capture group 2 with the subsequence "b" .

Octal escape sequence

In ECMAScript, the following characters have special meanings:

In basic and grep, the following characters have special meanings:

) has a special meaning when it matches a preceding (

Unicode escape sequence