@@ -10,47 +10,101 @@ This module implements regular expression operations. Regular expression
10
10
syntax supported is a subset of CPython ``re `` module (and actually is
11
11
a subset of POSIX extended regular expressions).
12
12
13
- Supported operators are:
13
+ Supported operators and special sequences are:
14
14
15
- ``'.' ``
15
+ ``. ``
16
16
Match any character.
17
17
18
- ``' [...]' ``
18
+ ``[...] ``
19
19
Match set of characters. Individual characters and ranges are supported,
20
20
including negated sets (e.g. ``[^a-c] ``).
21
21
22
- ``'^' ``
22
+ ``^ ``
23
23
Match the start of the string.
24
24
25
- ``'$' ``
25
+ ``$ ``
26
26
Match the end of the string.
27
27
28
- ``'?' ``
29
- Match zero or one of the previous entity .
28
+ ``? ``
29
+ Match zero or one of the previous sub-pattern .
30
30
31
- ``'*' ``
32
- Match zero or more of the previous entity .
31
+ ``* ``
32
+ Match zero or more of the previous sub-pattern .
33
33
34
- ``'+' ``
35
- Match one or more of the previous entity .
34
+ ``+ ``
35
+ Match one or more of the previous sub-pattern .
36
36
37
- ``'??' ``
37
+ ``?? ``
38
+ Non-greedy version of ``? ``, match zero or one, with the preference
39
+ for zero.
38
40
39
- ``'*?' ``
41
+ ``*? ``
42
+ Non-greedy version of ``* ``, match zero or more, with the preference
43
+ for the shortest match.
40
44
41
- ``'+?' ``
45
+ ``+? ``
46
+ Non-greedy version of ``+ ``, match one or more, with the preference
47
+ for the shortest match.
42
48
43
- ``'|' ``
44
- Match either the LHS or the RHS of this operator.
49
+ ``| ``
50
+ Match either the left-hand side or the right-hand side sub-patterns of
51
+ this operator.
45
52
46
- ``' (...)' ``
53
+ ``(...) ``
47
54
Grouping. Each group is capturing (a substring it captures can be accessed
48
55
with `match.group() ` method).
49
56
50
- **NOT SUPPORTED **: Counted repetitions (``{m,n} ``), more advanced assertions
51
- (``\b ``, ``\B ``), named groups (``(?P<name>...) ``), non-capturing groups
52
- (``(?:...) ``), etc.
57
+ ``\d ``
58
+ Matches digit. Equivalent to ``[0-9] ``.
53
59
60
+ ``\D ``
61
+ Matches non-digit. Equivalent to ``[^0-9] ``.
62
+
63
+ ``\s ``
64
+ Matches whitespace. Equivalent to ``[ \t-\r] ``.
65
+
66
+ ``\S ``
67
+ Matches non-whitespace. Equivalent to ``[^ \t-\r] ``.
68
+
69
+ ``\w ``
70
+ Matches "word characters" (ASCII only). Equivalent to ``[A-Za-z0-9_] ``.
71
+
72
+ ``\W ``
73
+ Matches non "word characters" (ASCII only). Equivalent to ``[^A-Za-z0-9_] ``.
74
+
75
+ ``\ ``
76
+ Escape character. Any other character following the backslash, except
77
+ for those listed above, is taken literally. For example, ``\* `` is
78
+ equivalent to literal ``* `` (not treated as the ``* `` operator).
79
+ Note that ``\r ``, ``\n ``, etc. are not handled specially, and will be
80
+ equivalent to literal letters ``r ``, ``n ``, etc. Due to this, it's
81
+ not recommended to use raw Python strings (``r"" ``) for regular
82
+ expressions. For example, ``r"\r\n" `` when used as the regular
83
+ expression is equivalent to ``"rn" ``. To match CR character followed
84
+ by LF, use ``"\r\n" ``.
85
+
86
+ **NOT SUPPORTED **:
87
+
88
+ * counted repetitions (``{m,n} ``)
89
+ * named groups (``(?P<name>...) ``)
90
+ * non-capturing groups (``(?:...) ``)
91
+ * more advanced assertions (``\b ``, ``\B ``)
92
+ * special character escapes like ``\r ``, ``\n `` - use Python's own escaping
93
+ instead
94
+ * etc.
95
+
96
+ Example::
97
+
98
+ import ure
99
+
100
+ # As ure doesn't support escapes itself, use of r"" strings is not
101
+ # recommended.
102
+ regex = ure.compile("[\r\n]")
103
+
104
+ regex.split("line1\rline2\nline3\r\n")
105
+
106
+ # Result:
107
+ # ['line1', 'line2', 'line3', '', '']
54
108
55
109
Functions
56
110
---------
0 commit comments