ATOMs:-
\ used to mentioning special (meta) characters e.g \*, \+, \[, \]
\d Any digit (same as [0-9])
\D Any nondigit (same as [^0-9])
\w Any alphanumeric character in upper- or lower-case and
underscore (same as [a-zA-Z0-9_])
\W Any nonalphanumeric or underscore character (same as [^a-zA-
Z0-9_])
\s Any whitespace character (same as [\f\n\r\t\v])
\S Any nonwhitespace character (same as [^\f\n\r\t\v])
\Q....\E any thing in between \Q and \E will be exactly as it is
written
[\b] Backspace
\f Form feed
\n Line feed
\r Carriage return
\t Tab
\v Vertical tab
\cA represent control+A
\x1A represent ASCII value 1A in hexadecimal
\o67 represent ASCII value 67 in octal
Note:- window uses \r\n sequence for end-of-line while unix uses only \n
. any single character e.g ... any three letter word
[] any one character inside it e.g 1[ab?4-6] matches '1a',
'ab', '1?', '14', '15', '16'
^ used inside brackets [] to specify every thing except. ^ must be the
first character inside []
e.g [^0-9a-z] matches everything except numbers and lower case
alphabet
() to define an subexpression so that an repetition operator can be used as
whole.
nesting of subexpression can be done.
e.g zz(ab)+ matches zzab, zzabab, zzabababab
A subexpression value can be referenced later in a the regex using \no.
Where no. denotes position of imidiate subexpression in regex.
e.g. ([ab]+)zz\1 will match azza, bbzzbb, aaazzaaa but not aazzbb
\n can also used in replace string to denote n th substring value.
This is called back-referencing and is very useful in situations where
matching next
characters is dependent on the output what is already matched.
Branching Atoms:-
| use this to branch atoms e.g ((ab)+)|(a+)|(b+)
Specifing Repetetion of Atom(used after an Atom):-
* 0 or more e.g a*
+ 1 or more e.g a+
? 0 or 1 e.g a?
{6} looks for 6 occurrence exactly e.g [abc]{3} matches aaa, bbb, ccc
{2,6} looks for occurences b/w 2 to 6
{3,} specifies at least 3 occurences
* , + and {} are greedy operators and look for longest match.
to make them lazy so that they look for smallest match use ? symbol. like *?,
+?, {3,}?
Specifing Position:-
Specifing Word Boundries:-
\b used to match the start or end of a word. It matches only position
not any character or whitespace
e.g cat\b matches 'cat' at end of word cataaacataacat.
\B To specifically not match at a word boundary
e.g \Bcat\B matches 'cat' not at the start or end of word
cataaacataacat.
Specifieng String Boundries:-
^ matches start of string
$ matches end of string. E.g to list all rar files -- ls -R | grep
“rar$”
(?m) starting a regular expression with this symbol imposes multi mode
match.
in this mode a match is looked with in a line
Look ahead and look behind:- (this is not supported by all
implementations)
positive look-ahead or look-behind looks after or before the regex for text
that matchs the specified pattern.
What ever is looked ahead or behind is not included in match.
(?=) positive look ahead a+(?=@) matches aaa@nchf@aanbfaa@aa Note:
@ is not included in match
(?<=) positive look behind (?<=@)a+ matches aaa@nchf@aanbfaa@aa
negative look-ahead or look-behind looks after or before the regex for text
that does not matches the specified pattern.
(?!) negetive look ahead
(?<!) negative look behind
Condition based check:-(this is not supported by all
implementations)
syntax:- (?(backreference number)regex_if_backreference_found|
regex_if_not_found).
this type of regex can be used to specify which sub-regex to use based on if
previous subexpression is
found or not.
e.g (\d)[a-z]*(?(1)@|!) matches 1ashdjha@, gasfdg!
For Replacement:-
The complete matched string is represented by $& in some implementations
A portion of found string is represented by \no e.g \1
where the no. represents the n th substring
where $& is not support you can put the regex inside () brackets and the
reference the whole string as \1
Some regex implementations support the use of conversion operations via the
metacharacters listed below
\l Convert next character to lowercase
\u Convert next character to uppercase
\L Convert all characters up to \E to lowercase
\U Convert all characters up to \E to uppercase
\E Terminate \L or \U conversion
purpose find string
replace string:
to remove initial line no. from ACL "( *[0-9]+)( +)((permit)|(deny))"
"\3"
remove a complete line and line feed for notepad ++ only in extended mode
only --- use "\r\n<exact key>" replace it with NULL
to remove space b/w lines
to remove leading spaces
to remove trailing spaces