[go: up one dir, main page]

0% found this document useful (0 votes)
72 views16 pages

Regular Expression 4

The document discusses regular expressions in Python. It covers matching patterns, functions like search(), findall(), split(), and sub(). It discusses special characters like [], ., ^, $, *, etc. and how they are used to match patterns. It provides examples of using regular expressions to match words starting with a particular letter, replacing substrings, and verifying phone numbers.

Uploaded by

patilpatil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views16 pages

Regular Expression 4

The document discusses regular expressions in Python. It covers matching patterns, functions like search(), findall(), split(), and sub(). It discusses special characters like [], ., ^, $, *, etc. and how they are used to match patterns. It provides examples of using regular expressions to match words starting with a particular letter, replacing substrings, and verifying phone numbers.

Uploaded by

patilpatil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Regular Expression

• Regular Expression
In the real world, string parsing in most programming languages is handled by
regular expression. Regular expression in a python programming language is a
method used for matching text pattern.
The “re” module which comes with every python installation provides regular
expression support.
In python, a regular expression search is typically written as:
match = re.search(pattern, string)
The re.search() method takes two arguments, a regular expression pattern
and a string and searches for that pattern within the string. If the pattern is
found within the string, search() returns a match object or None otherwise.
So in a regular expression, given a string, determine whether that string
matches a given pattern, and, optionally, collect substrings that contain
relevant information.
• Matching patterns
• Regular expressions are complicated mini-language. They rely on special
characters to match unknown strings, but let's start with literal characters,
such as letters, numbers, and the space character, which always match
themselves. Let's see a basic example:
import re
search_string = "TutorialsPoint"
pattern = "Tutorials"
match = re.match(pattern, search_string)
#If-statement after search() tests if it succeeded
if match:
print("regex matches: ", match.group())
else:
print('pattern not found')

Output-- regex matches: Tutorials


• RegEx Functions
• The re module offers a set of functions that allows us to search a string for
a match:

Functi Description
on
findall Returns a list containing all matches

search Returns a Match object if there is a match anywhere in the string

split Returns a list where the string has been split at each match

sub Replaces one or many matches with a string


• Metacharacters
• Metacharacters are characters with a special meaning:

Character Description Example


[] A set of characters "[a-m]"

\ Signals a special sequence (can also be used to escape special characters) "\d"

. Any character (except newline character) "he..o"

^ Starts with "^hello"

$ Ends with "planet$"

* Zero or more occurrences "he.*o"

+ One or more occurrences "he.+o"

? Zero or one occurrences "he.?o"

{} Exactly the specified number of occurrences "he.{2}o"

| Either or "falls|stays"

() Capture and group


import re
txt = "The rain in Spain"
#Find all lower case characters alphabetically between "a" and "m":
x = re.findall("[a-m]", txt)
print(x)

import re
txt = "That will be 59 dollars"
#Find all digit characters:
x = re.findall("\d", txt)
print(x)

import re
txt = "hello planet"
#Check if the string ends with 'planet‘
x = re.findall("planet$", txt)
if x:
print("Yes, the string ends with 'planet'")
else:
print("No match")
• Special Sequences
• A special sequence is a \ followed by one of the characters in the list
below, and has a special meaning:
Character Description Example
\A Returns a match if the specified characters are at the beginning of the string "\AThe"

\b Returns a match where the specified characters are at the beginning or at the end of a word r"\bain"
(the "r" in the beginning is making sure that the string is being treated as a "raw string") r"ain\b"

\B Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of r"\Bain"
a word r"ain\B"
(the "r" in the beginning is making sure that the string is being treated as a "raw string")

\d Returns a match where the string contains digits (numbers from 0-9) "\d"

\D Returns a match where the string DOES NOT contain digits "\D"

\s Returns a match where the string contains a white space character "\s"

\S Returns a match where the string DOES NOT contain a white space character "\S"

\w Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, "\w"
and the underscore _ character)
\W Returns a match where the string DOES NOT contain any word characters "\W"

\Z Returns a match if the specified characters are at the end of the string "Spain\Z"
import re
txt = "The rain in Spain"
#Check if the string starts with "The":
x = re.findall("\AThe", txt)
print(x)
if x:
print("Yes, there is a match!")
else:
print("No match")

import re
txt = "The rain in Spain"
#Check if "ain" is present, but NOT at the beginning of a word:
x = re.findall(r"\Bain", txt)
print(x)
if x:
print("Yes, there is at least one match!")
else:
print("No match")
• Sets
• A set is a set of characters inside a pair of square brackets [] with a special
meaning:
Set Description
[arn] Returns a match where one of the specified characters (a, r, or n) are present

[a-n] Returns a match for any lower case character, alphabetically between a and n

[^arn] Returns a match for any character EXCEPT a, r, and n

[0123] Returns a match where any of the specified digits (0, 1, 2, or 3) are present

[0-9] Returns a match for any digit between 0 and 9

[0-5][0-9] Returns a match for any two-digit numbers from 00 and 59

[a-zA-Z] Returns a match for any character alphabetically between a and z, lower case OR upper case

[+] In sets, +, *, ., |, (), $,{} has no special meaning, so [+] means: return a match for any + character in
the string
import re
txt = "The rain in Spain"
#Check if the string has any characters between a and n:
x = re.findall("[a-n]", txt)
print(x)
if x:
print("Yes, there is at least one match!")
else:
print("No match")
-----------------------------------------------------------------------------------------------------------
import re
txt = "8 times before 11:45 AM"
#Check if the string has any digits:
x = re.findall("[0-9]", txt)
print(x)
if x:
print("Yes, there is at least one match!")
else:
print("No match")
• The findall() Function
• The findall() function returns a list containing all matches.
import re
#Return a list containing every occurrence of "ai":
txt = "The rain in Spain“
x = re.findall("ai", txt)
print(x)
• ----------------------------------------------------------------------------------------------
• The list contains the matches in the order they are found.
• If no matches are found, an empty list is returned:
import re
txt = "The rain in Spain"
#Check if "Portugal" is in the string:
x = re.findall("Portugal", txt)
print(x)
if (x):
print("Yes, there is at least one match!")
else:
print("No match")
• The search() Function
• The search() function searches the string for a match, and returns a Match
object if there is a match.
• If there is more than one match, only the first occurrence of the match will
be returned:
import re
txt = "The rain in Spain"
x = re.search("\s", txt)
print("The first white-space character is located in position:", x.start())
---------------------------------------------------------------------------------------------------
If no matches are found, the value None is returned:
import re
txt = "The rain in Spain"
x = re.search("Portugal", txt)
print(x)
• The split() Function
• The split() function returns a list where the string has been split at each
match:
import re
#Split the string at every white-space character:
txt = "The rain in Spain"
x = re.split("\s", txt)
print(x)
• --------------------------------------------------------------------------------------------------
• You can control the number of occurrences by specifying
the maxsplit parameter:
• Split the string only at the first occurrence:
import re
#Split the string at the first white-space character:
txt = "The rain in Spain"
x = re.split("\s", txt, 1)
print(x)
match word with perticular pattern
import re

Str="sat,hat,mat,pat"

allStr=re.findall("[shmp]at",Str)
# speciafically word start with s h m p and end with at

for i in allStr:
print(i) Output--- sat hat mat pat
----------------------------------------
Match Series of range of character

import re
Str="Sat, hat,mat,pat"

someStr=re.findall("[h-m]at",Str)
for i in someStr:
print(i) Output--- hat mat
----------------------------------
someStr=re.findall("[^h-m]at",Str)# everything apart from h-m
Replace a string
import re
item= 'hat,rat,mat,pat'
regex=re.compile("[r]at")
item=regex.sub("item",item) # for replacing
print(item) Output-- hat item mat pat
--------------------------------------------
Verify Phone Number
import re
# \w [a-z A-Z 0-9]
# \W [^a-z A-Z 0-9]
phn = "412-555-1212"
if re.search("\w{3}-\w{3}-\w{4}",phn)
print("It is a phone number")
-----------------------------------------------------------
if re.search("\d{3}-\d{3}-\d{4}",phn)
-----------------------------------------
Verify Name
import re
if re.search("\w{2,20}\s\w{2,20}", "Sachin Tendulkar"):
print("Name is valid")
#{first name range} \s—space {last name range}
-----------------------------------------------
verify email address
import re
email = "sk@aol.com md@.com @seo.com dc@.com"
print("Email Matches :",len(re.findall("[\w._%+-]{1,20}@[\w.-]{2,20}.[A-Za-
z]{2,3}",email)))
Output-- sk@aol.com

You might also like