[go: up one dir, main page]

0% found this document useful (0 votes)
93 views32 pages

Ge Rex

Regular expressions (RegEx) allow complex pattern matching in strings. The re module provides functions like search(), findall(), split() to work with RegEx in Python. Patterns use special characters like ., *, ?, \s to match characters, words, spaces etc. Match objects return details of matches like span position. RegEx helps check, extract, replace substrings in a string through simple and powerful patterns.

Uploaded by

Niladri Editz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views32 pages

Ge Rex

Regular expressions (RegEx) allow complex pattern matching in strings. The re module provides functions like search(), findall(), split() to work with RegEx in Python. Patterns use special characters like ., *, ?, \s to match characters, words, spaces etc. Match objects return details of matches like span position. RegEx helps check, extract, replace substrings in a string through simple and powerful patterns.

Uploaded by

Niladri Editz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Python RegEx

❑ A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern.

❑ RegEx can be used to check if a string contains the specified search pattern.

❑ RegEx Module

✓ Python has a built-in package called re, which can be used to work with Regular Expressions.

✓ Import the re module: import re

➢ import re #Check if the string starts with "The" and ends with "Spain":

txt = "The rain in Spain"


Output: x = re.search("^The.*Spain$", txt)

YES! We have a match! if x:


print("YES! We have a match!")
else:
print("No match")
RegEx Functions

❑ The re module offers a set of functions that allows us to search a string for a match:

❑ Function Description

✓ findall Returns a list containing all matches


✓ search Returns a Match object if there is a match anywhere in the string
✓ split Returns a list where the string has been split at each match
✓ sub Replaces one or many matches with a string

import re
Output: #Return a list containing every occurrence of "ai":
['ai', 'ai'] txt = "The rain in Spain"
x = re.findall("ai", txt)
print(x)

Note: The list contains the matches in the order they are found. If no matches are found, an empty list is returned.
import re

#The search() function returns a Match object:


#Split at each white-space character:
txt = "The rain in Spain"
x = re.search("ai", txt)
import re
print(x) #<_sre.SRE_Match object; span=(5, 7), match='ai'>
txt = "The rain in Spain"
x = re.split("\s", txt)
#Search for the first white-space character in the string: print(x)

import re #Output: ['The', 'rain', 'in', 'Spain']

txt = "The rain in Spain"


x = re.search("\s", txt)

print("The first white-space character is located in position:", x.start())

#Output: The first white-space character is located in position: 3


#Split the string only at the first occurrence:

import re

txt = "The rain in Spain"


x = re.split("\s", txt, 1)
print(x) #Replace the first 2 occurrences:

#output: ['The', 'rain in Spain'] import re

txt = "The rain in Spain"


#Replace every white-space character with the number 9: x = re.sub("\s", "9", txt, 2)
print(x)
import re
#output: The9rain9in Spain
txt = "The rain in Spain"
x = re.sub("\s", "9", txt)
print(x)

#output: The9rain9in9Spain
❑ The Match object has properties and methods used to retrieve information about the search, and the result:

✓ .span() returns a tuple containing the start-, and end positions of the match.
✓ .string returns the string passed into the function
✓ .group() returns the part of the string where there was a match Print the string passed into the function:

Print the position (start- and end-position) of the first match import re
occurrence.
txt = "The rain in Spain"
The regular expression looks for any words that starts with x = re.search(r"\bS\w+", txt)
an upper case "S": print(x.string) #The rain in Spain

import re Print the part of the string where there was a match.

txt = "The rain in Spain" The regular expression looks for any words that
x = re.search(r"\bS\w+", txt) starts with an upper case "S":
print(x.span()) #(12, 17)
import re

txt = "The rain in Spain"


x = re.search(r"\bS\w+", txt)
print(x.group()) #Spain
❑ Metacharacters

✓ Metacharacters are characters with a special meaning:

import re

txt = "The rain in Spain"

#Find all lower case characters alphabetically between "a" and "m":

x = re.findall("[a-m]", txt)

print(x) #['h', 'e', 'a', 'i', 'i', 'a', 'i']


import re

txt = "That will be 59 dollars"

#Find all digit characters:

x = re.findall("\d", txt)

print(x) #['5', '9']


import re

txt = "hello world"

#Search for a sequence that starts with "he", followed by two (any) characters, and an "o":

x = re.findall("he..o", txt)

print(x) #['hello']
import re

txt = "hello world"

#Check if the string starts with 'hello':

x = re.findall("^hello", txt)
if x:
print("Yes, the string starts with 'hello'")
else:
print("No match")

Output: Yes, the string starts with 'hello'


txt = "hello world"

#Check if the string ends with 'world':

x = re.findall("world$", txt)
if x:
print("Yes, the string ends with 'world'")
else:
print("No match")

Output: Yes, the string ends with 'world'


import re

txt = "The rain in Spain falls mainly in the plain!"

#Check if the string contains "ai" followed by 0 or more "x" characters:

x = re.findall("aix*", txt)

print(x)

if x:
print("Yes, there is at least one match!")
else:
print("No match")

Output: ['ai', 'ai', 'ai', 'ai’] Yes, there is at least one match!
import re

txt = "The rain in Spain falls mainly in the plain!"

#Check if the string contains "ai" followed by 1 or more "x" characters:

x = re.findall("aix+", txt)

print(x)

if x:
print("Yes, there is at least one match!")
else:
print("No match")

Output: [] No match
import re

txt = "The rain in Spain falls mainly in the plain!"

#Check if the string contains "a" followed by exactly two "l" characters:

x = re.findall("al{2}", txt)

print(x)

if x:
print("Yes, there is at least one match!")
else:
print("No match")

Output: ['all’] Yes, there is at least one match!


import re

txt = "The rain in Spain falls mainly in the plain!"

#Check if the string contains either "falls" or "stays":

x = re.findall("falls|stays", txt)

print(x)

if x:
print("Yes, there is at least one match!")
else:
print("No match")

Output: ['falls’] Yes, there is at least one match!


❑ Special Sequences

✓ A special sequence is a \ followed by one of the characters in the list below, and has a special meaning:

import re

txt = "The rain in Spain"

#Check if the string starts with "The": Output:

x = re.findall("\AThe", txt) ['The']


print(x) Yes, there is a match!
if x:
print("Yes, there is a match!")
else:
print("No match")
import re import re

txt = "The rain in Spain" txt = "The rain in Spain"

#Check if "ain" is present at the beginning of a WORD: #Check if "ain" is present at the end of a WORD:

x = re.findall(r"\bain", txt) x = re.findall(r"ain\b", txt)

print(x) print(x)

if x: if x:
print("Yes, there is at least one match!") print("Yes, there is at least one match!")
else: else:
print("No match") print("No match")
import re import re

txt = "The rain in Spain" txt = "The rain in Spain"

#Check if "ain" is present, but NOT at the beginning of a #Check if "ain" is present, but NOT at the end of a
word: word:

x = re.findall(r"\Bain", txt) x = re.findall(r"ain\B", txt)

print(x) print(x)
if x: if x:
print("Yes, there is at least one match!") print("Yes, there is at least one match!")
else: else:
print("No match") print("No match")

#['ain', 'ain’] Yes, there is at least one match! # [] No match


import re

txt = "The rain in Spain"

#Check if the string contains any digits (numbers from 0-9):

x = re.findall("\d", txt)

print(x)

if x:
print("Yes, there is at least one match!")
else:
print("No match")

#[] No match
import re

txt = "The rain in Spain"

#Return a match at every no-digit character:

x = re.findall("\D", txt)

print(x)

if x:
print("Yes, there is at least one match!")
else:
print("No match")

#['T', 'h', 'e', ' ', 'r', 'a', 'i', 'n', ' ', 'i', 'n', ' ', 'S', 'p', 'a', 'i', 'n’] Yes, there is at least one match!
import re

txt = "The rain in Spain"

#Return a match at every white-space character:

x = re.findall("\s", txt)

print(x)

if x:
print("Yes, there is at least one match!")
else:
print("No match")

#[' ', ' ', ‘ ‘] Yes, there is at least one match!


import re

txt = "The rain in Spain"

#Return a match at every NON white-space character:

x = re.findall("\S", txt)

print(x)

if x:
print("Yes, there is at least one match!")
else:
print("No match")

#['T', 'h', 'e', 'r', 'a', 'i', 'n', 'i', 'n', 'S', 'p', 'a', 'i', 'n’] Yes, there is at least one match!
import re

txt = "The rain in Spain"

#Return a match at every word character (characters from a to Z, digits from 0-9, and the underscore _
character):

x = re.findall("\w", txt)

print(x)

if x:
print("Yes, there is at least one match!")
else:
print("No match")

#['T', 'h', 'e', 'r', 'a', 'i', 'n', 'i', 'n', 'S', 'p', 'a', 'i', 'n’] Yes, there is at least one match!
import re

txt = "The rain in Spain"

#Return a match at every NON word character (characters NOT between a and Z. Like "!", "?" white-space etc.):

x = re.findall("\W", txt)

print(x)

if x:
print("Yes, there is at least one match!")
else:
print("No match")

# [‘ ', ‘ ', ‘ ‘ ‘] Yes, there is at least one match!


import re

txt = "The rain in Spain"

#Check if the string ends with "Spain":

x = re.findall("Spain\Z", txt)

print(x)

if x:
print("Yes, there is a match!")
else:
print("No match")

#['Spain’] Yes, there is a match!


❑ Sets

✓ A set is a set of characters inside a pair of square brackets [] with a special meaning:

import re

txt = "The rain in Spain"

#Check if the string has any a, r, or n characters:


Output:
x = re.findall("[arn]", txt)
['r', 'a', 'n', 'n', 'a', 'n']
print(x) Yes, there is at least one match!

if x:
print("Yes, there is at least one match!")
else:
print("No match")
import re

txt = "The rain in Spain"

#Check if the string has any characters between a and n:

x = re.findall("[a-n]", txt) Output:

print(x) ['h', 'e', 'a', 'i', 'n', 'i', 'n', 'a', 'i', 'n']
Yes, there is at least one match!
if x:
print("Yes, there is at least one match!")
else:
print("No match")
import re

txt = "The rain in Spain"

#Check if the string has other characters than a, r, or n:

x = re.findall("[^arn]", txt) Output:

print(x) ['T', 'h', 'e', ' ', 'i', ' ', 'i', ' ', 'S', 'p', 'i']
Yes, there is at least one match!
if x:
print("Yes, there is at least one match!")
else:
print("No match")
import re

txt = "The rain in Spain"

#Check if the string has any 0, 1, 2, or 3 digits:

x = re.findall("[0123]", txt) Output:

print(x) []
No match
if x:
print("Yes, there is at least one match!")
else:
print("No match")
import re

txt = "8 times before 11:45 AM"

#Check if the string has any digits: Output:

x = re.findall("[0-9]", txt) ['8', '1', '1', '4', '5']


Yes, there is at least one match!
print(x)

if x:
print("Yes, there is at least one match!")
else:
print("No match")
import re

txt = "8 times before 11:45 AM"

#Check if the string has any two-digit numbers, from 00 to


59:
Output:
x = re.findall("[0-5][0-9]", txt)
['11', '45']
print(x) Yes, there is at least one match!

if x:
print("Yes, there is at least one match!")
else:
print("No match")
import re

txt = "8 times before 11:45 AM"

#Check if the string has any characters from a to z lower case, and A to Z upper case:

x = re.findall("[a-zA-Z]", txt)

print(x)

if x:
print("Yes, there is at least one match!")
else:
print("No match")

#['t', 'i', 'm', 'e', 's', 'b', 'e', 'f', 'o', 'r', 'e', 'A', 'M’] Yes, there is at least one match!
import re

txt = "8 times before 11:45 AM"

#Check if the string has any + characters:


Output:
x = re.findall("[+]", txt)
[]
print(x) No match

if x:
print("Yes, there is at least one match!")
else:
print("No match")

You might also like