0% found this document useful (0 votes)

32 views28 pages

Unit - 4 Regex

The document provides an overview of regular expressions (regex) in Python, detailing their definition, applications, and how to create them using the re module. It covers various functions such as re.match(), re.search(), re.findall(), and re.finditer(), explaining their purposes and differences. Additionally, it discusses character classes, predefined character classes, and regex metacharacters, along with practical examples of usage.

Uploaded by

SHADOW GAMING

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views28 pages

Unit - 4 Regex

Uploaded by

SHADOW GAMING

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

REGULAR EXPRESSIONS AND OOP CONCEPTS

UNIT – 4 | I SEM | MCA |2024-26 BATCH | RIT

REGEX IN PYTHON

 What is Regular expression?

 Applications of regular expressions
 How to create Regular expressions in Python?
REGULAR EXPRESSION

 A Regular Expression (regex) is a sequence of characters that defines a search

pattern.

 It is commonly used for string matching, searching, and replacing text.

 It is a code or way of describing what kind of text is being looked for in a bigger
chunk of text.

 Python provides the re module to work with regular expressions.

APPLICATIONS OF REGULAR EXPRESSIONS

 Data validations
 Ex: mobile number validation, email validation, etc
 Data extraction
 Specific info from data can be extracted
 Data cleaning, web scrapping
 Functionalities of
 ctrl+f and replace, grep commands (UNIX), LIKE operator in SQL
 To create translators – compilers, interpreters, assemblers
 For syntax analysis and lexical analysis
 Password Policies
 Used in NLP to identify specific patterns in data.
BASIC SEARCH FUNCTIONS

 search()
 match()
 finditer()
 findall()
re.match()

 Purpose: search for a pattern at the beginning of a string.

 Syntax: re.match (pattern, string, flags = 0)
 pattern: The regular expression pattern you want to search for
 string: input string in which you want to search for pattern
 Returns: if a match is found at the beginning of the string, it returns a
match object; otherwise it returns None.
Using the re Module in Python

Python’s re module provides powerful tools for regex operations.

re.match() – Matches the Beginning of a String. It only checks the start of the string.

import re .group() → Returns the actual match.

.span() → Returns the start and end positions
pattern = r"Hello" of the match.
text = "Hello, world!"

match = re.match(pattern, text) if match:

if match: print("Matched text:", match.group())
print("Match found!") # Returns matched text ("Hello")
else: print("Start and End positions:",
print("No match") match.span())
# Output: Match found! # Returns (0, 5)
What is a Raw String (r"")?

• In Python, the r before a string (like r"^\d$") makes it a raw string literal.

• In a normal string, backslashes (\) are treated as escape characters

(e.g., "\n" for a newline, "\t" for a tab).

• A raw string (r"") tells Python not to interpret backslashes as escape

sequences.

• In regex, we often use \d, \s, \b, etc., where \ has a special meaning.
Using r"" prevents Python from treating \ as an escape character.

• Always use r"" for regex patterns to avoid unexpected errors.

re.search()

 Purpose: The search() function in the re module scans a string for the
first occurrence of a pattern.
 Syntax: re.search (pattern, data)
 pattern: The regular expression pattern you want to search for
 data: input string in which you want to search for pattern
 Returns: match object if match is found or None if no match found
re.search() – Finds the First Match Anywhere

Unlike match(), search() checks the entire string.

import re

pattern = r"world"
text = "Hello, world!"

match = re.search(pattern, text)

if match:
print("Match found!")

# Output: Match found!

In Python, you can use regular expressions in two ways:
1. Directly as a string pattern

You pass a raw string directly to functions like re.search(), re.match(), etc.

2. Using Regular Expression Objects

You first compile the pattern using re.compile(), creating a reusable regex
object. This is useful for repeated searches.
Using Regular Expression Objects

import re

pattern = re.compile(r"World") # Compile the regex pattern

text = "Hello, World!"

match = pattern.search(text) # Using the compiled object

if match:
print("Found:", match.group())
re.finditer()

Purpose: re.finditer() returns an iterator yielding match objects for all

non-overlapping occurrences of a pattern in a string.
 Syntax: re.finditer (pattern, data, flags = 0)
 pattern: The regular expression pattern you want to search for
 data: input string in which you want to search for pattern
 Returns: iterator object containing match info.
re.finditer() – Returns Matches as an Iterator

import re import re

pattern = r“Hello" pattern = re.compile('ab', re.IGNORECASE)

text = "Hello, world!" data = 'abaababa'
match_iter = re.finditer(pattern, data)
matches = re.finditer(pattern, text) count = 0
for match in matches: for match in match_iter:
print(match.group()) count += 1
print(f"start:{match.start()},
# Output: Hello, world end:{match.end()}, element:{match.group()}")
print("total:", count)

Useful when handling large data, as it yields results lazily.

re.findall()

Purpose: re.findall() returns a list of all non-overlapping matches of a

pattern in a string.
 Syntax: re.findall (pattern, data, flags = 0)
 pattern: The regular expression pattern you want to search for
 data: input string in which you want to search for pattern
 Returns: A list containing all matching substrings
re.findall() – Returns All Matches in a List

import re

pattern = r“[0-9]” # Find all numbers

text = "My number is 123 and my friend's is 456"

matches = re.findall(pattern, text)

print(matches) # Output: ['1', '2', '3', '4', '5', '6']

import re

pattern = re.compile('ab', re.IGNORECASE)

data = ‘abaababa’

match_list = re.findall(pattern, data)

print(match_list) # Output: ['ab', 'ab', 'ab']
DIFFERENCE BETWEEN findall() AND finditer()

Both re.findall() and re.finditer() are used to search for all occurrences of a
pattern in a string, but they differ in how they return results.

Feature re.findall() re.finditer()

Return Type Returns a list of matching Returns an iterator yielding

substrings. match objects.
Memory Usage Stores all matches in a list Uses an iterator (more memory-
(higher memory usage for efficient).
large data).

Accessing Match Info Only returns matched Provides full match details
substrings, no details like (start, end, groups).
position.
Use Case When only matched strings When additional match details
are needed. (index, groups) are needed.
Understanding Non-Overlapping Matches in re.findall() and finditer()

In re.findall() and finditer(), matches are non-overlapping, meaning

once a match is found, the search continues after the match, rather
than inside it.

import re

data = "ababab"
matches = re.findall(r"aba", data)
print(matches)

#Output: ['aba']
CHARACTER CLASS IN PYTHON REGEX

 A character class typically refers to a set of characters that you can

define using regular expressions
 Character classes are used to specify range or group of characters you
want to search in data
 These classes help in defining flexible patterns for text searching and
validation.
Character Classes in Python Regex

Square Brackets [ ]
•Used to define a set of characters.
•Example: [abc] matches 'a', 'b', or 'c'.
Range of Characters
•[a-z] → Matches any lowercase letter (a to z).
•[A-Z] → Matches any uppercase letter (A to Z).
•[0-9] → Matches any digit (0 to 9).
Negation [^ ] (Caret Inside Brackets)
•Matches anything except the characters inside the brackets.
•Example: [^0-9] matches anything except digits.
Predefined Character Classes
•\d → Matches any digit (equivalent to [0-9]).
•\D → Matches any non-digit character (equivalent to [^0-9]).
•\w → Matches any word character (letters, digits, underscore) [a-zA-Z0-9_].
•\W → Matches any non-word character (opposite of \w).
•\s → Matches any whitespace character (space, tab, newline).
•\S → Matches any non-whitespace character.
Special Character Classes
•[aeiou] → Matches any vowel.
•[13579] → Matches any odd digit.
•[02468] → Matches any even digit.
Regex Meaning Example Matches Does Not Match
Pattern
\b Word boundary (start or \bcat\b "The cat is here" → "caterpillar",
end of a word) "cat" "wildcat" →

\A Matches only at the start of \AHello "Hello world" → "world Hello" →

a string

\Z Matches only at the end of tutorial\Z "This is a tutorial" → "tutorial on regex"

a string →

. Matches every character

Find digits in given data

import re import re

pattern = r'[0-9]' pattern = r'[0-9]'

data = "The price is $." data = "The price is $100."

match_list = re.findall(pattern, data) match_iter = re.finditer(pattern, data)

if match_list: for match in match_iter:

print("digits present") print(match)
else:
print("not present")
Table 1: Basic Regex Metacharacters

Symbol Description
. Matches any character except a newline
^ Matches the start of a string
$ Matches the end of a string
Matches 0 or more occurrences of the preceding
*
character
Matches 1 or more occurrences of the preceding
+
character
? Matches 0 or 1 occurrence of the preceding character
{n} Matches exactly n occurrences
{n,} Matches n or more occurrences
{n,m} Matches between n and m occurrences
\ Escape character (e.g., \. matches a literal dot .)
Table 2: Character Classes and Groups

Pattern Description
\d Matches any digit (0-9)
\D Matches any non-digit character
\w Matches any word character (a-z, A-Z, 0-9, _)
\W Matches any non-word character
\s Matches any whitespace (space, tab, newline)
\S Matches any non-whitespace character
[abc] Matches any one of a, b, or c
[^abc] Matches anything except a, b, or c
Matches word boundaries (e.g., \bword\b
\b
matches the word "word" exactly)
re.sub() – Replaces Text in a String

import re

text = "Python is fun!"

new_text = re.sub(r"Python", "Java", text)

print(new_text)

# Output: Java is fun!

https://regexr.com/

https://www.kaggle.com/code/albeffe/regex-exercises-
solutions/notebook
Character classes
. any character except newline
\w\d\s word, digit, whitespace
\W\D\S not word, digit, whitespace
[abc] any of a, b, or c
[^abc] not a, b, or c
[a-g] character between a & g
Anchors
^abc$ start / end of the string
\b word boundary
Escaped characters
\. \* \\ escaped special characters
\t \n \r tab, linefeed, carriage return
Groups
(abc) capture group
Quantifiers & Alternation
a* a+ a? 0 or more, 1 or more, 0 or 1
a{5} a{2,} exactly five, two or more
a{1,3} between one & three
a+? a{2,}? match as few as possible
ab|cd match ab or cd

PP - Module-3 Notes
No ratings yet
PP - Module-3 Notes
56 pages
Unit 3 Python
No ratings yet
Unit 3 Python
72 pages
9 RegEx
No ratings yet
9 RegEx
57 pages
Ii MSC Python Unit V Notes
No ratings yet
Ii MSC Python Unit V Notes
18 pages
Unit 2
No ratings yet
Unit 2
69 pages
Unit-3 Python
No ratings yet
Unit-3 Python
72 pages
UNIT4
No ratings yet
UNIT4
67 pages
Python Unit 3
No ratings yet
Python Unit 3
46 pages
9 RegEx
No ratings yet
9 RegEx
57 pages
Unit 4 Regular Expression
No ratings yet
Unit 4 Regular Expression
16 pages
9python Simple Character Matches
No ratings yet
9python Simple Character Matches
19 pages
Full Python Regex Questions Detailed
No ratings yet
Full Python Regex Questions Detailed
4 pages
Python Regular Expression
100% (1)
Python Regular Expression
31 pages
Lecture 11 Regular Expressions
No ratings yet
Lecture 11 Regular Expressions
17 pages
Regular Expressions - Regexes in Python (Part 1) - Real Python
No ratings yet
Regular Expressions - Regexes in Python (Part 1) - Real Python
44 pages
Python Assignment Date: 08-11-2021: Name-Navjeet Kaur Sap ID-500076160 Roll No - R134219065
No ratings yet
Python Assignment Date: 08-11-2021: Name-Navjeet Kaur Sap ID-500076160 Roll No - R134219065
3 pages
3.III-Regular Expression Part-I & II 2022-23
No ratings yet
3.III-Regular Expression Part-I & II 2022-23
14 pages
Regular Expression 01
No ratings yet
Regular Expression 01
48 pages
13B RegExp
No ratings yet
13B RegExp
38 pages
Regular Expression
No ratings yet
Regular Expression
17 pages
Unit7 RegularExpressionpdf 2023 10 17 09 16 29
No ratings yet
Unit7 RegularExpressionpdf 2023 10 17 09 16 29
17 pages
Regular Expressions in Python
No ratings yet
Regular Expressions in Python
12 pages
Python Complete Unit 3
No ratings yet
Python Complete Unit 3
40 pages
Python Re
No ratings yet
Python Re
18 pages
Regular Expression L
No ratings yet
Regular Expression L
20 pages
PL7 - Completo
No ratings yet
PL7 - Completo
344 pages
17 - Regular Expression
No ratings yet
17 - Regular Expression
20 pages
Lecture 9 Python
No ratings yet
Lecture 9 Python
8 pages
UNIT-4 (Regular Expressions)
No ratings yet
UNIT-4 (Regular Expressions)
25 pages
Regular Exp
No ratings yet
Regular Exp
10 pages
RegEx in Python
No ratings yet
RegEx in Python
5 pages
Regular
No ratings yet
Regular
9 pages
Lec 06 - Regular Expression
No ratings yet
Lec 06 - Regular Expression
19 pages
Regular Expressions - Regexes in Python (Part 2) - Real Python
No ratings yet
Regular Expressions - Regexes in Python (Part 2) - Real Python
27 pages
Regular Expression
No ratings yet
Regular Expression
21 pages
Module5 RegularExpressions
No ratings yet
Module5 RegularExpressions
10 pages
Untitled
No ratings yet
Untitled
53 pages
Day-13 Python Regx
No ratings yet
Day-13 Python Regx
11 pages
Lecture 04
No ratings yet
Lecture 04
18 pages
Python Regex Cheat Sheet
No ratings yet
Python Regex Cheat Sheet
29 pages
Data Analysis Using Python Lab Ex3
No ratings yet
Data Analysis Using Python Lab Ex3
27 pages
Chapter 10
No ratings yet
Chapter 10
28 pages
Python Reg Expressions
No ratings yet
Python Reg Expressions
8 pages
Regular Expressions Python
No ratings yet
Regular Expressions Python
26 pages
Regular Expressions
No ratings yet
Regular Expressions
9 pages
Lecture 6 Re Basics
No ratings yet
Lecture 6 Re Basics
12 pages
Manipulating Text With Regular Expression in Python
No ratings yet
Manipulating Text With Regular Expression in Python
4 pages
Regular Expression 4
No ratings yet
Regular Expression 4
16 pages
Python Course: Session 6b - Regular Expressions
No ratings yet
Python Course: Session 6b - Regular Expressions
11 pages
Regular Expressions: Regular Expression Syntax in Python
No ratings yet
Regular Expressions: Regular Expression Syntax in Python
11 pages
Chapter - 11 - Regular Expressions
100% (1)
Chapter - 11 - Regular Expressions
10 pages
Unit-3 - Regular Expression
No ratings yet
Unit-3 - Regular Expression
15 pages
Python Regex Cheatsheet With Examples: Re Module Functions
No ratings yet
Python Regex Cheatsheet With Examples: Re Module Functions
1 page
Python Reg Expressions PDF
No ratings yet
Python Reg Expressions PDF
8 pages
Adjective Order
No ratings yet
Adjective Order
5 pages
Regex Case Interview Guide
No ratings yet
Regex Case Interview Guide
10 pages
Python Regex: Re - Match, Re - Search, Re - Findall With Example
No ratings yet
Python Regex: Re - Match, Re - Search, Re - Findall With Example
10 pages
Python RegEx
No ratings yet
Python RegEx
11 pages
Regular Expressions: Regular Expressions Are A Powerful Tool For Various Kinds of String Manipulation
No ratings yet
Regular Expressions: Regular Expressions Are A Powerful Tool For Various Kinds of String Manipulation
4 pages
IT Book Faaz Bhai Ki Book
No ratings yet
IT Book Faaz Bhai Ki Book
250 pages
Unlock 2e RW1 Tests - Mid-Level
No ratings yet
Unlock 2e RW1 Tests - Mid-Level
4 pages
TES Notes - The Emigree
No ratings yet
TES Notes - The Emigree
7 pages
Literary Criticism Powerpoint
100% (2)
Literary Criticism Powerpoint
18 pages
Python Regex
No ratings yet
Python Regex
8 pages
Python Regular Expressions
No ratings yet
Python Regular Expressions
14 pages
WT1s Pie Charts For Ss Printing
No ratings yet
WT1s Pie Charts For Ss Printing
11 pages
Syllabus-24-25 (Corrected) LKG
No ratings yet
Syllabus-24-25 (Corrected) LKG
31 pages
Transforming Simple Sentences To Complex and Compound Sentences - Grammar and Speech Enhancement
No ratings yet
Transforming Simple Sentences To Complex and Compound Sentences - Grammar and Speech Enhancement
27 pages
Chapter 3: Prefixes: Medterm
No ratings yet
Chapter 3: Prefixes: Medterm
18 pages
ملوک هرمز
No ratings yet
ملوک هرمز
7 pages
Handouts Cristine Bobier
No ratings yet
Handouts Cristine Bobier
29 pages
Profile
No ratings yet
Profile
12 pages
English F3 2025 Cat
No ratings yet
English F3 2025 Cat
4 pages
On Complex Predicates in Brazilian Portuguese: Universidade Estadual de Campinas
No ratings yet
On Complex Predicates in Brazilian Portuguese: Universidade Estadual de Campinas
21 pages
Utah Many Comanche: Yetan Moqui San Chemehuevi The Them Them Mean
No ratings yet
Utah Many Comanche: Yetan Moqui San Chemehuevi The Them Them Mean
20 pages
Gmail - 티티믹? - Train for TOPIK Ⅱ Success
No ratings yet
Gmail - 티티믹? - Train for TOPIK Ⅱ Success
2 pages
7TH Form Booklet
No ratings yet
7TH Form Booklet
18 pages
Unit - 03 - Extra - Grammar - Exercises Juan Jose
70% (10)
Unit - 03 - Extra - Grammar - Exercises Juan Jose
4 pages
Communicative Language Teaching
No ratings yet
Communicative Language Teaching
8 pages
Gerund / To-Infinitive / Bare Infinitive
No ratings yet
Gerund / To-Infinitive / Bare Infinitive
5 pages
Kielende Isungu Treacy Cover Letter
No ratings yet
Kielende Isungu Treacy Cover Letter
3 pages
Instruction To Learners Literature
No ratings yet
Instruction To Learners Literature
5 pages
Ivr Basic
No ratings yet
Ivr Basic
12 pages
Active Pasive
No ratings yet
Active Pasive
2 pages
Morden Community Driven Immigration Initiative
No ratings yet
Morden Community Driven Immigration Initiative
5 pages
Asian Journal of Multidisciplinary Studies: Kamala Das As A Postmodern Indian English Poet
No ratings yet
Asian Journal of Multidisciplinary Studies: Kamala Das As A Postmodern Indian English Poet
3 pages
Masonry and The Cabala - Gematria As A Key To The Secrets of FR
No ratings yet
Masonry and The Cabala - Gematria As A Key To The Secrets of FR
5 pages
Table of Specifications: Item No. Language Competencies Question Levels Question Type/S No. of Items R U AP AN E C
No ratings yet
Table of Specifications: Item No. Language Competencies Question Levels Question Type/S No. of Items R U AP AN E C
6 pages
English File: 1 Quick Test
No ratings yet
English File: 1 Quick Test
2 pages
Ian Talks Regex A-Z
From Everand
Ian Talks Regex A-Z
Ian Eress
No ratings yet

Unit - 4 Regex

Uploaded by

Unit - 4 Regex

Uploaded by

REGULAR EXPRESSIONS AND OOP CONCEPTS

UNIT – 4 | I SEM | MCA |2024-26 BATCH | RIT

 What is Regular expression?

 A Regular Expression (regex) is a sequence of characters that defines a search

 It is commonly used for string matching, searching, and replacing text.

 Python provides the re module to work with regular expressions.

 Purpose: search for a pattern at the beginning of a string.

Python’s re module provides powerful tools for regex operations.

import re .group() → Returns the actual match.

match = re.match(pattern, text) if match:

• In a normal string, backslashes (\) are treated as escape characters

• A raw string (r"") tells Python not to interpret backslashes as escape

• Always use r"" for regex patterns to avoid unexpected errors.

Unlike match(), search() checks the entire string.

match = re.search(pattern, text)

# Output: Match found!

2. Using Regular Expression Objects

pattern = re.compile(r"World") # Compile the regex pattern

text = "Hello, World!"

match = pattern.search(text) # Using the compiled object

Purpose: re.finditer() returns an iterator yielding match objects for all

pattern = r“Hello" pattern = re.compile('ab', re.IGNORECASE)

Useful when handling large data, as it yields results lazily.

Purpose: re.findall() returns a list of all non-overlapping matches of a

pattern = r“[0-9]” # Find all numbers

matches = re.findall(pattern, text)

pattern = re.compile('ab', re.IGNORECASE)

match_list = re.findall(pattern, data)

Feature re.findall() re.finditer()

Return Type Returns a list of matching Returns an iterator yielding

In re.findall() and finditer(), matches are non-overlapping, meaning

 A character class typically refers to a set of characters that you can

\A Matches only at the start of \AHello "Hello world" → "world Hello" →

\Z Matches only at the end of tutorial\Z "This is a tutorial" → "tutorial on regex"

. Matches every character

pattern = r'[0-9]' pattern = r'[0-9]'

match_list = re.findall(pattern, data) match_iter = re.finditer(pattern, data)

if match_list: for match in match_iter:

text = "Python is fun!"

new_text = re.sub(r"Python", "Java", text)

# Output: Java is fun!

You might also like