0% found this document useful (0 votes)

13 views37 pages

String Functions and Regular Expressions: Anastasis Oulas Evangelos Pafilis Jacques Lagnel

python string expressions

Uploaded by

bruintjiesivhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views37 pages

String Functions and Regular Expressions: Anastasis Oulas Evangelos Pafilis Jacques Lagnel

python string expressions

Uploaded by

bruintjiesivhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

String Functions and

Regular Expressions

Anastasis Oulas
Evangelos Pafilis
Jacques Lagnel
Strings - Revision


Declaration and value assignment eg.
courseName = 'Introduction to Python'

Concatenation
field = 'computational' + ' ' + 'Biology'

Equality check
stringA == stringB , stringA != stringC

Containment check
stringA in stringB, stringA not in stringB
Relevance to Bioinformatics


In Bioinformatics many of the tasks have to do with sequences

Sequences can be represented as Strings

Elements on sequences are also Strings

Pick your own choice: codons, transcription factor binding sites,
tata-box, restriction enzyme cutting sites, primer sequences,
intron/exon boundary sequences

Data/Result file handling is String manipulation
Strings - Revision


Declaration and value assignment eg.
seqA = 'ACGTC’

Concatenation
seqB = seqA + ‘AAAA’

Equality check
seqA == seqB , seqA != seqB

Containment check
seqA in seqB, seqB not in seqA
Example

seq = 'ACGTCATAATTAGCTGACGAG'
site = 'AATT' #EcoRI cutting site
print('seq contains the site: ', site in seq)
Example

Sometimes you want the position in the sequence:

seq = 'ACGTCATAATTAGCTGACGAG'
site = 'AATT' #EcoRI cutting site
startingPosition = seq.find(site)
print(startingPosition)
find() returns an integer
Example

Sometimes you want the position in the sequence:

0 1 2 3 4 5 6 7.............20
A C G T C A T A A T T A ....G

startingPosition
Example

seqA = 'ACGTCAUUUUUUUU'
seqB = 'ACGT'
if seqA.startswith(seqB):
print('Seq A starts with seq B')

startswith() returns a Boolean (True/False)

Example

seqA = 'ACGTCAUUUUUUUU'
seqB = 'ACGT'
print ('SeqB starts with seqA (t/f):')
print (seqB.startswith(seqA))

startswith() returns a Boolean (True/False)

Example: substring

General view:
substring = mainString[start position:end position]

The character at the 'end position' is NEVER included

Example: substring

Sometimes you want extract pa part of the string:

seq = 'ACGTCATAAT'
Example: substring

Sometimes you want extract pa part of the string:

index: 0123456789
seq = 'ACGTCATAAT'

substr=seq[3:6]
Example: substring

Sometimes you want extract pa part of the string:

index: 0123456789
seq = 'ACGTCATAAT'

substr=seq[3:6]
print (substr)
Gives: TCA
Example: split string

Sometimes you want build a list of words from a string

string1='hello my world'

space as
list1=['hello', 'my', 'world'] Separator

list1=string1.split(' ')
Example: strings join

Sometimes you want the reverse eg you have the list:

list1=['hello', 'my', 'world']
And you want to join the words the a space.
This can be done using join()

listA=['hello', 'my', 'world']

space = ' '
stringA = space.join(listA)
print( stringA )
=> Prints hello my world
String functions
Searching
 str1.startswith(str2[, startpos, [endpos]])
– Returns true if str1 starts with str2
 str1.endswith(str2[, startpos, [endos]])
– Returns true if str1 ends with str2
 str1.find(str2[, startpos[, endpos]])
– Returns the lowest index of str1 at which str2 is found, or −1
if it is not found
 str1.index(str2[, startpos[, endpos]])
– Returns the lowest index of str1 at which str2 is found, or
ValueError if it is not found
String functions - Table
Replacing and changing case

str1.lower()

Returns a copy of the string with all of its
characters converted to lowercase

str1.upper()

Returns a copy of the string with all of its
characters converted to uppercase

str1.replace(oldstr, newstr[, count])

Returns a copy of str1 with all occurrences of the
substring oldstr replaced by the string newstr; if
count is specified, only the first count occurrences
are replaced
String functions
●
str1.join( list1)
●
Returns a string containing the elements of list1 separated
by the str1 string

Testing
 str1.islower()
– Returns true if str1 contains at least one “cased” character
and all of its cased characters are lowercase
 str1.isupper()
– Returns true if str1 contains at least one “cased” character
and all of its cased characters are uppercase
Regular Expressions

 However the requirements of Bioinformatics / Computational

Biology exceed what can be achieved with the available String
functions
 This has given rise to wide usage of Regular Expressions
 What is a Regular Expression and why is it so useful?
Why a regular expression

 'AATT' #EcoRI cutting site

– 'AATT' in sequence

 DsaI possible cutting sites: CC - G or A - T or C - GG


'CCGTGG' in sequence

'CCGCGG' in sequence
Why a regular expression

 'AATT' #EcoRI cutting site

– 'AATT' in sequence

 'AATT' #EcoRI cutting site

– 'AATT' in sequence

 Regular Expressions provide the tool to manage this

“combinatorial explosion”
 A regular expression for DsaI’s site would be:
– 'CC[GA][TC]GG'
[ ] → a set of possible characters at a single position
[GA]: this position will contain either G or A
(ie possible characters)
[TC]: this position will contain either T or C
Regular expressions: Another example

 Find the pattern enzym followed by any character (.)

any number of times incl zero (*)
– Eg Reg Expr: enzym.*

• enzyme
• enzymes
• enzymatic
• enzym
Regular Expression Syntax

 . Any character
 [ ] A character set
 [ACTG] One DNA base character
 [A-Za-z_] One underscore or letter
 [0-9] a digit
Regular Expression Syntax

 \n a newline character
 \d Any digit
 \D Any nondigit
 \s Any whitespace character
– space ' ' , tab \t, new line: \n\r
– ie. shorthand for [ \t\n\r]
 \S Any non-whitespace character
ie. all characters excluding [ \t\n\r]
Regular Expression Syntax

 * Zero or more repetitions of the preceding regular

expression
 ? Zero or one repetitions of the preceding regular
expression
 + One or more repetitions of the preceding regular
expression
 {n} Exactly n repetitions of the preceding regular
expression
 {m,n} Between m and n (inclusive) repetitions of the
preceding regular expression
Regular Expressions

 ( ) : captures a group of characters

eg. (TA) : matches TA in ACGATAGACC

 Can be combined with the repetition quantifiers

eg. (TA){3} : matches TATATA in ACGATATATACC
The re Module

 import re
 By writing the above statement in a python script the
re (regular expression) module is imported and
ready to use.
 You are now able to use the methods of the
regular expression library in your algorithm
Example code

import re
seq = 'ACCGTGGCAAATTTCCACGGACGAG'
regEx = 'CC[GA][TC]GG'
aList = re.findall(regEx,seq)
for i in range(0,len(aList)):
print('Found', aList[i])
 finds any DsaI cutting sites in the given sequence
 The result is : Found CCGTGG
Found CCACGG
Example code

import re
text = 'this is a test paragraph'
regEx = 'A\stest'
aList = re.findall(regEx,text)
if len(aList) == 0:
print('Not Found')
 Checks whether the sentence contains “A text”
The result is : Not Found
Example code

import re
seq = 'ACGATATACC'
regEx = '(TA){2}'
aList = re.findall(regEx,seq)
if len(aList) > 0:
print('Found TATA')
else:
print ('Not Found')

The result is : ?
Example code

import re
seq = 'ACGATATACC'
regEx = '(TA){3}'
aList = re.findall(regEx,seq)
if len(aList) > 0:
print('Found TATA')
else:
print ('Not Found')

The result is : ?
Substitution example: re.sub()

Regular expressions can be used to perform substitutions

eg replace all T’s or C’s with a “-” in a sequence

seq = 'AAACGCTGTCAATACAATCTTCTTTCGGATTTGAATTTTGCAAAGCTGCC'
regEx = '[TC]'
replacement = '-'
new_seq = re.sub(regEx , replacement , seq )
print (new_seq )

The result is :

AAA-G--G--AA-A-AA---------GGA---GAA----G-AAAG--G--
findall() function of the re module

re.sub(regEx , replacement , targetString )

Returns a string with all the matches of the regEx in the targetString
substituted with the replacement string

re.findall(pattern, target[, flags])

Returns a list of all nonoverlapping matches in target as a list of
strings or, if the pattern included groups, a list of lists of strings

([, flags]: it is optional and exceeds the scope of this tutorial,

however if required we would be happy to explain you more)

More functions are available at http://docs.python.org/library/re.html

File I/O – reading from a file

F = open('C:\Documents and
Settings\Administrator\Desktop\User\Pyt
hon course\Seq.txt', 'r')
F is the file handler allows you to have
a direct link to the contents of the
file – Seq.txt
lines = F.readlines() # command reads all
the lines of the file into a list
called lines
F.close()
File I/O – writing to a file

F = open('C:\Documents and
Settings\Administrator\Desktop\User\Pyt
hon course\Out.txt', 'w')
F is the file handler allows you to have
a direct link to the contents of the
file – Seq.txt
F.write('Hello') # command writes the
word “Hello” in the file Out.txt
F.close()

Regex for Genomics & Programming
No ratings yet
Regex for Genomics & Programming
38 pages
Unit 3 2
No ratings yet
Unit 3 2
3 pages
Regular Expression Howto: A.M. Kuchling
No ratings yet
Regular Expression Howto: A.M. Kuchling
20 pages
Regular Expression 01
No ratings yet
Regular Expression 01
48 pages
Howto Regex
No ratings yet
Howto Regex
17 pages
Advanced Python Programming - Lesson No.002
No ratings yet
Advanced Python Programming - Lesson No.002
20 pages
Day3.3 StringManipulation
No ratings yet
Day3.3 StringManipulation
43 pages
Howto Regex
No ratings yet
Howto Regex
19 pages
3.III-Regular Expression Part-I & II 2022-23
No ratings yet
3.III-Regular Expression Part-I & II 2022-23
14 pages
Python 4
No ratings yet
Python 4
128 pages
Regular Expression HOWTO: Guido Van Rossum and The Python Development Team
No ratings yet
Regular Expression HOWTO: Guido Van Rossum and The Python Development Team
18 pages
Python Strings & Regex Guide
No ratings yet
Python Strings & Regex Guide
23 pages
06 Regex
No ratings yet
06 Regex
31 pages
Python Regex for Beginners
No ratings yet
Python Regex for Beginners
101 pages
UNIT4
No ratings yet
UNIT4
67 pages
Unit-Iii Chapter-1: Python Strings Revisited
100% (2)
Unit-Iii Chapter-1: Python Strings Revisited
49 pages
Unit 4 Regular Expression
No ratings yet
Unit 4 Regular Expression
16 pages
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
No ratings yet
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
18 pages
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
No ratings yet
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
18 pages
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
100% (1)
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
18 pages
Regular Expressions in Python
No ratings yet
Regular Expressions in Python
4 pages
Regular Expressions
No ratings yet
Regular Expressions
9 pages
Python Programming: Reema Thareja
No ratings yet
Python Programming: Reema Thareja
27 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
App Dev Using Python-Chapter 3
No ratings yet
App Dev Using Python-Chapter 3
16 pages
Howto Regex PDF
No ratings yet
Howto Regex PDF
20 pages
Regular
No ratings yet
Regular
9 pages
String and Text Processing
No ratings yet
String and Text Processing
8 pages
String Python
No ratings yet
String Python
8 pages
Special IMp
No ratings yet
Special IMp
77 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
Python How To Regex
No ratings yet
Python How To Regex
19 pages
Python 2nd Internal
No ratings yet
Python 2nd Internal
3 pages
PP - Module-3 Notes
No ratings yet
PP - Module-3 Notes
56 pages
2B Strings
No ratings yet
2B Strings
21 pages
Advanced Python Programming Practical Manual
No ratings yet
Advanced Python Programming Practical Manual
29 pages
Python Course Content
No ratings yet
Python Course Content
8 pages
Strings in Python
No ratings yet
Strings in Python
5 pages
Manipulating Text With Regular Expression in Python
No ratings yet
Manipulating Text With Regular Expression in Python
4 pages
Summary Python 1
No ratings yet
Summary Python 1
36 pages
Regular Expression
No ratings yet
Regular Expression
39 pages
Python Regex Guide
No ratings yet
Python Regex Guide
20 pages
Python Regex Examples
No ratings yet
Python Regex Examples
24 pages
Python Re
No ratings yet
Python Re
18 pages
Strings - 1
No ratings yet
Strings - 1
15 pages
Python Day - 3
No ratings yet
Python Day - 3
10 pages
Python Tutorial 27
No ratings yet
Python Tutorial 27
3 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
205 pages
Unit-3 - Regular Expression
No ratings yet
Unit-3 - Regular Expression
15 pages
Wa0003.
No ratings yet
Wa0003.
30 pages
Python Regex & NLTK Guide
No ratings yet
Python Regex & NLTK Guide
53 pages
Python RegEx Guide: Metacharacters & Functions
No ratings yet
Python RegEx Guide: Metacharacters & Functions
104 pages
String Function
No ratings yet
String Function
6 pages
Functions
No ratings yet
Functions
3 pages
CHPT 2 - Laplace Transforms
No ratings yet
CHPT 2 - Laplace Transforms
78 pages
CHPT 4 - Transfer Function
No ratings yet
CHPT 4 - Transfer Function
4 pages
6b. Sterilisation
No ratings yet
6b. Sterilisation
29 pages
Geankoplis Solution Manual
No ratings yet
Geankoplis Solution Manual
267 pages
MBAN Brochure 2325
No ratings yet
MBAN Brochure 2325
8 pages
NLP Sample Questions-Stu
No ratings yet
NLP Sample Questions-Stu
4 pages
Auma - Copyright in The Digital Age - An Assessment of Keny20Framework
No ratings yet
Auma - Copyright in The Digital Age - An Assessment of Keny20Framework
121 pages
Pisces Aries Cusp
No ratings yet
Pisces Aries Cusp
3 pages
State Space to Transfer Function
No ratings yet
State Space to Transfer Function
2 pages
CHHINDWARA
No ratings yet
CHHINDWARA
4 pages
Building Construction Materials and Techniques P. Purushothama Raj Instant Download Full Chapters
100% (2)
Building Construction Materials and Techniques P. Purushothama Raj Instant Download Full Chapters
141 pages
Computer Science Second Half Book
No ratings yet
Computer Science Second Half Book
4 pages
Reproductive Health in One Shot
No ratings yet
Reproductive Health in One Shot
53 pages
217 Energy Management
No ratings yet
217 Energy Management
1 page
Customer Segments Ver.2.0
No ratings yet
Customer Segments Ver.2.0
51 pages
John Dowland - A Fancy
100% (1)
John Dowland - A Fancy
4 pages
Lithogeochemistry Interpretation
100% (5)
Lithogeochemistry Interpretation
45 pages
Quartal Jazz Piano Voicings PDF
0% (2)
Quartal Jazz Piano Voicings PDF
2 pages
5e Inventory Tracking Sheet (Auto-Calcuating)
No ratings yet
5e Inventory Tracking Sheet (Auto-Calcuating)
1 page
Banking Abbriviation
No ratings yet
Banking Abbriviation
4 pages
Assignment Report
No ratings yet
Assignment Report
4 pages
Celebrity Edge Balcony Door Parts Offer
No ratings yet
Celebrity Edge Balcony Door Parts Offer
3 pages
Assignment On Bata India LTD: Presented by A.Ch - Kalyani (M.B.A)
No ratings yet
Assignment On Bata India LTD: Presented by A.Ch - Kalyani (M.B.A)
20 pages
Labour Act Cap 73
No ratings yet
Labour Act Cap 73
69 pages
CEGP013091: 49.248.216.238 21/07/2022 08:51:56 Static-238
No ratings yet
CEGP013091: 49.248.216.238 21/07/2022 08:51:56 Static-238
9 pages
Unit 21 Film Editing Laa Lab
No ratings yet
Unit 21 Film Editing Laa Lab
53 pages
Emplay Inc. Opportunities for 2025 Grads
No ratings yet
Emplay Inc. Opportunities for 2025 Grads
1 page
HW Fluido II
No ratings yet
HW Fluido II
33 pages
Potentiometer
No ratings yet
Potentiometer
7 pages
4.1.2 Lab - Troubleshoot Eigrp For Ipv4
0% (1)
4.1.2 Lab - Troubleshoot Eigrp For Ipv4
5 pages
Descriptive Paragraphs
No ratings yet
Descriptive Paragraphs
16 pages
DM 16 GraphRepresentation Isomophism
No ratings yet
DM 16 GraphRepresentation Isomophism
36 pages
SWA Catalogue 4.0
No ratings yet
SWA Catalogue 4.0
191 pages
English Grammar Guide for Students
No ratings yet
English Grammar Guide for Students
5 pages