0% found this document useful (0 votes)

151 views103 pages

Heikki Perl-Bioperl

The document provides an introduction to Perl and BioPerl. It discusses what Perl is, why it should be used, and how to get started with the language. Key topics covered include Perl's history and uses, basic Perl program structure, variable types like scalars, arrays and hashes, operators, control structures, and default variables. The document aims to give readers an overview of the Perl programming language and a starting point for learning more.

Uploaded by

cherry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

151 views103 pages

Heikki Perl-Bioperl

Uploaded by

cherry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 103

Introduction to Perl and BioPerl

Institut Pasteur Tunis

22 March 2007

Heikki Lehväslaiho, SANBI

This work is licensed under the Creative Commons Attribution-ShareAlike 2.0
South Africa License. To view a copy of this license, visit
http://creativecommons.org/licenses/by-sa/2.0/za/
or send a letter to
Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.

Introduction to Perl and Bioperl

What is Perl
Perl is a programming language
Born from a combination of C & shell scripting for system administration
Larry Wall’s background in linguistics led to Perl borrowing ideas from
natural language.

“There is more than one way to do it”

The glue that holds the internet together.
Oldest scripting language
No separate compilation step needed
The line noise of the programming languages.
/^[^#]+\s*(?:\d+\w+\s*)[2,3]$/;

Introduction to Perl and Bioperl

Why use Perl
Easy to learn
Cross platform
Very strong community support
CPAN, perlmonks, Perl User Groups
Provides API to things that do not have API
Excellent documentation
see man perl

Introduction to Perl and Bioperl

The Camel Book

Introduction to Perl and Bioperl

Beginning Perl
open source bool
by Simon Cozens

Downloadable at
http://www.perl.org/books/beginning-perl/
and locally

Introduction to Perl and Bioperl

Perl Documentation
perldoc perltoc
perldoc CGI
perldoc Bio::PrimarySeq
perldoc -f open
http://perldoc.perl.org/
http://www.cpan.org/
http://qa.perl.org/phalanx/100/
http://perlmonks.org/

Introduction to Perl and Bioperl

Programming Perl
Best Practices
Aimed at Perl 5.8.x
Shortcuts
Code Re-Use
Maintainable Development
Shortest Path between two points

Introduction to Perl and Bioperl

Perl program structure
shebang #!
#!/usr/bin/perl
# hello.pl
directives (use) use warnings;

keywords # print a message

print “Hello world!\n”;
functions
statements ;
escape sequences: “\t\n” > chmod 755 hello.pl
> hello.pl
Hello world!
white space >

comments

Introduction to Perl and Bioperl

Variable types
Scalars - Start with a $
Strings, Integers, Floating Point Numbers, References to other variables
Arrays - Start with a @
Zero based index
Contain an ordered list of Scalars
Hashes - Start with %
Associative Arrays wihout order
Key => Value

Introduction to Perl and Bioperl

Scalars
Any single value
#!/usr/bin/perl
Automatic type casting # print_sum.pl
use warnings;
string interpolation use strict;

only in double quoted strings print “Give a number ”;

my $num = <STDIN>;
In Perl, context is everything! my $num2 = '0.5';
my $float = $num + 0.5;
my $res = 'Sum';

# print the sum

print “$res = $float\n”;

Introduction to Perl and Bioperl

Pragmas
‘use strict;’
Forces variable declaration
Needed for maintainable code
Scoping
Garbage collection
‘use warnings;’
Forces variables initialization
Warns on deprecated syntax
Useful for sanity checking
in desperate situations: 'no warnings;'

Introduction to Perl and Bioperl

undef
Q: What is the value of variable,
#!/usr/bin/perl
if the value has not been # print_sum.pl
assigned? use warnings;
use strict;
A: undef my $num;
# print
not defined, void print “$num\n”;

use warnings will warn if you try

to access undefined variables

Introduction to Perl and Bioperl

Operators
Function String Numeric
Assignment = =

Equality eq, ne ==, !=

Comparison lt,le, gt, ge <, <=, >, >=

Concatenation . N/A

Repetition x N/A

Basic Math N/A +,-,*,/

Modulus, Exponent N/A %, ^

Special Sorting cmp <=>

Introduction to Perl and Bioperl

Operators
normal matematical precedence
operators force the context on variables!
More:
boolean operators ( and, &&, or, || )
operating and assinging at once ($a += $b;)
autoincrement and autodecrement ($count++, ++$c;)

Introduction to Perl and Bioperl

Arrays
Implement stacks, lists, queues
Creation
@a = (); # literal empty list
@b= qw(a t c g); # white space limited list
functions: e.g. push @b, 'u'; $first = shift @b;

shift() push()
0 1 2 3 4
unshift() pop()

Introduction to Perl and Bioperl

Working with arrays
#!/usr/bin/perl
Special variable $#alph # counting.pl
use warnings;
index of last element use strict;

Special variable $_ my $alph = 'atgc';

print length($alph), “\n”;
my @alph =
split() and join(), foreach() split '', $alph;
print “$#alph\n”;
Enclosure print scalar(@alph), “\n”;
my $c = 0;
Scalar context gives foreach (@alph) {
print “$c: ”, $alph[$c], “$_\n”;
array length $c++;
my $alph = 'augc';
Access array elements }
print “$alph: $c\n”;
as scalars
note: @ -> $

Introduction to Perl and Bioperl

Variable Scope
Lexical Scope
Declared with my()
Limits scope to containing block
Widest scope: the file in which its declared
Package Scope
Default scope
Declared with our()
Permanent scope

Introduction to Perl and Bioperl

Working with arrays
Ranges, an easy way to generate lists:
(1 .. 6), ( 8 .. -2 ), ('a' .. 'z')
Can be used a slices
@three = reverse sort @months[ -1..1 ];
Months with 31 days:
@months[0,3,5, 7-8, 9, 11]
Swaping values without intermediate variables:
($a, $b) = ($b, $a);

Introduction to Perl and Bioperl

Hashes
Special Initialization
my %hash = ( ‘key1’ => ‘value1’ );
could be written ( ‘key1’, ‘value1’, ‘key2’, ‘value2’ )
Hash keys are unique!
Access scalar elements inside Hashes like this:
my $value = $hash{key};
Hashes auto-vivify!
$hash{test1} = 'value'; # creates an entry with key test1;
When you use hashes all the time, you have mastered perl!
hash references are even better, but we'll talk about them later

Introduction to Perl and Bioperl

Hash functions
my $is_there = exists $hash{key};
returns 1 if the key exists, undef if not.
does not auto-vivify.
my $has_value = defined $hash{key};
return 1 if the key has value, undef if not
my @list = keys %hash;
returns a list of the keys in the hash
my @list = values %hash;
returns a list of the values in the hash

Introduction to Perl and Bioperl

Default variables
$_ - the “default scalar”;
for example, chomp() and print() work on default scalar if no argument is
given
@_ & @ARGV - the “default arrays”;
Subroutines use @_ as default
Outside of a subroutine, @ARGV is the default array, only used for
command line input

Introduction to Perl and Bioperl

Control structures
if (<some test>) {
# do
Loops and decisions } elsif (<other test>) {
# do
for, foreach } else {
# do
}
if, elsif, else
$a = 5;
while while ($a>0) {
# do
“if not” equals “unless” }
$a--;

unless ($valid) {
check($value)
}
transposition helps check($value) unless $valid;
readability

Introduction to Perl and Bioperl

Loop modifers
while (<EXPR>) {
# redo always comes here
next do_something;
} continue {
last # next always comes here
}
# last always comes here
redo
continue
OUTER: foreach (<EXPR>) {
INNER: foreach (<EXPR>) {
last OUTER;
LABEL: }

name a loop to know which }

one is being jumped out of

Introduction to Perl and Bioperl

What is boolean in Perl
Anything can be tested.
An empty string is false
Number 0 and string “0” are false
An empty list () is false
Undefined value, undef, id false
everything else is true

Introduction to Perl and Bioperl

Pseudocode
Near English (or any natural language) explanation what code
does written before writing the code
Keep elaborating and adding programme code like elements until
it is easy to implement.
e.g. how to count from 10 to zero in even numbers:

start from 10, start from 10, $x = 10;

remove 2, keep repeating until 0 until ($x < 0) {
keep repeating until 0 print value print $x;
remove 2, $x -= 2;
}

Introduction to Perl and Bioperl

Subroutines
sub version;
create your own verbs print version, "\n";

prototypes and predeclarations sub add1 {

my $one = shift;
of subroutines can be used my $two = shift;
my $sum = $one + $two;
lexical scoping return $sum;
}
shift works on @_ sub add ($$) {
shift() + shift();
last statement is returned }

Note: you can not pass my $sum = add1(2,3);

$sum = add 2, 3;
two arrays, they are sub version {'1.0'};
flattened into one!

Introduction to Perl and Bioperl

Long arguments for subroutines
sub add2 {
if you have more than two my %args = @_;
my $one = $args{one} || 0;
arguments often, you might my $two = $args{two} || 0;
want to use hashes to pass my $sum = $one +$two;
return $sum;
arguments to subroutines }

sub add ($$) {

shift() + shift();
}

my $sum2 = add2(one => 2,

two => 3);
my $sum = add(2,3);

Introduction to Perl and Bioperl

References
@lower = ('a' .. 'z');
$myletters = \@lower;
Reference is a scalar
push @$myletters, '-';
variable pointer to some $upper = \('A' .. 'Z');
other, often more
${$all}{'upper'} = $upper;
complex, structure. $all->{'lower'} = \@lower;

It does not have to a named $matrix[0][5] = 3;

structure # using ref()
ref \$a; #returns SCALAR
references make it possible ref \@a; #returns ARRAY
to create complex structures: ref \%a; #returns HASH

hashes of hashes,
hashes of arrays, ...
ref() tells what is the referenced
structure

Introduction to Perl and Bioperl

References
@four = ('a' .. 'z');
$myletters = \@lower;
Reference is a scalar
push @$myletters, '-';
variable pointer to some $upper = \('A' .. 'Z');
other, often more
${$all}{'upper'} = $upper;
complex, structure. $all->{'lower'} = \@lower;

It does not have to a named $matrix[0][5] = 3;

structure
references make it possible
to create complex structures:
hashes of hashes,
hashes of arrays, ...

Introduction to Perl and Bioperl

Subroutines revisited
sub first_is_longer {
passing more compex my ($lref1, $lref2) = @_;
arguments as references $first = @$lref1; #length
$sec = @$lref2; # length
? : operator ($first > $sec) ? 1 : 0;
}

Introduction to Perl and Bioperl

Reading and Writing a file
# the most useful perl construct
The easy way:
while (<>) {
use while (<>){} construct # do something
}
redirect the output at command line
into a file

# same as:
> perl -ne '#do something'

# redirection

> perl -ne '#do something' > file

Introduction to Perl and Bioperl

Filehandles
print “Hello\n”;
Default filehandle is STDOUT print STDOUT “Hello\n”; # identical

$! special variable holds my $file = 'seq.embl';

die “Not exist”
error messages unless $file -e;
die “Not readable”
perldoc -f -x unless $file -r;

open FH, $file or die $!;

perldoc -f open while (<FH>) { chomp; print;}
close FH;
$/ 'input record separator'
{
defaults to “\n” open my $F, '>', $file
or die $!;
The three argument form while (<$F>) { chomp; ... }
}
is preferred
lexical scope to filehandles

Introduction to Perl and Bioperl

Reading and Writing a file
die “Not writable”
Permanent record of unless $file -w;
open my $LOG, '>>', $file
program execution or die $!;
print STDERR “log: $params\n”;
print $LOG “$params\n”;

local $/ = "\/\/\n";
read file one EMBL seq open my $SEQ, '<', shift
entry at a time or die $!;
while (<$SEQ>) {
my $seq = $_;
modify $/ in a closure my ($ac) =
or subroutine $seq =~ /AC +(\w+)/;
print "$ac\n"
only use for local you'll see! if $seq =~ /FT +CDS/;
}
}

Introduction to Perl and Bioperl

Regular expressions
/even/; # literal
used for finding patterns in
/eve+n; # + means one or more
free text, semi-structured text
(database parsing), /eve*n; # * means zero or more
sequences (e.g. prosite) /eve?n/; # ? means zero or one

consists of /e(ve)+n/ # group

literals /0|1|2|3|4|5|6|7|8|9/ # alteration

metacharacters /[0123456789]/ # character class

/[0-9]/ # range, in ASCII

/\d/ # character class

Introduction to Perl and Bioperl

Regex shorthands
/[a-zA-Z0-9_]/; # word character
Always use the shortest form /\w/; # word character
for clarity /[^a-zA-Z0-9_]/; # non-word char
/\W/; # non-word char
what does /p*/ match?
/\D/; # not-nummber
it always matches
/[^ \t\n\r\f]/ # white space
Exact number of reptions /\s/ # white space
/\S/ # non-white space

/./ # any

/\w{4}/ # four letter word

/\w{4,6}/ # 4-6 letters
/\w{4,}/ # at least four letters

Introduction to Perl and Bioperl

Regex anchors and operators
Anchoring the match to a border /^ \w+.+/ # ^ forces line start

/\d$/ # $ forces line end

regex works on $_
/\bword\b/ # word boundary
regexp operators tell regexps
to bind to other strings if (/\w/) { # word char
my $line = $_;
=~ # found the first digit
print “digit\n”
!~ if $line =~ /\d/;
# should have ID
print “error: $line”
if $line !~ /ID/;
}

Introduction to Perl and Bioperl

String manipulations with regexs
contents of parenthesis is /^ (\w+)(.+)/;
my first_word = $1;
remembered my $rest = $2;
# or
fancier version of split() my ($first_word, $rest) =
/^ (\w+)(.+)/;
any delimiter can be used when
# two words limited by '\'
declaring a regexp with 'm'
/\w+\\\w+/;
regexp operators m|\w+\\w+|;

match m// s/[Uu]/t/;

s/(\w+)/”$1”/; # add quotes around
substitution s/// # the first word

translate t/// $count = tr/[AT]/N/;

returns number of translations

useful for counting

Introduction to Perl and Bioperl

Regex modifiers and greedyness
modifiers s/(\w+)/”$1”/g; # quotes around
# every word
g - global my $count = tr/[AT]/N/;

Greedy by default
/.+(w+)/; # last word character
“always match all you can” /.+?(w+)/; # first whole word

lazy (non-greedy) matching by

adding ? to repetition

Introduction to Perl and Bioperl

Catching errors
$a = 0;
eval eval {
$b = 5/$a;
traps run time errors };
print $@ if $@;
error message stored in special
variable $@
semicolon at the end of the
eval block is required

Introduction to Perl and Bioperl

Calling external progams
system(“ls”);

# to catch the output use backtics

$files = `ls -1`;

Introduction to Perl and Bioperl

Running perl
man perrun
man perldebug
Chapter 9 on Beginning Perl
command line perl
you should have learned it by now by example!

Introduction to Perl and Bioperl

Modules
logical organisation of code
code reuse
@INC – paths where Perl looks for modules
(do) - call subroutines from an other file
require – runtime include of a file or module
allows testing and gracefull failure
use
compile time include
'use'ing a perl module makes object oriented interface availblae and
usually exports common functions

Introduction to Perl and Bioperl

GetOpt::Long
use constant PROGRAMME_NAME =>
a standard library 'testing.pl';
use constant VERSION => '0.1';
used to set short or long options
our $DEBUG = '';
from command line our $DIR = '.';
our $WINDOW = 7;
$0, name of the calling programme
GetOptions
('v|version' =>
sub{print PROGRAMME_NAME, ",
version ", VERSION, "\n";
exit 1; },
'd|directory:s'=> \$DIR,
'g|debug' => \$DEBUG,
'h|help|?' =>
sub{
exec('perldoc',$0); exit 0}
);

Introduction to Perl and Bioperl

Plain Old Documentation
=pod
POD: embeded structured =head1 Heading Text
comments in code Text in B<bold> I<italic>

Empty lines separate commands =head2 Heading Text

=head3 Heading Text
Three types of text: =head4 Heading Text
=over indentlevel
=item stuff
1. ordinary paragraphs =back
=begin format
formatting codes =end format
=for format text...
2. verbatim paragraphs =encoding type
=cut
indented
3. command paragraphs
see code

Introduction to Perl and Bioperl

POD tools
pod2html pod2latex pod2man pod2text pod2usage,
podchecker
use POD to create selfdocumenting scripts
exec('perldoc',$0); exit;
Headers for a program:
NAME, SYNOPSIS, DESCRIPTION (INSTALLING, RUNNING,
OPTIONS), VERSION, TODO, BUGS, AUTHOR, CONTRIBUTORS,
LICENSE, (SUBROUTINES)
Use inline documentation when you can

Introduction to Perl and Bioperl

Code reuse
Try not to reinvent wheels
CPAN Authors usually QA their code
The community reviews CPAN Modules
Always look for a module FIRST
Chances are, it’s been done faster and more secure than you
could do it by yourself
It saves time
You might be able to do it better, but is it worth it?

Introduction to Perl and Bioperl

Some Modules (I)
GetOpt::Long for command line parsing
Carp provides more intelligent designs for error/warning
messages
Data::Dumper for debugging
CGI & CGI::Pretty provide an interface to the CGI Environment
DBI provides a unified interface to relational databases
DateTime for date interfaces, also
DateTime::Format::DateManip

Introduction to Perl and Bioperl

Some Modules (II)
WWW::Mechanize for web screen scraping
HTML::TreeBuilder for HTML parsing
MIME::Lite for constructing email message with or without
attachments
Spreadsheet::ParseExcel to read in Excel Spreadsheets
Spreadsheet::WriteExcel to create spreadsheets in perl
XML::Twig for XML data
PDL, Perl Data Language, to work with matrices and math

Introduction to Perl and Bioperl

Perl Resources
Perl Phalanx
http://qa.perl.org/phalanx/100/
Comprehensive Perl Archive Network
http://www.cpan.org/
http://search.cpan.org/

Introduction to Perl and Bioperl

Installing from CPAN
use your distro's package manager to install most – and
especialy complex modules.
e.g. sudo apt-get install GD – graphics library
first run configures cpan
o conf init at cpan prompt reconfigures
sets closest mirrors and finds helper programs

$ sudo cpan
cpan> install YAML
...

Introduction to Perl and Bioperl

BioPerl
BioPerl is in CPAN
... but you will not want to use it from there!
sequence databases change so often that official releases are often
outdated
http://bioperl.org/

Introduction to Perl and Bioperl

Installing BioPerl via CVS (I)
http://www.bioperl.org/wiki/Using_CVS
You need cvs client on your local machine
Create a directory for BioPerl
$ mkdir ~/src;
$ mkdir ~/src/bioperl
$ cd ~/src/bioperl

Login to CVS (password is "cvs"):

$ cvs -d :pserver:cvs@code.open-bio.org:\
/home/repository/bioperl login

Introduction to Perl and Bioperl

Installing BioPerl via CVS (II)
Checkout the BioPerl core module, only
$ cvs -d :pserver:cvs@code.open-bio.org:\
/home/repository/bioperl checkout bioperl-live

Tell perl where to find BioPerl (set this in your .bash_profile,

.profile, or .cshrc):
bash: $ export PERL5LIB="$HOME/src/bioperl"
tcsh: $ setenv PERL5LIB "$HOME/src/bioperl"

Test
perl -MBio::Perl -le 'print Bio::Perl->VERSION;'

Introduction to Perl and Bioperl

What is Bioperl
A collection of Perl modules for processing data for the life
sciences
A project made up of biologists, bioinformaticians, computer
scientists
An open source toolkit of building blocks for life sciences
applications
Supported by Open Bioinformatics Foundation (O|B|F),
http://www.open-bio.org/
Collaborative online community

Introduction to Perl and Bioperl

Simple example
#!/usr/bin/perl -w
use strict;
use Bio::SeqIO;
my $in = new Bio::SeqIO(-format => 'genbank',
-file => 'AB077698.gb');
while ( my $seq = $in->next_seq ) {
print "Sequence length is ", $seq->length(), "\n";
my $sequence = $seq->seq();
print "1st ATG is at ", index($sequence,'ATG')+1, "\n";
print "features are: \n";
foreach my $f ( $seq->top_SeqFeatures ) {
printf(" %s %s(%s..%s)\n",
$f->primary_tag,
$f->strand < 0 ? 'complement' : '',
$f->start,
$f->end);
}
}

Introduction to Perl and Bioperl

Simple example, output
% perl ex1.pl
Sequence length is 2701
1st ATG is at 80
features are:
source (1..2701)
gene (1..2701)
5'UTR (1..79)
CDS (80..1144)
misc_feature (137..196)
misc_feature (239..292)
misc_feature (617..676)
misc_feature (725..778)
3'UTR (1145..2659)
polyA_site (1606..1606)
polyA_site (2660..2660)

Introduction to Perl and Bioperl

Gotchas
Sequences start with 1 in Bioperl (historical reasons). In perl
strings, arrays, etc start with 0.
When using a module, CaseMatTers.
methods are usually lower case with underscores (_).
Make sure you know what you're getting back - if you get back an
array, don't assign it to a scalar in haste.
my ($val) = $obj->get_array(); # 1st item
my @vals = $obj->get_array(); # whole list
my $val = $obj->get_array(); # array length

Introduction to Perl and Bioperl

Where to go for help
http://docs.bioperl.org/
http://bioperl.org/
FAQ, HOWTOs, Tutorial
modules/ directory (for class diagrams)
perldoc Module::Name::Here
Publication - Stajich et al. Genome Res 2002
Bioperl mailing list: bioperl-l@bioperl.org
Bug reports: http://bugzilla.bioperl.org/

Introduction to Perl and Bioperl

Brief Object Oriented overview
Break problem into
components
Each component has
data (state) and methods
Only interact with
component through methods
Interface versus implementations

Introduction to Perl and Bioperl

Objects in Perl
An object is simply a reference that happens to know which class
it belongs to.
A class is simply a package that happens to provide methods to
deal with object references.
A method is simply a subroutine that expects an object reference
(or a package name, for class methods) as the first argument.

Introduction to Perl and Bioperl

Inheritance
Objects inherit methods
from their parent
They inherit state
(data members);
not explicitly in Perl.
Methods can be
overridden by children

Introduction to Perl and Bioperl

Interfaces
Interfaces can be thought of
as an agreement
Object will at least look
a certain way
It is independent of what
goes on under the hood

Introduction to Perl and Bioperl

Interfaces and Inheritance in Bioperl
What you need to know:
Interfaces are declared with trailing 'I' (Bio::PrimarySeqI)
Can be assured that at least these methods will be implemented by
subclasses
Can treat all inheriting objects as if they were the same, i.e.
Bio::PrimarySeq, Bio::Seq, Bio::Seq::RichSeq all have basic
Bio::PrimarySeqI methods.
In Perl, good OO requires good manners.
Methods which start with an underscore are considered 'private'
Watch out. Perl programmers can cheat.

Introduction to Perl and Bioperl

Modular programming (I)

From Stein et al. Genome Research 2002

Introduction to Perl and Bioperl

Modular programming (II)

Introduction to Perl and Bioperl

Bioperl components

Introduction to Perl and Bioperl

Sequence components I
Sequences
Bio::PrimarySeq - Basic sequence operations (aa and nt)
Bio::Seq - Supports attached features
Bio::Seq::RichSeq - GenBank,EMBL,SwissProt fields
Bio::LocatableSeq - subsequences
Bio::Seq::Meta - residue annotation

Introduction to Perl and Bioperl

Sequence components II
Features
Bio::SeqFeature::Generic - Basic Sequence features
Bio::SeqFeature::Similarity - Represent similarity info
Bio::SeqFeature::FeaturePair - Paired features (HSPs)
Sequence Input: Bio::SeqIO
Annotation: Bio::Annotation::XX objects

Introduction to Perl and Bioperl

Class diagram (subset)

From Stajich et al. Genome Research 2002

Introduction to Perl and Bioperl

Build a sequence and translate it
#!/usr/bin/perl -w
use strict;
use Bio::PrimarySeq;
my $seq = new Bio::PrimarySeq(-seq => 'ATGGGACCAAGTA',
-display_id => 'example1');
print "seq length is ", $seq->length, "\n";
print "translation is ", $seq->translate()->seq(), "\n";

% perl ex2.pl
seq length is 13
translation is MGPS

Introduction to Perl and Bioperl

Bio::PrimarySeq I
Initialization
-seq - sequence string
-display_id - sequence ID (i.e. >ID DESCRIPTION)
-desc - description
-accession_number - accession number
-alphabet - alphabet (dna,rna,protein)
-is_circular - is a circular sequence (boolean)
-primary_id - primary ID (like GI number)

Introduction to Perl and Bioperl

Bio::PrimarySeq III
Essential methods
length - return the length of the sequence
seq - get/set the sequence string
desc - get/set the description string
display_id - get/set the display id string
alphabet - get/set the sequence alphabet
subseq - get a sub-sequence as a string
trunc - get a sub-sequence as an object

Introduction to Perl and Bioperl

Bio::PrimarySeq III
Methods only for nucleotide sequences
translate - get the protein translation
revcom - get the reverse complement

Introduction to Perl and Bioperl

Bio::Seq
Initialization
annotation - Bio::AnnotationCollectionI object
features - array ref of Bio::SeqFeatureI objects
species - Bio::Species object

Introduction to Perl and Bioperl

Bio::Seq
Essential methods
species - get/set the Bio::Species object
annotation - get/set the Bio::AnnotationCollectionI object
add_SeqFeature - attach a Bio::SeqFeatureI object to Seq
flush_SeqFeatures - remove all features
top_SeqFeatures - Get all the toplevel features
all_SeqFeatures - Get all features flattening those which contain sub-
features (rare now).
feature_count - Get the number of features attached

Introduction to Perl and Bioperl

Parse a sequence from file
# ex3.pl
use Bio::SeqIO;
my $in = new Bio::SeqIO(-format => 'swiss',
-file => 'BOSS_DROME.sp');
my $seq = $in->next_seq();
my $species = $seq->species;
print "Organism name: ", $species->common_name, " ",
"(", $species->genus, " ", $species->species, ")\n";
my ($ref1) = $seq->annotation->get_Annotations('reference');
print $ref1->authors,"\n";
foreach my $feature ( $seq->top_SeqFeatures ) {
print $feature->start, " ",$feature->end, " ",
$feature->primary_tag, "\n";
}

Introduction to Perl and Bioperl

Parse a sequence from file, output
% perl ex3.pl
Organism name: Fruit fly (Drosophila melanogaster)
Hart A.C., Kraemer H., van Vactor D.L. Jr., Paidhungat M., Zipursky
1 31 SIGNAL
32 896 CHAIN
32 530 DOMAIN
531 554 TRANSMEM
570 588 TRANSMEM
615 637 TRANSMEM
655 676 TRANSMEM
693 712 TRANSMEM
728 748 TRANSMEM
759 781 TRANSMEM
782 896 DOMAIN
...

Introduction to Perl and Bioperl

Bio::SeqIO
Can read sequence from a file or a filehandle
special trick to read from a string: use IO::String
Initialize
-file - filename for input (prepend > for output files)
-fh - filehandle for reading or writing
-format - format for reading writing
Some supported formats:
genbank, embl, swiss, fasta, raw, gcg, scf, bsml, game, tab

Introduction to Perl and Bioperl

Read in sequence and write out in
different format
# ex4.pl
use Bio::SeqIO;
my $in = new Bio::SeqIO(-format => 'genbank',
-file => 'in.gb');
my $out = new Bio::SeqIO(-format => 'fasta',
-file =>'>out.fa');
while ( my $seq = $in->next_seq ) {
next unless $seq->desc =~ /hypothetical/i;
$out->write_seq($seq);
}

Introduction to Perl and Bioperl

Sequence Features:
Bio::SeqFeatureI
Basic sequence features - have a location in sequence
primary_tag, source_tag, score, frame
additional tag/value pairs
Subclasses by numerous objects - power of the interface!

Introduction to Perl and Bioperl

Sequence Features:
Bio::SeqFeature::Generic
Initialize
-start, -end, -strand
-frame - frame
-score - score
-tag - hash reference of tag/values
-primary - primary tag name
-source - source of the feature (e.g. program)
Essential methods
primary_tag, source_tag, start,end,strand, frame
add_tag_value, get_tag_values, remove_tag, has_tag

Introduction to Perl and Bioperl

Locations quandary
How to manage features that span more than just start/end
Solution: An interface Bio::LocationI, and implementations in
Bio::Location
Bio::Location::Simple - default: 234, 39^40
Bio::Location::Split - multiple locations (join,order)
Bio::Location::Fuzzy - (<1..30, 80..>900)
Each sequence feature has a location() method to get access to
this object.

Introduction to Perl and Bioperl

Create a sequence and a feature
#ex5.pl
use Bio::Seq;
use Bio::SeqFeature::Generic;
use Bio::SeqIO;
my $seq = Bio::Seq->new
(-seq => 'STTDDEVVATGLTAAILGLIATLAILVFIVV',
-display_id => 'BOSSfragment',
-desc => 'pep frag');
my $f = Bio::SeqFeature::Generic->new
(-seq_id => 'BOSSfragment',
-start => 7, -end => 22,
-primary => 'TRANSMEMBRANE',
-source => 'hand_curated',
-tag => {'note' => 'putative transmembrane'});
$seq->add_SeqFeature($f);
my $out = new Bio::SeqIO(-format => 'genbank');
$out->write_seq($seq);

Introduction to Perl and Bioperl

Create a sequence and a feature,
output
% perl ex5.pl
LOCUS BOSSfragment 34 aa linear UNK
DEFINITION pep frag
ACCESSION unknown
FEATURES Location/Qualifiers
TRANSMEMBRANE 10..25
/note="putative transmembrane"
ORIGIN
1 tvasttddev vatgltaail gliatlailv fivv
//

Introduction to Perl and Bioperl

Sequence Databases
Remote databases
GenBank, GenPept, EMBL, SwissProt - Bio::DB::XX
Local databases
local Fasta - Bio::Index::Fasta, Bio::DB::Fasta
local Genbank,EMBL,SwissProt - Bio::Index::XX
local alignments - Bio::Index::Blast, Bio::Index::SwissPfam
SQL dbs
Bio::DB::GFF
Bio::DB::BioSeqDatabases (through bioperl-db pkg)

Introduction to Perl and Bioperl

Retrieve sequences from a
database
# ex6.pl
use Bio::DB::GenBank;
use Bio::DB::SwissProt;
use Bio::DB::GenPept;
use Bio::DB::EMBL;
use Bio::SeqIO;
my $out = new Bio::SeqIO(-file => ">remote_seqs.embl",
-format => 'embl');
my $db = new Bio::DB::SwissProt();
my $seq = $db->get_Seq_by_acc('7LES_DROME');
$out->write_seq($seq);
$db = new Bio::DB::GenBank();
$seq = $db->get_Seq_by_acc('AF012924');
$out->write_seq($seq);
$db = new Bio::DB::GenPept();
$seq = $db->get_Seq_by_acc('CAD35755');
$out->write_seq($seq);

Introduction to Perl and Bioperl

The Open Biological Database
Access (OBDA) System
cross-platform, database independent
implemented in Bioperl, Biopython, Biojava, Bioruby
database access controlled by registry file(s)
global or user's own
the default registry retrieved over the web
Database types implemented:
flat - Bio::Index
biosql
biofetch - Bio::DB
more: http://www.bioperl.org/HOWTOs/html/OBDA_Access.html

Introduction to Perl and Bioperl

Retrieve sequences using OBDA
# ex7.pl
use Bio::DB::Registry 1.2;# needs bioperl release 1.2.2 or later
my $registry = Bio::DB::Registry->new;
# $registry->services
my $db = $registry->get_database('embl');
# get_Seq_by_{id|acc|version}
my $seq = $db->get_Seq_by_acc("J02231");
print $seq->seq,"\n";

Introduction to Perl and Bioperl

Alignments

Introduction to Perl and Bioperl

Alignment Components
Pairwise Alignments
Bio::SearchIO - Parser
Bio::Search::XX - Data Objects
Bio::SeqFeature::SimilarityPair
Multiple Seq Alignments
Bio::AlignIO - Parser
Bio::SimpleAlign - Data Object

Introduction to Perl and Bioperl

Multiple Sequence Alignments
# ex.pl
# usage: convert_aln.pl < in.aln > out.phy
use Bio::AlignIO;
my $in = new Bio::AlignIO(-format => 'clustalw');
my $out = new Bio::AlignIO(-format => 'phylip');
while( my $aln = $in->next_aln ) {
$out->write_aln($aln);
}

Introduction to Perl and Bioperl

BLAST/FASTA/HMMER Parsing
Can be split into 3 components
Result - one per query, associated db stats and run parameters
Hit - Sequence which matches query
HSP - High Scoring Segment Pairs. Components of the Hit which match
the query.
Corresponding object types in the Bio::Search namespace
Implemented for BLAST, FASTA, HMMER

Introduction to Perl and Bioperl

Parse a BLAST & FASTA report
# ex8.pl
use Bio::SearchIO;
use Math::BigFloat;
my $cutoff = Math::BigFloat->new('0.001');
my %files = ( 'blast' => 'BOSS_Ce.BLASTP',
'fasta' => 'BOSS_Ce.FASTA');
while( my ($format,$file) = each %files ) {
my $in = new Bio::SearchIO(-format => $format,
-file => $file);
while( my $r = $in->next_result ) {
print "Query is: ", $r->query_name, " ",
$r->query_description," ",$r->query_length," aa\n";
print " Matrix was ", $r->get_parameter('matrix'), "\n";
while( my $h = $r->next_hit ) {
last unless Math::BigFloat->new($h->significance) < $cutoff;
print "Hit is ", $h->name, "\n";
while( my $hsp = $h->next_hsp ) {
print " HSP Len is ", $hsp->length('total'), " ",
" E-value is ", $hsp->evalue, " Bit score ", $hsp->score, " \n",
" Query loc: ",$hsp->query->start, " ", $hsp->query->end," ",
" Sbject loc: ",$hsp->hit->start, " ", $hsp->hit->end,"\n";
}
}
print "--\n";
}
} Introduction to Perl and Bioperl
Parse a BLAST & FASTA report,
output
% perl ex7.pl
Query is: BOSS_DROME Bride of sevenless protein precursor. 896 aa
Matrix was BL50
Hit is F35H10.10
HSP Len is 728 E-value is 6.8e-05 Bit score 197.9
Query loc: 207 847 Sbject loc: 640 1330
--
Query is: BOSS_DROME Bride of sevenless protein precursor. 896 aa
Matrix was BLOSUM62
Hit is F35H10.10
HSP Len is 315 E-value is 4.9e-11 Bit score 182
Query loc: 511 813 Sbject loc: 1006 1298
HSP Len is 28 E-value is 1.4e-09 Bit score 39
Query loc: 508 535 Sbject loc: 427 454
--

Introduction to Perl and Bioperl

Create an HTML version of a report
#!/usr/bin/perl -w
# ex9.pl
use strict;
use Bio::SearchIO;
use Bio::SearchIO::Writer::HTMLResultWriter;
use Math::BigFloat;
my $cutoff = Math::BigFloat->new('0.2');
my $in = new Bio::SearchIO(-format => 'blast',
-file => 'BOSS_Ce.BLASTP');
my $writer = new Bio::SearchIO::Writer::HTMLResultWriter;
my $out = new Bio::SearchIO(-writer => $writer,
-file => '>BOSS_Ce.BLASTP.html');

Introduction to Perl and Bioperl

Create an HTML version of a report
while( my $result = $in->next_result ) {
my @keephits;
my $newresult = new Bio::Search::Result::GenericResult
(-query_name => $result->query_name,
-query_accession => $result->query_accession,
-query_description => $result->query_description,
-query_length => $result->query_length,
-database_name => $result->database_name,
-database_letters => $result->database_letters,
-database_entries => $result->database_entries,
-algorithm => $result->algorithm,
-algorithm_version => $result->algorithm_version,
);
foreach my $param ( $result->available_parameters ) {
$newresult->add_parameter($param,
$result->get_parameter($param));
}
foreach my $stat ( $result->available_statistics ) {
$newresult->add_statistic($stat,
$result->get_statistic($stat));
}
while( my $hit = $result->next_hit ) {
last if Math::BigFloat->new($hit->significance) > $cutoff;
$newresult->add_hit($hit);
}
$out->write_result($newresult);
}
Introduction to Perl and Bioperl
Other things covered by Bioperl

Introduction to Perl and Bioperl

Parse outputs from various
programs
Bio::Tools::Results::Sim4
Bio::Tools::GFF
Bio::Tools::Genscan,MZEF, GRAIL
Bio::Tools::Phylo::PAML, Bio::Tools::Phylo::Molphy
Bio::Tools::EPCR
(recent) Genewise, Genscan, Est2Genome, RepeatMasker

Introduction to Perl and Bioperl

Things I'm skipping (here)
In detail: Bio::Annotation objects
Bio::Biblio - Bibliographic objects
Bio::Tools::CodonTable - represent codon tables
Bio::Tools::SeqStats - base-pair freq, dicodon freq, etc
Bio::Tools::SeqWords - count n-mer words in a sequence
Bio::SeqUtils – mixed helper functions
Bio::Restriction - find restriction enzyme sites and cut sequence
Bio::Variation - represent mutations, SNPs, any small variations
of sequence

Introduction to Perl and Bioperl

More useful things
Bio::Structure - parse/represent protein structure (PDB) data
Bio::Tools::Alignment::Consed - process Consed data
Bio::TreeIO, Bio::Tree - Phylogenetic Trees
Bio::MapIO, Bio::Map - genetic, linkage maps (rudiments)
Bio::Coordinate - transformations between coordinate systems
Bio::Tools::Analysis – web scraping

Introduction to Perl and Bioperl

Bioperl can help you run things too
Namespace is Bio::Tools::Run
In separate CVS module bioperl-run since v1.2
EMBOSS, BLAST, TCoffee, Clustalw
SoapLab, PISE
Remote Blast searches at NCBI (Bio::Tools::Run::RemoteBlast)
Phylogenetic tools (PAML, Molphy, PHYLIP)
More utilities added on a regular basis for the BioPipe pipeline
project, http://www.biopipe.org/

Introduction to Perl and Bioperl

Other project off-shoots and
integrations
Microarray data and objects (Allen Day)
BioSQL - relational db for sequence data (Hilmar Lapp, Chris
Mungall, GNF)
Biopipe - generic pipeline setup (Elia Stupka, Shawn Hoon,
Fugu-Sg)
GBrowse - genome browser (Lincoln Stein)

Introduction to Perl and Bioperl

Acknowledgements
LOTS of people have made the toolkit what it is today.
The Bioperl AUTHORS list in the distro is a starting point.
Some people who really got the project started and kept it going:
Jason Stajich, Sendu Bala, Chris Field, Brian Osborne, Steven
Brenner, Ewan Birney, Lincoln Stein, Steve Chervitz, Ian Korf,
Chris Dagdigian, Hilmar Lapp, Heikki Lehväslaiho, Georg Fuellen
& Elia Stupka

Introduction to Perl and Bioperl

Perl Notes
No ratings yet
Perl Notes
25 pages
C#.Net Full Notes
100% (1)
C#.Net Full Notes
63 pages
The Go Programming Language Specification - The Go Programming Language
No ratings yet
The Go Programming Language Specification - The Go Programming Language
110 pages
v3s User Manual
No ratings yet
v3s User Manual
30 pages
Perl
No ratings yet
Perl
38 pages
Lecture Notes 14
No ratings yet
Lecture Notes 14
44 pages
Artigo Wall
No ratings yet
Artigo Wall
71 pages
Unit-I Part-II - Introduction To PERL
No ratings yet
Unit-I Part-II - Introduction To PERL
98 pages
Scripting Languages Advanced Perl: Course: 67557 Hebrew University Lecturer: Elliot Jaffe - הפי טוילא
100% (1)
Scripting Languages Advanced Perl: Course: 67557 Hebrew University Lecturer: Elliot Jaffe - הפי טוילא
44 pages
Bioinformatics Data Skills (PDFDrive)
No ratings yet
Bioinformatics Data Skills (PDFDrive)
30 pages
JavaScript - Quick Guide
No ratings yet
JavaScript - Quick Guide
119 pages
Reading Sample Sap Press Sap Successfactors Employee Central
No ratings yet
Reading Sample Sap Press Sap Successfactors Employee Central
35 pages
Data Structures and C Programming (B.SC Cs and B.C.A - 2 Sem)
No ratings yet
Data Structures and C Programming (B.SC Cs and B.C.A - 2 Sem)
118 pages
Perl Training
No ratings yet
Perl Training
33 pages
DFP40203 CH1 Introduction To Python
No ratings yet
DFP40203 CH1 Introduction To Python
18 pages
SL Unit-Ii Perl
No ratings yet
SL Unit-Ii Perl
84 pages
PERL On Unix/Linux: Practical Extraction and Reporting Language
No ratings yet
PERL On Unix/Linux: Practical Extraction and Reporting Language
135 pages
Major Presentation
No ratings yet
Major Presentation
12 pages
Part I ! Introduction To Perl Scripting: Dr. K. Najeeb
No ratings yet
Part I ! Introduction To Perl Scripting: Dr. K. Najeeb
85 pages
Perlintro (Unix)
No ratings yet
Perlintro (Unix)
11 pages
String Operations in ABAP
No ratings yet
String Operations in ABAP
16 pages
Book
No ratings yet
Book
26 pages
Randal Schwartz Learning Perl
No ratings yet
Randal Schwartz Learning Perl
159 pages
Python Notes Module5
No ratings yet
Python Notes Module5
34 pages
HCL Placement Papers
No ratings yet
HCL Placement Papers
214 pages
PHP Notes1
No ratings yet
PHP Notes1
46 pages
Perl - Part Iv: Indian Institute of Technology Kharagpur
No ratings yet
Perl - Part Iv: Indian Institute of Technology Kharagpur
24 pages
PP Innovative Features of Scripting Languages
100% (2)
PP Innovative Features of Scripting Languages
20 pages
Perl
No ratings yet
Perl
60 pages
Perl 0411 PDF
100% (1)
Perl 0411 PDF
124 pages
EN234FEA Tutorial
No ratings yet
EN234FEA Tutorial
15 pages
Introduction To Java Arrays
No ratings yet
Introduction To Java Arrays
83 pages
Table of Contents: Language Core
No ratings yet
Table of Contents: Language Core
63 pages
An Introduction To Perl PDF
No ratings yet
An Introduction To Perl PDF
25 pages
Google C++ Style Guide
No ratings yet
Google C++ Style Guide
54 pages
Perl 240529 094027
No ratings yet
Perl 240529 094027
3 pages
Scripting Languages Perl Basics: Course: 67557 Hebrew University Lecturer: Elliot Jaffe - הפי טוילא
No ratings yet
Scripting Languages Perl Basics: Course: 67557 Hebrew University Lecturer: Elliot Jaffe - הפי טוילא
48 pages
Introduction To Perl
100% (1)
Introduction To Perl
62 pages
Adobe Introduction To Scripting
No ratings yet
Adobe Introduction To Scripting
52 pages
Pearl
No ratings yet
Pearl
49 pages
Perl
No ratings yet
Perl
60 pages
Intro To Perl
No ratings yet
Intro To Perl
59 pages
Perl
No ratings yet
Perl
54 pages
Zsdzs
No ratings yet
Zsdzs
35 pages
SL Unit-III (Perl)
No ratings yet
SL Unit-III (Perl)
210 pages
CH 8 PPT NOTES - Flexible Budgets - VMOH Variances-Part 1 2021
No ratings yet
CH 8 PPT NOTES - Flexible Budgets - VMOH Variances-Part 1 2021
17 pages
Perl Web Tutorial-LokeshB
No ratings yet
Perl Web Tutorial-LokeshB
23 pages
Introduction To Perl Programming
100% (10)
Introduction To Perl Programming
21 pages
Lab 2: Modules: Step 1: Examine The Following Algorithm
No ratings yet
Lab 2: Modules: Step 1: Examine The Following Algorithm
19 pages
Errors
No ratings yet
Errors
24 pages
Beginning Perl For Bioinformatics
No ratings yet
Beginning Perl For Bioinformatics
17 pages
Www.tutorialspoint.com
No ratings yet
Www.tutorialspoint.com
16 pages
Perl 4
No ratings yet
Perl 4
112 pages
Javascript Cheat Sheet: Beginner's Essential
No ratings yet
Javascript Cheat Sheet: Beginner's Essential
63 pages
Perl
No ratings yet
Perl
25 pages
Perl Programming: David Schweikert
No ratings yet
Perl Programming: David Schweikert
14 pages
Introduction To Perl: Part 1
No ratings yet
Introduction To Perl: Part 1
11 pages
Perl Intro
No ratings yet
Perl Intro
21 pages
Perl Version 5.14.0 Documentation - Perlintro
No ratings yet
Perl Version 5.14.0 Documentation - Perlintro
11 pages
Tutorial
No ratings yet
Tutorial
48 pages
Perl Syntax: Basic Script
No ratings yet
Perl Syntax: Basic Script
9 pages
Unit 1-3
No ratings yet
Unit 1-3
43 pages
Revised Potato Pirates X Java (3-Hour Resource)
No ratings yet
Revised Potato Pirates X Java (3-Hour Resource)
51 pages
5 Perl
No ratings yet
5 Perl
32 pages
Module1 Python 15CS664
No ratings yet
Module1 Python 15CS664
34 pages
Java Basic Questions For Interview Preparation
No ratings yet
Java Basic Questions For Interview Preparation
20 pages
Beginning Perl For Bioinformatics-RVS
No ratings yet
Beginning Perl For Bioinformatics-RVS
49 pages
PERL Programming Basic
100% (3)
PERL Programming Basic
106 pages
Perl Version 5.10.1 Documentation - Perlintro
No ratings yet
Perl Version 5.10.1 Documentation - Perlintro
11 pages
Perl
No ratings yet
Perl
99 pages
Perl Version 5.14.1 Documentation - Perlintro
No ratings yet
Perl Version 5.14.1 Documentation - Perlintro
11 pages
Perl Version 5.8.8 Documentation - Perlintro: Strongly Perltoc
No ratings yet
Perl Version 5.8.8 Documentation - Perlintro: Strongly Perltoc
10 pages
Perl Version 5.16.2 Documentation - Perlintro
No ratings yet
Perl Version 5.16.2 Documentation - Perlintro
11 pages
Perl Workshop
100% (1)
Perl Workshop
68 pages
Perl Scripting: M. Varadharajan Thiagarajar College of Engineering
No ratings yet
Perl Scripting: M. Varadharajan Thiagarajar College of Engineering
36 pages
Perl Basics
No ratings yet
Perl Basics
90 pages
Introduction To Programming in Perl
No ratings yet
Introduction To Programming in Perl
20 pages
Tutorial
No ratings yet
Tutorial
118 pages
Brief Introduction To Perl
No ratings yet
Brief Introduction To Perl
11 pages
A Quick Guide To Perl
No ratings yet
A Quick Guide To Perl
2 pages
Beginning Perl
From Everand
Beginning Perl
Curtis Poe
4/5 (1)
Perl One-Liners: 130 Programs That Get Things Done
From Everand
Perl One-Liners: 130 Programs That Get Things Done
Peteris Krumins
4/5 (3)
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Perl Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
From Everand
Perl Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
Ginno
No ratings yet
Introduction to PHP, Part 5, Second Edition
From Everand
Introduction to PHP, Part 5, Second Edition
Adam Majczak
No ratings yet
Mastering Java: A Comprehensive Guide to Development Tools and Techniques
From Everand
Mastering Java: A Comprehensive Guide to Development Tools and Techniques
Lena Neill
No ratings yet
Simplified PHP
From Everand
Simplified PHP
James Blanchette
No ratings yet
Ruby Gems Mastery: 100 Essential Packages for 2024
From Everand
Ruby Gems Mastery: 100 Essential Packages for 2024
Kanto
No ratings yet
PHP programming
From Everand
PHP programming
Nino Paiotta
No ratings yet
Basic Information About C language PDF
From Everand
Basic Information About C language PDF
Suraj Das
No ratings yet