[go: up one dir, main page]

0% found this document useful (0 votes)
21 views69 pages

Lecture 03 Protein Sequence Analysis

none
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views69 pages

Lecture 03 Protein Sequence Analysis

none
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Protein Sequence Analysis

Safee Ullah Chaudhary

Department of Biology, SBASSE, LUMS


The Great Pyramid of Proteomics!

Systems
Biology

Quantitative
Proteomics

Protein Structural
Sequencing Proteomics
• Each amino acid can be presented with a single lettered
amino acid tag
3
Amino Acids & Peptide bonds

https://www2.chemistry.msu.edu/faculty/reusch/VirtTxtJml/protein2.htm

4
But proteins are 3D, Right?

5
Peptide bond (Planar)

Pile of Peptide bonds

Peptide bonds supported by H-bonds Unreal-Ray Crystallography of Paper Clip6Protein


Angles to them all!
• Peptide bonds are planar (C=O and N-H)

• The Dihedral Angles

1. Phi φ (phi, involving C'-N-Cα-C')


• Controls the C’-C’ distance
2. Psi ψ (psi, involving N-Cα-C'-N)
• Controls the N-N distance
3. Omega W (Omega Cα-C'-N-Cα )
• Controls the Cα - Cα distance
(Typically 180 degrees and planar as peptide bond is planar)

7
The Dihedral Angles
To visualize the dihedral angle
of four atoms:
1. Look down the second bond vector

2. The first atom is at 6 o'clock, the


fourth atom is at roughly 2 o'clock
and the second and third atoms are
located in the center.

3. The second bond vector is coming


out of the page. The dihedral angle
is the counterclockwise angle made
by the red and blue vectors.
8
Certain side-chain configurations are
energetically favored (rotamers)
Ramachandran plot: "Allowable" psi & phi dihedral angles
w angle

w angle
α

w angle

phi
9
Right-handed Alpha Helix: If you hold it pointing away from you and it twists clockwise moving
away, it is right-handed, otherwise it is left-handed. These models are mirror images and can
not be converted into the other by rotation. The helix of normal DNA is right-handed.
Alpha-helix and beta-strand regions

Data as in (Lovell et
al. 2003) showing
about 100,000 data Left-handed
Alpha Helices
points for several
proteins/amino-acids
Right-handed
Alpha Helices

http://www.ocf.berkeley.edu/~asiegel/posts/?p=24 10
Protein Sequence?
• So each amino acid can be presented with a single
lettered amino acid tag

• And amino acids can join together, by formation of


peptide bonds

• This process repeated for all codons in an mRNA


molecule helps form a protein

• So a protein sequence representation is essentially


a concatenated list of several amino acid tags

11
Protein Sequencing – Edman Degradation
• Edman degradation starting from the N-terminal and
removing one amino acid at a time (details next).

• Drawback:
• Restricted to 60 residues
• Laborious: ~50 aa/day

• Modern technique: Tandem mass spectrometry

P. Edman, Acta Chem. Scand. 4, 283 (1950)


Proteins: Finding the Primary Structure

(Phenyl isothiocyanate or PITC)

(Trifluoroacetic acid, TFA)

(http://en.wikibooks.org/wiki/Structural_Biochemistry/Proteins/Protein_sequence_determination_techniques)

13
Edman Cycle
MS-based Proteomics
• Objective: Large-scale determination of gene and
cellular function directly at the protein level

• Challenge: Complexity of cellular proteomes and the


low abundance of many of the proteins necessitates
highly sensitive analytical techniques

• Hence, mass spectrometry (MS) based proteomics has


become the method of choice for analysis of complex
protein samples

• High throughput MS-based proteomics is now an


indispensable technology to interpret genome-wide
information
Mass Spectrometry
based Proteomics
What is a Mass Spectrometer?
Mass spectrometer is an analytical device that
measures molecular masses within a sample.

How?
Mass spectrometer ionizes molecules and sorts them
based on their mass-to-charge (m/z) ratio against
their relative abundance.

17
Figure - Mass Spectrometer Workflow
18
Physics behind Mass Spectrometry

www.mhhe.com
m/z Ratio
• Moving charged particles in a magnetic field
experience forces given by

Force ∝ Q

F is the force applied to the ion, m is the mass of the


particle, a is the acceleration, Q is the electric charge,
E is the electric field, and v × B is the cross product of
the ion's velocity and the magnetic flux density.

Lahore University of Management Sciences


20
(LUMS), Pakistan
Components in a Mass Spec.
• Ionization
• Proteomics typically involves addition of proton(s) to the protein or
peptide
• Protonation changes the mass by +1
• Charged molecules are then transferred to Mass Analyzer

• Mass Analyzer
• Separates the samples according to their m/z

• Detector
• Selected molecules then hit the detector

• Spectrum Assembly
• Proteomics software which is interfaced to the MS, assembles spectra

Lahore University of Management Sciences


21
(LUMS), Pakistan
Big Names in Proteomics

Top Down Proteomics


www.kelleher.northwestern.edu 22
Swiss Prot
GC-MS Ralph Apweiler at European
Fred McLafferty in in his Molecular Biology Lab
lab at Cornell
23
Quantitative Proteomics
Rudy Aebersold, ETH
Zurich

PST Approach (de novo),


Nano Electrospray
Mathias Mann at Max Planck

FT-ICR Development 24
Richard Smith at PNNL
Protein Sequence Databases
• Searching for a protein by ID
http://www.uniprot.org/

25
Peptide Databases
• Peptide Atlas • Antiparasitic Peptides
• PepBank (Harvard) (Antimalaria)
• Cancer Peptide and Protein • Anticancer Peptides
Database (CPPD) • Anti-protist Peptides
• Antibacterial Peptides • Insecticidal Peptides
(Antibiofilms)
• Spermicidal Peptides
• Antiviral Peptides (Anti-HIV)
• Chemotactic peptides
• Antifungal Peptides
• wound healing
• Antiparasitic Peptides
(Antimalaria) • Antioxidant peptides
• Protease inhibitors

26
MS Spectral Data Processing – Charge
State Deconvolution
• Charge needs to be estimated before calculation of the mass from
m/z ratios 1 kg = 6.022e+26 amu

27
Activity

Calculate the MW of A, B and C

Lahore University of Management Sciences


28
(LUMS), Pakistan
MS Spectral Data Processing – Isotopic
Envelope Deconvolution

• Most abundant isotopic molecule will give the monoisotopic peak

29
Mass Isotopic Distributions
Calculating Isotopic Mass Distributions of
• N H3

• C H4

• C2 O2 N H5

Calculate peaks and their intensity in each case and plot them!
Cookie Point (0.25)

30
31
Excerpt from a Mass Spectrum
Hurdles in Application of MS
1. Hard ionization techniques

2. Low resolution of mass analyzers

3. Search Algorithms
• Isotopic envelope deconvolution
• Post-translational modifications detection
Soft Ionization to the Rescue!

The Nobel Prize in Chemistry 2002 was awarded "for the development of
methods for identification and structure analyses of biological macromolecules
with one half jointly to John B. Fenn and Koichi Tanaka "for their development of
soft desorption ionisation methods for mass spectrometric analyses of biological
macromolecules" and the other half to Kurt Wüthrich "for his development of
nuclear magnetic resonance spectroscopy
High Resolution Mass Spec!

Earth's field ranges between approximately 25,000 and 65,000 nT


SPECTRUM: A MATLAB Toolbox for Identifying
Proteins from Top-Down Proteomics Data

Biomedical Informatics Research Laboratory,


LUMS
A B

C E H

D F G

Fig. SPECTRUM GUIs. The set of graphical user interfaces in SPECTRUM created using MATLAB GUIDE to undertake the search process and
visualize results. (A) Main SPECTRUM GUI to provide general search parameters, (B) GUI to tune intact protein mass, (C) GUI to provide PST
search parameters, (D) GUI to include special fragmentation ions in the search process and (E) GUI to specify instrument based chemical
modification. (F) GUI to tailor final scoring scheme, and (G-H) GUIs to provide the user with brief as well as detailed results.
Overview
• A toolbox for protein identification from top-down proteomics
data built using MATLAB

• Open-source and open-architecture system for development,


testing and benchmarking of top-down proteomics algorithms

• Search pipeline that seamlessly brings together key proteomics


algorithms resulting in a lower FDR rates as compared to industry
standard tools

• An intuitive yet comprehensive graphical user interface for


convenient utilization as well as customization by the users
Features Comparison - SPECTRUM vs other TDP tools
Supported Features
Top Down
Variable Multiple Blind Truncated In silico
Proteomics Intact Mass De novo Fragmentation PTM PTM PTM Protein Spectral Graphical Protein
Tools Tuning/Filter Sequencing Techniques Search Search Search Search Comparison User Interface Quantitation

SPECTRUM ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✗
(9 Types)

pTop ✗ ✓ ✓ ✓ ✓ ✗ ✓ ✓ ✓ ✗

MSPathFinder ✗ ✓ ✓ ✓ ✓ ✗ ✓ ✓ ✗ ✓
MASH Suite ✓
✗ ✗ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Pro (2 Types)

ProsightPC ✗ ✓ ✗ ✓ ✓ ✗ ✓ ✓ ✗
(7 types)

TopPIC ✗ ✗ ✗ ✗ ✓ ✗ ✓ ✓ ✗
(4 types)
Results – Case Study on HeLa H4
Histone
• Case Study I – Evaluation of SPECTRUM Search with Known Target Protein
Search Parameters

Results Comparison for HeLa PST Search: Disabled PST Search: Disabled PST Search: Enabled
Dataset Scoring Component: In silico Scoring Component: In silico Scoring Component: In silico
Blind PTM Search: Disabled Blind PTM Search: Enabled Blind PTM Search: Disabled

SPECTRUM ProSight PC* SPECTRUM TopPIC* SPECTRUM pTop*


Protein Spectral Matches 10 10 10 8 8 0

Proteins Identified 3 3 3 2 1 0

True Positives (out of 10) 8 8 8 7 8 0

Not Reported 0 0 0 2 2 10

Search Time
15 24 16 2350 15 13
(in seconds)
* ProSight PC v4.0, TopPIC v1.1, pTop v1.2 PST : Peptide Sequence Tag
Results – Case Study E. coli
Dataset
• Case Study II – Evaluation of SPECTRUM Search with Unknown Target Protein
Results Comparison for
With PSTs/TagSearch Without PSTs/TagSearch
E. Coli Dataset
SPECTRUM MSPathFinder pTop SPECTRUM MSPathFinder TopPIC

No. of PrSMs Identified 1739 1458 1181 1911 1319 1262

No. of Proteins Identified 245 128 128 305 110 128

Total Search Time 2228† † † 1678† † †


344 619 304 751
(in seconds) (4456*) (3356*)
Average No. of PrSMs per
7 11 9 6 12 10
Protein
Average No. of matched
39.4 55 41 41.7 56 36
fragments for each PrSM
† Average compute time for 1 target and 1 decoy search

* Average compute time for 1 target and 3 decoy searches

Note: Benchmarking performed using a desktop machine with Intel® Ci7 7700 @ 4.2GHz and 32 GB RAM
Results – Case Study E. coli Dataset
 Validation/benchmarking of the platform was performed against:
1. Published datasets
2. TDP tools (ProSight PC, pTop, TopPIC, MSPathFinder)

 SPECTRUM enhanced protein identification rates from 91% to over 177%


Basic Algorithms & Scoring Schemes
for Searching Protein Spectra

Department of Biology, SBASSE, LUMS


Out of Experiment and into Algorithms
Input Data Preparation
+ ++
+
+
+

Protein Sample Protein Separation Ionization Mass Spectrometer MS Spectra

Filtered DB

P62837
P62838
P62839
Q6ZPJ3
P21734
Q02159
P62837

PST Extraction Candidate Proteins Filter Protein DB on MW Intact Protein Mass


Tuner

Filtered DB

P62837
P62839
Q6ZPJ3
P21734

Filter Candidate Proteins Candidate Proteins In Silico Fragmentation Post Translational


using PST Modifications

P. ID | Score
Filtered DB

P62837 | 0.87 P62837


P62839 | 0.66 P62839

Protein List with Calculating Final Candidate Proteins Matching of Experimental


Scores Score and Insilco Peaks 45
Format of Tandem MS Data
8559 1 <-MS1 Data
635.39084
654.44981
763.490304
864.543503
866.611999
1135.705343
1217.828113 <-
1248.798419 MS2 Data

1263.796108
1304.861342
1433.914879
1477.96237
1535.004263
1562.021644
Things to watch out for in MS data
1. Intact Protein/Peptide Mass

2. Charge States

3. Relative Abundances

4. Technique-specific fragmentation patterns

5. Mass Shifts (PTMs, neutral losses)

47
1. MS1 Mass Tuning and Scoring

48
Estimating Intact Mass
For
For each
each 2-tuple(i,j),
2-tuple(i,j),
Get Generate
Generate 2-tuples
2-tuples fromfrom
Get monoisotopic
monoisotopic MS
MS compute
compute tuple
tuple sum:
sum:
Start spectra MS2
MS2 peaks
peaks TS
Start spectra {(m/z TSkk =m/z
=m/zii+m/z
+m/zjj ,,
(MS1, {(m/zii,, Int
Intii);
); (m/z
(m/zjj,, Int
Intjj)}
)}
(MS1, MS2,
MS2, Int)
Int) i=1:n,
Int avg_k =(Int
Intavg_k =(Intii+Int
+Intjj )/2,
)/2,
i=1:n, j=i:n,
j=i:n, n=1:size(MS2)
n=1:size(MS2)
k=1:all
k=1:all 2-tuple
2-tuple sums
sums
NO

Filter
Filter TS
TS
Initialize
Initialize the
the scanning
scanning using
Create
Create aa scanning
scanning window
window using user
user
window
window position
position with
with the
the Obtain
Obtain filtered
filtered TS
TS YES
(width=1
(width=1 Dalton
Dalton )) defined tolerance
defined tolerance
smallest TS
smallest TSkk (Tol frag))
(Tolfrag
NO

Count
Count TS TSkkfalling
falling within
within the
the Scanning
Scanning Obtain
Obtain maximum
maximum value
value
Incrementally
Incrementally shift
shift the
the scanning
scanning window
window forfor every
every window
window from
from TS_Count
TS_Count and
and select
select
scanning
scanning window
window by by aa user
user YES
shift
shift &
& store
store respective
respective reached
reached end
end corresponding scanning
corresponding scanning
defined
defined step
step size
size TS_Count
TS_Count values
values of
of TS?
TS? window
window

Obtain
Obtain tuned
tuned Compute
Compute intensity
intensity weighted
weighted From
From selected
selected window
window
End
End intact
intact protein
protein average
average ofof elements
elements in
in obtain TS &
obtain TSkk & Int
Intavgk
avgk
mass
mass selected
selected window
window

 TS ki  Intkavg
i

TunedMass  i 1
m

 Int
i 1
avg
ki
49
Intuitively Scoring Tuned Masses
• As a first step in protein search, protein database is
filtered for proteins matching the MW reported in
the experimental data

• What to do incase multiple proteins fall in the mass


range?

• Scoring Philosophy: The closer the better!

1
𝑀𝑆𝑐𝑜𝑟𝑒 =
√ 𝑀𝐸𝑥𝑝 − 𝑀 𝑇ℎ𝑟 2

50
What we do in SPECTRUM?
Massdiff | Massexp erimental  Masstheoretical |

1, Massdiff  0

 2Massdiff , 0  Massdiff  Thr
1
ScoreWPMW

0, Massdiff  Thr

51
2. Peptide Sequence Tags
• Upon obtaining scores of all proteins in the protein
database, we filter the database for “candidate
proteins”

• Sequence tags are extracted from spectral data

• These sequence tags are then searched in the


candidate proteins and a re-scoring is performed

• Upon obtaining the new scores, the “candidate


proteins” are further shortlisted and sorted as per
the newer scores.
52
53
Extracting Peptide Sequence Tags
Get
Get monoisotopic
monoisotopic MS MS Generate
Generate 2-tuples
2-tuples from
from
spectra
spectra and
and useruser define
define Extract MS2 peaks
MS2 peaks
Start Extract MS2
MS2 values
values and
and
Start parameters(Tol,
parameters(Tol, {(m/z
{(m/zii, Int
, Intii);
); (m/z
(m/zjj,, Int
Intjj)}
)}
order
order data
data
length min,, length
lengthmin max ))
lengthmax i=1:n,
i=1:n, j=i:n,
j=i:n, n=number
n=number of of
MS2 peaks
MS2 peaks
NO NO

Get
Get monoisotopic
monoisotopic mass
mass Mass
MassAAAA-Tol
-Tol For
For each
each 2-tuple(i,j)
2-tuple(i,j)
If for
for standard
standard
If PTM
PTM selected
selected Mass
Massdiff
diff compute
compute difference:
difference:
amino
amino acids
acids (L_Mass AA))
(L_MassAA Mass
MassAA +Tol
AA +Tol Mass diff =m/z
Massdiff =m/zii-m/z
-m/zjj

YES YES

Add
Add modified
modified mass
mass of of For
For each
each value
value compute:
compute: Join
Join Tag AA,, if
TagAA if Tag
TagOrder
Order
amino
amino acids
acids for
for selected
selected Store
Store Tag
TagAAAA ,, Error
Error ,, Error=|Mass
Error=|Massdiffdiff -- Mass AA||
MassAA show
show that
that they
they are
are
PTM(s)
PTM(s) in
in the
the list
list Int avgk &
Intavgk & Tag
TagOrder
Order Int avgk =Int
Intavgk =Intii+Int
+Intjj consecutive
consecutive
L_Mass
L_MassAAAA

Create
Create PST(s)
PST(s) of
of
Compute
Compute score
score for
for each
each
End
End Length_range(length
Length_range(lengthmin
min,,
for each PST
for each PST length
lengthmax)
max)
Scoring Sequence Tags - I
• Sequence Tag Examples: ‘M’, ‘MQ’, and ‘QV’ etc

• What can we consider to be “scorable” attributes of


these tags?
1. Length
2. RMSE
3. Abundance

• Scoring Philosophy:
• The lengthier the tag, the better,
• The smaller the RMSE, the better,
• The more abundant the better!

55
Scoring Sequence Tags - II
• If a candidate protein matches ‘n’ PSTs, then its
score can be given by:
𝑛

𝑃𝑆𝑇𝑆𝑐𝑜𝑟𝑒 = ෍ 𝐿𝑒𝑛𝑔𝑡ℎ 𝑃𝑆𝑇𝑖 2

𝑖=0

• Additionally, if we include RMSE to the scoring


system, then it can highlight better PST matches.

56
Scoring Sequence Tags - III
• So, what is the RMSE for a specific sequence tag ‘i’ of length
‘n’?
𝑛

𝑅𝑀𝑆𝐸𝑖 = ෍ (MHop – M AA) 2


𝑖=0
n

So, the updated relationship is:


𝑛 2
𝐿𝑒𝑛𝑔𝑡ℎ(𝑃𝑆𝑇𝑖)
𝑃𝑆𝑇𝑆𝑐𝑜𝑟𝑒 = ෍( )
𝑖=0
𝑅𝑀𝑆𝐸𝑖

57
Cookie Point: How to cater for abundance? (0.25)
3. Post-translational Modifications

Lahore University of Management Sciences


58
(LUMS), Pakistan
Lahore University of Management Sciences
59
(LUMS), Pakistan
Predict
Phosphorylation

Lahore University of Management Sciences


60
(LUMS), Pakistan
Lahore University of Management Sciences
61
(LUMS), Pakistan
Post-translational Modifications
Retrieve sequences from
candidate protein list and Get PTM sites for each
Start Select a PTM user
get user selected user selected PTM (PTMsite )
modification(s) (PTMuser )

NO NO

Compute score for PTMsite Get modification


Get binding site for user
Score>PTM_Tol and get PTM tolerance propensities for this
selected PTM (PTMseq )
(PTM_Tol) PTMsite

YES

Make combinations of all


Shortlist site for Processed all Fixed modification ?
YES NO shortlisted modification
modification elements in PTMuser?
sites

YES

Get protein sequences with


Modify all shortlist sites in combinations of
End Obtained modified proteins
the protein sequence shortlisted modification
sites

Lahore University of Management Sciences


62
(LUMS), Pakistan
4. In silico Fragment Generation & Comparison

Lahore University of Management Sciences


63
(LUMS), Pakistan
Spectral Comparison - Flowchart

Get candidate protein list Retrieve sequences for


Start and user specified each candidate protein End of protein list?
fragmentation technique from database

NO

Compare Frag_thr with


Get monoisotopic MS2 Score the protein using
Frag_exp to get the no. Get a protein sequence YES
data (Frag_exp) no. of Matches
of Matches

Generate fragments
Compute mass for each assemble in silico
using user specified End
fragment spectrum (Frag_thr)
fragmentation technique

Lahore University of Management Sciences


64
(LUMS), Pakistan
Scoring Exp. & Thr. Peaks - I
• Upon computing the PST scores, the candidate list is
further filtered for the highest scoring proteins.

• Finally, for each protein in this yet newer candidate list,


we compute the theoretical fragments.

• Each proteins theoretical fragments is compared with the


experimental fragments.

• Now, the question is, how to score?

Lahore University of Management Sciences


65
(LUMS), Pakistan
Scoring Exp. & Thr. Peaks - II
1. Count the matches between thr. and exp. Peaks
and give an equivalent score to the candidate
protein
𝑀𝑎𝑡𝑐ℎ𝑒𝑠𝑛𝑢𝑚
𝑆𝑐𝑜𝑟𝑒𝑖𝑛𝑠𝑖𝑙𝑖𝑐𝑜 =
𝐹𝑟𝑎𝑔𝑒𝑥𝑝

2. Weigh each of the aforementioned match by the


mass error and abundance, and then accumulate
the score

Lahore University of Management Sciences


66
(LUMS), Pakistan
Computing Cumulative Scores - I
• So now we have obtained three individual scores
1. Scores from MW Matches
2. Scores from PST Matches
3. Scores from Exp<>Thr Peak Matches

• It is necessary to compute an overall cumulative


score (Why?)

• What are the options that we have? (Discussion!)

Lahore University of Management Sciences


67
(LUMS), Pakistan
Scoring Scheme in SPECTRUM
1, 𝑀𝑊𝑃𝐷𝑖𝑓𝑓 = 0 𝑀𝑊𝑃𝐷𝑖𝑓𝑓 = |𝑇𝑢𝑛𝑒𝑑 𝑚𝑎𝑠𝑠 − 𝑀𝑊𝑃|

1
𝑺𝒄𝒐𝒓𝒆_𝑴𝑾 = , 0 < 𝑨𝑩𝑺(𝑀𝑊𝑃𝐷𝑖𝑓𝑓 ) ≤ 𝑇ℎ𝑟
2𝑀𝑊𝑃𝐷𝑖𝑓𝑓

0, 𝑀𝑊𝑃𝐷𝑖𝑓𝑓 > 𝑇ℎ𝑟


𝑀

Score_PST = ෍ 𝑃𝑆𝑇𝑀𝑎𝑡𝑐ℎ𝑒𝑠𝑖 × 𝐸𝑟𝑟𝑜𝑟𝑆𝑐𝑜𝑟𝑒𝑖 + 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦𝑆𝑐𝑜𝑟𝑒𝑖


𝑖=0
𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦𝑆𝑐𝑜𝑟𝑒 = 𝐼𝑛𝑡𝑒𝑛𝑠𝑖𝑡𝑦 × 𝐿𝑒𝑛𝑔𝑡ℎ𝑆𝑐𝑜𝑟𝑒
σ𝑁
𝑖=1 𝐸𝑟𝑟𝑜𝑟𝑖
2
𝑅𝑀𝑆𝐸 =
𝐿𝑒𝑛𝑔𝑡ℎ𝑆𝑐𝑜𝑟𝑒 = 𝑁 2 𝑁
σ𝑁
𝑖=1 𝐼𝑛𝑡_𝐴𝐴𝑖
𝐼𝑛𝑡𝑒𝑛𝑠𝑖𝑡𝑦 = 𝐸𝑟𝑟𝑜𝑟𝑆𝑐𝑜𝑟𝑒 = 𝑒 −𝑅𝑀𝑆𝐸 × 2
𝑁

𝑁𝑜 𝑜𝑓 𝑚𝑎𝑡𝑐ℎ𝑒𝑠
𝑰𝒏𝒔𝒊𝒍𝒊𝒄𝒐 𝑺𝒄𝒐𝒓𝒆 =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐸𝑥𝑝𝑒𝑟𝑖𝑚𝑒𝑛𝑡𝑎𝑙 𝐹𝑟𝑎𝑔𝑚𝑒𝑛𝑡𝑠

𝑺𝒄𝒐𝒓𝒆_𝑭𝒊𝒏𝒂𝒍 = (𝑆𝑐𝑜𝑟𝑒_𝑀𝑊 × 𝑊1) +


(𝑆𝑐𝑜𝑟𝑒_𝑃𝑆𝑇 × 𝑊2) + ቍ
(𝑆𝑐𝑜𝑟𝑒_𝐼𝑛𝑠𝑖𝑙𝑖𝑐𝑜 × 𝑊3)
Computing Cumulative Scores - II
• Simply sum the scores up (a linear function)
𝑆𝑐𝑜𝑟𝑒 𝑀𝑊 + 𝑆𝑐𝑜𝑟𝑒 𝑃𝑆𝑇 + 𝑆𝑐𝑜𝑟𝑒 𝐸𝑥𝑝 <> 𝑇ℎ𝑟 = 𝑆𝑐𝑜𝑟𝑒

• Weigh each scoring component up by respective


RMSE before summing them up
𝑆𝑐𝑜𝑟𝑒𝑓𝑖𝑛𝑎𝑙 = 𝑆𝑐𝑜𝑟𝑒𝑚𝑎𝑠𝑠 ∗ 𝑊1 + 𝑆𝑐𝑜𝑟𝑒𝑃𝑆𝑇 ∗ 𝑊2 + 𝑆𝑐𝑜𝑟𝑒𝑖𝑛𝑠𝑖𝑙𝑖𝑐𝑜 ∗ 𝑊3

• Develop a non-linear function to integrate the


scoring components (e.g. Mascot etc)
• Highly proprietary for commercial proteomics software

Lahore University of Management Sciences


69
(LUMS), Pakistan

You might also like