EP2269154A1 - Methods for identifying biologically active peptides and predicting their function - Google Patents
Methods for identifying biologically active peptides and predicting their functionInfo
- Publication number
- EP2269154A1 EP2269154A1 EP09731158A EP09731158A EP2269154A1 EP 2269154 A1 EP2269154 A1 EP 2269154A1 EP 09731158 A EP09731158 A EP 09731158A EP 09731158 A EP09731158 A EP 09731158A EP 2269154 A1 EP2269154 A1 EP 2269154A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- hla
- peptide
- motif
- drb1
- polypeptide
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 108090000765 processed proteins & peptides Proteins 0.000 title claims abstract description 427
- 102000004196 processed proteins & peptides Human genes 0.000 title claims abstract description 195
- 238000000034 method Methods 0.000 title claims abstract description 130
- 229920001184 polypeptide Polymers 0.000 claims abstract description 140
- 230000004071 biological effect Effects 0.000 claims abstract description 41
- 125000000539 amino acid group Chemical group 0.000 claims abstract description 28
- 238000000126 in silico method Methods 0.000 claims abstract description 12
- 108090000623 proteins and genes Proteins 0.000 claims description 110
- 102000004169 proteins and genes Human genes 0.000 claims description 101
- 150000001413 amino acids Chemical class 0.000 claims description 54
- 102100040485 HLA class II histocompatibility antigen, DRB1 beta chain Human genes 0.000 claims description 51
- 108010039343 HLA-DRB1 Chains Proteins 0.000 claims description 51
- 239000012634 fragment Substances 0.000 claims description 34
- 238000003776 cleavage reaction Methods 0.000 claims description 25
- 230000007017 scission Effects 0.000 claims description 25
- 102100028976 HLA class I histocompatibility antigen, B alpha chain Human genes 0.000 claims description 24
- 108010058607 HLA-B Antigens Proteins 0.000 claims description 24
- 230000037361 pathway Effects 0.000 claims description 24
- 230000008827 biological function Effects 0.000 claims description 18
- 230000000694 effects Effects 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 17
- 230000006870 function Effects 0.000 claims description 17
- 230000008569 process Effects 0.000 claims description 17
- 230000001225 therapeutic effect Effects 0.000 claims description 17
- 230000008236 biological pathway Effects 0.000 claims description 16
- 239000011230 binding agent Substances 0.000 claims description 15
- 125000003275 alpha amino acid group Chemical group 0.000 claims description 14
- 108700028369 Alleles Proteins 0.000 claims description 10
- 239000000126 substance Substances 0.000 claims description 10
- 238000012216 screening Methods 0.000 claims description 9
- 102100028971 HLA class I histocompatibility antigen, C alpha chain Human genes 0.000 claims description 8
- 101000986084 Homo sapiens HLA class I histocompatibility antigen, C alpha chain Proteins 0.000 claims description 8
- 238000007876 drug discovery Methods 0.000 claims description 8
- 238000001727 in vivo Methods 0.000 claims description 8
- 108090000708 Proteasome Endopeptidase Complex Proteins 0.000 claims description 7
- 102000004245 Proteasome Endopeptidase Complex Human genes 0.000 claims description 7
- 230000019491 signal transduction Effects 0.000 claims description 7
- 102000004190 Enzymes Human genes 0.000 claims description 6
- 108090000790 Enzymes Proteins 0.000 claims description 6
- 102100028972 HLA class I histocompatibility antigen, A alpha chain Human genes 0.000 claims description 6
- 108010075704 HLA-A Antigens Proteins 0.000 claims description 6
- 230000030741 antigen processing and presentation Effects 0.000 claims description 6
- 230000002255 enzymatic effect Effects 0.000 claims description 6
- 229940088598 enzyme Drugs 0.000 claims description 6
- 102000008949 Histocompatibility Antigens Class I Human genes 0.000 claims description 5
- 238000009509 drug development Methods 0.000 claims description 5
- 238000010845 search algorithm Methods 0.000 claims description 5
- 102100028640 HLA class II histocompatibility antigen, DR beta 5 chain Human genes 0.000 claims description 4
- 108010016996 HLA-DRB5 Chains Proteins 0.000 claims description 4
- 102000057297 Pepsin A Human genes 0.000 claims description 4
- 108090000284 Pepsin A Proteins 0.000 claims description 4
- 102000035195 Peptidases Human genes 0.000 claims description 4
- 108091005804 Peptidases Proteins 0.000 claims description 4
- 230000033228 biological regulation Effects 0.000 claims description 4
- 230000001086 cytosolic effect Effects 0.000 claims description 4
- 229940111202 pepsin Drugs 0.000 claims description 4
- 229910052698 phosphorus Inorganic materials 0.000 claims description 4
- 235000019833 protease Nutrition 0.000 claims description 4
- 108010059378 Endopeptidases Proteins 0.000 claims description 3
- 102000005593 Endopeptidases Human genes 0.000 claims description 3
- 108010088729 HLA-A*02:01 antigen Proteins 0.000 claims description 3
- 108010046732 HLA-DR4 Antigen Proteins 0.000 claims description 3
- 108010088652 Histocompatibility Antigens Class I Proteins 0.000 claims description 3
- 230000001580 bacterial effect Effects 0.000 claims description 3
- 238000013518 transcription Methods 0.000 claims description 3
- 230000035897 transcription Effects 0.000 claims description 3
- 238000013519 translation Methods 0.000 claims description 3
- 230000003612 virological effect Effects 0.000 claims description 3
- 241000283690 Bos taurus Species 0.000 claims description 2
- 108010076667 Caspases Proteins 0.000 claims description 2
- 102000011727 Caspases Human genes 0.000 claims description 2
- 108090000317 Chymotrypsin Proteins 0.000 claims description 2
- 108010067770 Endopeptidase K Proteins 0.000 claims description 2
- 108010013369 Enteropeptidase Proteins 0.000 claims description 2
- 102100029727 Enteropeptidase Human genes 0.000 claims description 2
- 108010074860 Factor Xa Proteins 0.000 claims description 2
- 108010051815 Glutamyl endopeptidase Proteins 0.000 claims description 2
- 108010074032 HLA-A2 Antigen Proteins 0.000 claims description 2
- 102000025850 HLA-A2 Antigen Human genes 0.000 claims description 2
- 108010013476 HLA-A24 Antigen Proteins 0.000 claims description 2
- 108010086377 HLA-A3 Antigen Proteins 0.000 claims description 2
- 102210024302 HLA-B*0702 Human genes 0.000 claims description 2
- 108010078301 HLA-B*07:02 antigen Proteins 0.000 claims description 2
- 102220391613 HLA-B*2702 Human genes 0.000 claims description 2
- 102220378376 HLA-B*2705 Human genes 0.000 claims description 2
- 108010054198 HLA-B*27:05 antigen Proteins 0.000 claims description 2
- 102220436838 HLA-B*51 Human genes 0.000 claims description 2
- 108010017588 HLA-B*52:01 antigen Proteins 0.000 claims description 2
- 108010087017 HLA-B14 Antigen Proteins 0.000 claims description 2
- 108010087480 HLA-B40 Antigen Proteins 0.000 claims description 2
- 108010055587 HLA-B60 antigen Proteins 0.000 claims description 2
- 108010012145 HLA-B61 antigen Proteins 0.000 claims description 2
- 108010091938 HLA-B7 Antigen Proteins 0.000 claims description 2
- 108010039075 HLA-B8 Antigen Proteins 0.000 claims description 2
- 108700037339 HLA-DR13 antigen Proteins 0.000 claims description 2
- 108010051539 HLA-DR2 Antigen Proteins 0.000 claims description 2
- 108010064885 HLA-DR3 Antigen Proteins 0.000 claims description 2
- 108010001041 HLA-DR7 Antigen Proteins 0.000 claims description 2
- 108010086066 HLA-DR8 antigen Proteins 0.000 claims description 2
- 101000882911 Hathewaya histolytica Clostripain Proteins 0.000 claims description 2
- 102000056251 Prolyl Oligopeptidases Human genes 0.000 claims description 2
- 108700015930 Prolyl Oligopeptidases Proteins 0.000 claims description 2
- 108090001109 Thermolysin Proteins 0.000 claims description 2
- 108090000190 Thrombin Proteins 0.000 claims description 2
- 108090000631 Trypsin Proteins 0.000 claims description 2
- 102000004142 Trypsin Human genes 0.000 claims description 2
- 229960002376 chymotrypsin Drugs 0.000 claims description 2
- 108090001092 clostripain Proteins 0.000 claims description 2
- 230000006916 protein interaction Effects 0.000 claims description 2
- 229960004072 thrombin Drugs 0.000 claims description 2
- 239000012588 trypsin Substances 0.000 claims description 2
- 102220372951 HLA-A*3101 Human genes 0.000 claims 1
- 239000003814 drug Substances 0.000 abstract description 10
- 235000018102 proteins Nutrition 0.000 description 93
- 235000001014 amino acid Nutrition 0.000 description 53
- 150000001875 compounds Chemical class 0.000 description 32
- 230000027455 binding Effects 0.000 description 30
- 238000009739 binding Methods 0.000 description 29
- 230000014509 gene expression Effects 0.000 description 15
- 238000004458 analytical method Methods 0.000 description 14
- 210000004027 cell Anatomy 0.000 description 13
- 230000015556 catabolic process Effects 0.000 description 12
- 239000002253 acid Substances 0.000 description 11
- 230000015572 biosynthetic process Effects 0.000 description 11
- 230000035502 ADME Effects 0.000 description 10
- 150000002611 lead compounds Chemical class 0.000 description 10
- 201000010099 disease Diseases 0.000 description 9
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 9
- 108091054437 MHC class I family Proteins 0.000 description 8
- 238000003786 synthesis reaction Methods 0.000 description 8
- 101001006370 Actinobacillus suis Hemolysin Proteins 0.000 description 7
- 102000011022 Chorionic Gonadotropin Human genes 0.000 description 7
- 108010062540 Chorionic Gonadotropin Proteins 0.000 description 7
- 150000002148 esters Chemical class 0.000 description 7
- 229940084986 human chorionic gonadotropin Drugs 0.000 description 7
- 229910052739 hydrogen Inorganic materials 0.000 description 7
- 239000000047 product Substances 0.000 description 7
- 150000003839 salts Chemical class 0.000 description 7
- 231100000419 toxicity Toxicity 0.000 description 7
- 230000001988 toxicity Effects 0.000 description 7
- 101001121580 Enterobacteria phage PRD1 Adsorption protein P2 Proteins 0.000 description 6
- KDXKERNSBIXSRK-UHFFFAOYSA-N Lysine Natural products NCCCCC(N)C(O)=O KDXKERNSBIXSRK-UHFFFAOYSA-N 0.000 description 6
- 102000043129 MHC class I family Human genes 0.000 description 6
- 101001125164 Parietaria judaica Probable non-specific lipid-transfer protein 2 Proteins 0.000 description 6
- 101000580771 Pseudomonas phage phi6 RNA-directed RNA polymerase Proteins 0.000 description 6
- 101001121571 Rice tungro bacilliform virus (isolate Philippines) Protein P2 Proteins 0.000 description 6
- 238000009826 distribution Methods 0.000 description 6
- 239000001257 hydrogen Substances 0.000 description 6
- 230000002209 hydrophobic effect Effects 0.000 description 6
- 239000000825 pharmaceutical preparation Substances 0.000 description 6
- QTBSBXVTEAMEQO-UHFFFAOYSA-N Acetic acid Chemical compound CC(O)=O QTBSBXVTEAMEQO-UHFFFAOYSA-N 0.000 description 5
- 108020004414 DNA Proteins 0.000 description 5
- CKLJMWTZIZZHCS-REOHCLBHSA-N L-aspartic acid Chemical compound OC(=O)[C@@H](N)CC(O)=O CKLJMWTZIZZHCS-REOHCLBHSA-N 0.000 description 5
- KDXKERNSBIXSRK-YFKPBYRVSA-N L-lysine Chemical compound NCCCC[C@H](N)C(O)=O KDXKERNSBIXSRK-YFKPBYRVSA-N 0.000 description 5
- 108010038807 Oligopeptides Proteins 0.000 description 5
- 102000015636 Oligopeptides Human genes 0.000 description 5
- 108010026552 Proteome Proteins 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 5
- 229940079593 drug Drugs 0.000 description 5
- -1 for example Chemical class 0.000 description 5
- 238000006467 substitution reaction Methods 0.000 description 5
- ISEIOOAKVBPQFO-AOHZBQACSA-N (2S,5R,6R)-3,3-dimethyl-7-oxo-6-[(2-pyren-1-ylacetyl)amino]-4-thia-1-azabicyclo[3.2.0]heptane-2-carboxylic acid Chemical compound CC1(C)S[C@@H]2[C@H](NC(=O)Cc3ccc4ccc5cccc6ccc3c4c56)C(=O)N2[C@H]1C(O)=O ISEIOOAKVBPQFO-AOHZBQACSA-N 0.000 description 4
- OUYCCCASQSFEME-QMMMGPOBSA-N L-tyrosine Chemical compound OC(=O)[C@@H](N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-QMMMGPOBSA-N 0.000 description 4
- MWUXSHHQAYIFBG-UHFFFAOYSA-N Nitric oxide Chemical compound O=[N] MWUXSHHQAYIFBG-UHFFFAOYSA-N 0.000 description 4
- 150000007513 acids Chemical class 0.000 description 4
- 230000000890 antigenic effect Effects 0.000 description 4
- 230000001413 cellular effect Effects 0.000 description 4
- 238000004519 manufacturing process Methods 0.000 description 4
- 230000004060 metabolic process Effects 0.000 description 4
- 239000002773 nucleotide Substances 0.000 description 4
- 125000003729 nucleotide group Chemical group 0.000 description 4
- 125000006239 protecting group Chemical group 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- KPYXMALABCDPGN-HYOZMBHHSA-N (4s)-5-[[(2s)-6-amino-1-[[(2s,3s)-1-[[(2s)-1-[[(2s)-1-[[(2s)-1-[[(2s)-1-[[(2r)-1-[[2-[[2-[[(1s)-3-amino-1-carboxy-3-oxopropyl]amino]-2-oxoethyl]amino]-2-oxoethyl]amino]-1-oxo-3-sulfanylpropan-2-yl]amino]-4-methyl-1-oxopentan-2-yl]amino]-1-oxopropan-2-yl]a Chemical compound NC(=O)C[C@@H](C(O)=O)NC(=O)CNC(=O)CNC(=O)[C@H](CS)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](C)NC(=O)[C@H](C(C)C)NC(=O)[C@H](CC(C)C)NC(=O)[C@H]([C@@H](C)CC)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@@H](NC(=O)[C@H](CC(O)=O)NC(=O)CNC(=O)[C@@H](N)CCCCN)CC1=CC=C(O)C=C1 KPYXMALABCDPGN-HYOZMBHHSA-N 0.000 description 3
- 108010074051 C-Reactive Protein Proteins 0.000 description 3
- 102100032752 C-reactive protein Human genes 0.000 description 3
- 102000043131 MHC class II family Human genes 0.000 description 3
- 108091054438 MHC class II family Proteins 0.000 description 3
- MUBZPKHOEPUJKR-UHFFFAOYSA-N Oxalic acid Chemical compound OC(=O)C(O)=O MUBZPKHOEPUJKR-UHFFFAOYSA-N 0.000 description 3
- 102100035703 Prostatic acid phosphatase Human genes 0.000 description 3
- 230000004913 activation Effects 0.000 description 3
- 230000033077 cellular process Effects 0.000 description 3
- KRKNYBCHXYNGOX-UHFFFAOYSA-N citric acid Chemical compound OC(=O)CC(O)(C(O)=O)CC(O)=O KRKNYBCHXYNGOX-UHFFFAOYSA-N 0.000 description 3
- 210000000805 cytoplasm Anatomy 0.000 description 3
- 210000000172 cytosol Anatomy 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 229940000406 drug candidate Drugs 0.000 description 3
- 230000029142 excretion Effects 0.000 description 3
- 230000002068 genetic effect Effects 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 230000003834 intracellular effect Effects 0.000 description 3
- 150000002632 lipids Chemical class 0.000 description 3
- 230000037353 metabolic pathway Effects 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 244000052769 pathogen Species 0.000 description 3
- 239000008194 pharmaceutical composition Substances 0.000 description 3
- 108091033319 polynucleotide Proteins 0.000 description 3
- 102000040430 polynucleotide Human genes 0.000 description 3
- 239000002157 polynucleotide Substances 0.000 description 3
- 125000002924 primary amino group Chemical group [H]N([H])* 0.000 description 3
- 230000004850 protein–protein interaction Effects 0.000 description 3
- 230000017854 proteolysis Effects 0.000 description 3
- 230000002797 proteolythic effect Effects 0.000 description 3
- 230000006337 proteolytic cleavage Effects 0.000 description 3
- 230000001105 regulatory effect Effects 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- NOOYIVIUEHDMSF-PKLMIRHRSA-N (2r)-1-phenyl-n-(3-phenylpropyl)propan-2-amine;hydrochloride Chemical compound Cl.C([C@@H](C)NCCCC=1C=CC=CC=1)C1=CC=CC=C1 NOOYIVIUEHDMSF-PKLMIRHRSA-N 0.000 description 2
- JARGNLJYKBUKSJ-KGZKBUQUSA-N (2r)-2-amino-5-[[(2r)-1-(carboxymethylamino)-3-hydroxy-1-oxopropan-2-yl]amino]-5-oxopentanoic acid;hydrobromide Chemical compound Br.OC(=O)[C@H](N)CCC(=O)N[C@H](CO)C(=O)NCC(O)=O JARGNLJYKBUKSJ-KGZKBUQUSA-N 0.000 description 2
- NQUNIMFHIWQQGJ-UHFFFAOYSA-N 2-nitro-5-thiocyanatobenzoic acid Chemical compound OC(=O)C1=CC(SC#N)=CC=C1[N+]([O-])=O NQUNIMFHIWQQGJ-UHFFFAOYSA-N 0.000 description 2
- 206010002199 Anaphylactic shock Diseases 0.000 description 2
- CIWBSHSKHKDKBQ-JLAZNSOCSA-N Ascorbic acid Chemical compound OC[C@H](O)[C@H]1OC(=O)C(O)=C1O CIWBSHSKHKDKBQ-JLAZNSOCSA-N 0.000 description 2
- 108060000903 Beta-catenin Proteins 0.000 description 2
- 102000015735 Beta-catenin Human genes 0.000 description 2
- 102000004127 Cytokines Human genes 0.000 description 2
- 108090000695 Cytokines Proteins 0.000 description 2
- RGHNJXZEOKUKBD-SQOUGZDYSA-N D-gluconic acid Chemical compound OC[C@@H](O)[C@@H](O)[C@H](O)[C@@H](O)C(O)=O RGHNJXZEOKUKBD-SQOUGZDYSA-N 0.000 description 2
- 108010016626 Dipeptides Proteins 0.000 description 2
- VZCYOOQTPOCHFL-OWOJBTEDSA-N Fumaric acid Chemical compound OC(=O)\C=C\C(O)=O VZCYOOQTPOCHFL-OWOJBTEDSA-N 0.000 description 2
- 208000031886 HIV Infections Diseases 0.000 description 2
- 208000037357 HIV infectious disease Diseases 0.000 description 2
- 108090000144 Human Proteins Proteins 0.000 description 2
- 102000003839 Human Proteins Human genes 0.000 description 2
- VEXZGXHMUGYJMC-UHFFFAOYSA-N Hydrochloric acid Chemical compound Cl VEXZGXHMUGYJMC-UHFFFAOYSA-N 0.000 description 2
- 239000004472 Lysine Substances 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 2
- PXHVJJICTQNCMI-UHFFFAOYSA-N Nickel Chemical compound [Ni] PXHVJJICTQNCMI-UHFFFAOYSA-N 0.000 description 2
- 108091028043 Nucleic acid sequence Proteins 0.000 description 2
- 101800000628 PDH precursor-related peptide Proteins 0.000 description 2
- NBIIXXVUZAFLBC-UHFFFAOYSA-N Phosphoric acid Chemical compound OP(O)(O)=O NBIIXXVUZAFLBC-UHFFFAOYSA-N 0.000 description 2
- 208000020369 Polymerase proofreading-related adenomatous polyposis Diseases 0.000 description 2
- 101710132811 Protein P3 Proteins 0.000 description 2
- QAOWNCQODCNURD-UHFFFAOYSA-N Sulfuric acid Chemical compound OS(O)(=O)=O QAOWNCQODCNURD-UHFFFAOYSA-N 0.000 description 2
- 239000000370 acceptor Substances 0.000 description 2
- 230000029936 alkylation Effects 0.000 description 2
- 238000005804 alkylation reaction Methods 0.000 description 2
- 230000009435 amidation Effects 0.000 description 2
- 238000007112 amidation reaction Methods 0.000 description 2
- 208000003455 anaphylaxis Diseases 0.000 description 2
- 150000008064 anhydrides Chemical class 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- WPYMKLBDIGXBTP-UHFFFAOYSA-N benzoic acid Chemical compound OC(=O)C1=CC=CC=C1 WPYMKLBDIGXBTP-UHFFFAOYSA-N 0.000 description 2
- 230000031018 biological processes and functions Effects 0.000 description 2
- 239000000090 biomarker Substances 0.000 description 2
- 210000000601 blood cell Anatomy 0.000 description 2
- 125000003178 carboxy group Chemical group [H]OC(*)=O 0.000 description 2
- 230000003915 cell function Effects 0.000 description 2
- 230000004640 cellular pathway Effects 0.000 description 2
- 238000010367 cloning Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 208000037765 diseases and disorders Diseases 0.000 description 2
- 239000003596 drug target Substances 0.000 description 2
- 239000012636 effector Substances 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000001747 exhibiting effect Effects 0.000 description 2
- 229910052731 fluorine Inorganic materials 0.000 description 2
- 108010044804 gamma-glutamyl-seryl-glycine Proteins 0.000 description 2
- 230000013595 glycosylation Effects 0.000 description 2
- 238000006206 glycosylation reaction Methods 0.000 description 2
- 208000033519 human immunodeficiency virus infectious disease Diseases 0.000 description 2
- 230000001900 immune effect Effects 0.000 description 2
- 230000028993 immune response Effects 0.000 description 2
- 230000002998 immunogenetic effect Effects 0.000 description 2
- 230000010365 information processing Effects 0.000 description 2
- 230000002401 inhibitory effect Effects 0.000 description 2
- 229910052740 iodine Inorganic materials 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- BDAGIHXWWSANSR-UHFFFAOYSA-N methanoic acid Natural products OC=O BDAGIHXWWSANSR-UHFFFAOYSA-N 0.000 description 2
- 230000004879 molecular function Effects 0.000 description 2
- 230000004001 molecular interaction Effects 0.000 description 2
- 210000004940 nucleus Anatomy 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000001717 pathogenic effect Effects 0.000 description 2
- 230000001575 pathological effect Effects 0.000 description 2
- 238000003068 pathway analysis Methods 0.000 description 2
- 238000010647 peptide synthesis reaction Methods 0.000 description 2
- 230000026731 phosphorylation Effects 0.000 description 2
- 238000006366 phosphorylation reaction Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 150000003384 small molecules Chemical class 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000001179 sorption measurement Methods 0.000 description 2
- 230000004936 stimulating effect Effects 0.000 description 2
- 230000002459 sustained effect Effects 0.000 description 2
- VZCYOOQTPOCHFL-UHFFFAOYSA-N trans-butenedioic acid Natural products OC(=O)C=CC(O)=O VZCYOOQTPOCHFL-UHFFFAOYSA-N 0.000 description 2
- 230000001052 transient effect Effects 0.000 description 2
- 229910052721 tungsten Inorganic materials 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- ABIFUJNCKIMWRZ-JGVFFNPUSA-N (2r,4s)-4-(3-phosphonopropyl)piperidine-2-carboxylic acid Chemical compound OC(=O)[C@H]1C[C@@H](CCCP(O)(O)=O)CCN1 ABIFUJNCKIMWRZ-JGVFFNPUSA-N 0.000 description 1
- HFOXKFUFXCZIKS-HOTWGDJZSA-N (2s,3r,4s,5r,6r)-2-[(2r,3r,4s,5r,6r)-4,5-dihydroxy-6-(hydroxymethyl)-2-propyl-2-[(2s,3r,4s,5r,6r)-3,4,5-trihydroxy-6-(hydroxymethyl)oxan-2-yl]oxyoxan-3-yl]oxy-6-(hydroxymethyl)oxane-3,4,5-triol Chemical compound O([C@]1(CCC)[C@@H]([C@@H](O)[C@@H](O)[C@@H](CO)O1)O[C@H]1[C@@H]([C@@H](O)[C@@H](O)[C@@H](CO)O1)O)[C@@H]1O[C@H](CO)[C@H](O)[C@H](O)[C@H]1O HFOXKFUFXCZIKS-HOTWGDJZSA-N 0.000 description 1
- 108010052418 (N-(2-((4-((2-((4-(9-acridinylamino)phenyl)amino)-2-oxoethyl)amino)-4-oxobutyl)amino)-1-(1H-imidazol-4-ylmethyl)-1-oxoethyl)-6-(((-2-aminoethyl)amino)methyl)-2-pyridinecarboxamidato) iron(1+) Proteins 0.000 description 1
- BJEPYKJPYRNKOW-REOHCLBHSA-N (S)-malic acid Chemical compound OC(=O)[C@@H](O)CC(O)=O BJEPYKJPYRNKOW-REOHCLBHSA-N 0.000 description 1
- 125000003088 (fluoren-9-ylmethoxy)carbonyl group Chemical group 0.000 description 1
- TUSDEZXZIZRFGC-UHFFFAOYSA-N 1-O-galloyl-3,6-(R)-HHDP-beta-D-glucose Natural products OC1C(O2)COC(=O)C3=CC(O)=C(O)C(O)=C3C3=C(O)C(O)=C(O)C=C3C(=O)OC1C(O)C2OC(=O)C1=CC(O)=C(O)C(O)=C1 TUSDEZXZIZRFGC-UHFFFAOYSA-N 0.000 description 1
- PTUJJIPXBJJLLV-UHFFFAOYSA-N 2-[[2-[[2-[[2-[(2-methylpropan-2-yl)oxycarbonylamino]acetyl]amino]acetyl]amino]-3-phenylpropanoyl]amino]acetic acid Chemical compound CC(C)(C)OC(=O)NCC(=O)NCC(=O)NC(C(=O)NCC(O)=O)CC1=CC=CC=C1 PTUJJIPXBJJLLV-UHFFFAOYSA-N 0.000 description 1
- IGODGTDUQSMDQU-UHFFFAOYSA-N 2-amino-2-cyclopropyl-2-(4-phosphonophenyl)acetic acid Chemical compound C=1C=C(P(O)(O)=O)C=CC=1C(N)(C(O)=O)C1CC1 IGODGTDUQSMDQU-UHFFFAOYSA-N 0.000 description 1
- OZDAOHVKBFBBMZ-UHFFFAOYSA-N 2-aminopentanedioic acid;hydrate Chemical compound O.OC(=O)C(N)CCC(O)=O OZDAOHVKBFBBMZ-UHFFFAOYSA-N 0.000 description 1
- OINNEUNVOZHBOX-QIRCYJPOSA-K 2-trans,6-trans,10-trans-geranylgeranyl diphosphate(3-) Chemical compound CC(C)=CCC\C(C)=C\CC\C(C)=C\CC\C(C)=C\COP([O-])(=O)OP([O-])([O-])=O OINNEUNVOZHBOX-QIRCYJPOSA-K 0.000 description 1
- BMYNFMYTOJXKLE-UHFFFAOYSA-N 3-azaniumyl-2-hydroxypropanoate Chemical compound NCC(O)C(O)=O BMYNFMYTOJXKLE-UHFFFAOYSA-N 0.000 description 1
- 101800000535 3C-like proteinase Proteins 0.000 description 1
- 101800002396 3C-like proteinase nsp5 Proteins 0.000 description 1
- OSWFIVFLDKOXQC-UHFFFAOYSA-N 4-(3-methoxyphenyl)aniline Chemical compound COC1=CC=CC(C=2C=CC(N)=CC=2)=C1 OSWFIVFLDKOXQC-UHFFFAOYSA-N 0.000 description 1
- DUJGMZAICVPCBJ-VDAHYXPESA-N 4-amino-1-[(1r,4r,5s)-4,5-dihydroxy-3-(hydroxymethyl)cyclopent-2-en-1-yl]pyrimidin-2-one Chemical compound O=C1N=C(N)C=CN1[C@H]1[C@H](O)[C@H](O)C(CO)=C1 DUJGMZAICVPCBJ-VDAHYXPESA-N 0.000 description 1
- PQGCEDQWHSBAJP-TXICZTDVSA-N 5-O-phosphono-alpha-D-ribofuranosyl diphosphate Chemical compound O[C@H]1[C@@H](O)[C@@H](O[P@](O)(=O)OP(O)(O)=O)O[C@@H]1COP(O)(O)=O PQGCEDQWHSBAJP-TXICZTDVSA-N 0.000 description 1
- 230000005730 ADP ribosylation Effects 0.000 description 1
- RZVAJINKPMORJF-UHFFFAOYSA-N Acetaminophen Chemical compound CC(=O)NC1=CC=C(O)C=C1 RZVAJINKPMORJF-UHFFFAOYSA-N 0.000 description 1
- 206010067484 Adverse reaction Diseases 0.000 description 1
- 108010029445 Agammaglobulinaemia Tyrosine Kinase Proteins 0.000 description 1
- ZFXQNADNEBRERM-BJDJZHNGSA-N Ala-Ala-Pro-Pro Chemical compound C[C@H](N)C(=O)N[C@@H](C)C(=O)N1CCC[C@H]1C(=O)N1[C@H](C(O)=O)CCC1 ZFXQNADNEBRERM-BJDJZHNGSA-N 0.000 description 1
- VWEWCZSUWOEEFM-WDSKDSINSA-N Ala-Gly-Ala-Gly Chemical compound C[C@H](N)C(=O)NCC(=O)N[C@@H](C)C(=O)NCC(O)=O VWEWCZSUWOEEFM-WDSKDSINSA-N 0.000 description 1
- 108010033918 Alanine-glyoxylate transaminase Proteins 0.000 description 1
- 241000726103 Atta Species 0.000 description 1
- BXTVQNYQYUTQAZ-UHFFFAOYSA-N BNPS-skatole Chemical compound N=1C2=CC=CC=C2C(C)(Br)C=1SC1=CC=CC=C1[N+]([O-])=O BXTVQNYQYUTQAZ-UHFFFAOYSA-N 0.000 description 1
- 241000894006 Bacteria Species 0.000 description 1
- 108010077805 Bacterial Proteins Proteins 0.000 description 1
- 102100024747 Band 4.1-like protein 1 Human genes 0.000 description 1
- 239000005711 Benzoic acid Substances 0.000 description 1
- 102100028282 Bile salt export pump Human genes 0.000 description 1
- 108010017384 Blood Proteins Proteins 0.000 description 1
- 102000004506 Blood Proteins Human genes 0.000 description 1
- 240000007124 Brassica oleracea Species 0.000 description 1
- 235000003899 Brassica oleracea var acephala Nutrition 0.000 description 1
- 235000012905 Brassica oleracea var viridis Nutrition 0.000 description 1
- OYPRJOBELJOOCE-UHFFFAOYSA-N Calcium Chemical compound [Ca] OYPRJOBELJOOCE-UHFFFAOYSA-N 0.000 description 1
- 102100035024 Carboxypeptidase B Human genes 0.000 description 1
- 208000005623 Carcinogenesis Diseases 0.000 description 1
- 206010048610 Cardiotoxicity Diseases 0.000 description 1
- 102100039496 Choline transporter-like protein 4 Human genes 0.000 description 1
- 102100030289 Chronophin Human genes 0.000 description 1
- 101000822695 Clostridium perfringens (strain 13 / Type A) Small, acid-soluble spore protein C1 Proteins 0.000 description 1
- 101000655262 Clostridium perfringens (strain 13 / Type A) Small, acid-soluble spore protein C2 Proteins 0.000 description 1
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 101100074201 Cricetulus griseus LAMP1 gene Proteins 0.000 description 1
- 150000008574 D-amino acids Chemical group 0.000 description 1
- RGHNJXZEOKUKBD-UHFFFAOYSA-N D-gluconic acid Natural products OCC(O)C(O)C(O)C(O)C(O)=O RGHNJXZEOKUKBD-UHFFFAOYSA-N 0.000 description 1
- FEWJPZIEWOKRBE-JCYAYHJZSA-N Dextrotartaric acid Chemical compound OC(=O)[C@H](O)[C@@H](O)C(O)=O FEWJPZIEWOKRBE-JCYAYHJZSA-N 0.000 description 1
- 206010013710 Drug interaction Diseases 0.000 description 1
- 102100029671 E3 ubiquitin-protein ligase TRIM8 Human genes 0.000 description 1
- 241000196324 Embryophyta Species 0.000 description 1
- PIICEJLVQHRZGT-UHFFFAOYSA-N Ethylenediamine Chemical compound NCCN PIICEJLVQHRZGT-UHFFFAOYSA-N 0.000 description 1
- 239000001263 FEMA 3042 Substances 0.000 description 1
- 102100029379 Follistatin-related protein 3 Human genes 0.000 description 1
- 102100039820 Frizzled-4 Human genes 0.000 description 1
- 102100028461 Frizzled-9 Human genes 0.000 description 1
- 241001123946 Gaga Species 0.000 description 1
- OINNEUNVOZHBOX-XBQSVVNOSA-N Geranylgeranyl diphosphate Natural products [P@](=O)(OP(=O)(O)O)(OC/C=C(\CC/C=C(\CC/C=C(\CC/C=C(\C)/C)/C)/C)/C)O OINNEUNVOZHBOX-XBQSVVNOSA-N 0.000 description 1
- WHUUTDBJXJRKMK-UHFFFAOYSA-N Glutamic acid Natural products OC(=O)C(N)CCC(O)=O WHUUTDBJXJRKMK-UHFFFAOYSA-N 0.000 description 1
- 102000003886 Glycoproteins Human genes 0.000 description 1
- 108090000288 Glycoproteins Proteins 0.000 description 1
- 102000006771 Gonadotropins Human genes 0.000 description 1
- 108010086677 Gonadotropins Proteins 0.000 description 1
- OOFLZRMKTMLSMH-UHFFFAOYSA-N H4atta Chemical compound OC(=O)CN(CC(O)=O)CC1=CC=CC(C=2N=C(C=C(C=2)C=2C3=CC=CC=C3C=C3C=CC=CC3=2)C=2N=C(CN(CC(O)=O)CC(O)=O)C=CC=2)=N1 OOFLZRMKTMLSMH-UHFFFAOYSA-N 0.000 description 1
- 206010019851 Hepatotoxicity Diseases 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 101000724352 Homo sapiens Bile salt export pump Proteins 0.000 description 1
- 101000942118 Homo sapiens C-reactive protein Proteins 0.000 description 1
- 101000946524 Homo sapiens Carboxypeptidase B Proteins 0.000 description 1
- 101000889282 Homo sapiens Choline transporter-like protein 4 Proteins 0.000 description 1
- 101000922020 Homo sapiens Cysteine and glycine-rich protein 1 Proteins 0.000 description 1
- 101000795300 Homo sapiens E3 ubiquitin-protein ligase TRIM8 Proteins 0.000 description 1
- 101001062529 Homo sapiens Follistatin-related protein 3 Proteins 0.000 description 1
- 101000885581 Homo sapiens Frizzled-4 Proteins 0.000 description 1
- 101001061405 Homo sapiens Frizzled-9 Proteins 0.000 description 1
- 101000677891 Homo sapiens Iron-sulfur clusters transporter ABCB7, mitochondrial Proteins 0.000 description 1
- 101000875582 Homo sapiens Isoleucine-tRNA ligase, cytoplasmic Proteins 0.000 description 1
- 101001018097 Homo sapiens L-selectin Proteins 0.000 description 1
- 101000760817 Homo sapiens Macrophage-capping protein Proteins 0.000 description 1
- 101000957743 Homo sapiens Meiosis regulator and mRNA stability factor 1 Proteins 0.000 description 1
- 101000979001 Homo sapiens Methionine aminopeptidase 2 Proteins 0.000 description 1
- 101000962156 Homo sapiens N-acetylglucosamine-1-phosphodiester alpha-N-acetylglucosaminidase Proteins 0.000 description 1
- 101000588749 Homo sapiens N-acetylglutamate synthase, mitochondrial Proteins 0.000 description 1
- 101001128634 Homo sapiens NADH dehydrogenase [ubiquinone] 1 beta subcomplex subunit 2, mitochondrial Proteins 0.000 description 1
- 101100189357 Homo sapiens PAPOLG gene Proteins 0.000 description 1
- 101000981515 Homo sapiens Phospholysine phosphohistidine inorganic pyrophosphate phosphatase Proteins 0.000 description 1
- 101000874141 Homo sapiens Probable ATP-dependent RNA helicase DDX43 Proteins 0.000 description 1
- 101001001272 Homo sapiens Prostatic acid phosphatase Proteins 0.000 description 1
- 101000654452 Homo sapiens Protein transport protein Sec16B Proteins 0.000 description 1
- 101000864780 Homo sapiens Pulmonary surfactant-associated protein A1 Proteins 0.000 description 1
- 101000651017 Homo sapiens Pulmonary surfactant-associated protein A2 Proteins 0.000 description 1
- 101000666658 Homo sapiens Rho-related GTP-binding protein RhoV Proteins 0.000 description 1
- 101000588548 Homo sapiens Serine/threonine-protein kinase Nek8 Proteins 0.000 description 1
- 101000588553 Homo sapiens Serine/threonine-protein kinase Nek9 Proteins 0.000 description 1
- 101000864342 Homo sapiens Tyrosine-protein kinase BTK Proteins 0.000 description 1
- AVXURJPOCDRRFD-UHFFFAOYSA-N Hydroxylamine Chemical compound ON AVXURJPOCDRRFD-UHFFFAOYSA-N 0.000 description 1
- 102000014150 Interferons Human genes 0.000 description 1
- 108010050904 Interferons Proteins 0.000 description 1
- 102100021504 Iron-sulfur clusters transporter ABCB7, mitochondrial Human genes 0.000 description 1
- 102100036015 Isoleucine-tRNA ligase, cytoplasmic Human genes 0.000 description 1
- 229930194542 Keto Natural products 0.000 description 1
- 150000008575 L-amino acids Chemical group 0.000 description 1
- 102100033467 L-selectin Human genes 0.000 description 1
- 108700026569 LMP7 Proteins 0.000 description 1
- GAAKALASJNGQKD-UHFFFAOYSA-N LY-165163 Chemical compound C1=CC(N)=CC=C1CCN1CCN(C=2C=C(C=CC=2)C(F)(F)F)CC1 GAAKALASJNGQKD-UHFFFAOYSA-N 0.000 description 1
- 239000000232 Lipid Bilayer Substances 0.000 description 1
- 102100024573 Macrophage-capping protein Human genes 0.000 description 1
- FYYHWMGAXLPEAU-UHFFFAOYSA-N Magnesium Chemical compound [Mg] FYYHWMGAXLPEAU-UHFFFAOYSA-N 0.000 description 1
- 102000000424 Matrix Metalloproteinase 2 Human genes 0.000 description 1
- 108010016165 Matrix Metalloproteinase 2 Proteins 0.000 description 1
- 102100038620 Meiosis regulator and mRNA stability factor 1 Human genes 0.000 description 1
- DBTDEFJAFBUGPP-UHFFFAOYSA-N Methanethial Chemical compound S=C DBTDEFJAFBUGPP-UHFFFAOYSA-N 0.000 description 1
- 102100023174 Methionine aminopeptidase 2 Human genes 0.000 description 1
- 101100181504 Mus musculus Clc gene Proteins 0.000 description 1
- 101100189356 Mus musculus Papolb gene Proteins 0.000 description 1
- LZCXCXDOGAEFQX-UHFFFAOYSA-N N-Acryloylglycine Chemical compound OC(=O)CNC(=O)C=C LZCXCXDOGAEFQX-UHFFFAOYSA-N 0.000 description 1
- NQTADLQHYWFPDB-UHFFFAOYSA-N N-Hydroxysuccinimide Chemical compound ON1C(=O)CCC1=O NQTADLQHYWFPDB-UHFFFAOYSA-N 0.000 description 1
- 102100039267 N-acetylglucosamine-1-phosphodiester alpha-N-acetylglucosaminidase Human genes 0.000 description 1
- 102100032618 N-acetylglutamate synthase, mitochondrial Human genes 0.000 description 1
- WFBHRSAKANVBKH-UHFFFAOYSA-N N-hydroxyguanidine Chemical compound NC(=N)NO WFBHRSAKANVBKH-UHFFFAOYSA-N 0.000 description 1
- 102100032194 NADH dehydrogenase [ubiquinone] 1 beta subcomplex subunit 2, mitochondrial Human genes 0.000 description 1
- 108010057466 NF-kappa B Proteins 0.000 description 1
- 102000003945 NF-kappa B Human genes 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 206010029155 Nephropathy toxic Diseases 0.000 description 1
- 208000012902 Nervous system disease Diseases 0.000 description 1
- 208000025966 Neurological disease Diseases 0.000 description 1
- 206010029350 Neurotoxicity Diseases 0.000 description 1
- GRYLNZFGIOXLOG-UHFFFAOYSA-N Nitric acid Chemical compound O[N+]([O-])=O GRYLNZFGIOXLOG-UHFFFAOYSA-N 0.000 description 1
- 101150096185 PAAS gene Proteins 0.000 description 1
- 101150062722 PDXP gene Proteins 0.000 description 1
- 101150004726 PFKP gene Proteins 0.000 description 1
- 101000655256 Paraclostridium bifermentans Small, acid-soluble spore protein alpha Proteins 0.000 description 1
- 101000655264 Paraclostridium bifermentans Small, acid-soluble spore protein beta Proteins 0.000 description 1
- 208000034038 Pathologic Neovascularization Diseases 0.000 description 1
- LRBQNJMCXXYXIU-PPKXGCFTSA-N Penta-digallate-beta-D-glucose Natural products OC1=C(O)C(O)=CC(C(=O)OC=2C(=C(O)C=C(C=2)C(=O)OC[C@@H]2[C@H]([C@H](OC(=O)C=3C=C(OC(=O)C=4C=C(O)C(O)=C(O)C=4)C(O)=C(O)C=3)[C@@H](OC(=O)C=3C=C(OC(=O)C=4C=C(O)C(O)=C(O)C=4)C(O)=C(O)C=3)[C@H](OC(=O)C=3C=C(OC(=O)C=4C=C(O)C(O)=C(O)C=4)C(O)=C(O)C=3)O2)OC(=O)C=2C=C(OC(=O)C=3C=C(O)C(O)=C(O)C=3)C(O)=C(O)C=2)O)=C1 LRBQNJMCXXYXIU-PPKXGCFTSA-N 0.000 description 1
- 108010033276 Peptide Fragments Proteins 0.000 description 1
- 102000007079 Peptide Fragments Human genes 0.000 description 1
- 241001442654 Percnon planissimum Species 0.000 description 1
- 102100024111 Phospholysine phosphohistidine inorganic pyrophosphate phosphatase Human genes 0.000 description 1
- OAICVXFJPJFONN-UHFFFAOYSA-N Phosphorus Chemical compound [P] OAICVXFJPJFONN-UHFFFAOYSA-N 0.000 description 1
- 102100040153 Poly(A) polymerase gamma Human genes 0.000 description 1
- 239000004952 Polyamide Substances 0.000 description 1
- 108010020346 Polyglutamic Acid Proteins 0.000 description 1
- UCTIUWKCVNGEFH-OBJOEFQTSA-N Pro-Val-Gly-Pro Chemical compound N([C@@H](C(C)C)C(=O)NCC(=O)N1[C@@H](CCC1)C(O)=O)C(=O)[C@@H]1CCCN1 UCTIUWKCVNGEFH-OBJOEFQTSA-N 0.000 description 1
- 102100035724 Probable ATP-dependent RNA helicase DDX43 Human genes 0.000 description 1
- 101710196742 Probable phosphatidylglycerophosphate synthase Proteins 0.000 description 1
- OFOBLEOULBTSOW-UHFFFAOYSA-N Propanedioic acid Natural products OC(=O)CC(O)=O OFOBLEOULBTSOW-UHFFFAOYSA-N 0.000 description 1
- 102100031481 Protein transport protein Sec16B Human genes 0.000 description 1
- 102100027773 Pulmonary surfactant-associated protein A2 Human genes 0.000 description 1
- 101100083082 Pyrus communis PGIP gene Proteins 0.000 description 1
- 238000004617 QSAR study Methods 0.000 description 1
- 108020004511 Recombinant DNA Proteins 0.000 description 1
- 102100038400 Rho-related GTP-binding protein RhoV Human genes 0.000 description 1
- 102100031398 Serine/threonine-protein kinase Nek9 Human genes 0.000 description 1
- KDYFGRWQOYBRFD-UHFFFAOYSA-N Succinic acid Natural products OC(=O)CCC(O)=O KDYFGRWQOYBRFD-UHFFFAOYSA-N 0.000 description 1
- 241000282887 Suidae Species 0.000 description 1
- 108091008874 T cell receptors Proteins 0.000 description 1
- 102000016266 T-Cell Antigen Receptors Human genes 0.000 description 1
- 210000001744 T-lymphocyte Anatomy 0.000 description 1
- FEWJPZIEWOKRBE-UHFFFAOYSA-N Tartaric acid Natural products [H+].[H+].[O-]C(=O)C(O)C(O)C([O-])=O FEWJPZIEWOKRBE-UHFFFAOYSA-N 0.000 description 1
- 206010044221 Toxic encephalopathy Diseases 0.000 description 1
- 108091023040 Transcription factor Proteins 0.000 description 1
- 102000040945 Transcription factor Human genes 0.000 description 1
- QNMIVTOQXUSGLN-SZMVWBNQSA-N Trp-Arg-Arg Chemical compound C1=CC=C2C(C[C@H](N)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](CCCN=C(N)N)C(O)=O)=CNC2=C1 QNMIVTOQXUSGLN-SZMVWBNQSA-N 0.000 description 1
- 108010040002 Tumor Suppressor Proteins Proteins 0.000 description 1
- 102000001742 Tumor Suppressor Proteins Human genes 0.000 description 1
- 208000034953 Twin anemia-polycythemia sequence Diseases 0.000 description 1
- 102100029823 Tyrosine-protein kinase BTK Human genes 0.000 description 1
- KZSNJWFQEVHDMF-UHFFFAOYSA-N Valine Natural products CC(C)C(N)C(O)=O KZSNJWFQEVHDMF-UHFFFAOYSA-N 0.000 description 1
- 108010067390 Viral Proteins Proteins 0.000 description 1
- 241000700605 Viruses Species 0.000 description 1
- 208000027418 Wounds and injury Diseases 0.000 description 1
- HCHKCACWOHOZIP-UHFFFAOYSA-N Zinc Chemical compound [Zn] HCHKCACWOHOZIP-UHFFFAOYSA-N 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 238000010521 absorption reaction Methods 0.000 description 1
- 235000011054 acetic acid Nutrition 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 230000006838 adverse reaction Effects 0.000 description 1
- 239000000783 alginic acid Substances 0.000 description 1
- 229960001126 alginic acid Drugs 0.000 description 1
- 235000010443 alginic acid Nutrition 0.000 description 1
- 229920000615 alginic acid Polymers 0.000 description 1
- 150000004781 alginic acids Chemical class 0.000 description 1
- BJEPYKJPYRNKOW-UHFFFAOYSA-N alpha-hydroxysuccinic acid Natural products OC(=O)C(O)CC(O)=O BJEPYKJPYRNKOW-UHFFFAOYSA-N 0.000 description 1
- 229910052782 aluminium Inorganic materials 0.000 description 1
- XAGFODPZIPBFFR-UHFFFAOYSA-N aluminium Chemical compound [Al] XAGFODPZIPBFFR-UHFFFAOYSA-N 0.000 description 1
- 229910000147 aluminium phosphate Inorganic materials 0.000 description 1
- 150000001412 amines Chemical class 0.000 description 1
- 238000010171 animal model Methods 0.000 description 1
- 230000003110 anti-inflammatory effect Effects 0.000 description 1
- 239000011668 ascorbic acid Substances 0.000 description 1
- 229960005070 ascorbic acid Drugs 0.000 description 1
- 235000010323 ascorbic acid Nutrition 0.000 description 1
- 235000003704 aspartic acid Nutrition 0.000 description 1
- 208000029618 autoimmune pulmonary alveolar proteinosis Diseases 0.000 description 1
- 230000005784 autoimmunity Effects 0.000 description 1
- 150000001540 azides Chemical class 0.000 description 1
- 229910052788 barium Inorganic materials 0.000 description 1
- DSAJWYNOEDNPEQ-UHFFFAOYSA-N barium atom Chemical compound [Ba] DSAJWYNOEDNPEQ-UHFFFAOYSA-N 0.000 description 1
- JUHORIMYRDESRB-UHFFFAOYSA-N benzathine Chemical compound C=1C=CC=CC=1CNCCNCC1=CC=CC=C1 JUHORIMYRDESRB-UHFFFAOYSA-N 0.000 description 1
- 235000010233 benzoic acid Nutrition 0.000 description 1
- 125000001797 benzyl group Chemical group [H]C1=C([H])C([H])=C(C([H])=C1[H])C([H])([H])* 0.000 description 1
- 125000001584 benzyloxycarbonyl group Chemical group C(=O)(OCC1=CC=CC=C1)* 0.000 description 1
- OQFSQFPPLPISGP-UHFFFAOYSA-N beta-carboxyaspartic acid Natural products OC(=O)C(N)C(C(O)=O)C(O)=O OQFSQFPPLPISGP-UHFFFAOYSA-N 0.000 description 1
- 230000007321 biological mechanism Effects 0.000 description 1
- 230000008512 biological response Effects 0.000 description 1
- JLHBAYXOERKFGV-UHFFFAOYSA-N bis(4-nitrophenyl) phenyl phosphate Chemical compound C1=CC([N+](=O)[O-])=CC=C1OP(=O)(OC=1C=CC(=CC=1)[N+]([O-])=O)OC1=CC=CC=C1 JLHBAYXOERKFGV-UHFFFAOYSA-N 0.000 description 1
- 229910052797 bismuth Inorganic materials 0.000 description 1
- JCXGWMGPZLAOME-UHFFFAOYSA-N bismuth atom Chemical compound [Bi] JCXGWMGPZLAOME-UHFFFAOYSA-N 0.000 description 1
- 125000006367 bivalent amino carbonyl group Chemical group [H]N([*:1])C([*:2])=O 0.000 description 1
- 230000008499 blood brain barrier function Effects 0.000 description 1
- 210000001218 blood-brain barrier Anatomy 0.000 description 1
- 238000009933 burial Methods 0.000 description 1
- KDYFGRWQOYBRFD-NUQCWPJISA-N butanedioic acid Chemical compound O[14C](=O)CC[14C](O)=O KDYFGRWQOYBRFD-NUQCWPJISA-N 0.000 description 1
- 125000000484 butyl group Chemical group [H]C([*])([H])C([H])([H])C([H])([H])C([H])([H])[H] 0.000 description 1
- 210000004900 c-terminal fragment Anatomy 0.000 description 1
- 229910052791 calcium Inorganic materials 0.000 description 1
- 239000011575 calcium Substances 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 230000036952 cancer formation Effects 0.000 description 1
- 150000001718 carbodiimides Chemical class 0.000 description 1
- 150000001720 carbohydrates Chemical class 0.000 description 1
- 235000014633 carbohydrates Nutrition 0.000 description 1
- 125000006355 carbonyl methylene group Chemical group [H]C([H])([*:2])C([*:1])=O 0.000 description 1
- 150000001732 carboxylic acid derivatives Chemical class 0.000 description 1
- 231100000504 carcinogenesis Toxicity 0.000 description 1
- 231100000259 cardiotoxicity Toxicity 0.000 description 1
- 230000003197 catalytic effect Effects 0.000 description 1
- 238000009903 catalytic hydrogenation reaction Methods 0.000 description 1
- 150000001768 cations Chemical class 0.000 description 1
- 210000000170 cell membrane Anatomy 0.000 description 1
- 210000003855 cell nucleus Anatomy 0.000 description 1
- 230000005754 cellular signaling Effects 0.000 description 1
- 239000013043 chemical agent Substances 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 235000015165 citric acid Nutrition 0.000 description 1
- 229910017052 cobalt Inorganic materials 0.000 description 1
- 239000010941 cobalt Substances 0.000 description 1
- GUTLYIVDDKVIGB-UHFFFAOYSA-N cobalt atom Chemical compound [Co] GUTLYIVDDKVIGB-UHFFFAOYSA-N 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 239000013078 crystal Substances 0.000 description 1
- 238000012258 culturing Methods 0.000 description 1
- ATDGTVJJHBUTRL-UHFFFAOYSA-N cyanogen bromide Chemical compound BrC#N ATDGTVJJHBUTRL-UHFFFAOYSA-N 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 239000007857 degradation product Substances 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000002405 diagnostic procedure Methods 0.000 description 1
- 235000014113 dietary fatty acids Nutrition 0.000 description 1
- 239000000539 dimer Substances 0.000 description 1
- 230000003292 diminished effect Effects 0.000 description 1
- 238000010494 dissociation reaction Methods 0.000 description 1
- 230000005593 dissociations Effects 0.000 description 1
- 230000013020 embryo development Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 108010041998 erythrocyte membrane protein band 4.1-like 1 Proteins 0.000 description 1
- LUJQXGBDWAGQHS-UHFFFAOYSA-N ethenyl acetate;phthalic acid Chemical compound CC(=O)OC=C.OC(=O)C1=CC=CC=C1C(O)=O LUJQXGBDWAGQHS-UHFFFAOYSA-N 0.000 description 1
- 125000001495 ethyl group Chemical group [H]C([H])([H])C([H])([H])* 0.000 description 1
- 125000000816 ethylene group Chemical group [H]C([H])([*:1])C([H])([H])[*:2] 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000013401 experimental design Methods 0.000 description 1
- 230000006126 farnesylation Effects 0.000 description 1
- 229930195729 fatty acid Natural products 0.000 description 1
- 239000000194 fatty acid Substances 0.000 description 1
- 150000004665 fatty acids Chemical class 0.000 description 1
- 210000003754 fetus Anatomy 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 229940014144 folate Drugs 0.000 description 1
- OVBPIULPVIDEAO-LBPRGKRZSA-N folic acid Chemical compound C=1N=C2NC(N)=NC(=O)C2=NC=1CNC1=CC=C(C(=O)N[C@@H](CCC(O)=O)C(O)=O)C=C1 OVBPIULPVIDEAO-LBPRGKRZSA-N 0.000 description 1
- 235000019152 folic acid Nutrition 0.000 description 1
- 239000011724 folic acid Substances 0.000 description 1
- 235000019253 formic acid Nutrition 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 239000001530 fumaric acid Substances 0.000 description 1
- 235000011087 fumaric acid Nutrition 0.000 description 1
- LRBQNJMCXXYXIU-QWKBTXIPSA-N gallotannic acid Chemical compound OC1=C(O)C(O)=CC(C(=O)OC=2C(=C(O)C=C(C=2)C(=O)OC[C@H]2[C@@H]([C@@H](OC(=O)C=3C=C(OC(=O)C=4C=C(O)C(O)=C(O)C=4)C(O)=C(O)C=3)[C@H](OC(=O)C=3C=C(OC(=O)C=4C=C(O)C(O)=C(O)C=4)C(O)=C(O)C=3)[C@@H](OC(=O)C=3C=C(OC(=O)C=4C=C(O)C(O)=C(O)C=4)C(O)=C(O)C=3)O2)OC(=O)C=2C=C(OC(=O)C=3C=C(O)C(O)=C(O)C=3)C(O)=C(O)C=2)O)=C1 LRBQNJMCXXYXIU-QWKBTXIPSA-N 0.000 description 1
- 238000001415 gene therapy Methods 0.000 description 1
- 102000034238 globular proteins Human genes 0.000 description 1
- 108091005896 globular proteins Proteins 0.000 description 1
- 239000000174 gluconic acid Substances 0.000 description 1
- 235000012208 gluconic acid Nutrition 0.000 description 1
- 235000013922 glutamic acid Nutrition 0.000 description 1
- 150000004676 glycans Chemical class 0.000 description 1
- UKWAMSAUNRIURK-BWYYQZIESA-N gmgg Chemical compound C1=CC(OC)=CC=C1OC[C@@H]1[C@@H](O[C@H]2[C@@H]([C@@H](OCC=3C=CC=CC=3)[C@H](O[C@H]3[C@H]([C@@H](OCC=C)[C@H](O[C@H]4[C@@H]([C@@H](OCC=5C=CC=CC=5)[C@H](OCC=5C=CC=CC=5)[C@@H](COCC=5C=CC=CC=5)O4)N4C(C5=CC=CC=C5C4=O)=O)[C@@H](COCC=C)O3)OCC=3C=CC=CC=3)[C@@H](COCC=3C=CC=CC=3)O2)N2C(C3=CC=CC=C3C2=O)=O)[C@H](OCC=2C=CC=CC=2)[C@@H](N2C(C3=CC=CC=C3C2=O)=O)[C@H](OCC=C)O1 UKWAMSAUNRIURK-BWYYQZIESA-N 0.000 description 1
- 239000002622 gonadotropin Substances 0.000 description 1
- 229940094892 gonadotropins Drugs 0.000 description 1
- 229940093915 gynecological organic acid Drugs 0.000 description 1
- 230000003394 haemopoietic effect Effects 0.000 description 1
- 230000007686 hepatotoxicity Effects 0.000 description 1
- 231100000304 hepatotoxicity Toxicity 0.000 description 1
- 229920006130 high-performance polyamide Polymers 0.000 description 1
- 230000013632 homeostatic process Effects 0.000 description 1
- 102000053446 human BTK Human genes 0.000 description 1
- 102000051143 human CRP Human genes 0.000 description 1
- 125000002887 hydroxy group Chemical group [H]O* 0.000 description 1
- NPZTUJOABDZTLV-UHFFFAOYSA-N hydroxybenzotriazole Substances O=C1C=CC=C2NNN=C12 NPZTUJOABDZTLV-UHFFFAOYSA-N 0.000 description 1
- 230000033444 hydroxylation Effects 0.000 description 1
- 238000005805 hydroxylation reaction Methods 0.000 description 1
- 125000002883 imidazolyl group Chemical group 0.000 description 1
- 230000008105 immune reaction Effects 0.000 description 1
- 210000000987 immune system Anatomy 0.000 description 1
- 230000005847 immunogenicity Effects 0.000 description 1
- 238000000338 in vitro Methods 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 230000004968 inflammatory condition Effects 0.000 description 1
- 230000002757 inflammatory effect Effects 0.000 description 1
- 208000014674 injury Diseases 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 229940079322 interferon Drugs 0.000 description 1
- 125000000468 ketone group Chemical group 0.000 description 1
- 229920000267 ladder-type polyparaphenylene Polymers 0.000 description 1
- 239000003446 ligand Substances 0.000 description 1
- 235000018977 lysine Nutrition 0.000 description 1
- 125000003588 lysine group Chemical group [H]N([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])(N([H])[H])C(*)=O 0.000 description 1
- 229910052749 magnesium Inorganic materials 0.000 description 1
- 239000011777 magnesium Substances 0.000 description 1
- VZCYOOQTPOCHFL-UPHRSURJSA-N maleic acid Chemical compound OC(=O)\C=C/C(O)=O VZCYOOQTPOCHFL-UPHRSURJSA-N 0.000 description 1
- 239000011976 maleic acid Substances 0.000 description 1
- 239000001630 malic acid Substances 0.000 description 1
- 235000011090 malic acid Nutrition 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 239000012528 membrane Substances 0.000 description 1
- 230000002503 metabolic effect Effects 0.000 description 1
- 239000002207 metabolite Substances 0.000 description 1
- 229910052751 metal Inorganic materials 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
- 125000002496 methyl group Chemical group [H]C([H])([H])* 0.000 description 1
- 238000002493 microarray Methods 0.000 description 1
- 150000007522 mineralic acids Chemical class 0.000 description 1
- 230000007886 mutagenicity Effects 0.000 description 1
- 231100000299 mutagenicity Toxicity 0.000 description 1
- 210000004898 n-terminal fragment Anatomy 0.000 description 1
- 230000007694 nephrotoxicity Effects 0.000 description 1
- 231100000417 nephrotoxicity Toxicity 0.000 description 1
- 230000007135 neurotoxicity Effects 0.000 description 1
- 231100000228 neurotoxicity Toxicity 0.000 description 1
- 229910052759 nickel Inorganic materials 0.000 description 1
- 229910017604 nitric acid Inorganic materials 0.000 description 1
- 229910052757 nitrogen Inorganic materials 0.000 description 1
- 108020004707 nucleic acids Proteins 0.000 description 1
- 102000039446 nucleic acids Human genes 0.000 description 1
- 150000007523 nucleic acids Chemical class 0.000 description 1
- 235000015097 nutrients Nutrition 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 150000007524 organic acids Chemical class 0.000 description 1
- 235000005985 organic acids Nutrition 0.000 description 1
- 150000002892 organic cations Chemical class 0.000 description 1
- IFPHDUVGLXEIOQ-UHFFFAOYSA-N ortho-iodosylbenzoic acid Chemical compound OC(=O)C1=CC=CC=C1I=O IFPHDUVGLXEIOQ-UHFFFAOYSA-N 0.000 description 1
- 235000006408 oxalic acid Nutrition 0.000 description 1
- GTUJJVSZIHQLHA-XPWFQUROSA-N pApA Chemical compound C1=NC2=C(N)N=CN=C2N1[C@@H]([C@@H]1O)O[C@H](COP(O)(O)=O)[C@H]1OP(O)(=O)OC[C@H]([C@@H](O)[C@H]1O)O[C@H]1N1C(N=CN=C2N)=C2N=C1 GTUJJVSZIHQLHA-XPWFQUROSA-N 0.000 description 1
- WLJNZVDCPSBLRP-UHFFFAOYSA-N pamoic acid Chemical compound C1=CC=C2C(CC=3C4=CC=CC=C4C=C(C=3O)C(=O)O)=C(O)C(C(O)=O)=CC2=C1 WLJNZVDCPSBLRP-UHFFFAOYSA-N 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 230000009745 pathological pathway Effects 0.000 description 1
- 230000006320 pegylation Effects 0.000 description 1
- WXZMFSXDPGVJKK-UHFFFAOYSA-N pentaerythritol Chemical compound OCC(CO)(CO)CO WXZMFSXDPGVJKK-UHFFFAOYSA-N 0.000 description 1
- 230000035699 permeability Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 239000008177 pharmaceutical agent Substances 0.000 description 1
- UYWQUFXKFGHYNT-UHFFFAOYSA-N phenylmethyl ester of formic acid Natural products O=COCC1=CC=CC=C1 UYWQUFXKFGHYNT-UHFFFAOYSA-N 0.000 description 1
- 239000011574 phosphorus Substances 0.000 description 1
- 230000010399 physical interaction Effects 0.000 description 1
- 230000001766 physiological effect Effects 0.000 description 1
- 230000003169 placental effect Effects 0.000 description 1
- 239000013612 plasmid Substances 0.000 description 1
- SJWPTBFNZAZFSH-UHFFFAOYSA-N pmpp Chemical compound C1CCSC2=NC=NC3=C2N=CN3CCCN2C(=O)N(C)C(=O)C1=C2 SJWPTBFNZAZFSH-UHFFFAOYSA-N 0.000 description 1
- 239000002798 polar solvent Substances 0.000 description 1
- 229920002647 polyamide Polymers 0.000 description 1
- 239000005020 polyethylene terephthalate Substances 0.000 description 1
- 229920000139 polyethylene terephthalate Polymers 0.000 description 1
- 229920002643 polyglutamic acid Polymers 0.000 description 1
- 229930001119 polyketide Natural products 0.000 description 1
- 125000000830 polyketide group Chemical group 0.000 description 1
- 229930001118 polyketide hybrid Natural products 0.000 description 1
- 125000003308 polyketide hybrid group Chemical group 0.000 description 1
- 229920000642 polymer Polymers 0.000 description 1
- 102000054765 polymorphisms of proteins Human genes 0.000 description 1
- 229920002744 polyvinyl acetate phthalate Polymers 0.000 description 1
- 230000035935 pregnancy Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000000770 proinflammatory effect Effects 0.000 description 1
- 238000011321 prophylaxis Methods 0.000 description 1
- 230000004844 protein turnover Effects 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
- 108020003175 receptors Proteins 0.000 description 1
- 102000005962 receptors Human genes 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000009712 regulation of translation Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 229930000044 secondary metabolite Natural products 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- BJLPWUCPFAJINB-UAQSTNRTSA-N sn-3-O-(geranylgeranyl)glycerol 1-phosphate Chemical compound CC(C)=CCC\C(C)=C\CC\C(C)=C\CC\C(C)=C\COC[C@H](O)COP(O)(O)=O BJLPWUCPFAJINB-UAQSTNRTSA-N 0.000 description 1
- CURNJKLCYZZBNJ-UHFFFAOYSA-M sodium;4-nitrophenolate Chemical compound [Na+].[O-]C1=CC=C([N+]([O-])=O)C=C1 CURNJKLCYZZBNJ-UHFFFAOYSA-M 0.000 description 1
- 239000007790 solid phase Substances 0.000 description 1
- 238000010532 solid phase synthesis reaction Methods 0.000 description 1
- 239000002904 solvent Substances 0.000 description 1
- 230000019635 sulfation Effects 0.000 description 1
- 238000005670 sulfation reaction Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 235000015523 tannic acid Nutrition 0.000 description 1
- 229920002258 tannic acid Polymers 0.000 description 1
- 229940033123 tannic acid Drugs 0.000 description 1
- 239000011975 tartaric acid Substances 0.000 description 1
- 235000002906 tartaric acid Nutrition 0.000 description 1
- 229940124597 therapeutic agent Drugs 0.000 description 1
- 150000003573 thiols Chemical class 0.000 description 1
- 231100000041 toxicology testing Toxicity 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000002103 transcriptional effect Effects 0.000 description 1
- 239000013638 trimer Substances 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 239000004474 valine Substances 0.000 description 1
- 125000002987 valine group Chemical group [H]N([H])C([H])(C(*)=O)C([H])(C([H])([H])[H])C([H])([H])[H] 0.000 description 1
- 229910052720 vanadium Inorganic materials 0.000 description 1
- 239000011782 vitamin Substances 0.000 description 1
- 235000013343 vitamin Nutrition 0.000 description 1
- 229940088594 vitamin Drugs 0.000 description 1
- 229930003231 vitamin Natural products 0.000 description 1
- 150000003722 vitamin derivatives Chemical class 0.000 description 1
- 239000002676 xenobiotic agent Substances 0.000 description 1
- 229910052725 zinc Inorganic materials 0.000 description 1
- 239000011701 zinc Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/30—Detection of binding sites or motifs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/50—Mutagenesis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/30—Data warehousing; Computing architectures
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
Definitions
- the invention relates generally to biotechnology and, more specifically, to methods of identifying lead molecules that have an increased probability of becoming an approved medicament, and business methods of identifying molecules such that they have an increased probability of becoming an approved medicament.
- the invention can be used, interalia, for research, diagnostic and/or therapeutic products, methods and devices.
- Drug discovery is typically a process that takes significant time and money.
- ADME absorption, distribution, metabolism, excretion
- T toxicity
- oligopeptides known in the art have the antigenic binding activity of human chorionic gonadotropin (hCG). See, e.g., U.S. Patent 5,380,668 to Herron (Jan. 10, 1995), the contents of the entirety of which are incorporated by this reference.
- the oligopeptides disclosed therein are disclosed generally for use in diagnostic methods.
- Patent 6,596,688, WO 01/11048 A2, WO 01/10907 A2., and U.S. Patent 6,583,109 relate to various oligopeptides and their use in, among other things, "inhibiting HIV infection,” “treating or preventing HIV infection,” “treating or preventing cancer,” “treating or preventing a condition characterized by loss of body cell mass,” “treating or preventing a condition associated with pathological angiogenesis,” “treating or preventing hematopoietic deficiency,” “ex vivo gene therapy,” “expanding blood cells in vitro " and/or “providing blood cells to a subject.”
- selection criteria that can be used to identify biologically active peptides derivable from among a library of proteins, for example from human proteins, bacterial proteins, viral proteins and the like.
- the invention provides the insight that certain peptide motifs have an increased chance of being biologically active, in particular if they occur relatively frequently among naturally occurring proteins. It is the inventor's hypothesis that there is a pool of endogenous proteins which serves as a supplier of biologically active peptides.
- a method of the invention identifies a biologically active peptide derivable from proteins other than hCG or from a fragment thereof, in particular, other than MTRVLQGVLPALPQWC .
- the invention provides a method for identifying one or more biologically active peptides consisting of two to seven amino acids, comprising the steps of providing a database comprising a plurality of naturally occurring polypeptide sequences, defining at least one peptide motif that satisfies one or more specific criteria as disclosed herein below, and determining for the defined peptide motif the frequency of occurrence among the polypeptides in the database, wherein the frequency of occurrence is indicative of the peptide motif having biological activity.
- the results and/or physical characteristics are stored in a database as descriptors in a record associated with a particular peptide, wherein the database, or a subset of the database may be searched, sorted, or analyzed according to one or more descriptors, such as the effect of the peptide on production of nitric oxide (NO) or the levels of a tumor suppressor, to generate at least one hit and/or lead compound.
- the database or a subset of the database may be searched, sorted, or analyzed according to one or more descriptors, such as the effect of the peptide on production of nitric oxide (NO) or the levels of a tumor suppressor, to generate at least one hit and/or lead compound.
- NO nitric oxide
- a biologically active peptide satisfies at least one or more of the following criteria: AP, PA, A(P) n A or P(A) n P, wherein n is 0, 1, 2, 3, 4 or 5 and wherein A stands for an amino acid residue selected from a first defined subset of "A" amino acids and wherein P stands for an amino acid residue selected from a second defined subset of "P" amino acids, wherein the first and second subset are different from each other.
- Examplary A and P amino acids are provided herein below.
- a method for predicting the biological function of a peptide that is or can be identified as being biologically active comprises the steps of analyzing the frequency of occurrence of the peptide amino acid sequence of the biologically active peptide sequence among a database comprising multiple polypeptides, each having at least one known biological function; selecting from the database at least one polypeptide comprising at least one copy of the biologically active peptide sequence and identifying at least one biological network or pathway wherein the selected polypeptide is involved.
- the defined peptide is predicted to modulate at least one identified biological pathway or network.
- a method for predicting the biological function of a peptide that is or can be identified as being biologically active comprises the steps of analyzing the frequency of occurrence of the peptide amino acid sequence of the biologically active peptide sequence among a database comprising polypeptides which have as specific genetic trait one or more non-synonymous single nucleotide polymorpisms (SNP's) at their DNA coding for the amino acid sequence of said polypeptide.
- SNP's single nucleotide polymorpisms
- a set or sets of corresponding peptides are generated that correspond to polypeptide sequences that are coded by DNA wherein said non-synonymous single nucleotide polymorpisms are detected.
- Such sets of peptides generally comprise two (occasionally three or more) peptides that differ in one amino acid, very occasionally they may differ in two or more amino acids.
- the invention provides a pharmaceutical composition comprising or consisting of a peptide selected from any of the peptides listed in Tablea, or Table 2, or Table 3.
- the invention provides a peptide or a pharmaceutical composition comprising or consisting of a peptide selected from the group of peptides TPVE, SAVT, PVE, LEDSSGNLMNRRPI, RAMAIY, LAKTCPV, SVWPY, SGAT, LSPGL, GATATAAL, ATATAAL, ATAAL, AMAIY, ELAK, LSCRL, LQKSL, QKSL, TKPR, LKAP, LKTP, FTKP, KQGV, MCNSSCM, AKTCPV, DEIPVEVFKDLFEL, IQTPPSSPPTAFGSP, ARQLLSGMVNQPNNL, FSDLLQRLLNGIGGC, AAAAPPDPLSQLPAP, AKQILSGIVNQPNNL, ARQLLSGIVKQPNNL, FKTCIPGFPGAPSAV, VGQL, VGQA, ELAE, SAQGV, GSAQGV, AQGVI, AQGVIA, GR
- a computer program capable of performing at least part of the steps comprised in the method for identifying a biologically active peptide and/or at least part of the steps comprised in the method of predicting the function of a biologically active peptide.
- data and databases, computer readable media, computer systems, and/or apparatus that use, compare or generate data relating to a method according to the invention.
- the invention provides at least one algorithm for generating at least one set of peptide motifs. For example, a Motif Search algorithm is provided that collects, classifies, analyzes and arranges protein sequence data in a particular output format that is based on one or more input criteria provided by the user.
- the invention also provides a method of conducting a drug discovery business comprising i) identifying one or more biologically active peptides using a peptide identification method according to the invention, e.g., in the form of software stored on a computer system, ii) screening the peptide for the presence of descriptors indicative of a desirable therapeutic profile ⁇ e.g., ADME/T profile), iii) optionally modifying the peptide to improve its therapeutic profile; and iv) licensing, to a third party, the rights for further drug development of the peptide.
- a peptide identification method e.g., in the form of software stored on a computer system
- the method of conducting a drug discovery business may furthermore comprise predicting the biological function of the one or more biologically active peptides using a method of the invention and, optionally, correlating the predicted function with a disease or pathological condition.
- identifying lead peptide compounds includes screening hits for traits indicative of a desirable ADME/T profile and selecting compounds having a higher probability of exhibiting a pharmaceutically desirable ADME/T profile.
- FIG. 1 is a graph comparing the number of motifs to the number of occurrences in a single protein.
- FIG. 2A is a graph illustrating some of the results that were obtained when peptide motif ATFV.
- FIG. 2B is another graph illustrating some of the results that were obtained when peptide motif ATFV.
- FIG. 2C is another graph illustrating some of the results that were obtained when peptide motif ATFV.
- FIG. 3A is a bar graph showing diseases and disorders.
- FIG. 3B is a bar graph illustrating molecular and cellular functions.
- FIG. 3C is a bar graph showing physiological systems development and functions.
- FIG. 3D is a bar graph showing metabolic pathways.
- FIG. 3E is a graph showing signalling pathways.
- FIG. 4 is a flow chart for determining a Peptide-I database.
- FIG. 5 is another flow chart for determining a Peptide-I database.
- record means a database entry for a particular compound, operably linking descriptors associated with the particular compound.
- descriptor means a metric used to describe a structure or certain molecular or functional attribute of a compound.
- descriptors and descriptor coding include, but are not limited to: Hammin, Euclidean, Tversky, Tanimoto, Ghose and Crippen, and/or BCUT indices, molecular mass, calculated lipophilicity (e.g., LogP and LogD), rotatable bonds, polar surface area, molecular flexibility, SMILES, Bit Strings, pKa, hydrogen bond donors, hydrogen bond acceptors, total hydrogen bond count, individual amino acids by position in a polypeoptide, biomarker expression or activity, geometry -based descriptors, proteolytic cleavage sites, acid/base properties, conformational constraint, molecular topology, AUC (area under the drug plasma concentration-time curve), Cmax (maximum drug concentration in plasma), adsorption properties, distribution properties, metabolism rates, excretion rates, toxicity, cardiotoxicity, nephrotoxicity, neurotoxicity, hepatotoxicity, electronegativity, polarity, solubility, membrane permeability, ability
- a "purified or isolated" peptide is a peptide that has been purified from a natural or biotechnological source.
- Polypeptide and “Protein” include polymers of two, preferably three, or more amino acids, preferably the peptides are “virtual” (or in silico) breakdown products of a larger native protein. No distinction, based on length, is intended between a peptide, a polypeptide or a protein.
- a "functional analogue” or “derivative” of a peptide includes variations made with regard to a reference peptide, wherein the functional analoque or derivative retains an identifiable relationship to the reference peptide and retain the desired function of the reference peptide;
- functional analogues and derivatives include, but are not limited to, compounds having the same or equivalent sidechains as the reference peptide arranged sequentially in the same order as the reference peptide, but for example, joined together by non-peptide bonds ⁇ e.g., by isosteric linkages such as the keto isostere, hydroxy isostere, diketo isostere, or the keto-difluoromethylene isostere), non-naturally occurring amino acids or polyamides, surrogate peptide bonds (see, e.g., U.S.
- a functional analogue or derivative may be considered a peptide for the purposes of screening, identification of activity, inclusion in a database, production of a pharmaceutical and used in the method of identifying hit and lead compounds.
- Conservative amino acid substitutions are known in the art and generally constitute substitution of one amino acid residue with another residue having generally similar properties (size, hydrophobicity, or charge).
- modifications including, but not limited to, glycosylation, PEGylation, PEG alkylation, alkylation, acteylation, amidation, glycosyl-phophatdylinositalization, farnesylation, ADP-ribosylation, sulfation, lipid attachment, hydroxylation, and phosphorylation.
- compound and/or "peptide” includes an acceptable salt or ester of the compound or peptide.
- an "acceptable salt or ester” refers to a salt or ester that retains the desired activity of the peptide or compound, and preferably does not detrimentally affect a subject, for example, a human subject.
- examples of such salts are acid addition salts formed with inorganic acids, for example, hydrochloric acid, hydrobromic acid, sulfuric acid, phosphoric acid, nitric acid, and the like.
- Salts may also be formed with organic acids such as, for example, acetic acid, oxalic acid, tartaric acid, succinic acid, maleic acid, fumaric acid, gluconic acid, citric acid, malic acid, ascorbic acid, benzoic acid, tannic acid, pamoic acid, alginic acid, polyglutamic acid, and the like.
- Salts may be formed with polyvalent metal cations such as zinc, calcium, bismuth, barium, magnesium, aluminum, copper, cobalt, nickel and the like or with an organic cation formed from N,N'-dibenzylethylenediamine or ethylenediamine, or combinations thereof ⁇ e.g., a zinc tannate salt).
- a "hit” means a compound having a descriptor value meeting or exceeding a threshold value.
- a “lead” means a compound or pharmacore structure having at least one descriptor value meeting or exceeding a threshold value and having at least one descriptor value related to an element of its therapeutic profile meeting or exceeding a threshold value.
- descriptors include, but are not limited to, molecular mass, calculated lipophilicity (e.g., LogP and LogD), rotatable bonds, polar surface area, molecular flexibility, SMILES, Bit Strings, pKa, hydrogen bond donors, hydrogen bond acceptors, total hydrogen bond count, biomarker expression and/or activity, proteolytic cleavage sites, acid/base properties, conformational constraint, adsorption, distribution, metabolism, excretion, toxicity etc.
- the invention provides a method for identifying (in silico) a biologically active peptide consisting of two to seven amino acid residues, comprising the steps of: [0044] providing a database comprising a plurality of naturally occurring polypeptide sequences;
- defining at least one peptide motif that satisfies one or more of the following criteria: AP, PA, A(P) n A or P(A) n P; [0046] wherein n 0-5 and wherein A and P stand for amino acid residues selected from, respectively, a first group of defined A amino acids and a second group of defined P amino acids;
- the database comprising a plurality of naturally occurring polypeptide sequences may comprise wildtype and/or mutated polypeptides.
- the database can be generated manually or automatically.
- the database can be a commercial or a publicly available polypeptide database.
- the database is a database with polypeptides of one or more defined organism(s) or group(s) of organism(s).
- a database comprises human, viral, plant and/ or bacterial polypeptide sequences.
- Of particular interest for the invention is a database which also contains information about the (predicted) biological function of the polypeptides.
- An exemplary database of use when practicing the invention is the UniProtKB/Swiss-Prot Protein Knowledgebase. This is an annotated protein sequence database established in 1986.
- the UniProtKB/Swiss-Prot Protein Knowledgebase is a curated protein sequence database that provides a high level of annotation, a minimal level of redundancy and a high level of integration with other databases. Together with UniProtKB/TrEMBL, it constitutes the UniProt Knowledgebase, one component of the Universal Protein Resource (UniProt), a one-stop shop allowing easy access to all publicly available information about protein sequences. It is maintained collaboratively by the Swiss Institute for Bioinformatics (SIB) and the European Bioinformatics Institute (EBI). The current Swiss -Prot Release is version 50.0 as of 30- May-2006, and contains 222289 entries.
- a biologically active peptide is defined by one of the following motifs: AP,
- peptides of the invention include peptide dimers AA and PP, peptide trimers APA and PAP, tetramers APPA and PAAP, pentamers APPPA and PAAAP, hexamers APPPPA and PAAAAP and heptamers
- a and P residues are selected from distinct, predetermined subsets of amino acid.
- the subsets may partially overlap with respect to the amino acids, i.e., a certain amino acid residue may be present in the subset of A amino acids as well as in the subset of P amino acids.
- a and P may stand for amino acid residues with different physico- and/or chemical parameters.
- Physico- and/or chemical parameters based on which A and P can be defined comprise molecular mass (Dalton), surface area (A 2 ), volume (A 3 ), pKa carboxylic acid, pKb amine, pKa side-chain, isolectric point, solubility, density (crystal density, g/ml), non-polar surface area (A 2 ), estimated hydrophobic effect for residue or side chain burial (kcal/ml) and any combination thereof.
- a and P differ with respect to their ratio between the total surface area and the non-polar surface area, which is indicative of their relative polarity.
- A is selected from amino acids Leu (L), Trp (W), Phe (F), He (I), VaI
- V Pro (P), Ala (A), whereas P is selected from amino acids Tyr (Y), His (H), Thr (T), Lys
- A is selected independently from the subset consisting of amino acid residues A, V, L, I, P, M, F and W, whereas P is selected from the subset G, S, T, C, N, Q, Y, D, E, R, K and H.
- A is selected from the group consisting of amino acids Leu (L), Trp (W), Phe (F), He (I), VaI (V), Pro (P), Ala (A), Met (M), GIy (G) and
- V VaI (V), GIy (G), Phe (F) and Trp (W).
- A is selected from the group consisting of amino acids Leu (L), Trp (W), Phe (F), He (I), VaI (V), Pro (P), Ala (A), Met (M), GIy (G), GIn (Q) and Cys (C); whereas P is selected from the group consisting of amino acids Tyr (Y), His (H), Thr (T), Lys (K), Ser (S), Arg (R), GIu (E), GIn (Q), Cys (C), Asp (D), Asn (N), Pro (P),
- a and P amino acids can be the same or they can be different.
- motif APPA can have the sequence Trp -Arg- Arg -VaI, Trp-Glu-Arg-Phe or Phe-Glu-Arg-Phe.
- the peptide motif is smaller than seven amino acids (aa) because such a peptide generally does not binding to the MHC receptors, thereby decreasing the risk of the development of autoimmunity initiated by an immune response against the biologically active peptide when administered as a therapeutic agent.
- This size of smaller than seven amino acids is also particularly preferred because it was determined (when comparing peptides derived from the human proteome with those derived from pathogen proteomes, in particular of viruses or bacteria (Burroughs et al., Immunogenetics, 2004, 56:311-320)) that with a peptide size of seven aa only 3% overlap between self or non-self is found.
- the peptide consists of two to six amino acids, more preferably consists of three to five amino acids, and most preferably consists of three or four amino acids.
- the peptide consists of four amino acids with the motif APPA or PAAP.
- a method of the invention comprises the step of determining the frequency of occurrence of the at least one peptide motif, e.g., APA, among a database comprising multiple sequences of known proteins.
- the frequency is an indicator of the likelihood that the peptide is biologically active. The higher the frequency of a given peptide motif among the naturally occurring proteins, the higher the chance that the motif will be of biological significance.
- a method of the invention comprises the step of determining the frequency of occurrence of the at least one peptide motif among at least one single polypeptide sequence.
- the higher the number of occurrence of a specific motif in a single protein the more likely it is that the motif will be generated in vivo to exert a biological effect.
- the amino acid sequence of a peptide motif is an indicator of its biological activity but also its relative frequency of occurrence in the proteome.
- FIG. 1 shows as an example the results of determining the frequency of occurrence of tetrapeptide motif APPA among a database of human, viral or bacterial polypeptides. A was selected independently from amino acids L, W, F, I, V, P and A, while
- P was selected independently from amino acids Tyr (Y), His (H), Thr (T), Lys (K), Ser (S), Arg (R), GIu (E), GIn (Q), Cys (C), Asp (D), Asn (N) and GIy (G).
- a method of the invention for identifying a biologically active peptide comprises selecting a peptide motif, for instance a tetrapeptide motif, having a frequency of occurrence among a single polypeptide sequence of at least five, preferably at least ten.
- a peptide motif for instance a tetrapeptide motif, having a frequency of occurrence among a single polypeptide sequence of at least five, preferably at least ten.
- less than 50 of all APPA motifs occur more than 30 times within a single human polypeptide and less than 16 tetrapeptides are represented more than 40-fold in one protein.
- peptide motifs LTSL, FVLS, NMWD, LCFL, MWDF, FSYA for example,
- FWVD and AFTV can be identified to be present in the naturally occurring polypeptide C-Reactive Protein (CRP) (e.g., human CRP), peptide motifs GLLG, TAPS, VCQV, CLWT, VHQL, GALH, LGTL, TLVQ, QLLG, YAIT, LCEL, GLIR, APSL, ITTL, QALG, HPPS, GVLC, LCPA, LFYA, NIMR, NLIN, LHPP, LTEL, SPIE, VGGI, QLLY, LNTI, LWTL, LYSP, YAMT, LHNL, TVLR, and LFYA are present in Beta-catenin (e.g., human CTNB); peptide motifs LSNI, YVFS, LYGV, YVVC, FIVR, NILD, TIMY, LESI, FLLT, VFSP, FILE, TFLK, FWID, MWEI, QLLE,
- a relatively large peptide motif of the invention like a peptide consisting of six or seven amino acids, has a smaller statistical chance to be present in a polypeptide as compared to a relatively small peptide motif, for example a di- or tripeptide motif.
- the correlation between a given number of occurrences of a peptide motif among a single polypeptide and the biological activity of the peptide motif may therefore depend on the peptide size. In other words, a frequency of occurrence of five or more may not be a discriminant for biological activity of a tripeptide whereas it is a strong indicator for a hexapeptide.
- a biologically active dipeptide is identified by determining which of the dipeptide sequences according to the AP, PA, AA or PP motif is present at least 30 times, preferably at least 50 times, more preferably at least 70 times among a single naturally occurring polypeptide sequence.
- a biologically active tripeptide is identified by determining which of the tripeptide sequences according to the APA or PAP motif is present at least 20 times, preferably at least 30 times, more preferably at least 50 times among a single naturally occurring polypeptide sequence.
- a biologically active tripeptide is identified by determining which of the tetrapeptide sequences according to the APPA or PAAP motif is present at least ten times, preferably at least 20 times, more preferably at least 30 times among a single naturally occurring polypeptide sequence.
- a biologically active pentapeptide is identified by determining which of the pentapeptide sequences according to the APPPA or PAAAP motif is present at least five times, preferably at least ten times, more preferably at least 20 times among a single naturally occurring polypeptide sequence.
- a biologically active hexapeptide is identified by determining which of the hexapeptide sequences according to the APPPPA or PAAAAP motif is present at least three times, preferably at least seven times, more preferably at least ten times among a single naturally occurring polypeptide sequence.
- a biologically active heptapeptide is identified by determining which of the heptapeptide sequences according to the APPPPPA or PAAAAAP motif is present at least two times, preferably at least five times, more preferably at least seven times among a single naturally occurring polypeptide sequence.
- biological activity refers to any activity which regulates or influences, either in a transient or sustained fashion, at least one cellular process or pathway.
- the regulation or influence can be inhibitory (suppressive) or stimulatory.
- the cellular process or pathway can be a normal or a pathogenic pathway, for example a signal transduction pathway involved in inflammatory conditions or oncogenesis.
- the invention provides a method for identifying a peptide which exerts at least part of its biological activity in the nucleus of a cell.
- the peptide is involved in the regulation of the nuclear transcription machinery.
- the invention provides a method for identifying a peptide which exerts at least part of its biological activity in the cytoplasm (also known as cytosol) of a cell. For example, it is involved in the regulation of translation machinery and/or in one or more cytosolic signal transduction pathway(s).
- a peptide may also exert its biological activity in multiple different cellular compartments. In one embodiment, it shuttles between the cell nucleus and the cytoplasm.
- Table 1 shows exemplary tetrapeptides having nuclear activity. "A” amino acid residues were selected from LMAPVWIFCGQ, whereas "P' amino acid residues were selected from QTAPGRVCWFDHNSKYE. SwissProt database was used to identify the relative occurrence in human polypeptides. Indicated are those peptide motifs that occur with a frequency of at least one.
- Table 2 shows exemplary tetrapeptides predicted to have an effect on the translation machinery. "A” amino acid residues were selected from LMAPVWIFCGQ, whereas "P' amino acid residues were selected from QTAPGRVCWFDHNSKYE. SwissProt database was used to identify the relative occurrence in human polypeptides. Only those motifs having a frequency of occurrence of two or more are shown.
- Table 3 shows exemplary tetrapeptides predicted to have an effect of the transcription machinery. "A” amino acid residues were selected from LMAPVWIFCGQ, whereas “P” amino acid residues were selected from QTAPGRVC WFDHNSKYE. SwissProt database was used to identify the relative occurrence in human polypeptides.
- Table 3 Tetrapeptides predicted to have transcriptional activity Motif Frequency Motif Frequency Motif Frequency Motif Frequency Motif Frequency Motif Frequency Motif Frequency Motif Frequency Motif Frequency Motif Frequency Motif Frequency Motif Frequency Motif Frequency Motif
- PAAE 1 VGGI 1 Motif Frequency Motif Frequency Motif Frequency
- peptide motifs can be identified as being biologically active based on their relative frequency of occurrence in natural proteins.
- the present inventors identified as a further criterium for a peptide motif to be biologically active the likelihood that the peptide is generated in vivo, for example during intracellular protein turnover.
- a peptide motif has an increased chance of being biologically active if its complete sequence is present at least once, preferably at least twice, more preferably at least five times, within a fragment of a polypeptide that is or can be generated in vivo.
- peptide motifs that are likely to be degraded, for instance because they contain an enzymatic cleavage site are less likely to be biologically active.
- the invention therefore relates to an identification method as described herein above with the additional step or process of determining the likelihood that the peptide is generated in vivo, or, in other words, the likelihood that the peptide motif remains intact upon degradation of the polypeptide comprising the motif sequence.
- the process comprises the steps of: a) selecting from the database at least one polypeptide sequence comprising at least one copy of at least one defined peptide motif; b) determining in at least one selected polypeptide the presence of one or more polypeptide fragments having an increased likelihood of being generated in vivo; and c) selecting at least one
- an increased likelihood is a further positive indicator of the peptide having a biological activity.
- Step b) of determining what fragment(s) a polypeptide have an increased chance of being formed within a cell may be based on one or more known biological phenomena related to selective or aselective protein breakdown. For instance, it comprises determining in silico the presence of a polypeptide fragment which is flanked by one ⁇ e.g., C or N-terminal fragment) or more predicted cleavage sites.
- the cleavage sites can be an enzymatic cleavage site, preferably a cleavage site recognized by an enzyme selected from the group consisting of the proteasome, immunoproteasome, Arg-C proteinase, Asp-N endopeptidase, caspase, Chymotrypsin (high specificity (C-term to [FYW], not before P) or low specificity (C-term to [FYWML], not before P), Clostripain (Clostridiopeptidase B, Enterokinase, Factor Xa, Glutamyl endopeptidase, GranzymeB, LysC, Pepsin (pH1.3), Pepsin (pH>2), Proline -endopeptidase, Proteinase K, Staphylococcal peptidase I, Thermolysin, Thrombin and Trypsin.
- an enzyme selected from the group consisting of the proteasome, immunoprotea
- Enzymatic cleavage sites within a polypeptide can be readily identified in silico using publicly available software.
- “Cutter” is a program that allows to generate peptide fragments by the enzymatic or chemical cleavage of a protein sequence entered by the user, and that computes the theoretical masses of the generated peptides. It can be accessed at http://delphi.phys.univ-tours.fr/Prolysis/cutter.html.
- the cleavage site is a cleavage site for a chemical agent, for example BNPS-Skatole, CNBr, Formic acid, Hydroxylamine, Iodosobenzoic acid and/or NTCB (2-nitro-5-thiocyanobenzoic acid).
- a chemical agent for example BNPS-Skatole, CNBr, Formic acid, Hydroxylamine, Iodosobenzoic acid and/or NTCB (2-nitro-5-thiocyanobenzoic acid).
- the fragments are preferably subjected to an analysis to determe whether the fragment(s) are likely to be generated in the antigen-processing pathway. If this is the case, it is another positive indicator of the peptide motif having biological activity.
- Antigen processing and presentation are processes that occur within a cell that result in fragmentation (proteolysis) of proteins, association of the fragments with
- MHC molecules and expression of the peptide-MHC molecules at the cell surface where they can be recognized by the T cell receptor on a T cell.
- MHC class I molecules present degradation products derived from intracellular (endogenous) proteins in the cytosol.
- MHC class II molecules present fragments derived from extracellular (exogenous) proteins that are located in an intracellular compartment.
- a method of the invention comprises the use of a computational prediction method for MHC class I and/or class II binding peptides.
- the presence of one or more copies of a specific peptide motif in a polypeptide fragment that is predicted to bind to MHC class I and/or class II is a (further) positive indicator of the peptide motif having biological activity.
- a method for identifying a peptide of two to seven amino acids having biological activity involves determining in silico the presence of the motif in a polypeptide region that is a predicted binder to at least one class I MHC allele. For example, the binding regions of a polypeptide comprising one or more copies of the defined peptide sequence ⁇ e.g., APPA or PAAAP) to at least one HLA allele is determined.
- a polypeptide comprising one or more copies of the defined peptide sequence ⁇ e.g., APPA or PAAAP
- the polypeptide region binding to at least one HLA allele selected from from the group consisting of HLA-alleles HLA-A*1101, HLA-A2.1, HLA-A*3302, HLA-B14, HLA-B*3701, HLA-B40, HLA-B*5103, HLA-B*51, HLA-B62, HLA-Cw*0301, H2-Db, H2-Kd, HLA-A2, HLA-A24, HLA-A68.1, HLA-B*2702, HLA-B*3801, HLA-B*4403, HLA-B*5201, HLA-B*5801, HLA-B7, HLA-Cw*0401, H2-Db, H2-Kk, HLA-A*0201, HLA-A3, HLA-A20 cattle, HLA-B*2705, HLA-B*3901,HLA-B*5101, HLA-B*5301, HLA-B60, HLA-B*0702,
- a method for identifying a biologically active peptide involves selecting from a database at least one polypeptide sequence comprising at least one copy of at least one defined peptide motif; determining in at least one selected polypeptide the presence of one or more polypeptide fragments having an increased likelihood of binding to at least one class II MHC allele, and selecting at least one defined peptide motif whose amino acid sequence is present in at least one of the polypeptide fragments.
- the polypeptide fragment(s) are for example predicted binders to at least one class II MHC allele selected from the group consisting of HLA-alleles HLA-DRl, HLA-DRBl*0101, HLA-DRBl*0102, HLA-DR3 HLA-DRBl*0301, HLA-DRBl*0305, HLA-DRBl*0306, HLA-DRBl*0307, HLA-DRBl*0308, HLA-DRBl*0309, HLA-DRB1*O311, HLA-DR4, HLA-DRBl*0401, HLA-DRBl*0402, HLA-DRBl*0404, HLA-DRBl*0405, HLA-DRBl*0408, HLA-DRBl*0410, HLA-DRB1*O423,
- HLA-DRB1*112O HLA-DRB1*1121, HLA-DRB1*1128, HLA-DR13, HLA-DRB1*13O1, HLA-DRB1*13O2, HLA-DRB1*13O4, HLA-DRB1*13O5, HLA-DRB1*13O7,
- MHC-peptides can be divided into two groups: sequence based and structure based methods. Allele specific sequence motifs can be identified by studying the frequencies of amino acids in different positions of identified MHC-peptides.
- the peptides that bind to HLA-A*0201 are often nine amino acids long (nonamers), and frequently have two anchor residues, a lysine in position 2 and a Valine in position 9 (Rotzschke et al., European Journal of Immunology 1992, 22:2453-2456).
- SYFPEITHI is freely available as a web service at http://www.syfpeithi.de/. SYFPEITHI prediction can be done for different MHC class I and II types.
- HLA_BIND Another profile based MHC-peptide predictor is HLA_BIND at http://bimas.dcrt.nih. gov/molbio/hla_bind/. This method estimates the half-time of dissociation of a given MHC-peptide complex. HLA_BIND provides prediction for more than 40 different MHC class I types. It has been shown that profile based methods are correct in about 30% of the time, in the sense that one third of the predicted binders actually bind.
- MHC-peptide predition method is SVMC, based on support vector machines to predict the binding of peptides to MHC class I molecules (Donnes and Elofsson, BMC Bioinformatics 2002, 3:25).
- the prediction models for these MHC types are implemented in a public web service available at http://www.sbc.su.se/svmhc/.
- a method of the invention comprises subjecting one or more polypeptide fragments predicted to bind to MHC and comprising one or more copies of a defined peptide motif sequence to a proteasome/immunoproteasome filter.
- Proteasomes are the main proteases responsible for cytosolic protein degradation and the production of major histocompatibility complex class I ligands.
- Incorporation of the interferon gamma-inducible subunits low molecular weight protein (LMP)-2, LMP-7, and multicatalytic endopeptidase complex-like (MECL)-I leads to the formation of immunoproteasomes which have been associated with more efficient class I antigen processing.
- LMP low molecular weight protein
- MECL multicatalytic endopeptidase complex-like
- a biologically active peptide is identified by its presence in a polypeptide fragment that is not only a predicted MHC binder but also predicted to be generated after proteasomal cleavage.
- a further criterion that was identified by the present inventors to be an indicator or predictor of a peptide motif being biological active is related to the likelihood that the motif is exposed at the outer surface of a naturally occurring peptide.
- the presence of a peptide motif at an exposed surface has an increased chance of being released from the polypeptide, e.g., upon proteolytic cleavage, and is therefore regarded as a positive indicator of the peptide motif being biologically active.
- step (iv) comprising the steps of a) selecting at least one polypeptide sequence comprising at least one copy of at least one defined peptide motif; b) determining in at least one selected polypeptide the presence of one or more polypeptide regions having an increased likelihood of being exposed at the outer surface of the selected polypeptide; and c) selecting at least one defined peptide motif whose amino acid sequence is present in at least one of the exposed polypeptide regions, wherein exposure at the outer surface is a positive indicator of the peptide motif having biological activity.
- the chance or likelihood that a polypeptide region is exposed (in aqueous environment) at the outer surface of the polypeptide can be determined using a hydrophilicity and/or hydrophobicity plots. These plots are designed to display or predict on the one hand hydrophobic membrane-spanning segments and on the other hand regions that are likely exposed on the surface of proteins (hydrophilic domains).
- a hydrophilicity plot is a quantitative analysis of the degree of hydrophobicity or hydrophilicity of amino acids of a protein. It is used to characterize or identify possible structure or domains of a protein.
- the plot has amino acid sequence of a protein on its x-axis, and degree of hydrophobicity and hydrophilicity on its y-axis.
- Analyzing the shape of the plot gives information about partial structure of the protein.
- a stretch of about 20 amino acids shows positive for hydrophobicity, then this indicates that these amino acids may be part of alpha-helix spanning across a lipid bilayer, which is composed of hydrophobic fatty acids.
- amino acids with high hydrophilicity indicate that these residues are in contact with solvent, or water, and that they are therefore likely to reside on the outer surface of the protein.
- a region of high hydrophilicity and/or low hydrophobicity is indicative of an increased likelihood of being outer surface exposed and thus of being (e.g., proteolytically) released from the polypeptide.
- Hopp-Woods scale This scale was developed for predicting potential antigenic sites of globular proteins, which are likely to be rich in charged and polar residues. This scale is essentially a hydrophilic index, with apolar residues assigned negative values. The authors suggest that, using a window size of six, the region of maximal hydrophilicity is likely to be an antigenic site.
- the step of determining the probability that a peptide motif is outer surface exposed may be performed in any combination and in any order with one or more of the other selection criteria.
- a method of identifying a biologically active peptide consisting of two to seven amino acid residues complying with one of the defined motids comprises as a first step determining the frequency of occurrence within a single polypeptide.
- the polypeptide is subjected to an enzyme cutter program to determine which fragments are likely to be generated in vivo.
- predicted fragments containing at least one intact copy of the peptide motif are selected.
- the surface exposed fragments are presented to an MHC class I and/or class II prediction method, optionally in combination with an (immuno)proteasome filter, to identify those fragments that are likely to be generated in the antigen-processing pathway.
- Peptide motifs which are present in the polypeptide fragments that satisfy all the criteria have a very high chance of being biologically active.
- All of the above criteria were identified as being predictors of a given peptide sequence being biologically active or not. By using one or more predictors, the invention thus provides a method of identifying a biologically active peptide.
- biological activity is to be interpreted very broadly and therefore provides no indication for what purpose the peptide motif could be used, e.g., in a particular therapeutic or prophylactic treatment.
- a further aspect of the invention therefore relates to predicting the biological funtion of a peptide that has been identified (using the selection criteria set out herein above) as being biologically active.
- a method for predicting the biological function of a defined peptide sequence of two to seven amino acid residues comprising the steps of (i) identifying a biologically active peptide according to a method disclosed herein above; (ii) providing a database comprising multiple polypeptides, each having at least one known biological function; (iii) analyzing the frequency of occurrence of the amino acid sequence of the biologically active peptide sequence among the polypeptides in the database; (iv) selecting from the database at least one polypeptide, but preferably multiple polypeptides, comprising at least one copy of the biologically active peptide sequence and (v) identifying at least one biological pathway wherein the selected polypeptide is involved, wherein the predicted function of the defined peptide comprises is to modulate at least one identified biological pathway.
- the database comprising multiple polypeptides, each having at least one known biological function will be the same database which is used to identify the biologically active peptide motif.
- the identification method merely involves determining whether or not a defined peptide motif is present in whatever naturally occurring polypeptide.
- the prediction method uses the known biological function of the polypeptide comprising the peptide motif to predict in which biological pathway the peptide motif will be active upon its production as a "virtual" breakdown product of the polypeptide.
- the predicted function of the peptide motif is to modulate the pathway wherein the one or more than one polypeptide is involved.
- Modulation of a biological pathway by the peptide motif can be transiently or sustained. It can be stimulatory or suppressive. Modulation can be exerted at the level of the identified polypeptide itself, and/ or upstream and/or downstream thereof. Modulation can be achieved by a direct, e.g., protein-protein or DNA-protein, interaction between the peptide motif and at least one member of the pathway. In addition or as an alternative, modulation can be indirect, e.g., through the modulation of cellular levels of proteins, metabolites, signaling molecules and the like.
- biological pathway refers to any molecular interaction network involved in a biological process, for example any process responsible for a cell's activity, either chemical activity or physiological activity. Such pathways can involve gene expression levels, protein activation levels, the concentration of small molecules, external conditions (e.g., available nutrients) and other relevant biological processes and state. For instance, the peptide motif is predicted to modulate a metabolic pathway or a regulatory pathway.
- the prediction method of the invention comprises the step of identifying at least one biological pathway wherein the polypeptide comprising one ore multiple copies of the peptide motif is involved.
- the polypeptide comprising one ore multiple copies of the peptide motif is involved.
- the KEGG pathway (http://www.genome.jp/kegg/pathway.html) is a collection of manually drawn pathway maps representing the knowledge on a wide variety of molecular interaction and reaction networks. For example, Metabolism (carbohydrate, energy, lipid, nucleotide, amino acid, glycan, polyketides and non-ribosomal peptides, cofactor/vitamin, secondary metabolite, xenobiotics), Genetic Information Processing, Environmental Information Processing, Cellular Processes and Human Diseases. [00127] Major publicly available biological pathway resources, including the Kyoto
- the prediction method of the invention comprises performing an Ingenuity Pathway Analysis (IPA) to identify a biological pathway comprising at least one polypeptide comprising one or more copies of a peptide motif.
- Ingenuity Pathways Analysis is a web-based software application that enables biologists and bioinformaticians to identify the biological mechanisms, pathways and functions most relevant to their experimental datasets or genes of interest.
- An IPA search based can be on genes, proteins, diseases, processes, functions, and identifiers.
- a method of the invention for prediction the function of a biologically active peptide comprises performing an IPA search using the peptide motif as input to elucidate one or more biological networks wherein a polypeptide or polypeptides comprising the motif is/are involved.
- FIG. 2 illustrates some of the results that were obtained when peptide motif ATFV, present in the polypeptide C-reactiev protein (CRP) and identified as disclosed herein to be biologically active, was subjected to an Ingenuity Pathway Analysis.
- the ATFV sequence was found to be present in a number of naturally occurring human polypeptides (indicated in the figures in bold) playing a role in different biological pathways presented as different "networks" in FIGS. 2A, 2B and 2C.
- polypeptides wherein the motif is represented act at various (sub)cellular sites, either extracellularly (e.g., LAMAl, NPTXl, COLlOAl), at or close to the plasma membrane (e.g., FZD4, FZD9, FZDlO, CD4, EPB41L1), in the cytoplasm (e.g., ABCB7, METAP2, IARS) or in the nucleus (e.g., MYB, NEK8, WTl, EPASl).
- peptide motif ATFV is predicted to function as a modulator of the networks shown in FIG. 2.
- one peptide can serve a modulatory role in more than one biological pathway or network.
- this motif was found to be significantly overrepresented in cell signaling pathways in comparison with other pathways involved in molecular and cellular functions (see FIG. 3B). Further analysis of the different signaling pathways indicates that the ATFV peptide is in particular predicted to be a modulator of the Wnt/beta-catenin signaling pathway (results not shown). In addition, among pathways involved in diseases and disorders, the ATFV motif appears "enriched" in those underlying or related to dermatological diseases, neurological diseases, ophtalmic diseases, and organismal injuries and abnormalities (Fig 3A). With respect to physiological systems, development and functions, a modulatory function of embryonic development is predicted (FIG. 3C). Regarding metabolic pathways, ATFV is predicted to modulate folate biosynthesis (FIG. 3D).
- the invention not only provides a method for identifying a biologically active peptide but also encompasses predicting its biological function, both in a normal and diseased state of an organism. Given the need in the art for improved methods of identifying hits and generating lead compounds for therapeutic purposes, the identification and prediction methods are clearly of great use in the field of drug discovery.
- the invention also provides a method of conducting a drug discovery business comprising i) identifying one or more biologically active peptides using a peptide identification method according to the invention, ii) screening the peptide for the presence of descriptors indicative of a desirable therapeutic profile (e.g., ADME/T profile), iii) optionally modifying the peptide to improve its therapeutic profile; and iv) licensing, to a third party, the rights for further drug development of the peptide.
- a desirable therapeutic profile e.g., ADME/T profile
- the method of conducting a drug discovery business may furthermore comprise predicting the biological function of the one or more biologically active peptides using a method of the invention and, optionally, correlating the predicted function with a disease or pathological condition.
- identifying lead peptide compounds includes screening hits for traits indicative of a desirable ADME/T profile and selecting peptide compounds having a higher probability of exhibiting a pharmaceutically desirable ADME/T profile.
- the peptide motifs identified according to the invention may be prepared by methods known in the art (for example, see, U.S. Patent Application number 10/456,375). For example, by peptide synthesis methods known in the art, including, suitable N alpha protection (and side-chain protection if reactive side-chains are present). Protection of the ⁇ -amino group may utilize an acid-labile tertiary-butyloxycarbonyl group ("Boc”), benzyloxycarbonyl (“Z”) group or substituted analogs or the base-labile 9-fluoremyl-methyloxycarbonyl (“Fmoc”) group.
- Boc acid-labile tertiary-butyloxycarbonyl group
- Z benzyloxycarbonyl
- Fmoc base-labile 9-fluoremyl-methyloxycarbonyl
- the Z group can also be removed by catalytic hydrogenation, other suitable protecting groups include Nps, Bmv, Bpoc, Aloe, MSC, etc.
- suitable protecting groups include Nps, Bmv, Bpoc, Aloe, MSC, etc.
- a good overview of amino protecting groups is given in The peptides, Analysis, Synthesis, Biology, Vol. 3, E. Gross and J. Meienhofer, eds., (Academic Press, New York, 1981). Protection of carboxyl groups can take place by ester formation, for example, base-labile esters like methyl or ethyl, acid labile esters like tertiary butyl or, substituted, benzyl esters or hydrogenolytically. Protection of side -chain functions like those of lysine and glutamic or aspartic acid can take place using the aforementioned groups.
- Activation of the carboxyl group of the suitably protected amino acids or peptides can take place by the azide, mixed anhydride, active ester, or carbodiimide method especially with the addition of catalytic and racemization-suppressing compounds like 1-N-N-hydroxybenzotriazole, N-hydroxysuccin-imide, 3-hydroxy-4-oxo-3,4-dihydro-l,2,3,-benzotriazine, N-hydroxy-5 norbornene-2,3-dicar-boxyimide.
- the anhydrides of phosphorus based acids can be used. See, e.g., The Peptides, Analysis, Synthesis, Biology, supra and Pure and Applied Chemistry, 59(3), 331-344 (1987).
- Removal of the protecting groups, and, in the case of solid phase peptide synthesis, the cleavage from the solid support may be performed by means known in the art ⁇ see, e.g., volumes 3, 5 and 9 of the series on The Peptides Analysis, Synthesis, Biology, supra).
- Another possibility is the application of enzymes in synthesis of such compounds; for reviews see, e.g., H. D. Jakubke in The Peptides, Analysis, Synthesis, Biology, Vol. 9, S. Udenfriend and J. Meienhofer, eds. (Acad. Press, New York, 1987).
- modifications such as glycosylation, phosphorylation and other modifications known in the art.
- Peptides according to the invention may also be made according to recombinant DNA methods. Such methods involve the preparation of the desired peptide by means of expressing recombinant polynucleotide sequence which codes for one or more of the oligopeptides in a suitable host cell. Generally the process involves introducing into a cloning and/or expression vehicle ⁇ e.g., a plasmid, phage DNA, or other DNA sequence able to replicate in a host cell) a DNA sequence coding for the particular oligopeptide or oligopeptides, introducing the cloning and/or expression vehicle into a suitable eucaryotic or procaryotic host cell, and culturing the host cell thus transformed. When a eucaryotic host cell is used, the compound may include a glycoprotein portion.
- the information generated is stored (or compiled) in electronic form, using a computerized database that allows information to be efficiently catalogued and retrieved.
- databases are comprised of records, typically one record per compound is generated, that includes information about the compound, stored as descriptors (see, J.M. Berger, "A Note on Error Detection Codes for Asymmetric Channels," Information and Control, Vol. 4, pp. 68-73, 1961; B. Bose and T.R.N. Rao, "Theory of Unidirectional Error Correcting/Detecting Codes," IEEE Transactions on Computers, Vol. C-31, No. 6, pp.
- the invention therefore also provides a computer program (software) capable of performing at least part of the steps comprised in the method for identifying a biologically active peptide and/or at least part of the steps comprised in the method of predicting the function of a biologically active peptide.
- the software can be referred to as "Peptide identification” or "Peptide-I” software.
- the computer program for example comprises a motif search algorithm to compare a peptide motif with at least one protein from a protein sequence database and collect, classify, analyze and/or arrange protein sequence data (see Example 1).
- the software may comprise a protein query algorithm to scan submitted protein sequences for motif patterns and filter the results based on user-defined selection criteria, and to provide as output a list of motifs that meet the selection criteria.
- Suitable selection criteria are one or more selected from a) the presence of one or more predicted cleavage sites flanking the motif, and preferably the absence a predicted cleavage site within the motif; b) the presence of a motif in polypeptide fragment(s) which are predicted to bind to class I and/or class II MHC molecules; and c) the exposure of a motif at the outer surface of a naturally occurring polypeptide.
- a computer program according to the invention may comprise an algorithm to build a "virtual" protein interaction network using functional information from a protein sequence database.
- Example 2 describes a representative protein query algorithm.
- the motif search algorithm and the protein query algorithm are preferably integrated into a single computer program.
- the manner in which they are integrated can vary.
- FIGS. 4 and 5 show block diagrams illustrating exemplary computer programs suitable for use with the invention.
- the software may be written for Microsoft .NET platform in C#.
- Microsoft SQL server can be used as database.
- Descriptors may be generated and/or entered into the database manually, that is by a user entering data through a user interface (e.g., keyboard, touchpad, etc.), or may be generated and/or entered electronically, for example, when a robotic system is used to generate results that are then converted to a descriptor and transferred to the database (often referred to as uploading).
- a user interface e.g., keyboard, touchpad, etc.
- Such information may be stored in a discrete area of the record, e.g., the descriptor may refer to a sigle property or may describe multiple properties.
- the information or descriptor, or a database of such information or descriptors may be stored permanently or temporarily on various forms of storage media, including, but not limited to, compact disks, floppy disks, magnetic tapes, optical tapes, hard drives, computer system memory units, and the like.
- the database may be stand-alone, or the records therein may be related to other databases (a relational database).
- databases include publicly available databases, such as GenBank for peptides and neucleic acids (and associated databases maintained by the National Center for Biotechnology Information or NCBI), and the databases available through www.chemfinder.com or The Dialog Corporation (Cary, N.C.) for chemical compounds.
- the database may comprise wild-type ("normal") and /or mutated protein sequences. For example, polypeptides that are encoded by genes comprising one or more single nucleotide polymorphisms (SNPs) may be screened for the presence of a certain peptide motif.
- SNPs single nucleotide polymorphisms
- a user will be able to search the database according to the information recorded (selecting records that have a particular value in a selected descriptor, for example, searching for all compounds that show the ability to up regulate NF2); accordingly, another aspect of the invention is a method of using a computer system to catalog and store information about various peptide motifs, their representation in naturally occurring polypeptides and the predicted biological function.
- the ability to store, retrieve and seach such information in computerized form allows those of ordinary skill in the art to select compounds for additional testing, including additional analysis of protein-protein interactions, physical characteristics of the compounds, toxicology testing in animal models, and/or clinical trials of pharmaceutical agents in humans.
- the database may be screened or searched using multiple individual protocols, the output thereof being searched to provide a report, either in electronic form or in the form of a printout, which will facilitate further analysis of selected compounds.
- a computer device comprising: a processor means, a memory means adapted for storing data relating to a plurality of protein sequences; means for inputting data relating to peptide motifs and a computer program stored in the computer memory adapted to screen the protein sequences for data relating to peptide motifs and outputting the screening results.
- the computer device for example comprises a computer program described above.
- One embodiment of the invention comprises a computing environment; an input device, connected to the computing environment, to receive information from the user; an output device, connected to the computing environment, to provide information to the user; and a plurality of algorithms selectively executed based on at least a portion of the received information, wherein any one of these algorithms analyzes at least a portion of the received information and generates output information, and preferably wherein the output information is communicated via the output device.
- the computing environment preferably further comprises a communications network; a server connected to the network; and a client connected to the network, wherein the client is part of a client-server architecture and typically is an application that runs on a personal computer or workstation and relies on a server to perform some operations (see Nath, 1995, The Guide To SQL Server, 2nd ed., Addison- Wesley Publishing Co.).
- the computing environment of the invention is advantageously implemented using any multipurpose computer system including those generally referred to as personal computers and mini-computers.
- a computer system will include means for processing input information such as at least one central processor, for example an Intel® processor (including Pentium® 113, Pentium® 4 or the like), or Motorola processor (for example, a PowerPC G4 microprocessor); a storage device, such as a hard disk, for storing information related to polypeptides and/or compounds; and means for receiving input information.
- central processor for example an Intel® processor (including Pentium® 113, Pentium® 4 or the like), or Motorola processor (for example, a PowerPC G4 microprocessor); a storage device, such as a hard disk, for storing information related to polypeptides and/or compounds; and means for receiving input information.
- Intel® processor including Pentium® 113, Pentium® 4 or the like
- Motorola processor for example, a PowerPC G4 microprocessor
- storage device such as a hard disk,
- the processor which comprises and/or accesses memory units of the computer system, is programmed to perform analyses of information related to the polypeptides and/or compounds.
- This programming may be permanent, as in the case where the processor is a dedicated PROM (programmable read-only memory) or EEPROM (electrically erasable programmable read-only memory), or it may be transient in which case the programming instructions are loaded from the storage device or from a floppy diskette or other transportable computer -readable media.
- the computing environment further preferably comprises a user interface such as a Unix/X-Window interface, a Microsoft Windows interface, or a Macintosh operating system interface.
- the computing environment further includes an optical disk for storing data, a printer for providing a hard copy of the data, and a monitor or video display unit to facilitate user input of information and to display both input and output information.
- the output information may be output from the processor within the computer system in print form using a printer; on a video display unit; or via a communications link or network to another processor or client application.
- Example 1 Algorithms implemented in Peptide -I software Motif Search algorithm [00157] The algorithm collects, classifies, analyzes and arranges protein sequence data in a particular output format based on a number of input criteria provided by the user.
- the algorithm utilizes publicly available protein sequence data stored for quick access in a local relational database, such as Microsoft SQL Server or other database.
- the primary source of protein sequence data used by the algorithm is available from UniProtKB/Swiss-Prot (http://us.expasy.org/sprot/) - a curated protein sequence database.
- the algorithm fetches protein sequences one by one from the local database and verifies the match of the protein information fields with the user-defined criteria. Such criteria could be: organism taxonomy, gene ontology, or any other description field related to the properties of the analyzed protein.
- the algorithm applies motif patterns defined by the user to identify matches in the sequence.
- Such motif patterns are denoted as regular expressions consisting of one or more amino acids specified by letters followed by typical regular expression characters, such as "*" to match zero or more occurrences of the acid, "+” to match one or more occurrences of the acid, "?" to match zero or one occurrence of the acid, " I " to match either of the acids, "[..]” to match any of the specified acids, "X” to denote all amino acids to simplify the structure of the motif pattern.
- the algorithm utilizes regular expression matching, walks through all positions in the sequence, starting from the first position, and builds up a table of found matches together with the corresponding protein ids.
- the table is kept in memory for fast access in subsequent processing steps.
- the identified motif in the sequence is added to the table of found matches depending on the initial user choice for the number of occurrences of a motif in a sequence. For example, the user can specify that the motif should be stored if it occurs at least twice in the sequence.
- the algorithm analyzes protein binding data for all motifs stored in the table.
- the algorithm attempts to build a network of proteins using the protein-protein binding data from the UniProtKB/Swiss-Prot database or a similar protein database containing information on protein bindings.
- the network is constructed depending on the user choice for only first or first and second level of binding. With the first level of binding, a network is built up of proteins where motifs have been found. For example, if motif Ml is found in protein Pl, and motif M2 is found in protein P2 that binds with protein Pl, then both motifs are selected and belong to the Pl -P2 network.
- the motifs are selected if proteins they are found in do not bind directly with each other, but do bind through other proteins. For example, if motif Ml is found in protein Pl that binds with protein P2, and motif M2 if found in protein P3 that binds with protein P2, than both motifs are selected and belong to the P1-P2-P3 network.
- the algorithm arranges the data to present the output in a convenient table format for easier analysis. Such a format depends on the user-defined search criteria, e.g., taxonomy, gene ontology, networks, etc., and contains input criteria vs. motif patterns, where each cell in the table contains the number of proteins in which the motif is found.
- the algorithm stores the results in a quickly searchable data structure, such as hash table, so the user is provided with fast access to any information on a protein or a group of proteins where a motif has been found.
- the protein query algorithm scans submitted protein sequences for motif patterns and filters the results according to specified criteria. The output is a list of motifs that meet selection criteria and thus are likely to exit.
- the algorithm can process amino sequences directly or utilize publicly available protein data stored for quick access in a local relational database, such as Microsoft SQL Server or other database. The primary source of protein data used by the algorithm is available from UniProtKB/Swiss-Prot (http://us.expasy.org/sprot/) - a curated protein sequence database.
- the algorithm takes the amino sequences and applies the motif patters to identify motifs in sequences as described in 1.0004. For each motif, the positions where it's found in the sequence are kept. The same motif can have multiple occurrences in the sequence.
- short peptides are generated from longer protein sequences by the use of either chemical or enzymatic cleavage.
- the algorithm calculates the positions where the sequence is cut by the selected enzymes and/or chemicals.
- the entire list of motifs is then screened for positions in the sequence which are compared to the identified cut positions. The motif is considered valid only if it is not cut by any of the enzymes and/or chemicals.
- the algorithm uses a prediction method of MHC binding sites for Class-I molecules.
- Statistical/Mathematical expression based methods including Quantitative matrix and Neural network based methods, are preferred.
- the method might be extended by allowing prediction of MHC binders for various alleles and proteasome cleavage site.
- the predicted MHC binders are then filtered based on prediction of proteasome cleavage sites in an antigenic sequence. For example, if a linear prediction model is used, the matrix data is calculated by summing up or multiplying the contribution of each amino acid.
- the peptides having scores more than a defined threshold score are assigned as binders.
- the algorithm uses a prediction method of MHC binding sites for Class-II molecules. It works in a similar fashion as the algorithm in the preceding paragraph, but uses different alleles and different threshold scores to assign peptides as binders.
- the entire list of motifs is then screened for positions in the sequence which are compared to the identified binders' positions. The motif is considered valid only if it falls entirely within the boundaries of the binder.
- the algorithm considers the distribution of polar and apolar residues along a protein sequence using a hydrophobicity plot.
- the algorithm might incorporate any of the commonly used hydrophobicity scales, for example Kyte-Doolittle, or Hopp-Woods scale.
- the mean hydrophobic index of the amino acids within the window is calculated and that value is taken as the midpoint of the window.
- the entire list of motifs is then screened and compared to the hydrophobicity threshold.
- the algorithm analyzes protein binding data, for example using all motifs passed through the previous filtering steps.
- the algorithm attempts to build a network of proteins using the protein-protein binding data from the UniProtKB/Swiss-Prot database or a similar protein database containing information on protein bindings.
- the network is constructed depending on the user choice for only first or first and second level of binding. With the first level of binding, a network is built up of proteins where motifs have been found. For example, if motif Ml is found in protein Pl, and motif M2 is found in protein P2 that binds with protein Pl, then both motifs are selected and belong to the P1-P2 network.
- the motifs are selected if proteins they are found in do not bind directly with each other, but do bind through other proteins. For example, if motif Ml is found in protein Pl that binds with protein P2, and motif M2 if found in protein P3 that binds with protein P2, than both motifs are selected and belong to the P1-P2-P3 network.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Theoretical Computer Science (AREA)
- Biotechnology (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Investigating Or Analysing Biological Materials (AREA)
- Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)
- Peptides Or Proteins (AREA)
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP09731158A EP2269154A1 (en) | 2008-04-09 | 2009-04-09 | Methods for identifying biologically active peptides and predicting their function |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP08154264A EP2109054A1 (en) | 2008-04-09 | 2008-04-09 | Methods for identifying biologically active peptides and predicting their function |
US12363209A | 2009-04-09 | 2009-04-09 | |
PCT/NL2009/050189 WO2009126037A1 (en) | 2008-04-09 | 2009-04-09 | Methods for identifying biologically active peptides and predicting their function |
EP09731158A EP2269154A1 (en) | 2008-04-09 | 2009-04-09 | Methods for identifying biologically active peptides and predicting their function |
Publications (1)
Publication Number | Publication Date |
---|---|
EP2269154A1 true EP2269154A1 (en) | 2011-01-05 |
Family
ID=43243719
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP09731158A Withdrawn EP2269154A1 (en) | 2008-04-09 | 2009-04-09 | Methods for identifying biologically active peptides and predicting their function |
Country Status (1)
Country | Link |
---|---|
EP (1) | EP2269154A1 (en) |
-
2009
- 2009-04-09 EP EP09731158A patent/EP2269154A1/en not_active Withdrawn
Non-Patent Citations (1)
Title |
---|
See references of WO2009126037A1 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20110113053A1 (en) | Methods for identifying biologically active peptides and predicting their function | |
Cohen et al. | Origins of structural diversity within sequentially identical hexapeptides | |
De Bakker et al. | Ab initio construction of polypeptide fragments: Accuracy of loop decoy discrimination by an all‐atom statistical potential and the AMBER force field with the Generalized Born solvation model | |
Wang et al. | Peptide binding predictions for HLA DR, DP and DQ molecules | |
Lundegaard et al. | Modeling the adaptive immune system: predictions and simulations | |
Tripathi et al. | High throughput virtual screening (HTVS) of peptide library: Technological advancement in ligand discovery | |
Koehl et al. | Polar and nonpolar atomic environments in the protein core: implications for folding and binding | |
US20030158672A1 (en) | Use of computationally derived protein structures of genetic polymorphisms in pharmacogenomics for drug design and clinical applications | |
JP4633930B2 (en) | Protein engineering | |
CN110706742B (en) | Pan-cancer tumor neoantigen high-throughput prediction method and application thereof | |
BRPI0808855A2 (en) | METHOD FOR PRODUCING PEPTIDE LIBRARIES AND USING THEREOF | |
Florea et al. | Epitope prediction algorithms for peptide-based vaccine design | |
KR20190065263A (en) | Evaluation method, evaluation apparatus, evaluation program, evaluation system, and terminal apparatus for colorectal cancer | |
Jusot et al. | Exhaustive exploration of the conformational landscape of small cyclic peptides using a robotics approach | |
Salehi et al. | Dynamics and infrared spectrocopy of monomeric and dimeric wild type and mutant insulin | |
Vincenzi et al. | Virtual screening of peptide libraries: The search for peptide-based therapeutics using computational tools | |
Durojaye et al. | Identification of a potential mRNA‐based vaccine candidate against the SARS‐CoV‐2 spike glycoprotein: A reverse vaccinology approach | |
US20030078374A1 (en) | Complementary peptide ligands generated from the human genome | |
Papasaikas et al. | A novel method for GPCR recognition and family classification from sequence alone using signatures derived from profile hidden Markov models | |
US20210391031A1 (en) | Method and system of targeting epitopes for neoantigen-based immunotherapy | |
HUP0400698A2 (en) | Modified interleukin-1 receptor antagonist (il-1ra) with reduced immunogenicity | |
Zhu et al. | Comparison of protein expression lists from mass spectrometry of human blood fluids using exact peptide sequences versus BLAST | |
EP2269154A1 (en) | Methods for identifying biologically active peptides and predicting their function | |
US20060141480A1 (en) | Use of computationally derived protein structures of genetic polymorphisms in pharmacogenomics and clinical applications | |
Evensen et al. | Ligand design by a combinatorial approach based on modeling and experiment: application to HLA-DR4 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20101101 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO SE SI SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL BA RS |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: BIOTEMPT B.V. |
|
17Q | First examination report despatched |
Effective date: 20140519 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20141202 |