CN117292741A

CN117292741A - A method and application for developing lipidated analogues that bind cell membranes and/or serum albumin using computational aided design

Info

Publication number: CN117292741A
Application number: CN202310095754.9A
Authority: CN
Inventors: 林世贤; 丁文龙; 刘超; 陈宇霖
Original assignee: Shaoxing Research Institute Of Zhejiang University
Current assignee: Shaoxing Research Institute Of Zhejiang University
Priority date: 2023-01-18
Filing date: 2023-01-18
Publication date: 2023-12-26

Abstract

The invention discloses a method for developing a lipid analogue combined with cell membranes and/or serum albumin by utilizing a calculation-aided design and application thereof.

Description

Method for developing cell membrane-binding and/or serum albumin lipidation analogues by using computer aided design and application

Technical Field

The invention relates to the technical field of genetic engineering, in particular to a method for developing a cell membrane-combined and/or serum albumin lipidation analogue by utilizing a calculation-aided design and application thereof.

Background

The lipidation modification of the protein covalently connects lipid molecules with strong hydrophobicity to specific sites of the protein, remodels the interaction between the protein and the membrane, and the interaction between the protein and the protein, thereby greatly influencing the structure, the positioning and the migration of the protein. The lipidation modification of proteins includes the prenylation of cysteines, palmitoylation of cysteines, myristoylation of N-terminal glycine, and fatty acylation of serine and lysine. Organisms evolved hundreds of enzymes and regulatory factors involved in the correct addition and removal of these modifications, playing a central role in membrane-associated biological processes, including cell signaling, apoptosis, cell secretion, cellular immunity, and the like. However, the highly variable, reversible and cross-talk characteristics of lipidation modifications with other post-translational modifications make resolving the biological and physiological functions of lipidation modifications challenging. Inactivation mutation of the modification site is a method commonly used in studies of lipidation modification, by which an important role of lipidation modification has been demonstrated, but the function of specific lipidation modification, the function of dynamic transformation of lipidation modification, and the like are difficult to resolve by this method.

With the introduction of chemical biology research methods and ideas, site-specific gain-of-function mutation (gain-of-function) is a brand new strategy for analyzing the functions and importance of lipidation modification. This strategy prepares the lipidated protein to obtain the functional mutation by the following three methods, thereby elucidating the function of the lipidation modification on the biochemical and biophysical level: 1) Direct chemical modification, utilizing lipid molecules with reactive groups such as maleamide and the like connected to couple to cysteine residues of proteins in a specific way; 2) Semi-synthesis of lipidation modified protein, synthesizing peptide segment with lipidation modification by using a solid phase peptide synthesis method, and connecting to specific protein by using natural chemical connection and other methods; 3) The genetic code expansion is coupled with bioorthogonal reaction, the orthogonal reaction group is introduced at a specific site by using a method of genetic code expansion, and then the lipid group is connected to the protein by using bioorthogonal reaction.

The lipidation modification is also a clinically approved post-translational modification of drugs, and is widely applied to various clinically most-popular drugs, including liraglutide, cable Ma Lutai, insulin deluge, telipopeptide and the like. The lipidation modification of peptide or protein drugs can prolong the half life of the drugs, reduce the immunogenicity of the drugs and improve the absorption efficiency of the drugs. The half-life of the drug after lipidation modification is improved mainly because the lipidation modification can bind serum albumin, thus resisting degradation of the protease and reducing renal clearance of the drug by means of the mechanism of albumin reabsorption. The production mode of the lipidation modified peptide and protein medicines is the same as the research strategy for researching the obtained functional mutation of the lipidation modification, and the preparation method can be completed by the three methods. However, site specificity and drug homogeneity are bottlenecks in the generation of lipidated proteinaceous drugs. Method 1 requires multiple mutations of the protein that only retain reactive groups at the target site, which may affect the function of the protein. And both method 1 and method 2 require the addition of an organic solvent due to the highly hydrophobic nature of the lipid molecule, resulting in difficult acquisition of the modified protein. Although the method 3 can overcome the above problems to some extent, it is still difficult to obtain homogeneous lipid modified proteins due to the problem of bio-orthogonality reaction efficiency, and additional introduced groups of the orthogonal reaction may also affect the functions and efficacy of the proteins.

In order to solve the above problems, development of a lipid-modified analogue is urgently required. Such lipid-modified analogues are required to meet the following requirements: 1) The analogues have strong hydrophobicity and can be combined with cell membranes; 2) The analogues also need to have the capacity of binding serum albumin, and after being introduced into protein medicines, the analogues can delay half-life of the medicines and improve pharmacokinetics of the medicines; 3) Such analogs can be site-specifically introduced into biological macromolecules by genetic coding or in vitro coupling. However, there has been no corresponding report on how to design such molecules, how to evaluate such molecules, and how to design libraries of aminoacyl-tRNA synthetases to recognize such molecules, and thus introduce them by way of genetic coding. In view of the above problems, a solution is proposed below.

Disclosure of Invention

Aiming at the problems existing in the prior art, the invention aims to develop a computer-aided screening method, which is used for designing and evaluating lipidation modification and lipidation analogues capable of simulating proteins, and introducing the site specificity of the genetically encoded lipidation analogues obtained by virtual screening to biological macromolecules by using a genetic code expansion method. Meanwhile, the half-life period of the biological macromolecule medicine is prolonged by utilizing the system, the function of protein lipid modification is researched in a way of site-specific acquisition of functional mutation, and meanwhile, the cell delivery efficiency of the biological macromolecule is improved by utilizing the system.

The technical aim of the invention is realized by the following technical scheme:

and (3) a step of: establishment of a method for the computationally aided design of lipidated analogues

A computer-aided virtual screening method was developed to design and screen lipidated analogues capable of mimicking the lipidation modification of proteins, which method comprises three aspects:

1) Evaluating the hydrophobicity of the designed lipidated analog;

2) Evaluating the affinity of the designed lipidated analog for serum albumin;

3) The likelihood that the designed lipidated analog is recognized by an orthogonal aminoacyl-tRNA synthetase is assessed.

The hydrophobicity of the lipidated analogue was evaluated by predicting cLogP, the higher the cLogP value, the more hydrophobic the membrane bound. The affinity of the lipidated analogue to serum albumin is measured by the gibbs free energy (Δg) of the binding of the two, the smaller Δg the higher the affinity of the two. The engineered lipidated analog was docked to the 7 main fatty acid binding sites of serum albumin using autodock Vina, and the appropriate conformation after docking was analyzed to give gibbs free energy for this conformation. The likelihood that a lipidated analogue is recognized by an orthogonal aminoacyl-tRNA synthetase is measured by calculating the mean square Ping Gen difference (RMSD) between it and the reference amino acid, the smaller the value of the RMSD, the higher the likelihood of recognition. The likelihood of recognition is further assessed by calculating the RMSD between the lipidated analogue that has been recognized and the reference amino acid to set a corresponding threshold. Wherein the calculation of RMSD is accomplished by:

1) Docking the designed lipidated analogue into the substrate binding pocket of the corresponding orthogonal aminoacyl-tRNA synthetase, resulting in an energy-optimal conformation (pore);

2) Extracting a conformation (post) of the reference amino acid from the reference amino acid and aminoacyl-tRNA synthetase structure;

3) RMSD of the designed unnatural amino acid was calculated with reference amino acid using LigRMSD web server.

Hundreds of lipidated analogues designed by the present invention, including phenylalanine analogues, tryptophan analogues, lysine analogues and unnatural amino acids with aliphatic side chains, the structure of these amino acids is shown in Table 1. Designed amino acids were evaluated using the developed virtual screening method, and it was found by analysis that the hydrophobicity and binding ability to HSA of the lipidated analogue increased with the extension of the carbon chain and decreased with the introduction of the hydrophilic atom. At the same time, most of the lipidated analogues designed were more hydrophobic and more affinity to HSA than the previously reported HepoK, indicating that lipidated analogues mimicking protein lipidation modifications could be obtained by this virtual screening method. Further analysis showed that amino acids containing aromatic rings have a stronger HSA affinity, in particular lipidated analogues with benzene rings and linear aliphatic side chains and lipidated analogues containing two aromatic rings.

Specifically, the chimeric phenylalanyl-tRNA synthetases of the previous invention recognize a range of phenylalanine derivatives, and these recognized amino acids all have RMSD less than 2.2, so further analysis of lipidated analogs of RSMD <2.2, - ΔG >8.2 (Kd <1 μM), and cLogP >4.0 was performed. Further screening of these lipidated analogs indicated that para-phenylalanine derivatives containing linear aliphatic side chains of 6, 7, 8 and 9 carbons were ideal lipidated analogs. Whether the linkage between the linear aliphatic side chain and the benzene ring is single bond, double bond or triple bond, RMSD increases dramatically after the number of carbon chains exceeds 9, with little probability of recognition by aminoacyl-tRNA synthetases, indicating that the limit at which lipidated analogs may be recognized is between 9 and 10 inclusive of the linear aliphatic chain. Further analysis found that the affinity of the unnatural amino acid serum albumin with two benzene rings was significantly increased.

Preferably, the present invention further exemplifies phenylalanine derivatives of linear aliphatic side chains with triple bonds, which are compared with properties of lipidated modified lipids and other analogues reported in the literature. These phenylalanine derivatives include 4HexyF, 4HepyF, 4OctyF, 4NonyF and 4DecyF. The cLogP of these 5 genetically encoded lipidated analogues increased with the extension of the carbon chain, well above the reported HepoK. They have a higher capacity to bind serum albumin than fatty acids such as myristic acid and palmitic acid. Analysis of molecular docking results indicated that the high serum albumin affinity of these 5 genetically encoded lipidated analogues was conferred by pi-pi interactions mediated by benzene rings.

And II: synthesis and validation of lipidated analogues

In order to obtain ideal lipidated analogues, the reliability of computer-aided virtual screening is verified, and the synthetic unnatural amino acid has the following structural formula:

preferably, the affinity of 4HexyF and 4OctyF assays for serum albumin is selected as a control against HepoK. Kd of 4HexyF and 4OctyF bound serum albumin was measured by Surface Plasmon Resonance (SPR) to be 103. Mu.M and 23.9. Mu.M, respectively, 15.3 and 67 times that of HepoK (1.6 mM). This initially demonstrates that lipidated analogues containing aromatic rings and aliphatic side chains can interact with serum albumin through hydrophobic interactions and pi-pi interactions, being ideal lipidated analogues.

Thirdly,: screening for chimeric phenylalanyl-tRNA synthetase mutants that recognize lipidated analogs

The lipidated analog molecules such as 4HexyF and 4OctyF were docked to the substrate binding pocket of the chimeric phenylalanyl-tRNA synthetase and the surrounding amino acid residues were selected to construct a saturated mutagenesis library (Q356 NNK, L360NNK, E391GAN, V393NNK, M490NNK, L494NNK, T467G, and A507G). Positive mutants were obtained only in the 4HexyF panel by positive and negative selection, designated LipRS-1 (Q356G, E391D, T467G, M490G, and A507G), and the nucleotide sequences were found in Seq ID No. 1. Further experiments found that this mutant also identified 4HepyF, 4OctyF, 4NonyF, 4HexeF, 4OcteF, 4FbutF, 4FproF and 4FpenF, but failed to identify 4DecyF, consistent with the results of the previous virtual selection. By LC-MS analysis, other genetically encoded lipidated analogs were high in fidelity, except for only 50% of 4 nomyf recognition.

To further increase the efficiency of recognition of genetically encoded lipidated analogs, the 4HexyF molecule was ligated into LipRS-1. The results show that Q356G and M490G mutations provide sufficient space to accommodate these genetically encoded lipidated analogs, but interact poorly with aliphatic side chains, requiring further rational mutation of the amino acids at these two positions, improving the efficiency of recognition of the genetically encoded lipidated analogs. Experiments show that the mutation of 490A significantly improves the efficiency of recognizing 4HexyF and 4OctyF, and the mutant is named LipRS-2, and the corresponding nucleotide sequence is shown in Seq ID No2. The corresponding mutation at position 490 to an amino acid larger than the alanine side chain lost the ability to recognize genetically encoded lipidated analogs. Then, the fidelity of the recognition 4NonyF is further improved through rational design, and finally, a mutant (L225V, Q356G, E391D, F464I, T467G, M490G and A507G) with high fidelity of the recognition 4NonyF is obtained, which is named LipRS-3, and the corresponding nucleotide sequence is shown in Seq ID No3.

The genetically encoded lipidated analogs of the invention have high cLogP values resulting in a lower solubility, with a solubility of 224 μm for 4HexyF and 18.6 μm for 4 DecyF. Further taking 4OctyF as an example, the effect of the dosage used on recognition activity was investigated, and experimental results showed that LipRS-2 could still be recognized and introduced onto proteins with high efficiency at working concentrations of 4OctyF as low as 4. Mu.M.

Fourth, the method comprises the following steps: engineering of accurate lipidated therapeutic drug candidates

The lipidation modification is successfully applied to various FDA approved polypeptides, so that the half life of the polypeptides is obviously improved, the medication cost of patients is reduced, and the pain of the patients is relieved. However, strategies for lipidation modification have progressed slowly on proteinaceous macromolecular drugs for the following reasons:

1) The lack of site-specific lipidation modification tools, the target protein may have a plurality of reaction groups, and the direct chemical modification is difficult to achieve specific coupling;

2) Since lipids are highly hydrophobic, direct chemical modification is required in organic solutions, which conditions can disrupt the structure and function of the protein.

The genetically encoded lipidated analogue can be introduced onto target protein through the high-efficiency site specificity of a chimeric phenylalanine translation system, and a novel platform is provided for novel lipidated modified protein medicines.

The lipidation modification prolongs the half-life of the drug mainly because the lipidation modification endows the drug with the capability of binding serum albumin, and the stronger the binding, the longer the half-life. The invention first evaluates the feasibility of creating accurate lipidated therapeutic drug candidates using lipidated analogues by determining the affinity of the biomacromolecule that incorporates the lipidated analogue to serum albumin.

Specifically, 4HexyF and 4OctyF were first introduced into the K20 site of GLP-1 and affinity with Human Serum Albumin (HSA) was determined using a microphoresis instrument. The results showed that Kd of GLP1-20-4HexyF and GLP1-20-4OctyF binding HSA were 2.31. Mu.M and 0.58. Mu.M, respectively, 6.5 and 25.9 times that of GLP 1-20-HepoK. While wild-type GLP-1 does not bind HSA, further demonstrating that the genetically encoded lipidated analogs of the invention can confer strong HSA binding capacity to the protein of interest.

Further, a general approach was developed to introduce lipidated analogs that extend the half-life of the target protein. The specific choice is to introduce lipidated analogues at the N-terminus of the protein, with minimal impact on the function of the protein of interest. GFP and Neo-2/15 were chosen as model proteins, where Neo-2/15 is a biased IL-2 analog reported by 2019 to have anti-tumor effects. GFP and IL-2 mutants were assayed for affinity for HSA using an isothermal titration calorimeter, indicating Kd values for GFP-4HexyF, GFP-4OctyF, neo-2/15-4HexyF and Neo-2/15-4OctyF binding to HSA of 260nM, 160nM, 474nM and 370nM, respectively. Further analysis showed that one HSA molecule bound 6-7 genetically encoded lipidated proteins. Next, the binding capacity of the 4OcteF, 4FbutF, 4FproF and 4FpenF proteins introduced to HSA was determined, and the results indicated that the Kd values of these mutants were all in nM level.

The accurate lipidation modified Neo-2/15 mutant Neo-2/15-4OctyF is further applied to the treatment of a mouse colon cancer model. The results showed that compared to Neo-2/15 wild type, the tumor volume was 32% smaller after 15 days in the Neo-2/15-4OctyF treated group and the median survival of mice was increased by 2.2 days. These experimental results further demonstrate that the strategy of site-specific introduction of genetically encoded lipidated analogs can serve as a novel strategy and platform for the production of novel drug candidates, particularly those proteinaceous drugs that are difficult to manipulate by traditional chemical modification methods. The strategy of this patent is therefore not limited to application to both GFP and Neo-2/15 proteins, but includes all other drug macromolecules.

Fifth step: lipidation modification of genetically encoded lipidation analog mimetic proteins

The lipidation modification of proteins includes myristoylation modification of N-terminal glycine, palmitoylation modification on cysteine and prenylation modification, which play an extremely important role in the life process of cells. However, there is no genetic coding tool that can be applied to living cells, and analysis of the lipid modification function has been limited. Lipids are an extremely hydrophobic class of molecules, and the functions of lipidation modification are mostly related to membranes. Previous virtual screening results indicate that the cLogP of 4OctyF developed in the present invention is close to myristic acid and may have the ability to mimic the binding of lipidation modifying membranes.

Specifically, the chimeric phenylalanine genetic code extension system for recognizing the lipidated analog is first introduced into mammalian cells, and the flow type result shows that the genetically encoded lipidated analog can be introduced into mammalian cells with high efficiency. Further, the four proteins, LCK, xrp1, SVIN and Gαi1, were selected for their myristoylation modification, and 4OctyF was introduced at the site (G2) of their myristoylation modification, and the location of the mutant proteins was observed in living cells. Live cell imaging showed that the four proteins introduced into 4octyF were distributed mostly on the plasma membrane, similar to the wild type protein distribution, whereas the control group mutated to phenylalanine was not enriched on the plasma membrane. Then, two palmitoylation modified proteins R7BP and STERX were selected, and 4OctyF was introduced at the palmitoylation modified site thereof, and the imaging result showed that they were similar to the cell distribution of the wild type protein and also distributed on the plasma membrane. Finally, KRAS4B modified by isovalerylation is selected, 4OctyF is introduced into the isovalerylation modification site of the KRAS4B, and mutant proteins are distributed on a plasma membrane.

The above results indicate that the genetically encoded lipidated analogue 4OctyF of the present invention can confer membrane-binding capacity to proteins, and can mimic myristoylation, palmitoylation, isovalerylation modifications of proteins to investigate the lipidation of proteins in such a way that a sexual function mutation (gain-of-function) is obtained. The strategy for simulating lipidation modification is not limited to LCK, xrp1, SVIN, Gαi1, R7BP, STERX and KRAS4B, and can be applied to functional researches of all proteins subjected to lipidation modification.

Sixth,: ability to modulate protein membrane binding by length-tunable genetically encoded lipidated analogs

The protein lipidation modification has heterogeneity, and the function of the heterogeneity is to be elucidated, wherein the modification site is connected with unsaturated fatty acids with different lengths. The invention develops lipidation analogues with different lengths, which can be used for researching the function of protein connection with different lengths of lipid. Three genetically encoded lipidated analogues, 4HexyF, 4HepyF and 4OctyF, with 13, 14 and 15 carbon chains, were specifically selected and introduced into the C185 site of Kras4B, respectively, with Hepok and C185F mutants as controls. Experimental results show that the Kras4B-4OctyF protein shows remarkable plasma membrane enrichment distribution, and the Kras4B-4OctyF is mainly distributed in cytoplasm and nucleus, which is similar to a control group introduced by Hepok. The ratio (P/C) of fluorescence intensity to cytoplasmic fluorescence intensity on the membrane was further analyzed quantitatively, with P/C values of 12.4 for Kras4B-4OctyF, 5-fold and 10-fold for Kras4B-4HpeyF and 4 HexyF. Further proves the advantages of the invention in the aspect of membrane binding adjustability, a dipeptide motif of GC is further selected, and the introduction of lipidation modification at the G position induces palmitoylation modification of the dipeptide motif at the Golgi body, so that the corresponding protein is transferred onto a plasma membrane. The group introduced with 4HexyF showed that the proteins were substantially all enriched in the Golgi apparatus, whereas the groups introduced with 4HepyF and 4OctyF were enriched on the plasma membrane in addition to the Golgi apparatus, with P/C of 1.3 and 3.3, respectively.

The above results indicate that as the carbon chain grows, the membrane binding strength also changes, and the affinity of the introduced 4HexyF and 4OctyF genetically encoded lipidated analogues to the membrane is further determined by liposome co-precipitation. As a result, the affinity of PolyK-4OctyF-EGFP to the membrane was 105. Mu.M, which was 18 times that of PolyK-4 HexyF-EGFP. Further thermodynamic analysis shows that the difference in the two carbon chains results in a difference in the bound gibbs free energy of 1.6kcal/mol. These results indicate that there is a cLogP threshold above which cLogP of a given chemical will tend to bind to the cell membrane. The cLogP values for 4HexyF and 4OctyF were 4.4 and 5.3, respectively, and previous literature reported that lauric acid was weak in binding to cell membranes, as cLogP was 5.07, so the present invention speculates that this threshold was 5.07-5.3.

Seventh,: genetically encoded lipidated analogs facilitate delivery of protein cells

Cell-penetrating peptides (cell-peptides) are a class of short peptides that can help macromolecules such as proteins enter cells, and generally consist of positively charged amino acids and hydrophobic amino acids. Cell penetrating peptide mediated delivery enters cells primarily through both endocytosis and direct transduction, with biomacromolecules such as proteins typically primarily through endocytosis. However, these cell penetrating peptides require high protein concentrations for protein cell delivery and can be concentrated in the endosome, resulting in inefficient delivery to the cytoplasm or nucleus. It has been reported in the literature that lipidation modifications increase the direct transduction pathways during transit peptide delivery and that interactions with endosomal membranes through lipid modifications increase the probability of the delivered cargo escaping from the endosome. The genetic coding lipidation analogue developed by the invention can increase the hydrophobicity of the protein, enhance the interaction with the membrane and has great potential to improve the cell delivery efficiency of the protein.

Specifically, polyK-4OctyF-EGFP, polyK-4HexyF-EGFP and PolyK-EGFP were added to HeLa cells, respectively, and after incubation at 37℃for 2 hours, the protein delivery effect was observed by confocal fluorescence microscopy, and the delivery efficiency was analyzed by flow cytometry. The results show that the fluorescent signals of the PolyK-4HexyF-EGFP and the PolyK-EGFP groups are distributed in endosomes in a majority, and the fluorescent signals of the PolyK-4OctyF-EGFP groups are enriched in plasma membranes. This further demonstrates that PolyK-4OctyF has a strong affinity for plasma membranes. The results of the flow-through experiments showed that the cell delivery efficiencies of PolyK-4OctyF-EGFP and PolyK-4HexyF-EGFP were 3.1 and 2.3 times that of PolyK-EGFP, respectively. These results indicate that the genetically encoded lipidated analogs developed according to the present invention can enhance the intracellular delivery efficiency of proteins, and that the introduction of the genetically encoded lipidated analogs is a general strategy that can be applied to a variety of proteins.

The beneficial effects of the invention are as follows:

(1) The invention develops a strategy for virtual screening, which can rapidly evaluate the hydrophobicity of the designed lipidated analog, the ability to bind serum albumin, and the probability of being recognized by orthogonal aminoacyl-tRNA synthetases. This strategy can greatly improve the usability of designing lipidated analogues, reducing the cost of work. Further, through the virtual screening, the invention discovers a class of lipidated analogues with double functional groups of linear aliphatic side chains and aromatic rings, which can well simulate the property of lipid binding serum albumin and membranes. More importantly, the virtual screening strategy developed by the invention can be further expanded to design and identification of unnatural amino acids with other properties, such as fluorescent unnatural amino acids, unnatural amino acids with binding to a receptor, unnatural amino acids with proximity reaction characteristics, and the like.

(2) The invention utilizes a broad-spectrum orthogonal chimeric phenylalanine genetic code expansion system to identify the designed lipidation analogue, leads the site specificity of the lipidation analogue into protein, endows the target protein with the capability of combining serum albumin (Kd is about 400 nM), and enhances the curative effect of the drug protein. The lipid has strong hydrophobicity, and the introduction of the lipid into the protein may interfere with the function of the protein, and careful experimental screening of the introduced site is required. However, in vitro chemical modification of traditional lipidation is difficult to screen lipid introduction sites, which prevents application of the lipidation strategy in protein medicines. The site-specific genetic coding lipidation analogue introduction system can not only rapidly screen introduction sites for introducing the genetic coding lipidation analogue, but also rapidly evaluate the efficiency of introducing the target protein of the genetic coding lipidation analogue. The invention further provides a strategy for introducing genetic coding lipidation analogues at the N end of the protein, which can minimize interference to the functions of the protein and is applied to the modification of all the proteins of interest.

(3) The 4OctyF developed in the present invention is the first genetically encoded lipid analog to mimic the lipidation-modifying membrane-binding properties of proteins in mammalian cells. The protein can simulate not only the myristoylation modification and palmitoylation modification of the protein, but also the isovalerylation modification, has the potential of being applied to the functional research of all the lipidation modified proteins, decodes the functions of the lipidation modified proteins by using a Gain-of-function strategy and plays a role in the functional exercise. More importantly, the invention develops the genetic coding lipidation analogues with linear aliphatic side chains with different chain lengths, not only can be used for decoding the function of protein lipidation modification heterogeneity, but also can be applied to synthesis biology, and a brand new signal path is constructed through adjustable membrane combination.

Drawings

FIG. 1 is a diagram of a genetically encoded lipidated analog of an example computer aided design. In the diagram a: an overview of the development and use of genetically encoded lipidated analogs; in the diagram B: a screen map genetically encoding lipidated analogs Δg and cLogP; in the figure C: -genetically encoded lipidated analogue profiles of Δg8.2 and cLogP >4, wherein grey represents an undesirable unnatural amino acid with RMSD greater than a threshold; in the diagram D: candidate genetic codes for chemical structures of lipidated analogs and myristic acid;

FIG. 2 is a diagram of a genetically encoded lipidated analog of an embodiment of a computational aided design functionalization; in the diagram a: -genetically encoded lipidated analogue profiles of Δg8.2 and cLogP >4, unnatural amino acids except grey meeting RMSD threshold requirements; in the diagram B: candidate genetic codes for the chemical structure of a lipidated analog; in the figure C: a plot of length of linear aliphatic side chain of phenylalanine analog versus RMSD in B;

FIG. 3 is a diagram of a functionalized genetically encoded lipidated analogue for example computer aided design and screening; in the diagram a: calculation procedure of affinity Δg of genetically encoded analog to serum albumin; in the diagram B: calculation procedure of genetically encoded lipidated analog RMSD; in the figure C: candidate genetic codes for the chemical structure of lipidated analogs and some control compounds; in the diagram D: profile of cLogP for compound C; in figure E: profile of affinity of compound in C with human serum albumin; in the diagram F: the structure of the 4HexyF molecule docking to 7 binding pockets of human serum albumin showed the presence of Pi-Pi interactions; in the diagram G: the SPR assay section genetically encodes the affinity of the lipidated analogue for human serum albumin;

FIG. 4 is a synthetic overview of an example genetically encoded lipidated analogue;

FIG. 5 is a screen identification of example functional genetically encoded lipidated analogs. In the diagram a: the 4HexyF molecule was docked to the chimeric phenylalanyl-tRNA synthetase substrate binding pocket and shows the side chains of the amino acids surrounding 4HexyF, and these amino acids were selected to construct a library of mutations to screen genetically encoded lipidated analogs; in the diagram B: green fluorescent protein assay genetically encoded lipidated analog amber codon inhibition efficiency, wherein mutations of LipRS-1 and 2 are: Q356G/E391D/T467G/M490G/A507G and Q356G/E391D/T467G/M490A/A507G; in the figure C: lipRS-2 recognizes the mass spectra of fidelity of 4HexyF, 4OctyF and 4 NonyF; in the diagram D: identification efficiency plots for LipRS-2 at different concentrations of 4 OctyF; in figure E: a complex structure diagram simulating LipRS-1 binding 4 HexyF; in the diagram F: the large side-chain mutation at position 490 prevents the identification of genetically encoded lipidated analogs;

FIG. 6 is a diagram of the identification of example functional genetic encoded lipidated analogs. In the diagram a: map of LipRS-2 recognizing 4 HepyF; in the diagram B: identifying a mass spectrum of 4 HepyF; in the figure C: lipRS-2 identifies the efficiency maps of 4HexeF and 4 OcteF; in the diagram D: candidate genetic codes for the chemical structure of a lipidated analog; in figure E: lipRS-2 recognizes the efficiency maps of 4FproF,4FbutF and 4 FpenF; in the diagram F: mass spectra of 4fprof,4fbutf and 4 FpenF;

FIG. 7 is a graph of the binding capacity of the protein serum albumin conferred by the example genetically encoded lipidated analogs. In the diagram a: cootie-up plots for GLP1 mutants; in the diagram B: mass spectrum of GLP1 mutant; in the figure C: MST determines Kd values of GLP1 mutant and serum albumin; in the diagram D: ITC measures Kd of GFP mutants with serum albumin; in figure E: ITC determination of Kd of Neo-2/15 mutant and serum albumin;

FIG. 8 is an example of an engineered precision lipidation modified candidate therapeutic. In the diagram a: accurate lipidation modification therapeutic drug profile; in the diagram B: LC-MS of GFP mutants; in the figure C: kd of GFP mutant and serum albumin; in the diagram D: LC-MS of Neo-2/15 mutant; in figure E: kd of Neo-2/15 mutant with serum albumin; in the diagram F: constructing a schematic diagram of a colon cancer model; in the diagram G: a curve of colon cancer volume in mice over treatment time; in the figure, H: survival curves of colon cancer mice;

FIG. 9 is a graph showing the Kd of the Neo-2/15 mutant and serum albumin measured by ITC of the example;

FIG. 10 is a graph showing the ability of an example genetically encoded lipidated analog to mimic the binding of a lipidated modified membrane of a protein. In the diagram a: FACS determines the efficiency map of LipRS-2 recognition of genetically encoded lipidated analogs in mammalian cells; in the diagram B: genetically encoded lipidation analogs mimic protein lipidation modification profile; in the figure C: the introduction of 4OctyF mimics the myristoylation of proteins in mammalian cells; in the diagram D: the introduction of 4OctyF can mimic palmitoylation of proteins in mammalian cells;

FIG. 11 is an example of the ability of protein lipidation modifications to bind to cell membranes using length-tunable genetically encoded lipidation analogs. In the diagram a: fate of protein subcellular localization after introduction of different length genetically encoded lipidated analogs; in the diagram B: the introduction of 4OctyF into Kras4B effectively distributes the same onto the plasma membrane; in the figure C: subcellular localization after introduction of 4octyF and 4HexyF into the GC motif; in the diagram D: analysis curves of fluorescence distribution of panels B and C, in which panel E: interaction curve of GFP mutants with liposomes;

FIG. 12 is an example of the ability of a genetically encoded lipidated analog to mimic the lipidation modifying membrane binding of a protein. In the diagram a: subcellular distribution of different gene G2F mutants; in the diagram B: XRP2 introduced 4OctyF and wild-type subcellular distribution; in the figure C: b, fluorescence intensity analysis curve; in the diagram D: lck introduced 4OctyF and wild-type subcellular distribution; in figure E: d, fluorescence intensity analysis curve; in the diagram F: XRP2 and Lck introduce subcellular localization of 4 HexyF;

FIG. 13 is an example of subcellular localization incorporating the fluorescence of the genetically encoded lipidated analogs 4FbutF, 4FproF and 4 FpenF;

FIG. 14 is a graph showing the ability of an example length-tunable genetically encoded lipidated analog to bind to a cell membrane. In the diagram a:4OctyF and 4HexyF are introduced into corresponding sites, and fluorescent subcellular localization is carried out; in the diagram B: a graph A, fluorescence intensity analysis curve; in the figure C: LC-MS profile of EGFP mutants; in the diagram D: graphic description of liposome co-precipitation experiments; in figure E: a threshold that confers on the protein an effective membrane-binding compound cLogP;

FIG. 15 is an example of genetically encoded lipidated analogs to facilitate protein cell delivery. In the diagram a: genetically encoded lipidation resembles microscopic pictures that promote protein delivery; in the diagram B: FACS quantifies protein cell delivery efficiency;

fig. 16 is a diagram of an embodiment for virtual screening.

Detailed Description

The following description is only of the preferred embodiments of the present invention, and the scope of the present invention should not be limited to the examples, but should be construed as falling within the scope of the present invention.

Embodiment one: establishment of computer aided design lipid analogue virtual screening method

The computer-aided design lipid analogue virtual screening method comprises 4 parts, namely:

1) Design of lipid analogs and conversion of formats prior to virtual screening;

2) Prediction of lipidated analogue cLogP;

3) Prediction of affinity of genetically encoded lipid analogs for HSA;

4) Calculation of lipidated analog RMSD.

1. Design and format conversion of lipidated analogs

The two-dimensional structure and SMILES of 4 lipidated analogs (phenylalanine derivatives, tryptophan derivatives, lysine derivatives, unnatural amino groups with aliphatic side chains) are generated using ChemDraw.

And (3) generating a three-dimensional structure of the lipidation analogue by using Chem3D, and converting the three-dimensional structure into a pdbqt format file by using OpenBabel software.

2. Prediction of lipidated analog cLogP

The value of cLogP for a lipidated analogue in the present invention is that of its side chain. The cLogP of the lipidated analog was calculated by SMILES numbering of the lipidated analog using the RDKIT software package, and then the cLogP value (-3.56) of the backbone was subtracted.

3. Prediction of affinity of lipidated analogues to HSA

The affinity of the genetically encoded lipidated analogue for HSA was achieved by docking its molecule to the 7 main fatty acid binding pockets of HSA by AutoDock Vina software, selecting the highest- Δg in the appropriate conformation.

The PDB file (1E 7H) of HSA is converted to pddqt file using autodock tools 1.5.6 and set as receptor molecule in molecular docking.

Parameters of butt joint are designed, and parameters of seven binding pockets are set as follows:

operating AutoDock Vina, and selecting the highest-delta G in the proper conformation.

4. Calculation of lipidated analog RMSD

The lipidated analogue RMSD is a calculation of the average distance between the docking of the genetically encoded lipidated analogue into the conformation generated by the corresponding orthogonal aminoacyl-tRNA synthetase and the conformation of the reference molecule in the corresponding crystal structure. The smaller the RMSD value, the greater the probability that this genetically encoded lipidated analog will be recognized. To measure this recognition probability, the present invention sets this threshold to 1.2 times the maximum RMSD of the recognized unnatural amino acids by calculating the RMSD of these recognized unnatural amino acids. Taking phenylalanine derivatives as an example, the calculation procedure for RMSD is described: 1. a large substrate binding pocket phenylalanyl-tRNA synthetase mutant (T467G/M490G/A507G) was designed and the three-dimensional structure of this mutant was predicted using SWISS-MODEL.

2. The corresponding phenylalanine derivative molecules are butted to a substrate binding pocket by using AutoDock Vina software, 4 conformations with optimal energy are selected, and the molecules are converted into MOL files by using OpenBabel software.

3. The three-dimensional conformation of phenylalanine was extracted from the human mitochondrial phenylalanyl-tRNA synthetase crystal structure (PDB: 5 MGW), converted to MOL file, and set as a reference for calculation of RMSD.

4. RMSD between the phenylalanine derivative molecular pair-wise conformation and the reference molecular conformation was calculated by LigRMSD web server, the RMSD with the smallest of the 4 pair-wise conformations was set as the RMSD of this genetically encoded lipidated analogue.

Embodiment two: synthesis of lipid analogues

N-Boc-4-iodo-L-phenylalanine (10.0 mmol,3.92 g) was dissolved in 40mL of a methylene chloride-methanol mixed solution, and trimethylsilylated diazomethane (12 mmol,12 mL) was added dropwise to the reaction system at 0℃followed by reaction at room temperature for 2 hours, and the solvent was removed by rotary evaporation to give A (A1 yield: 99%) as a white solid.

Product A1 was analyzed and the results were as follows: a1 (s) ¹ H NMR(500MHz,CDCl ₃ )δ7.64–7.56(m,2H),6.87(d,J＝7.9Hz,2H),4.99(d,J＝8.3Hz,1H),4.62–4.48(m,1H),3.70(s,3H),3.06(dd,J＝13.8,5.8Hz,1H),2.97(dd,J＝13.8,6.2Hz,1H),1.41(s,9H). ¹³ C NMR(125MHz,CDCl ₃ )δ172.14,155.09,137.68,135.84,131.42,92.61,80.17,54.30,52.43,38.00,28.39.HRMS(ESI)m/z calcd.For C ₁₀ H ₁₃ INO ₂ ⁺ (M-Boc) ⁺ 305.9985,found 306.0001.

The solid A (4.0 mmol,1.62 g) obtained in the first step was dissolved in 15mL of anhydrous tetrahydrofuran, pd (PPh) was added to the solution at 0deg.C ₃ ) ₂ Cl ₂ (0.04 mmol,28 mg), cuI (0.08 mmol,15 mg) and 2mL piperidine were reacted for 10 minutes, and B1 (4.8 mmol,410 mg) was added dropwise. The reaction was stirred at 40℃for 4h under nitrogen. After completion of the reaction, the mixture was poured into water, extracted with ethyl acetate, and the upper organic layer was washed with brine, dried over anhydrous sodium sulfate, and purified by column chromatography over petroleum ether and ethyl acetate after completion of concentration under reduced pressure to give compound D1 (D1 yield 83%) as a pale yellow oil.

Product D1 was analyzed and the results were as follows: d1 (s) ¹ H NMR(500MHz,CDCl ₃ )δ7.33–7.26(m,2H),7.04(d,J＝7.9Hz,2H),5.22(d,J＝8.3Hz,1H),4.59–4.48(m,1H),3.66(s,3H),3.04(ddd,J＝45.9,13.8,6.2Hz,2H),2.38(t,J＝7.0Hz,2H),1.60–1.52(m,2H),1.46(ddd,J＝9.9,7.6,5.9Hz,2H),1.40(s,9H),0.93(t,J＝7.3Hz,3H). ¹³ C NMR(125MHz,CDCl ₃ )δ172.00,154.90,135.47,131.41,128.96,122.64,90.18,80.17,79.52,54.23,51.89,37.84,30.66,28.06,21.79,18.86,13.43.HRMS(ESI)m/z calcd.For C ₁₆ H ₂₂ NO ₂ ⁺ (M-Boc) ⁺ 260.1645,found 260.1668.

Compound D1 (2.0 mmol,720 mg) was dissolved in 10mL of methanol, naOH (6.0 mmol,240 mg) was added and dissolved in 10mL of H ₂ O. The mixture was stirred at room temperature for 4h, then methanol was evaporated under reduced pressure to a volume of about half the reaction volume. Acidifying with ice-cold 2M diluted hydrochloric acid, and adjusting pH to 3. The aqueous solution was extracted with cold ethyl acetate, and the upper organic layer was washed with saturated brine, dried over anhydrous sodium sulfate, and evaporated in vacuo to give a colorless oil, which was used to give carbamate E1 without further purification. Then, E1 was dissolved in methylene chloride, and trifluoroacetic acid (5.0 mmol,570 mg) was slowly added to deaminate the protecting group to give the objective compound 4HexyF (4 HexyF yield: 74%) as a pale yellow solid.

Product 4HexyF was analyzed and the results were as follows: 4 HexyF) ¹ H NMR(500MHz,D ₂ O)δ7.17(d,J＝7.9Hz,2H),6.97(d,J＝7.9Hz,2H),3.22(dd,J＝9.5,4.1Hz,1H),2.90(dd,J＝13.6,4.1Hz,1H),2.38(dd,J＝13.5,9.6Hz,1H),2.18(t,J＝6.9Hz,2H),1.36(dtd,J＝33.8,9.6,8.5,4.7Hz,4H),0.82(t,J＝7.2Hz,3H). ¹³ C NMR(125MHz,D ₂ O)δ181.25,138.64,131.52,129.26,121.44,89.51,80.96,57.39,41.51,30.78,21.92,18.79,13.37.HRMS(ESI)m/z calcd.For C ₁₅ H ₂₀ NO ₂ ⁺ (M+H) ⁺ 246.1489,found 246.1549.

The solid A (4.0 mmol,1.62 g) obtained in the first step was dissolved in 15mL of anhydrous toluene, pd (OAc) was added to the solution at 0deg.C ₂ (0.8mmol,180mg)，PPh ₃ (1.6 mmol, 319 mg) and 1mL of triethylamine were reacted for 10 minutes, and C1 (4.8 mmol,410 mg) was added dropwise. The reaction was stirred at 100℃overnight under nitrogen. After completion of the reaction, the mixture was poured into water, extracted with ethyl acetate, the upper organic layer was washed with brine, dried over anhydrous sodium sulfate, and purified by column chromatography over petroleum ether and ethyl acetate after completion of concentration under reduced pressure to give compound F1 (F1 yield 67%) as a pale yellow oil 。

Product F1 was analyzed and the results were as follows: f1 (s) ¹ H NMR(500MHz,CDCl ₃ )δ7.38–7.23(m,2H),7.07(ddd,J＝27.9,7.9,4.6Hz,3H),6.39–6.14(m,1H),4.98(d,J＝8.5Hz,1H),4.63–4.45(m,1H),3.71(d,J＝5.6Hz,3H),3.06(ddd,J＝24.0,13.5,7.1Hz,2H),2.19(ddd,J＝16.2,8.1,6.5Hz,2H),1.53–1.27(m,14H),0.98–0.87(m,3H). ¹³ C NMR(125MHz,CDCl ₃ )δ172.26,172.23,155.03,134.45,130.85,129.31,129.27,128.98,125.95,125.57,79.63,54.38,52.00,51.96,32.62,31.45,30.75,28.20,22.72,22.16,13.86,13.81.HRMS(ESI)m/z calcd.For C ₁₆ H ₂₄ NO ₂ ⁺ (M-Boc) ⁺ 262.1802,found 262.1764.

Compound F1 (2.0 mmol,720 mg) was dissolved in 10mL of methanol, naOH (6.0 mmol,240 mg) was added and dissolved in 10mL of H ₂ O. The mixture was stirred at room temperature for 4h, then methanol was evaporated under reduced pressure to a volume of about half the reaction volume. Acidifying with ice-cold 2M diluted hydrochloric acid, and adjusting pH to 3. The aqueous solution was extracted with cold ethyl acetate, and the upper organic layer was washed with saturated brine, dried over anhydrous sodium sulfate, and then evaporated in vacuo to give a colorless oil, which was used to give carbamate G1 without further purification. Then, G1 was dissolved in methylene chloride, and trifluoroacetic acid (5.0 mmol,570 mg) was slowly added to deaminate the protecting group to give the objective compound 4HexeF (yield of 4 HexeF: 65%) as a pale yellow solid.

Product 4hexeF was analyzed and the results were as follows: 4HexeF ¹ H NMR(500MHz,CD ₃ OD)δ7.40–7.25(m,2H),7.24–7.14(m,2H),6.42–6.16(m,1H),4.20(ddd,J＝7.5,5.5,3.2Hz,1H),3.31–3.10(m,2H),2.19(qd,J＝7.0,1.4Hz,1H),2.05–1.90(m,1H),1.56–1.18(m,4H),1.00–0.78(m,3H). ¹³ C NMR(125MHz,Methanol-d ₄ )δ156.85,136.67,136.32,129.10,123.13,119.25,112.76,107.80,70.22,69.96,68.68,54.42,30.79,9.95.HRMS(ESI)m/z calcd.For C ₁₅ H ₂₂ NO ₂ ⁺ (M+H) ⁺ 248.1645,found248.1631.

Embodiment III: surface Plasmon Resonance (SPR) determination of affinity of genetically encoded lipid analogs to serum albumin

The affinity between the genetically encoded lipidated analogue and HSA was determined using the Biacore T200 system, the specific steps being as follows:

1. HSA was coupled to CM5 chips using an amino coupling method. CM5 chips were activated with an equal ratio of 1M NHS and 1M EDC, and HSA (20. Mu.g/ml) dissolved in 10mM sodium acetate buffer was injected onto the chips to bring the response of the chips to 10000RU. Finally, the chip was blocked with 1M ethanolamine.

2. 4OctyF, 4HexyF and HepoK were dissolved in running buffer (20 mM phosphate, 150mM sodium chloride, 5% DMSO, pH 7.5) and formulated at 1mM. After 2h at room temperature, the pellet was removed by centrifugation at 12000rpm for 20 minutes.

3. These unnatural amino acids were sequentially injected into the HSA-immobilized CM5 chip from low concentration to high concentration, and the binding time was set at 120 seconds and the dissociation time at 300 seconds. Finally, the Kd value of the unnatural amino acid combined with HSA is obtained by using Biacore S200 analysis software.

Embodiment four: screening of chimeric phenylalanyl-tRNA synthetase mutations that recognize lipid analogs

1. The 4HexyF molecule was docked to the substrate binding pocket of chPheRS, the amino acids surrounding the 4HexyF molecule were selected, and the mutant library was set as: q356NNK, L360NNK, E391GAN, V393NNK, M490NNK, L494NNK, T467G, and a507G.

2. Fragments were amplified by PCR and ligated into pBK vector by Gibson assembly to construct chPheRS library.

3. The constructed library was electrotransformed into DH10B electrotransformed competent cells containing the negative screen plasmid pNEG-Barnase-Q3TAG-D45TAG-chPheT, and the bacterial solution was plated onto LB plates containing 50. Mu.g/ml kanamycin, 100. Mu.g/ml ampicillin, and0.2% L-arabinoside, and cultured at 37℃for 12 hours.

4. Bacteria on the plates were collected, plasmids were extracted, the extracted plasmids were transformed into DH10B electrotransformation competent cells containing the positive screen plasmid pNEG-CAT112TAG-chPheT-GFP190TAG, the positive screen plates (50. Mu.g/ml kanamycin, 100. Mu.g/ml ampicillin, 10. Mu.g/ml chloramphenicol, and 0.2% L-arabinose) containing the corresponding unnatural amino acids were spread, cultured at 37℃for 12h, and continued at 30℃for 48h. Clones with green fluorescence intensity on the plates were selected, the effect of introducing the genetically encoded lipidated analogue was determined by the method of example 5, and clone sequencing analysis with fluorescence intensity in the presence of the genetically encoded lipidated analogue and low fluorescence in the presence was selected. Finally, two mutants of 4HexyF, 4HepyF, 4OctyF, 4HepeF, 4OcteF, 4prhF were obtained that recognized LipRS-1 (Q356G, E391D, T467G, M490G, and A507G) and LipRS-2 (Q356G, E391D, T467G, M490A, and A507G) with high efficiency. Further mutation analysis resulted in a mutant of high fidelity recognition 4NonyF, designated LipRS-3 (L225V, Q356G, E391D, F464I, T467G, M490G, and A507G).

Fifth embodiment: determination of efficiency of chimeric phenylalanyl-tRNA synthetase mutants to recognize lipid analogs using GFP fluorescence reporting system

1. The pBK plasmid containing the orthogonal aminoacyl-tRNA synthetase and the pNEG plasmid containing GFP190TAG were co-transformed into E.coli DH10B, coated with a panel containing the corresponding antibiotic and incubated overnight at 37 ℃.

2. 3 individual clones were selected from the transformed plates and cultured with shaking to an OD600 of about 0.8 in LB medium containing the corresponding antibiotics, the corresponding genetically encoded lipidated analogues were added, and arabinose was added to induce expression at a final concentration of 0.2%. Controls were set without genetically encoded lipidated analogs. The culture was continued at 30℃for 16h with shaking.

3. After expression was complete, 750. Mu.L of the cell culture was centrifuged, the medium was removed and lysed with 150. Mu.L of BugBuster protein extraction reagent (Millipore). After completion of lysis, the mixture was centrifuged at 12000rpm for 1min, and 100. Mu.L of the supernatant was placed in a 96-well plate (COSTAR). The GFP signal of the supernatant was recorded with Bio Tek Synergy NEO and the OD600 values of the cultures were normalized.

Example six: construction of series of plasmids for introducing genetically encoded lipid analogs

All plasmids were constructed by the Gibson assembly system except as specified, the plasmids constructed and the corresponding steps were as follows:

1. plasmid construction for escherichia coli expression system

The method comprises the steps of (1) preparing pNEG-2 xchPheT-SUMO-GLP 1 and (2) preparing pNEG-2 xchPheT-SUMO-GLP 1-K20TAG

The nucleotide sequence of SUMO-GLP1-K20TAG is shown in Seq ID No4. The SUMO and GLP1 fragment or GLP1-K29TAG fragment were constructed onto vectors of pNEG-2 x chphet using the method of Gibson recombination. Other methods of construction of prokaryotic expression plasmids are consistent herewith. Specifically, when constructing GFP and Neo-2/15 fragments containing an amber codon at the N-terminus, a synthetic sequence was introduced at the N-terminus (Syn: ATGGAGTACGAA TAGGAATACGAGGCCGAAGCGGCTGCAAAAGAGG CCGCTGCAAAGGAAGCTGCAGCGAAGGCT), wherein amber codon positions are underlined.

The nucleotide sequences of Syn-GFP and Syn-Neo-2/15 are shown in Seq ID No5 and 7. The plasmids constructed were pNEG-2 XchPheT-Syn-GFP and pNEG-2 XchPheT-Syn-Neo-2/15, respectively. Further, plasmids expressing the polyK-EGFP mutant were constructed, and fragments of polyK-EGFP and polyK-TAG-EGFP were synthesized, respectively, and Gibson was ligated to a vector of pNEG-2 XchPheT. The nucleotide sequence of polyK-TAG-EGFP is shown in Seq ID No 7.

2. Plasmid construction for mammalian cell expression system

Construction of eukaryotic plasmid for expressing LipRS-2

The LipRS-2 fragment was amplified using primers F8 and R8 and ligated into the pCMV-chPheT vector to construct plasmid pCMV-LipRS-2-chPheT.

Construction of a myristoylation-simulated plasmid

Four proteins were selected for myristoylation, XRP (nm_ 006915.3), SVIP (nm_ 001391937), lck (nm_ 005356.5) and gαi1 (nm_ 002069.6), respectively. These four genes were synthesized and ligated into pEGFP vector to construct wild-type plasmid pEGFP-POIs-EGFP. The Ub gene was further amplified while mutating the second codons of the four POIs to amber codons, thereby constructing plasmids pEGFP-Ub-POIs-G2TAG-EGFP.Ub-XRP-G2TAG-EGFP, ub-SVIP-G2TAG-EGFP, ub-Lck-G2TAG-EGFP and Ub-G αi1-G2TAG-EGFP introducing unnatural amino acids, the nucleotide sequences of which are shown in Seq ID No. 8-11, respectively.

Third-order simulation of prenylated plasmid construction

The synthetic fragment Kras4B (165-185) was ligated to the pEGFP-mCherry-T2A-EGFP plasmid, constructed as pEGFP-mCherry-T2A-Kras4B-EGFP. Further mutating the C185 site of Kras4B into amber codon to construct plasmid pEGFP-mCherry-T2A-Kras4B-C185TAG-EGFP introducing unnatural amino acid.

The nucleotide sequence of mCherry-T2A-Kras4B-C185TAG-EGFP is shown in Seq ID No 12.

Plasmid construction comprising two fatty acid modification sites

A nucleotide sequence encoding XC (wherein X is an amber codon for the introduction of an unnatural amino acid) was synthesized and ligated into the corresponding vector to construct a plasmid pEGFP-mCherry-TAG-C-EGFP. The nucleotide sequence encoding mCherry-TAG-C-EGFP is shown in Seq ID No 13.

Construction of simulated palmitoylation plasmid

The STREX domains of palmitoylation proteins R7BP (NM_ 001029875.3) and KCNMA1 are selected, synthesized and constructed into pEGFP-mCherry-T2A-EGFP plasmids, and simultaneously amber codons are introduced into corresponding palmitoylation sites, so that the constructed plasmids are pEGFP-mCherry-T2A-R7BP-C253TAG-EGFP and pEGFP-mCherry-T2A-STREX-C13TAG-EGFP respectively. The nucleotides encoding mCherry-T2A-R7BP-C253TAG-EGFP and mCherry-T2A-STREX-C13TAG-EGFP are shown in Seq ID No 14-15.

Embodiment seven: insertion of genetically encoded lipidated analog protein expression, purification and mass spectrometry the E.coli expression system used for expression of the recombinant protein in this example was purified by chelate metal ion affinity chromatography, and insertion of the genetically encoded lipidated analog was confirmed by LC-MS. The specific steps are as follows:

protein expression and purification

The DH10B cells cultured overnight were inoculated into 100ml of fresh LB medium at an inoculum size of 1:100 and the required antibiotics were added, and the culture was shaken until the OD600 reached 0.8. The expression of the protein of interest was induced by addition of the corresponding genetically encoded lipidated analogue at a final concentration of 100. Mu.M, and arabinose at a final concentration of 0.2%. The induced cells were centrifuged at 4000rpm at 4℃for 5 minutes, and the resulting cell pellet was resuspended in pre-chilled NTA-0 buffer (25mM Tris,250mM NaCl,pH 8.0) and sonicated. The lysate was centrifuged at 12000rpm at 4℃for 60 minutes, and the resulting supernatant was applied to a nickel affinity chromatography chelate chromatograph equilibrated with NTA-0 buffer in advance, followed by washing with 6 volumes of NTA-0 buffer containing 50mM imidazole. Finally eluting the protein with NTA-0 buffer with 500mM imidazole added

Identification of LC-MS target protein

Purified proteins were analyzed by SCIEX Triple TOF 6600MS mass spectrometry using electrospray ionization and SCIEX analysis TF software. Adopting PHENOMENEX AERIS wide-pore C4 chromatographic column2.1x 50mm,3.6 μm). Mobile phase a was 0.1% formic acid in water and mobile phase B was 0.1% acetonitrile formate. The constant flow rate was set at 0.2mL/min. Mass spectrum deconvolution was performed using SCIEX OS-Q software (version 2.0, SCIEX Corporation) to analyze mass spectrum data. The molecular weight of the protein was predicted using the ExPASy Compute pI/Mw tool.

Example eight: determination of affinity of selected proteins for serum Albumin by micro thermal shock (MST)

Taking the SUMO-GLP1 mutant as an example, a detailed method for determining the affinity of a protein introduced with a genetically encoded lipidated analogue to HSA is as follows:

the procedure is described in embodiment seven, wherein SUMO-GLP1-K20-4OctyF, SUMO-GLP1-K20-4HexyF,

SUMO-GLP1-K20-HepoK and SUMO-GLP1-WT proteins, LC-MS analysis results show that the corresponding unnatural amino acids have been introduced with high fidelity. The four proteins were dialyzed into MST buffer (PBS buffer containing 0.01% tween 20) and then concentrated to the corresponding concentrations using ultrafiltration concentration tubes. Centrifuging the concentrated protein at 10,000g for 20min, removing precipitate, and measuring the concentration for standby.

The HSA powder is taken out, dissolved in PBS buffer solution, and then the Cy5 fluorescent group is marked by a Monolith RED-NHS second-generation protein marking kit (NanoTemper Technologies). The labeled proteins were replaced into MST buffer.

Third, the Kd values of SUMO-GLP1 mutants and HSA were determined using an nt.115 monoith instrument (Nano Temper Technologies, munich, germany) at a constant temperature of 25 ℃. See the instrument instructions for specific steps.

And (5) data processing. By means of NT analysis software, the target protein and fluorescent peptide fragment were combined according to the following sequence 1: and 1, fitting data by the combined model in proportion to obtain a dissociation constant Kd of the target protein, and drawing a fitting curve by using Origin software.

Experimental results show that the affinities of SUMO-GLP1-K20-4OctyF and SUMO-GLP1-K20-4HexyF and HSA are respectively 0.58 mu M and 2.31 mu M

25.9 and 6.3 times SUMO-GLP1-K20-HepoK (15. Mu.M). Meanwhile, experiments show that SUMO-GLP1-WT has no interaction with HSA. These results demonstrate that the genetically encoded lipidated analogs developed according to the present invention can confer strong HAS binding to proteins with great potential to extend the half-life of the corresponding protein or polypeptide drug.

Example nine: isothermal Titration Calorimeter (ITC) for determining affinity of proteins to serum albumin

Kd values of GFP mutant and Neo-2/15 mutant with HSA were determined by isothermal titration calorimeter. Taking GFP mutants as an example, a detailed method for determining the affinity of a protein introduced with a genetically encoded lipidated analogue to HSA is as follows:

the LC-MS analysis results show that the corresponding unnatural amino acids have been introduced with high fidelity, as described in example seven, for purification of GFP-4OctyF, GFP-4HexyF and GFP-F proteins. The three proteins were dialyzed into PBS buffer and then concentrated to the corresponding concentrations using ultrafiltration concentration tubes. Centrifuging the concentrated protein at 10,000g for 20min, removing precipitate, and measuring the concentration for standby.

HSA powder was taken, dissolved with PBS to a final concentration of 80 μm, and added to the loading needle of ITC. mu.M GFP was taken separately into a titration cell and titration experiments were performed according to the ITC manual.

Third, after the titration experiment is completed, the ITC self-contained Origin 7.0 software is used for carrying out dynamics fitting, and Kd of each mutant and HSA is calculated.

Neo-2/15 mutant and HSAThe Kd was determined in the same way as for the GFP mutant. Experimental results indicate that the introduction of 4OctyF and 4HexyF confers the ability of both GFP and Neo-2/15 proteins to bind HSA. GFP-4OctyF and Neo-2/15-4OctyF with HSA Kd of 160nM and 370nM, respectively, neo-2/15 is a preferential analog of IL-2 and IL-15 synthesized de novo, promoting CD8 ⁺ The T cells are amplified and have excellent anti-tumor activity. Because Neo-2/15-4OctyF binds HSA at nM level, it may have a longer serum half-life after injection into blood, thus having a more potent anti-tumor effect.

Example ten: mouse colon cancer model construction and treatment

To further explore the introduction of genetically encoded lipidated analogues, the efficacy of therapeutic proteins or polypeptides may be enhanced, exemplified by Neo-2/15, the effect of the introduction of genetically encoded lipidated analogues in a mouse colon cancer model.

1. Construction of a mouse colon cancer model

The mouse CT26 cell line cultured in a 10cm dish was digested with trypsin and washed twice with PBS for resuspension to remove serum.

The ratio of 1 to 10 ⁶ CT26 cells of (C) were injected subcutaneously left or right in mice, and the mice were cultured so that tumors were grown up to 100mm ³ 。

2. Treatment of a mouse colon cancer model

The established tumor model mice were randomly divided into 3 groups, and endotoxin-removed Neo-2/15, neo-2/15-4OctyF and PBS were intraperitoneally injected into the mice at a frequency of 1mg/kg once daily, respectively, and the tumor volume of the mice was measured daily. Mouse tumor volume (V) is calculated by the formula V = length-width ² *0.5 calculation. When the tumor volume of the mice exceeds 1000mm ³ It is euthanized.

The results show that the introduction of 4OctyF enhances the antitumor effect of Neo-2/15. Compared with Neo-2/15 wild type, the tumor volume of Neo-2/15-4OctyF treated group was 32% smaller after 15 days, and the median survival of mice was increased by 2.2 days.

Example eleven: flow cytometry analysis (FACS) efficiency of insertion of lipidated analogs in mammalian cells

(1) Transfecting the cells. Cells were transfected according to the standard plasmid transient transfection procedure, with the experimental group being cells co-transfected with plasmid pCMV-LipRS-2-chPheT and the fluorescent reporter plasmid pEGFP-mCherry-T2A-EGFP-190TAG expressing the chimeric phenylalanine translation system, and the control group being cells infected with pEGFP-mCherry and pEGFP-EGFP alone.

(2) After 6h of cell transfection, the medium was changed to fresh medium, while the corresponding genetically encoded lipidated analogue was added to a final concentration of 200. Mu.M, and the culture was continued.

(3) After 48h the medium was aspirated off and the residual medium was washed off by addition of 1 XPBS. The PBS solution was aspirated off, cells were digested with pancreatin, resuspended in 1mL DMEM medium, and transferred to a 1.5mL centrifuge tube.

(4) The flow cytometer was set up with HEK 293T cells for forward and side scatter gates, mCherry-expressing cells for parameters and gates of PE channels, EGFP-expressing cells for parameters and gates of FITC channels.

(5) The experimental group cells were assayed and 50000 cells were set per sample collection. Data was analyzed using software FlowJo.

Experimental results show that LipRS-2 obtained by screening can efficiently introduce genetically encoded lipidated analogs 4OctyF and 4HexyF into proteins in mammalian cells.

Embodiment twelve: living cell imaging analysis of binding of genetically encoded lipidated analogs to cell membrane systems

Protein lipidation modifications confer the ability of the protein to bind to the membrane, and the performance of this modification function is also largely dependent on interactions with the membrane. The genetically encoded lipidated analogs developed by the present invention possess the potential for the development of lipid modifications in the membrane-interacting function in the gain-of-function format. The interaction of proteins with membranes can be studied by analyzing the localization of proteins by live cell imaging. Taking the example of simulated farnesyl acylation Kras4B (65-185), the procedure for live cell imaging analysis of the lipid-modified membrane binding function of genetically encoded lipidated analogues to mimic protein is as follows:

1. cell transfection. Cultured HEK 293T cells were inoculated into 20mm glass-bottomed cell culture dishes and when cultured to 70% confluence, pCMV-LipRS-2-chPheT and pEGFP-mCherry-T2A-Kras4B-C185TAG-EGFP plasmids were co-transfected with liposome cell transfection reagents. After 6 hours of transfection, fresh medium was replaced and the corresponding genetically encoded lipidated analogues 4OctyF, 4HepyF and 4HexyF were added at a final concentration of 200. Mu.M. A control without genetic code for lipidated analogues was also made, as well as a C185F control, a Hepok-introduced control.

2. Confocal live cell imaging. After 24 and 24 transfection, observations were made with a 40-fold oil microscope using a zeiss LSM 900 confocal microscope. Two channels, mCherry and EGFP, were selected for picture taking using ZEN software. After shooting, subcellular localization of the fluorescent signal, and the degree of enrichment of the signal on the membrane were analyzed using ZEN software.

Imaging experiments simulating other lipidation modifications were also performed according to the above procedure. The experimental results are as follows:

1) The genetically encoded lipidated analogue 4OctyF can simulate posttranslational modifications such as protein myristoylation, palmitoylation, prenoylation and the like, simulate the interaction capacity of corresponding modifications with cell membranes, and can study the functions of protein lipidation modifications in a gain-of-function mode.

2) The number of carbons genetically encoding the fatty chain of the lipidated analog determines its strength of interaction with the membrane. Imaging results showed that the P/C value of Kras4B-4OctyF was 12.4, 5-fold and 10-fold compared to Kras4B-4HpeyF and 4 HexyF. The genetically encoded lipidation analogues of different carbon chain lengths can be used for studying the function of lipidation modification heterogeneity, and can also be used for constructing an adjustable signal transduction transmitter and applied to synthesis biology.

Embodiment thirteen: determination of the effect of genetically encoded lipidated analogue introduction on protein-bound lipids Using Liposome coprecipitation experiments

1. Preparation of liposomes. The liposome is prepared by a thin film hydration method. 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphorylcholine and 1-palmitoyl-2-oleoyl-sn-glycero-3- (phospho-L-serine) according to 2:1 in chloroform, and the chloroform was spin-dried using a rotary evaporator and dried in vacuo for 2h. The volatilized lipid film was resuspended in buffer A (25mM Tris,250mM NaCl,pH 8.2) to prepare a 5mM stock solution of liposomes.

2. Liposome co-precipitation experiments. The three proteins of polyK-EGFP, polyK-4HexyF-EGFP and polyK-4OcytF-EGFP were purified according to the method of example seven. The purified protein was centrifuged at 16000g for 30min at 4℃to remove precipitate and diluted to a final concentration of 2. Mu.M. 100. Mu.L of diluted protein was added to 100. Mu.L of liposome solution of different concentrations, incubated at room temperature for 15min, and centrifuged at 22℃at 16000g for 15min. The supernatant containing the protein unbound to the liposomes was transferred to a 96-well plate, 200. Mu.L of buffer A was added to resuspend the protein pellet containing bound liposomes also transferred to a 96-well plate, and the corresponding fluorescence intensity was measured.

3. Calculation of Kd of protein and liposome. Ratio of bound Liposome protein (f _Binding ) Using formula f _Binding Calculation of Fpel/(fpel+fsp). Where Fpel is the fluorescence of the assay pellet and Fsp is the fluorescence of the supernatant. F to be calculated _Binding And the corresponding liposome concentration, and using equation f _Binding ＝Ka[c]/(1+Ka[c]Fitting. Ka represents the binding constant and Kd is calculated according to the formula kd=1/Ka.

The results indicate that genetically encoded lipidated analogs can confer the ability to incorporate proteins to bind to cell membranes, with affinity to the membrane being positively correlated with its carbon chain length. The Kd of polyK-4OcytF-EGFP and liposome was 105. Mu.M, which was 18 times that of polyK-4 HexyF-EGFP.

Fourteen examples: determination of the Effect of lipidated analogs on protein cell delivery efficiency

1. HeLa cells were cultured and passaged into 24-well plates and grown to 70% confluency. The medium was removed and washed three times with PBS to remove residual serum.

2. Three proteins of polyK-EGFP, polyK-4HexyF-EGFP and polyK-4OcytF-EGFP were diluted to a final concentration of 2. Mu.M in an opti-MEM medium without serum, added to cells, and incubated at 37℃for 2 hours. The membrane bound proteins were removed by washing three times with PBS containing 0.5mg/ml heparin and the distribution of the proteins in the cells was observed with a confocal microscope according to the setting of example twelve. Further according to the setting of example eleven, the efficiency of the different protein mutants into the cells was quantified using a flow cytometer.

Microscopic images showed that polyK-4HexyF-EGFP was similar to polyK-EGFP, with fluorescent signals distributed in endosomes, indicating that these protein mutants were still endocytosed into the cells. The fluorescence signal of polyK-4OcytF-EGFP was focused mainly on the membrane, further demonstrating that 4OcytF binding to polyK is a strong membrane-bound signal. Flow cytometric analysis showed that PolyK-4OcytF-EGFP and polyK-4HexyF-EGFP were delivered into cells 3.1 and 2.4 times more efficiently than PolyK-EGFP, respectively.

The following are sequence listing:

Seq ID 1:LipRS-1

ATGGATAAGAAGCCGCTGGATGTTCTGATCTCTGCGACCGGTCTGTGGATG

TCCCGTACCGGCACGCTGCACAAGATCAAGCACTATGAGATTTCTCGTTCT

AAAATCTACATCGAAATGGCGTGTGGTGACCATCTGGTTGTGAACAACTCT

CGTTCTTGTCGTCCCGCACGTGCATTCCGTTATCATAAATACCGTAAAACCT

GCAAACGTTGTCGTGTTTCTGACGAAGATATCAACAACTTCCTGACCCGTT

CTACCGAAGGCAAAACCTCTGTTAAAGTTAAAGTTGTTTCTGAGCCGAAAG

TGAAAAAAGCGATGCCGAAATCTGTTTCTCGTGCGCCGAAACCGCTGGAA

AATCCGGTTTCTGCGAAAGCGTCTACCGACACCTCTCGTTCTGTTCCGTCTC

CGGCGAAATCTACCCCGAACTCTCCGGTTCCGACCTCTGCAAGCGCCCCAG

CTCTGACtaaatcccagacggaccgtctggaggtgctgctgaacccaaaggatgaaatctctctgaacagcggcaa

gcctttccgtgagctggaaagcgagctgctgtctcgtcgtaaaaaggatctgcaacagatctacgctgaggaacgcgagg

gtggcggaagcggcggcggaagccaggcctggggatcgaggcctcctgcagcagagtgtgccacccaaagagctcca

ggcagtgtggtggagctgctgggcaaatcctaccctcaggacgaccacagcaacctcacccggaaggtcctcaccagag

ttggcaggaacctgcacaaccagcagcatcaccctctgtggctgatcaaggagagggtgttggagcacttcaacaagcag

tatgtgggcagctctgggaccccgttgttctcggtctatgacaacctttcgccagtggtcacgacctggcagaactttgacag

cctgctcatcccagctgatcacccctgcaggaagaagggggacaactattacctgaatcggactcacatgctgagagcgc

acacgtccgcacacGGTtgggacttgctgcacgcgggactggatgccttcctggtggtgggtgatgtctacaggcgtga

ccagatcgactcccagcactaccctattttccaccagctgGATgccgtgcggctcttcaccaagcatgagttatttgctggt

ataaaggatggggaaagcctgcagctctttgaacaaagttctcgctctgcgcataaacaagagacacacaccatggaggc

cgtgaagcttgttgagtttgatcttaagcaaacgcttaccaggctcatggcacatctttttggagatgagccggagataaggt

gggtagactgctacttcccttttggacatccttcctttgagatggagatcaactttcatggagaatggctggaagttcttggctg

cggggtgGGTgaacaacaactggtcaattcagctggtgctcaagaccgaatcggctggggatttggcctagggttagaa

aggctagccatgatcctctacgacatccctgatatccgtctcttctggtgtgaggacgagcgcttcctgaagcagttctgtgta

tccaacattaatcagaaggtgaagtttcagcctcttagcaaa

Seq ID 2:LipRS-2

ATGGATAAGAAGCCGCTGGATGTTCTGATCTCTGCGACCGGTCTGTGGATG

TCCCGTACCGGCACGCTGCACAAGATCAAGCACTATGAGATTTCTCGTTCT

AAAATCTACATCGAAATGGCGTGTGGTGACCATCTGGTTGTGAACAACTCT

CGTTCTTGTCGTCCCGCACGTGCATTCCGTTATCATAAATACCGTAAAACCT

GCAAACGTTGTCGTGTTTCTGACGAAGATATCAACAACTTCCTGACCCGTT

CTACCGAAGGCAAAACCTCTGTTAAAGTTAAAGTTGTTTCTGAGCCGAAAG

TGAAAAAAGCGATGCCGAAATCTGTTTCTCGTGCGCCGAAACCGCTGGAA

AATCCGGTTTCTGCGAAAGCGTCTACCGACACCTCTCGTTCTGTTCCGTCTC

CGGCGAAATCTACCCCGAACTCTCCGGTTCCGACCTCTGCAAGCGCCCCAG

CTCTGACtaaatcccagacggaccgtctggaggtgctgctgaacccaaaggatgaaatctctctgaacagcggcaa

gtggcggaagcggcggcggaagccaggcctggggatcgaggcctcctgcagcagagtgtgccacccaaagagctcca

ggcagtgtggtggagctgctgggcaaatcctaccctcaggacgaccacagcaacctcacccggaaggtcctcaccagag

acacgtctgcacacGGTtgggacttgctgcacgcgggactggatgccttcctggtggtgggtgatgtctacaggcgtga

ccagatcgactcccagcactaccctattttccaccagctgGAcgccgtgcggctcttcaccaagcatgagttatttgctggt

cggggtgGCTgaacaacaactggtcaattcagctggtgctcaagaccgaatcggctggggatttggcctagggttagaa

tccaacattaatcagaaggtgaagtttcagcctcttagcaaa

Seq ID 3:LipRS-3

ATGGATAAGAAGCCGCTGGATGTTCTGATCTCTGCGACCGGTCTGTGGATG

TCCCGTACCGGCACGCTGCACAAGATCAAGCACTATGAGATTTCTCGTTCT

AAAATCTACATCGAAATGGCGTGTGGTGACCATCTGGTTGTGAACAACTCT

CGTTCTTGTCGTCCCGCACGTGCATTCCGTTATCATAAATACCGTAAAACCT

GCAAACGTTGTCGTGTTTCTGACGAAGATATCAACAACTTCCTGACCCGTT

CTACCGAAGGCAAAACCTCTGTTAAAGTTAAAGTTGTTTCTGAGCCGAAAG

TGAAAAAAGCGATGCCGAAATCTGTTTCTCGTGCGCCGAAACCGCTGGAA

AATCCGGTTTCTGCGAAAGCGTCTACCGACACCTCTCGTTCTGTTCCGTCTC

CGGCGAAATCTACCCCGAACTCTCCGGTTCCGACCTCTGCAAGCGCCCCAG

CTCTGACtaaatcccagacggaccgtctggaggtgctgctgaacccaaaggatgaaatctctctgaacagcggcaa

gtggcggaagcggcggcggaagccaggcctggggatcgaggcctcctgcagcagagtgtgccacccaaagagctcca

ggcagtgtggtggagctgctgggcaaatcctaccctcaggacgaccacagcaacctcacccggaaggtcctcaccagag

cgtgctcatcccagctgatcacccctgcaggaagaagggggacaactattacctgaatcggactcacatgctgagagcgc

acacgtccgcacacGGTtgggacttgctgcacgcgggactggatgccttcctggtggtgggtgatgtctacaggcgtga

ccagatcgactcccagcactaccctattttccaccagctggacgccgtgcggctcttcaccaagcatgagttatttgctggtat

aaaggatggggaaagcctgcagctctttgaacaaagttctcgctctgcgcataaacaagagacacacaccatggaggccg

tgaagcttgttgagtttgatcttaagcaaacgcttaccaggctcatggcacatctttttggagatgagccggagataaggtgg

gtagactgctacataccttttggacatccttcctttgagatggagatcaactttcatggagaatggctggaagttcttggctgcg

gggtggctgaacaacaactggtcaattcagctggtgctcaagaccgaatcggctggggatttggcctagggttagaaagg

ctagccatgatcctctacgacatccctgatatccgtctcttctggtgtgaggacgagcgcttcctgaagcagttctgtgtatcc

aacattaatcagaaggtgaagtttcagcctcttagcaaa

Seq ID 4:SUMO-GLP1-K20TAG

ATGTCGGACTCAGAAGTCAATCAAGAAGCTAAGCCAGAGGTCAAGCCAGA

AGTCAAGCCTGAGACTCACATCAATTTAAAGGTGTCCGATGGATCTTCAGA

GATCTTCTTCAAGATCAAAAAGACCACTCCTTTAAGAAGGCTGATGGAAGC

GTTCGCTAAAAGACAGGGTAAGGAAATGGACTCCTTAAGATTCTTGTACGA

CGGTATTAGAATTCAAGCTGATCAGACCCCTGAAGATTTGGACATGGAGGA

TAACGATATTATTGAGGCTCACAGAGAACAGATTGGTGGATCCCATGGCGA

AGGCACCTTTACCAGCGATGTGAGCAGCTATCTGGAAGGCCAGGCGGCGta

gGAATTTATTGCGTGGCTGGTGAAA

Seq ID 5:Syn-GFP

atgGAGTACGAAtagGAATACGAGgccgaagcggctgcaaaagaggccgctgcaaaggaagctgcag

cgaaggctggtAAAggagaagaacttttcactggagttgtcccaattcttgttgaattagatggtgatgttaatgggcacaa

attttctgtcagtggagagggtgaaggtgatgcaacatacggaaaacttacccttaaatttatttgcactactggaaaactacct

gttccatggccaacacttgtcactactttctcttatggtgttcaatgcttttcccgttatccgGACcacatgaaacggcatgac

tttttcaagagtgccatgcccgaaggttatgtacaggaacgcactatatctttcaaagatgacgggaactacaagacgcgtg

ctgaagtcaagtttgaaggtgatacccttgttaatcgtatcgagttaaaaggtattgattttaaagaagatggaaacattctcgg

acacaaactcgagtacaactataactcacacaacgtatacatcacggcagacaaacaaaagaatggaatcaaagctaactt

caaaattcgccacaacattgaagatggatccgttcaactagcagaccattatcaacaaaatactccaattggcgatggccct

gtccttttaccagacaaccattacctgtcgacacaatctgccctttcgaaagatcccaacgaaaagcgtgaccacatggtcct

tcttgagtttgtaactgctgctgggattacacatggcatggatgaactctacaaa

Seq ID 6:Syn-Neo-2/15

ATGGAGTACGAAtagGAATACGAGgccgaagcggctgcaaaagaggccgctgcaaaggaagctgca

gcgaaggctCCTAAAAAGAAAATCCAGCTGCACGCTGAACATGCACTGTATGAT

GCACTGATGATCCTGAATATCGTCAAAACCAACAGCCCGCCGGCAGAAGA

AAAACTGGAAGATTATGCATTTAACTTTGAACTGATCCTGGAAGAAATTGC

ACGTCTGTTTGAAAGCGGTGATCAGAAAGATGAAGCAGAAAAAGCAAAA

CGTATGAAAGAATGGATGAAACGCATTAAAACCACCGCAAGCGAAGATGA

ACAGGAAGAAATGGCAAATGCAATTATTACCATTCTGCAGAGCTGGATTTTT

AGT

Seq ID 7:polyK-TAG-EGFP

ATGgggaaaaagaagaaaaagaagtcaaagacaaagtagGGCGGAAGCGGCGGCAGCGTGAG

CAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGG

ACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGG

CGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAA

GCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCA

GTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTC

CGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACG

ACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTG

GTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACAT

CCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCAT

GGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACA

ACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACC

CCCATCGGCgacGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCC

AGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTG

CTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTAC

AAG

Seq ID 8:Ub-XRP-G2TAG-EGFP

atgcagatcttcgtgaagactctgactggtaagaccatcaccctcgaggtggagcccagtgacaccatcgagaatgtcaag

gcaaagatccaagataaggaaggcattcctcctgatcagcagaggttgatctttgccggaaaacagctggaagatggtcgt

accctgtctgactacaacatccagaaagagtccaccttgcacctggtgctccgtctcagaggtggctagtgcttcttctccaa

gagacggaaggctgacaaggagtcgcggcccgagaacgaggaggagcggccaaagcagtacagctgggatcagcg

cgagaaggttgatccaaaagactacatgttcagtggactgaaggatgaaacagtaggtcgcttacctgggacggtagcag

gacaacagtttctcattcaagactgtgagaactgtaacatctatatttttgatcactctgctacagttaccattgatgactgtacta

actgcataatttttctgggacccgtgaaaggcagcgtgtttttccggaattgcagagattgcaagtgcacattagcctgccaa

caatttcgtgtgcgagattgtagaaagctggaagtctttttgtgttgtgccactcaacccatcattgagtcttcctcaaatatcaa

atttggatgttttcaatggtactatcctgaattagctttccagttcaaagatgcagggctaagtatcttcaacaatacatggagta

acattcatgactttacacctgtgtcaggagaactcaactggagccttcttccagaagatgctgtggttcaggactatgttcctat

acctactaccgaagagctcaaagctgttcgtgtttccacagaagccaatagaagcattgttccaatatcccggggtcagaga

cagaagagcagcgatgaatcatgcttagtggtattatttgctggtgattacactattgcaaatgccagaaaactaattgatgag

atggttggtaaaggctttttcctagttcagacaaaggaagtgtccatgaaagctgaggatgctcaaagggtttttcgggaaaa

agcacctgacttccttcctcttctgaacaaaggtcctgttattgccttggagtttaatggggatggtgctgtagaagtatgtcaa

cttattgtaaacgagatattcaatgggaccaagatgtttgtatctgaaagcaaggagacggcatctggagatgtagacagctt

ctacaactttgctgatatacagatgggaataGGAAGCGGCGGCAGCGTGAGCAAGGGCGAGG

AGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTA

AACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTA

CGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGC

CCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCC

GCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCG

AAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTAC

AAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCAT

CGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACA

AGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGC

AGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGAC

GGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCgac

GGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCGCCCT

GAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCG

TGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAG

Seq ID 9:Ub-SVIP-G2TAG-EGFP

accctgtctgactacaacatccagaaagagtccaccttgcacctggtgctccgtctcagaggtggctagctgtgttttccttgt

cccggggagtccgcgcctcccacgccggacctggaagagaaaagagcaaagcttgcagaggctgcagagagaagac

aaaaagaggctgcatctcggggaattttagatgttcaatctgtgcaagaaaagagaaagaaaaaggaaaaaatagaaaaa

caaattgctacatccgggcccccaccagaaggtggacttaggtggacagtttcaGGAAGCGGCGGCAGCG

TGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAG

CTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGA

GGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCG

GCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGC

GTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTC

AAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAG

GACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACA

CCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGC

AACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTAT

ATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCG

CCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGA

ACACCCCCATCGGCgacGGCCCCGTGCTGCTGCCCGACAACCACTACCTGA

GCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATG

GTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGA

GCTGTACAAG

Seq ID 10:Ub-Lck-G2TAG-EGFP

accctgtctgactacaacatccagaaagagtccaccttgcacctggtgctccgtctcagaggtggctagtgtggctgcagct

cacacccggaagatgactggatggaaaacatcgatgtgtgtgagaactgccattatcccatagtcccactggatggcaagg

gcacgctgctcatccgaaatggctctgaggtgcgggacccactggttacctacgaaggctccaatccgccggcttcccca

ctgcaagacaacctggttatcgctctgcacagctatgagccctctcacgacggagatctgggctttgagaagggggaaca

gctccgcatcctggagcagagcggcgagtggtggaaggcgcagtccctgaccacgggccaggaaggcttcatccccttc

aattttgtggccaaagcgaacagcctggagcccgaaccctggttcttcaagaacctgagccgcaaggacgcggagcggc

agctcctggcgcccgggaacactcacggctccttcctcatccgggagagcgagagcaccgcgggatcgttttcactgtcg

gtccgggacttcgaccagaaccagggagaggtggtgaaacattacaagatccgtaatctggacaacggtggcttctacatc

tcccctcgaatcacttttcccggcctgcatgaactggtccgccattacaccaatgcttcagatgggctgtgcacacggttgag

ccgcccctgccagacccagaagccccagaagccgtggtgggaggacgagtgggaggttcccagggagacgctgaag

ctggtggagcggctgggggctggacagttcggggaggtgtggatggggtactacaacgggcacacgaaggtggcggt

gaagagcctgaagcagggcagcatgtccccggacgccttcctggccgaggccaacctcatgaagcagctgcaacacca

gcggctggttcggctctacgctgtggtcacccaggagcccatctacatcatcactgaatacatggagaatgggagtctagtg

gattttctcaagaccccttcaggcatcaagttgaccatcaacaaactcctggacatggcagcccaaattgcagaaggcatgg

cattcattgaagagcggaattatattcatcgtgaccttcgggctgccaacattctggtgtctgacaccctgagctgcaagattg

cagactttggcctagcacgcctcattgaggacaacgagtacacagccagggagggggccaagtttcccattaagtggaca

gcgccagaagccattaactacgggacattcaccatcaagtcagatgtgtggtcttttgggatcctgctgacggaaattgtcac

ccacggccgcatcccttacccagggatgaccaacccggaggtgattcagaacctggagcgaggctaccgcatggtgcgc

cctgacaactgtccagaggagctgtaccaactcatgaggctgtgctggaaggagcgcccagaggaccggcccacctttg

actacctgcgcagtgtgctggaggacttcttcacggccacagagggccagtaccagcctcagcctGGAAGCGGC

GGCAGCGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCT

GGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCG

AGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGC

ACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGAC

CTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACG

ACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCT

TCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAG

GGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGA

GGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACA

ACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCA

AGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTAC

CAGCAGAACACCCCCATCGGCgacGGCCCCGTGCTGCTGCCCGACAACCAC

TACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGA

TCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCAT

GGACGAGCTGTACAAG

Seq ID 11:Ub-Gαi1-G2TAG-EGFP

accctgtctgactacaacatccagaaagagtccaccttgcacctggtgctccgtctcagaggtggctagtgcacgctgagc

gccgaggacaaggcggcggtggagcggagtaagatgatcgaccgcaacctccgtgaggacggcgagaaggcggcgc

gcgaggtcaagctgctgctgctcggtgctggtgaatctggtaaaagtacaattgtgaagcagatgaaaattatccatgaagc

tggttattcagaagaggagtgtaaacaatacaaagcagtggtctacagtaacaccatccagtcaattattgctatcattaggg

ctatggggaggttgaagatagactttggtgactcagcccgggcggatgatgcacgccaactctttgtgctagctggagctg

ctgaagaaggctttatgactgcagaacttgctggagttataaagagattgtggaaagatagtggtgtacaagcctgtttcaac

agatcccgagagtaccagcttaatgattctgcagcatactatttgaatgacttggacagaatagctcaaccaaattacatccc

gactcaacaagatgttctcagaactagagtgaaaactacaggaattgttgaaacccattttactttcaaagatcttcattttaaaa

tgtttgatgtgggaggtcagagatctgagcggaagaagtggattcattgcttcgaaggagtgacggcgatcatcttctgtgta

gcactgagtgactacgacctggttctagctgaagatgaagaaatgaaccgaatgcatgaaagcatgaaattgtttgacagca

tatgtaacaacaagtggtttacagatacatccattatactttttctaaacaagaaggatctctttgaagaaaaaatcaaaaagag

ccctctcactatatgctatccagaatatgcaggatcaaacacatatgaagaggcagctgcatatattcaatgtcagtttgaaga

cctcaataaaagaaaggacacaaaggaaatatacacccacttcacatgtgccacagatactaagaatgtgcagtttgtttttg

atgctgtaacagatgtcatcataaaaaataatctaaaagattgtggtctctttGGAAGCGGCGGCAGCGTGA

GCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTG

GACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGG

GCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCA

AGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGC

AGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGT

CCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACG

ACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTG

GTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACAT

CCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCAT

GGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACA

ACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACC

CCCATCGGCgacGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCC

AGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTG

CTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTAC

AAG

Seq ID 12:mCherry-T2A-Kras4B-C185TAG-EGFP

atgGTGAGCAAGGGCGAGGAGGATAACATGATGGCCATCATCAAGGAGTTCA

TGCGCTTCAAGGTGCACATGGAGGGCTCCGTGAACGGCCACGAGTTCGAG

ATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAGGGCACCCAGACCGCCA

AGCTGAAGGTGACCAAGGGTGGCCCCCTGCCCTTCGCCTGGGACATCCTG

TCCCCTCAGTTCATGTACGGCTCCAAGGCCTACGTGAAGCACCCCGCCGAC

ATCCCCGACTACTTGAAGCTGTCCTTCCCCGAGGGCTTCAAGTGGGAGCGC

GTGATGAACTTCGAGGACGGCGGCGTGGTGACCGTGACCCAGGACTCCTC

CCTGCAGGACGGCGAGTTCATCTACAAGGTGAAGCTGCGCGGCACCAACT

TCCCCTCCGACGGCCCCGTAATGCAGAAGAAGACCATGGGCTGGGAGGCC

TCCTCCGAGCGGATGTACCCCGAGGACGGCGCCCTGAAGGGCGAGATCAA

GCAGAGGCTGAAGCTGAAGGACGGCGGCCACTACGACGCTGAGGTCAAG

ACCACCTACAAGGCCAAGAAGCCCGTGCAGCTGCCCGGCGCCTACAACGT

CAACATCAAGTTGGACATCACCTCCCACAACGAGGACTACACCATCGTGGA

ACAGTACGAACGCGCCGAGGGCCGCCACTCCACCGGCGGCATGGACGAGC

TGTACAAGggaagcggaGAGGGGAGAGGAAGTCTGCTAACATGCGGTGACGTC

GAGGAGAATCCTGGCCCAaaacataaagaaaagatgagcaaagatgggaaaaagaagaaaaagaagtc

aaagacaaagTAGGGCGGAAGCGGCGGCAGCGTGAGCAAGGGCGAGGAGCTG

TTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGG

CCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCA

AGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGG

CCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTAC

CCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGG

CTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGA

CCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAG

CTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCT

GGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGA

AGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGC

AGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCgacGGC

CCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCGCCCTGAG

CAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGA

CCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAG

Seq ID 13:mCherry-TAG-C-EGFP

atgGTGAGCAAGGGCGAGGAGGATAACATGATGGCCATCATCAAGGAGTTCA

TGCGCTTCAAGGTGCACATGGAGGGCTCCGTGAACGGCCACGAGTTCGAG

ATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAGGGCACCCAGACCGCCA

AGCTGAAGGTGACCAAGGGTGGCCCCCTGCCCTTCGCCTGGGACATCCTG

TCCCCTCAGTTCATGTACGGCTCCAAGGCCTACGTGAAGCACCCCGCCGAC

ATCCCCGACTACTTGAAGCTGTCCTTCCCCGAGGGCTTCAAGTGGGAGCGC

GTGATGAACTTCGAGGACGGCGGCGTGGTGACCGTGACCCAGGACTCCTC

CCTGCAGGACGGCGAGTTCATCTACAAGGTGAAGCTGCGCGGCACCAACT

TCCCCTCCGACGGCCCCGTAATGCAGAAGAAGACCATGGGCTGGGAGGCC

TCCTCCGAGCGGATGTACCCCGAGGACGGCGCCCTGAAGGGCGAGATCAA

GCAGAGGCTGAAGCTGAAGGACGGCGGCCACTACGACGCTGAGGTCAAG

ACCACCTACAAGGCCAAGAAGCCCGTGCAGCTGCCCGGCGCCTACAACGT

CAACATCAAGTTGGACATCACCTCCCACAACGAGGACTACACCATCGTGGA

ACAGTACGAACGCGCCGAGGGCCGCCACTCCACCGGCGGCATGGACGAGC

TGTACAAGggaagcggaGAGGGGAGAGGAAGTCTGCTAACATGCGGTGACGTC

GAGGAGAATCCTGGCCCAGGCTAGTGCGGAAGCGGCGGCAGCGTGAGCA

AGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGAC

GGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCG

ATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGC

TGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGT

GCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCG

CCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGAC

GGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGT

GAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCC

TGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGG

CCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAAC

ATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCC

CATCGGCgacGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCA

GTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGC

TGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACA

AG

Seq ID 14:mCherry-T2A-R7BP-C253TAG-EGFP

atgGTGAGCAAGGGCGAGGAGGATAACATGATGGCCATCATCAAGGAGTTCA

TGCGCTTCAAGGTGCACATGGAGGGCTCCGTGAACGGCCACGAGTTCGAG

ATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAGGGCACCCAGACCGCCA

AGCTGAAGGTGACCAAGGGTGGCCCCCTGCCCTTCGCCTGGGACATCCTG

TCCCCTCAGTTCATGTACGGCTCCAAGGCCTACGTGAAGCACCCCGCCGAC

ATCCCCGACTACTTGAAGCTGTCCTTCCCCGAGGGCTTCAAGTGGGAGCGC

GTGATGAACTTCGAGGACGGCGGCGTGGTGACCGTGACCCAGGACTCCTC

CCTGCAGGACGGCGAGTTCATCTACAAGGTGAAGCTGCGCGGCACCAACT

TCCCCTCCGACGGCCCCGTAATGCAGAAGAAGACCATGGGCTGGGAGGCC

TCCTCCGAGCGGATGTACCCCGAGGACGGCGCCCTGAAGGGCGAGATCAA

GCAGAGGCTGAAGCTGAAGGACGGCGGCCACTACGACGCTGAGGTCAAG

ACCACCTACAAGGCCAAGAAGCCCGTGCAGCTGCCCGGCGCCTACAACGT

CAACATCAAGTTGGACATCACCTCCCACAACGAGGACTACACCATCGTGGA

ACAGTACGAACGCGCCGAGGGCCGCCACTCCACCGGCGGCATGGACGAGC

TGTACAAGggaagcggaGAGGGGAGAGGAAGTCTGCTAACATGCGGTGACGTC

GAGGAGAATCCTGGCCCAagttctgcaccgaatgggcgcaaaaagcgccccagccggtccacccgctc

ctcgatcttccagatcagcaagcccccgctgcagagcggagattgggagcgcaggggcagcggctccgagagcgccc

acaaaacccaacgagccctggacgactgcaagatgcttgtccaagagttcaacacacaagtggccctgtaccgagagctg

gtcatttctattggggatgtctcggtcagctgcccctcactccgggcggaaatgcacaagacaagaaccaaaggctgtgaa

atggcccgtcaggcacaccaaaaattggctgccatctcaggcccggaagatggtgagatccatccagaaatctgtcggctt

tacatccagctgcagtgctgcttagaaatgtataccacagagatgctaaaatccatatgtctgctggggtctcttcagtttcatc

gaaaaggaaaggaacctggcgggggaaccaagagtttggattgcaaaattgaggagagtgctgaaacacctgccctaga

agactcctcatcatcccccgtagatagtcagcaacattcctggcaggtttccacagacattgagaacactgaaagagacatg

agagaaatgaaaaaccttttaagcaaactcagggaaactatgcctttaccattgaaaaatcaagatgacagcagccttctga

atctaactccctaccccctggtgagaagacggaagagaaggttctttgggctgtgttagctcatctcaagcGGAAGCG

GCGGCAGCGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATC

CTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGG

CGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCT

GCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTG

ACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCA

CGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCAT

CTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCG

AGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAG

GAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCA

CAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTT

CAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACT

ACCAGCAGAACACCCCCATCGGCgacGGCCCCGTGCTGCTGCCCGACAACC

ACTACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGC

GATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGC

ATGGACGAGCTGTACAAG

Seq ID 15:mCherry-T2A-STREX-C13TAG-EGFP

atgGTGAGCAAGGGCGAGGAGGATAACATGATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGCACATGGAGGGCTCCGTGAACGGCCACGAGTT

CGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAGGGCACCCAGACC

GCCAAGCTGAAGGTGACCAAGGGTGGCCCCCTGCCCTTCGCCTGGGACAT

CCTGTCCCCTCAGTTCATGTACGGCTCCAAGGCCTACGTGAAGCACCCCGC

CGACATCCCCGACTACTTGAAGCTGTCCTTCCCCGAGGGCTTCAAGTGGGA

GCGCGTGATGAACTTCGAGGACGGCGGCGTGGTGACCGTGACCCAGGACT

CCTCCCTGCAGGACGGCGAGTTCATCTACAAGGTGAAGCTGCGCGGCACC

AACTTCCCCTCCGACGGCCCCGTAATGCAGAAGAAGACCATGGGCTGGGA

GGCCTCCTCCGAGCGGATGTACCCCGAGGACGGCGCCCTGAAGGGCGAGA

TCAAGCAGAGGCTGAAGCTGAAGGACGGCGGCCACTACGACGCTGAGGT

CAAGACCACCTACAAGGCCAAGAAGCCCGTGCAGCTGCCCGGCGCCTACA

ACGTCAACATCAAGTTGGACATCACCTCCCACAACGAGGACTACACCATCG

TGGAACAGTACGAACGCGCCGAGGGCCGCCACTCCACCGGCGGCATGGAC

GAGCTGTACAAGggaagcggaGAGGGGAGAGGAAGTCTGCTAACATGCGGTG

ACGTCGAGGAGAATCCTGGCCCAaaggcctgtcatgatgacatcacagatcccaaaagaataaaaaa

atgtggctgcaaacggcccaagatgtccatctacaagagaatgagacgggcatgttagtttgattgcggacgttctgagcgt

gactgctcatgcatgtcaggccgtgtgcgtggtaacgtggacacccttgagagagccttcccactttcttctgtctctgttaat

gattgctccaccagtttccgtgccGGAAGCGGCGGCAGCGTGAGCAAGGGCGAGGAGCT

GTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACG

GCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGC

AAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTG

GCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTA

CCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAG

GCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAG

ACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGA

GCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGC

TGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGA

AGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGC

AGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCgacGGC

CCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCGCCCTGAG

CAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGA CCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAG。

the technical problems, technical solutions and advantageous effects solved by the present invention have been further described in detail in the above-described embodiments, and it should be understood that the above-described embodiments are only illustrative of the present invention and are not intended to limit the present invention, and any modifications, equivalent substitutions, improvements, etc. within the spirit and principle of the present invention should be included in the scope of protection of the present invention.

Claims

1. A class of lipidated analogues capable of binding cell membranes and/or serum albumin, characterized by the general formula:

if R is H, n ranges from 1 to 8; when the R group is an aromatic group having an arbitrary atomic composition, n is in the range of 1 to 4.

2. A class of lipidated analogues capable of binding to cell membranes and/or serum albumin according to claim 1, wherein the lipidated analogues are bifunctional structures comprising aromatic rings and linear fatty chains, the structural characteristics of the lipidated analogues are as follows:

wherein R1 is H, n ranges from 1 to 8, and R2 is any chemical coupling group.

3. A class of lipidated analogues capable of binding cell membranes and/or serum albumin according to claim 2, wherein the lipidated analogue is introduced onto the biological macromolecule via a chemical coupling group of R2.

4. The lipidated cell membrane and/or serum albumin binding class of analogs according to claim 1 wherein R is one or more of naphthyl, indolyl, phenyl, furanyl, imidazolyl, thiazolyl, thienyl, indenyl, benzothienyl, benzimidazolyl, benzofuranyl, benzothiazolyl, pyrrolyl, benzopyrrolyl, 2, 3-indanyl, oxazolyl, benzoxazolyl, pyridyl, pyrimidinyl.

5. A virtual screening method using calculation-assisted binding of lipidated analogues of cell membranes and/or serum albumin, characterized in that the virtual screening is aimed at a plurality of phenylalanine analogues, tryptophan analogues, lysine analogues and unnatural amino acids with aliphatic side chains, the virtual screening comprising the following aspects:

(1) Evaluating the hydrophobicity of the designed unnatural amino acid;

(2) Evaluating the affinity of the designed unnatural amino acid for serum albumin;

(3) The likelihood of the designed unnatural amino acid being recognized by an orthogonal aminoacyl-tRNA synthetase was evaluated.

6. A molecule of synthetic lipidated analogue obtained by computational virtual screening, characterized in that it comprises the following molecules:

7. the use of a lipidated analogue of cell membrane and/or serum albumin binding according to claim 6, wherein said lipidated analogue is introduced onto a biological macromolecule by means of genetic coding, comprising the steps of:

s1, designing and constructing a chimeric phenylalanyl-tRNA synthetase (chPheRS) mutant library, and screening lipidated analogues;

s2, utilizing the screened chimeric phenylalanyl-tRNA synthetase mutant to introduce the lipidation analogues into the biomacromolecule in the escherichia coli or mammalian cell in a site-specific way so as to obtain the biomacromolecule containing the lipidation analogues.

8. The use of a lipidated analogue which binds to cell membranes and/or serum albumin according to claim 7, wherein the chimeric phenylalanyl-tRNA synthetase mutant which recognizes said lipidated analogue comprises LipRS-1, lipRS-2 and LipRS-3, the gene sequence of LipRS-1 is shown as Seq ID 1, the gene sequence of LipRS-2 is shown as Seq ID 2, and the gene sequence of LipRS-3 is shown as Seq ID 3.

9. Use of a lipidated analogue of cell membrane and/or serum albumin according to claim 8, wherein the use of the lipidated analogue comprises the following aspects:

(1) The serum albumin is introduced to the biomacromolecule, so that the serum albumin binding capacity of the biomacromolecule is endowed, the serum half-life of the biomacromolecule is further delayed, and the treatment effect of the biomacromolecule is enhanced;

(2) Introducing the modified protein into the site of protein lipidation modification, and researching the function of protein lipidation by using a gain-of-function method;

(3) Introducing the modified polypeptide into biological macromolecules to endow the biological macromolecules with cell membrane binding capacity and change cell sub-positioning;

(4) Is introduced onto biomacromolecule to raise the delivery efficiency of biomacromolecule cell.

10. The use of a lipidated analogue of cell membrane and/or serum albumin according to claim 9 wherein the biological macromolecule comprises a protein, polypeptide or cytokine, chemokine, growth factor, enzyme, protein hormone, polypeptide hormone, active fragment of an antibody.

11. The use of a lipidated analogue of cell membrane and/or serum albumin according to claim 10 wherein the biological macromolecules comprise Neo-2/15, gfp, LCK, xrp1, SVIN, gαi1, R7BP and STREX.