[go: up one dir, main page]

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

REvIEwS

Deciphering protein post-translational


modifications using chemical
biology tools
Anne C. Conibear
Abstract | Proteins carry out a wide variety of catalytic, regulatory, signalling and structural
functions in living systems. Following their assembly on ribosomes and throughout their lifetimes,
most eukaryotic proteins are modified by post-translational modifications; small functional
groups and complex biomolecules are conjugated to amino acid side chains or termini, and
the protein backbone is cleaved, spliced or cyclized, to name just a few examples. These
modifications modulate protein activity, structure, location and interactions, and, thereby,
control many core biological processes. Aberrant post-translational modifications are markers
of cellular stress or malfunction and are implicated in several diseases. Therefore, gaining an
understanding of which proteins are modified, at which sites and the resulting biological
consequences is an important but complex challenge requiring interdisciplinary approaches.
One of the key challenges is accessing precisely modified proteins to assign functional
consequences to specific modifications. Chemical biologists have developed a versatile set of
tools for accessing specifically modified proteins by applying robust chemistries to biological
molecules and developing strategies for synthesizing and ligating proteins. This Review provides
an overview of these tools, with selected recent examples of how they have been applied to
decipher the roles of a variety of protein post-translational modifications. Relative advantages
and disadvantages of each of the techniques are discussed, highlighting examples where they are
used in combination and have the potential to address new frontiers in understanding complex
biological processes.

Proteome
Modification and splicing events at the DNA, RNA understanding of how PTMs alter protein functions in
All the proteins present and protein levels result in a far greater diversity and core cellular processes is increasing, as is our apprecia-
in a cell, tissue or organism complexity of the proteome than would be predicted tion of the consequences of misregulation of PTMs in
at a given time. by the number of protein-coding sequences in the disease3. Deciphering the roles of PTMs, therefore, has
genome1,2. Part of this complexity arises from modi- significant implications for prevention, diagnosis and
Post-translational
modifications fications to proteins during or after their assembly on treatment of disease, as well as for synthetic biology
(PTMs). Covalent modifications the ribosome — post-translational modifications (PTMs). and protein engineering.
to a protein after its assembly These chemical transformations increase the diversity The number and variety of documented protein
on the ribosome.
of the proteome and alter protein structures, destina- PTMs are continually expanding due to the develop-
Proteoforms
tions, interactions and activities (FiG. 1). The backbone ment of increasingly sensitive analytical techniques and
Modified and/or processed and side-chain functional groups of newly synthesized deeper analysis of proteomic data4,5. In parallel, there
forms of a protein arising from proteins serve as chemical synthons for a host of enzy- is growing appreciation for temporal and spatial varia-
a single gene. matic, non-enzymatic and self-catalysed reactions that tion in PTMs, with proteins being differently modified
range from addition of small chemical functionalities, at points in their life cycle or in subcellular locations,
School of Biomedical complex biomolecules or polypeptides to cleavage, tissues or organs. Effects of combinations of PTMs on
Sciences, The University
of Queensland, Brisbane,
side-chain and backbone cyclization and splicing1. Many a protein — PTM ‘crosstalk’ — add to the complexity.
QLD, Australia. PTMs are dynamic, enabling cells and organisms to However, the number of proteoforms observed appears
e-mail: a.conibear@uq.edu.au respond rapidly to changes in their environments; PTM to be far lower than even a conservative estimate of pos-
https://doi.org/10.1038/ addition or removal can alter protein function without sible variants2. This discrepancy points to limitations in
s41570-020-00223-8 the need for degradation and de novo synthesis. Our the analytical tools available but also indicates precise

Nature Reviews | Chemistry


Reviews

proteins to study their effects in cellular processes and


disease8,9. Many of these tools rely on a relatively small
set of robust and versatile bioconjugation reactions that
DNA can be carried out on unprotected proteins under mild
conditions10–13. Although they use many of the same
Transcription
Structure
Function
Activity chemistries, studies focused on identifying PTMs and
studies focused on accessing site-specifically modified
Location Interactions proteins have tended to be separate, and the scope of this
Review is limited to the latter, with a very brief overview
mRNA of PTM identification methods14 given in Box 1. This
Translation Review summarizes modern chemical biology tools
OH
S
available to access homogeneous, site-specifically mod-
HO
HO
O
O
ified proteins bearing PTMs — proteins comprising a
Post-translational OH O
HO
O
single proteoform in which each protein molecule in the
modifications
sample bears the same modification(s) at the same loca-
NH

tion(s). Rather than a comprehensive historical review


of the development of these tools, recent examples
from the last five years demonstrate how application of
O

Protein O P
O
O

genetic-code expansion, protein (semi-)synthesis, enzy-


matic modification and chemoselective chemistries have
Fig. 1 | Protein post-translational modifications diversify the proteome. Modifications provided unique insights into the role of PTMs in funda-
at the DNA, RNA and protein levels increase the number of proteoforms coded for by
mental biological processes. In particular, we highlight
a single gene. This Review focuses on chemical biology tools to access proteins bearing
site-specific post-translational modifications to decipher their roles in regulating protein studies in which combinations of these tools have been
location, structure, function, activity and interactions (the protein illustrated is ubiquitin used because these studies demonstrate how their com-
bearing hypothetical post-translational modifications, Protein Data Bank (PDB): 1UBQ). plementary advantages can be harnessed. Finally, some
remaining challenges are discussed, as well as oppor-
tunities for collaborating across disciplines to address
regulation of addition and removal of PTMs that we new frontiers.
still understand poorly. Gaining an accurate picture of
proteome diversity will, therefore, require a system-wide Accessing proteins bearing PTMs
perspective, as well as atomic and spatio-temporal For many PTMs identified using the tools mentioned in
resolution on individual PTM sites. Box 1 and annotated in protein databases15, their roles
Classification of PTMs by the type of functional in the structures, functions and interactions of proteins
group added or amino acid side chain modified reveals are poorly understood. To be able to unravel which PTMs
common chemical and enzyme mechanisms, and has at which sites give rise to observed biological activities,
been traditionally used to structure reviews on PTMs1. we need to access distinct proteoforms that bear defined
The prevalence and relative ease of identification of PTMs. As PTMs may occur sub-stoichiometrically and
PTMs that fall into these classes, phosphorylation, the modified proteins themselves have low abundance,
acetylation, glycosylation, methylation and ubiqui- isolation of proteins bearing specific PTMs from natu-
tylation, have led to these PTMs being relatively well ral sources on a preparative scale is usually impractical
studied. Nevertheless, many questions remain and our and would result in heterogeneous mixtures. Therefore,
ability to identify PTMs far outpaces our ability to deci- tools to make and modify proteins with atomic resolu-
pher their roles. In the last decade, many lesser studied tion are key to structural studies and in vitro and in vivo
and non-enzymatic PTMs have also been recognized to assays to understand the roles of PTMs. Chemical biol-
have key roles in protein regulation and disease6. In this ogy tools for accessing these proteins employ the native
Review, we focus on chemical biology tools to access protein-synthesis machinery, total chemical synthesis
proteins bearing specific PTMs, mostly those involv- and enzymatic modifications, with several strategies
PTM sites ing addition of functional groups and biomolecules to combining these tools. Each strategy has its strengths and
Specific amino acid residues amino acid side chains. Examples illustrating a wide, limitations, as illustrated by selected examples that cover
bearing one or more but non-exhaustive, variety of PTMs have been selected a variety of large and small, simple and complex PTMs.
post-translational
modifications (PTMs).
to demonstrate the applicability of the tools discussed to
many PTMs and readers are directed to the cited reviews Genetic-code expansion
Canonical amino acids for an overview of PTM classes1,6. Protein splicing, To synthesize proteins, nature typically uses a lim-
The 20 standard amino acids disulfide-bond formation, cyclization and N-terminal ited set of 20 amino acids — the canonical amino acids
encoded in the genetic code
and C-terminal processing are not covered here, but are — encoded by triplet codons in the genetic code.
and incorporated into proteins
by endogenous protein also important side-chain and backbone modifications1. Information contained in the DNA genetic code is tran-
biosynthesis processes. Indeed, some of these PTMs, such as intein splicing scribed into mRNA, which is ‘read’ by tRNAs. These
for protein semi-synthesis7, as we see later, have even tRNAs are aminoacylated with their respective amino
Aminoacyl-tRNA synthetase been harnessed by chemical biologists as tools to access acid by aminoacyl-tRNA synthetase (aa-tRNA synthetase)
(aa-tRNA synthetase). Enzyme
that loads an amino acid onto
proteins bearing PTMs. enzymes and read the corresponding codon on the
tRNA bearing the respective Chemical biology has provided us with tools to mRNA via base-pairing interactions (FiG. 2a). Central
anticodon for that amino acid. identify protein PTMs and to access precisely modified to this translational machinery, ribosomes order the

www.nature.com/natrevchem
Reviews

Bioorthogonal handle
components for sequential decoding and catalyse forma- Residue-specific incorporation of non-canonical
Functional group that is not tion of amide bonds to form the translated polypeptide amino acids involves incorporation of a non-canonical
found in biological systems, chain. In nature, proteins are typically modified after amino acid at every position where a particular canon-
allowing chemical reactions their translation on the ribosome, but our understand- ical amino acid is encoded in the genome. Such a
to be carried out in complex
mixtures of biomolecules
ing of the translational machinery allows us to hijack it proteome-wide exchange requires an auxotrophic
without affecting native to genetically encode and install PTMs during instead bacterial strain whose growth depends on an exter-
processes. of after translation. This powerful strategy for accessing nally supplied amino acid. Placing these strains under
modified proteins involves either direct incorporation of selective pressure and supplying them with an ana-
the post-translationally modified amino acid of interest logue of a canonical amino acid forces them to take
or incorporation of a non-canonical amino acid bearing up the non-canonical amino acid. In the absence of its
a bioorthogonal handle16,17. native amino acid substrate, the natural translational
machinery incorporates the non-canonical amino acid
at sites coding for the corresponding native residue18,19.
Box 1 | identifying Ptms in the proteome Although use of engineered aa-tRNA synthetases can
identifying post-translational modification (PtM) sites on proteins is made possible increase incorporation of non-canonical amino acids,
by the sensitivities of mass spectrometry and fluorescence-based techniques4,5. this so-called selective pressure incorporation has
Combining bioinformatic techniques, chemical biology tools and mass-spectrometry- limited applications to understanding PTMs, as the
based proteomics has resulted in identification of hundreds of thousands of post- non-canonical residue is incorporated indiscriminately
translationally modified protein sites and has far outpaced our understanding of the throughout the proteome.
functions and regulation of these PtMs. a very brief outline of the most commonly Genetic-code expansion allows for site-specific
used tools is given below. incorporation of a non-canonical amino acid at a
Mass-spectrometry-based proteomics (see the figure, panels a and b) typically selected position in a particular protein of interest,
involves sequencing of protein digests (grey wavy lines) using liquid chromatography
and, therefore, has great value for deciphering PTMs.
coupled to mass spectrometry (LC-Ms/Ms)4,5,172. Deviations in the mass of modified
amino acid residues from those of their unmodified counterparts (blue peaks) are
The strategy makes use of the 64-codon genetic code,
matched to databases and used to identify PtMs (blue circles, red squares or green in which 61 codons code for canonical amino acids and
triangles). Prior enrichment of proteins or peptides bearing a PtM of interest from the the remaining three codons are ‘stop’ codons. Typical
complex cellular milieu facilitates analysis and leverages the chemical properties of genetic-code-expansion strategies use one of these stop
PtMs; functional groups of the PtM react with or bind to an immobilized surface and codons, usually the UAG ‘amber’ stop codon, to code
unmodified fragments are washed away (see the figure, panel c). Other enrichment for a desired non-canonical amino acid. An orthogo-
techniques involve chemical transformation of the PtM to a chemical handle to which nal aa-tRNA synthetase/tRNA pair charges the tRNA
an affinity tag or fluorescent label can be conjugated14,173, for example, β-elimination of bearing the respective anticodon with the desired
phosphoserine, yielding dehydroalanine, to which a thiol-functionalized probe can be non-canonical amino acid, which is then incorporated
conjugated by Michael addition. alternatively, in the metabolic labelling strategy, the
into the growing peptide chain on the ribosome at the
cellular machinery installs a PtM analogue bearing a bioorthogonal handle instead of
the natural PtM, allowing for subsequent conjugation to an affinity tag or fluorescent
position of the repurposed stop codon16–18,20 (FiG. 2a).
label174. Finally, antibody-based detection or enrichment strategies are widely used A critical step in the process is engineering an orthog-
(see the figure, panel d). a PtM-specific antibody binds to a PtM or PtM-bearing onal aa-tRNA synthetase/tRNA pair so that only
protein sequence, followed by immunoprecipitation or labelling with a fluorescent the cognate tRNA is aminoacylated with the desired
tag or secondary antibody. Many antibodies are commercially available, but their non-canonical amino acid and there is no cross-reaction
degree of specificity varies greatly; some are specific for the PtM, independent of with other amino acids or tRNAs in the given organ-
flanking residues, while others also recognize sequence context. ism. This requirement is often a limiting factor in
genetic-code expansion. Strategies to overcome this bot-
a b
tleneck include chemical aminoacylation of the tRNA
and the ‘flexizyme’ system for flexible in vitro transla-
y5 tion. In the latter, a tRNA is aminoacylated in vitro with
b3 a non-canonical amino acid using a catalytic RNA ‘flex-
y6 izyme’21, circumventing the need to develop orthogonal
y4 b2 b4 aa-tRNA synthetase/tRNA pairs for each non-canonical
y7
amino acid. Genetic-code expansion can now be applied
y3
in bacterial, yeast, insect and mammalian cells, and
b5
has even been extended to multicellular organisms —
m/z Caenorhabditis elegans, fruit flies, mice, plants and
zebrafish22. The technique has been used to incorporate
c d functional groups for protein crosslinking and label-
ling, as well as PTMs and bioorthogonal handles for
attaching PTMs17,18,23.

Direct installation of small PTMs. Genetic-code expan-


sion is best suited to accessing proteins bearing small
PTMs, because amino acids bearing small modifica-
tions are more likely than those bearing large modifi-
cations to be accepted by aa-tRNA synthetase enzymes.
The pyrrolysyl-tRNA synthetase enzyme from certain

Nature Reviews | Chemistry


Reviews

a
Base
Amino acid tRNA
pairing
(decoding) R1 O R3 O
Aminoacylation H H
+ N N
H2N N n OH
H
Codon O R2 O R4
aa-tRNA mRNA
synthetase Anticodon =

Ribosome Polypeptide (protein)

Non-canonical Base
amino acid pairing
Aminoacylation (decoding)
+

UAG stop Site-specifically


modified protein
Orthogonal codon
aa-tRNA synthetase

b O O O

N O HN NH
HN HN NH N
N NH
=

O O O O O O
H2N H2N H2N H2N H2N H2N
Site-specifically OH
OH OH OH OH OH
modified protein
Pyrrolysine Acetyllysine Nε -Boc lysine Methyllysine Dimethyllysine Methylarginine

c O
O
O O O P O
O
O O S OH
O P O O P O O P O O
O O O
= NO2
O O O O O O
H2N H2N H2N H2N H2N H2N
OH OH OH OH OH OH
Site-specifically Phosphoserine Phosphonomethylene Phosphothreonine Phosphotyrosine Sulfotyrosine 3-Nitrotyrosine
modified protein alanine

N3
d
SR O
NHR R=
= P

O O
H3 H2N OH
OH 4
Site-specifically Traceless Staudinger
modified protein Azidonorleucine H3 O
ligation

e O

HN O
O

= HN O
N
N N
EPO O N3
H2N
Site-specifically
modified protein OH Azide–alkyne 38
click conjugation
Nε -(propargyloxycarbonyl)-L-lysine EPO

AcHN HO OH
OH OH NHAc
O O O O O
O O OH OH
O O O O
HO O O OH COOH HO
HO
GlcNAc Man Gal Neu5Ac (sialic acid)

methanogenic bacteria aminoacylates an amber sup- size, proved more challenging, as most synthetases
pressor tRNA with the rare amino acid pyrrolysine and that recognize methyllysine also recognize lysine.
has proven to be a valuable starting point for engineer- Instead, a Nε-tert-butyloxycarbonyl-protected methyl-
ing orthogonal aa-tRNA synthetase/tRNA pairs. In lysine precursor was incorporated into proteins and
early work on genetic-code expansion for example, the deprotected after translation25 (FiG. 2b). A photocaged
pyrrolysyl-tRNA synthetase/tRNA pair was engineered Nε-o-nitrobenzyl-oxycarbonyl-Nε-methyllysine was also
to incorporate acetyllysine into proteins expressed in developed and incorporated in mammalian cells, ena-
Escherichia coli24 (FiG. 2b). Methyllysine, despite its small bling in vitro deprotection26. Dimethyllysine required

www.nature.com/natrevchem
Reviews

◀ Fig. 2 | Genetic-code expansion. a | Protein synthesis on the ribosome where mRNA with tyrosine sulfation at either or both Tyr60 and Tyr73
codons are ‘decoded’ by tRNA molecules pre-charged with their cognate amino acid sites in mammalian cells36. Expression in mammalian
by aminoacyl-tRNA (aa-tRNA) synthetases. Nature typically uses the 20 canonical amino cells also allowed for endogenous N-glycation, which
acids (grey circles) encoded by the 64-codon genetic code. In genetic-code expansion, would not be possible with bacterial expression, as bac-
a tRNA (often decoding the UAG stop codon) is charged with a non-canonical amino
teria lack the required trans-Golgi processing machinery.
acid (blue star) by an orthogonal aa-tRNA synthetase. The non-canonical amino acid is
then incorporated into the growing peptide chain on the ribosome. b,c | Examples of
To incorporate nitrotyrosine, a Methanosarcina barkeri
non-canonical amino acids bearing native post-translational modifications that have aa-tRNA synthetase/tRNA pair was engineered and the
been incorporated into proteins via genetic-code expansion, either directly or indirectly system was further optimized to increase the efficiency
(methyllysine and dimethyllysine). d | Incorporation of an azide-bearing, non-canonical of nitrotyrosine incorporation at lower nitrotyrosine
amino acid into histone H3 provided a bioorthogonal handle, which was treated with concentrations, which are compatible with mammalian
phosphinothioester reagents to access native lysine acylation post-translational cell culture37. Nitrotyrosine was then incorporated site
modifications40. e | Incorporation of an alkyne-bearing, non-canonical amino acid into specifically at Tyr34 in manganese superoxide dismutase
erythropoietin (EPO) provided a bioorthogonal handle to which azide-functionalized and Tyr130 in 14–3-3 proteins in HEK293T cells37. The
sialyl oligosaccharides could be conjugated via azide–alkyne click chemistry46. ability to site specifically incorporate nitrotyrosine via
genetic-code expansion is significant because this PTM
yet another strategy: Nε-tert-butyloxycarbonyl-protected is a biomarker of oxidative stress in a number of human
lysine was incorporated into proteins by genetic-code diseases. It is not installed by enzymes but results from
expansion and then all other free amines were pro- the reaction of proteins with reactive oxygen and nitro-
tected with N ε -benzyloxycarbonyl groups. The gen species — a reaction that would be difficult to control
Nε-tert-butyloxycarbonyl-protected lysine was then selec- site selectively using chemical-oxidation methods.
tively deprotected, dimethylated by reductive amination
and, finally, the other Nε-benzyloxycarbonyl-protected Incorporating large and complex PTMs. Large and com-
lysines were deprotected27. So far, trimethyllysine has not plex PTMs are more difficult to incorporate directly into
been incorporated into proteins by genetic-code expan- proteins than small PTMs using genetic-code expansion
sion. Incorporation of methylarginine was achieved because they are unlikely to be accepted as substrates of
with an in vitro translation system in which a yeast aa-tRNA synthetases. Instead of developing an aa-tRNA
arginyl-tRNA synthetase aminoacylated a mutated synthetase/tRNA pair for each PTM, a common strat-
yeast tRNA bearing a four-base anticodon28. egy is to encode a non-canonical amino acid bearing a
A system for incorporation of phosphoserine was bioorthogonal handle that can be used to attach a PTM
derived from an aa-tRNA synthetase/tRNA pair that or mimic18. In later sections, we discuss further examples
incorporates phosphoserine as a precursor to cysteine in of PTM mimics and bioorthogonal reactions because
certain methanogenic bacteria. In the first study using this bioorthogonal handles can be introduced into proteins
system to incorporate phosphoserine, directed evolution of using all of the techniques discussed in this Review.
the aa-tRNA synthetase/tRNA pair and the elongation Nevertheless, two examples are described here to illustrate
factor EF-Tu binding pocket were necessary for efficient how genetic-code expansion has been used to incorporate
amber-codon suppression29. Subsequently, phosphos- bioorthogonal handles to attach PTM mimics.
erine incorporation without EF-Tu modification was Although acetyllysine and 2-hydroxyisobutyryl
demonstrated, with efficiency being increased by opti- lysine can be directly incorporated into proteins by
mizing the sequence surrounding the repurposed stop genetic-code expansion as described above 24,38,39, a
codon30. A non-hydrolysable analogue of phosphos- bioorthogonal conjugation strategy was devised that
erine, phosphonomethylene alanine (FiG. 2c), was also can, in principle, be used to install acyllysine PTMs
incorporated30. As we discuss later, such PTM mimics of any size40. Azidonorleucine was incorporated site
have the advantage that they are not removed by phos- specifically into proteins in E. coli using amber-codon
phatases and, so, can be used to study phosphoprotein suppression and a pyrrolysyl-tRNA synthetase/tRNA
binders and protein regulation by phosphorylation in pair, providing a bioorthogonal handle that could be
complex mixtures. Genetic-code expansion has since reacted with a range of phosphinothioester reagents
been extended to include phosphothreonine 31 and using the traceless Staudinger ligation (FiG. 2d). This
phosphotyrosine32–34 (FiG. 2c). amide-forming ligation, which we will meet again for
Recent efforts have focused on genetic-code expan- protein ligation, enables the formation of an amide from
sion in mammalian cells, as illustrated by two exam- an azide and a phosphinothioester41,42. To demonstrate
ples of further tyrosine PTMs, sulfotyrosine and the strategy, azidonorleucine was incorporated into
Directed evolution 3-nitrotyrosine (FiG. 2c). The Methanocaldococcus jan- histone H3 and reacted with acetyl-phosphinothioester
Selection of a protein or naschii tyrosyl-tRNA synthetase/tRNA pair was used or succinyl-phosphinothioester, yielding H3 bearing
nucleic acid with a desired in earlier work on incorporation of sulfotyrosine into acetyllysine or succinyllysine PTMs40. A drawback,
trait by iterative cycles of
proteins by genetic-code expansion in E. coli35, but is not however, was that the Staudinger ligation reaction was
genetic diversification, library
screening and replication of orthogonal in eukaryotic cells. Using an ‘altered trans- not quantitative and the acylated proteins could not be
functional variants. lational machinery’ approach for directed evolution of separated from their unmodified counterparts. Whereas
E. coli-derived aa-tRNA synthetase/tRNA pairs in E. coli, use of the Staudinger ligation in this example gave access
Histone a new orthogonal aa-tRNA synthetase/tRNA pair was to proteins bearing native acyllysine PTMs, many
One of several proteins
that associate with DNA in
developed that incorporated sulfotyrosine in both E. coli bioorthogonal ligations result in non-native linkages.
eukaryotic nuclei and help and mammalian cells36. This system was used to express One of the most widely used of these is azide–alkyne
to package it into chromatin. human heparin cofactor II, a large secreted glycoprotein, click ligation, which produces a triazole linkage43,44.

Nature Reviews | Chemistry


Reviews

Both azide-bearing and alkyne-bearing non-canonical into the cell and then cleaved by peptidases34. The ability
amino acids can be incorporated site specifically into to produce site-specifically modified proteins in biolog-
proteins using genetic-code expansion, allowing for con- ical systems has several advantages, as noted above, but
jugation of large PTMs such as glycans and ubiquitin45,46. can also have disadvantages. Uncontrolled activity of
A recent study employed this strategy for site-specific endogenous enzymes that remove PTMs, such as phos-
glycosylation of erythropoietin (EPO). EPO is a phatases and deacetylases, might compromise the homo-
166-residue glycoprotein hormone with a central role geneity and yield of modified protein, and inhibition of
in the development of red blood cells. Endogenous and these enzymes might be necessary.
recombinant EPO comprise inseparable heterogeneous One of the biggest challenges for genetic-code-
glycoforms with differing activities and stabilities, and expansion strategies is incorporation of multiple differ-
a variety of chemical biology tools has been used to ent PTMs, due to the limited number of available codons
access homogeneous EPO glycoforms in order to study and orthogonal aa-tRNA synthetase/tRNA pairs17,23.
their activities and guide therapeutic applications47. In Current efforts to overcome this challenge focus on sev-
this example, an alkyne-functionalized non-canonical eral strategies looking beyond triplet codons written in a
residue Nε-(propargyloxycarbonyl)-l-lysine was incor- four-nucleotide genetic code. Quadruplet codons, which
porated into recombinant EPO using genetic-code result in a +1 frameshift, have been employed51,52, as
expansion, followed by azide–alkyne click chemistry to have unnatural base pairs that pair and are decoded via
ligate azido-functionalized complex branched N-glycans hydrophobic packing interactions, rather than hydrogen
via a triazole linker46 (FiG. 2e). bonds53,54. Additional codons can also be made available
by artificial division of codon boxes, in which redun-
Advantages, disadvantages and challenges. Genetic- dant codons are decoded by tRNAs pre-charged with
code-expansion strategies have the advantage that non-canonical amino acids via a ‘flexizyme’ in a cell-free
site-specifically modified proteins can be generated in system55. Recoding of the E. coli genome to create a
a biological context and take advantage of endogenous strain with a 61-codon genome has also been achieved
processing and folding machineries. This can be use- by genome-wide replacement of two serine codons
ful for PTMs that are unstable to the conditions used and a stop codon, potentially increasing the number
for chemical protein synthesis and for proteins that of codons available for incorporation of non-canonical
cannot be easily refolded from denaturing conditions. amino acids56. Another factor that limits incorporation
Being able to incorporate bioorthogonal handles into of multiple PTMs, even of the same type, is competition
proteins in live cells also provides the opportunity to for the UAG stop codon in E. coli between the orthog-
control and image specific proteins in a cellular context. onal tRNA and release factor one (RF1). This leads to
Genetic-code expansion can be used to incorporate a premature termination and increases with the number of
non-canonical amino acid at any site, regardless of pro- suppressed codons. In a bold approach to removing this
tein size, and uses standard molecular biology facilities competition for the UAG stop codon, an E. coli strain
accessible to many laboratories. There is a growing range was genetically reprogrammed by replacing all instances
of orthogonal aa-tRNA synthetase/tRNA pairs and other of the UAG stop codon in the genome with the UAA stop
aminoacylation techniques available, and efforts to engi- codon, thereby, enabling deletion of RF1 and removing
neer ribosomes show promise for expanding the variety competition for the UAG stop codon57. An alternative
of non-canonical amino acids that can be incorporated48. recent strategy involved inhibition of RF1 with anti-
Other exciting new developments include incorpora- microbial peptides to enhance non-canonical amino
tion of non-canonical amino acids into phage display acid incorporation at multiple sites58. The challenge of
libraries49 and localizing genetic-code expansion to incorporating multiple different PTMs via genetic-code
specific subcellular locations, enabling genetic-code expansion is currently being tackled by several groups,
expansion for only selected mRNAs50. In the latter study, resulting in rapid developments in this area. For exam-
membrane-less artificial organelles for protein trans- ple, two different non-canonical amino acids (acetyl-
lation were assembled by tethering an aa-tRNA syn- lysine and a p-benzoylphenylalanine photoaffinity
thetase/tRNA pair and a specific mRNA-binding domain probe) were incorporated into histone H3 using two
to proteins that form phase-separated droplets in cells. mutually orthogonal aa-tRNA synthetase/tRNA pairs
Disadvantages of the genetic-code-expansion tool to capture proteins that ‘read’ lysine acetylation PTMs39.
include the need to develop an orthogonal aa-tRNA syn- Recently, three different non-canonical amino acids
thetase/tRNA pair for each new non-canonical amino have been genetically encoded by engineering three
acid. This can be difficult, particularly for bulky PTMs, new pyrrolysyl-tRNA synthetase/tRNA pairs59. Reports
and additional optimization might be needed for expres- such as these indicate that we will soon see demonstra-
sion in different organisms. Genetic-code expansion also tions of using genetic-code expansion to understand the
relies on the ability of the non-canonical amino acid to crosstalk of multiple PTMs in a cellular context.
enter cells in sufficient amounts and to be metabolically
stable in cells. The former challenge was encountered Chemical synthesis and semi-synthesis
for phosphotyrosine but was overcome by using a pro- Combining solid-phase peptide synthesis (SPPS) chem-
tected phosphotyrosine analogue, which was able to istry with amide-bond-forming ligations (FiG. 3) brings
enter cells, was incorporated into proteins and was then the versatility of organic chemistry into the realm of pro-
deprotected by a pH shift after incorporation33. An alter- tein science and enables protein synthesis independent
native solution was to use dipeptides that are transported of the translational machinery — unconstrained by the

www.nature.com/natrevchem
Reviews

genetic code, aa-tRNA synthetase/tRNA specificities or synthesis shown in FiG. 4a, a convergent strategy was cho-
ribosome capabilities. SPPS involves iterative steps of sen, making use of several new tools: (1) isopeptide liga-
coupling N-protected amino acids to a growing peptide tion using δ-mercaptolysine, which behaves similarly to
chain, anchored to a solid support60 (FiG. 3a). Single or an N-terminal cysteine in NCL and can be desulfurized to
multiple PTMs can be installed site-specifically using yield a native lysine side chain; (2) palladium-mediated
the many commercially available protected amino deprotection of δ-mercaptolysine; (3) efficient SPPS of
acid building blocks bearing PTMs or chemoselective ubiquitin monomers employing several dipeptide build-
methods for modifying amino acids61–63. SPPS has con- ing blocks; (4) one-pot ligation and desulfurization. The
ventionally been limited to peptides of approximately proximal and distal ubiquitin chains were labelled with
50 residues, although new and more efficient synthe- Myc and Flag tags, respectively, and α-globin was labelled
sis technologies, such as automated flow chemistry64, with an N-terminal HA tag so that the fate of the differ-
challenge this traditional boundary and have been ent components could be monitored during proteasomal
used to synthesize peptide chains of up to 164 residues degradation. The distal Flag-Ub bearing a C-terminal
within hours64. thioester was ligated via an isopeptide bond to a Ub
A breakthrough in protein chemistry came with the moiety bearing δ-mercaptolysine at position 48 and a
use of amide-bond-forming reactions that ligate unpro- C-terminal hydrazide that acts as a thioester precursor80.
tected protein segments (FiG. 3b). Native chemical liga- Oxidation and thiolysis of this hydrazide to generate a
tion (NCL)65 has been most broadly used and, along with thioester then allowed for ligation to the third Ub moi-
α-ketoacid-hydroxylamine (KAHA) ligation66 and ser- ety, also furnished with δ-mercaptolysine at position 48
ine/threonine (Ser/Thr) ligation67, has provided access and a C-terminal hydrazide. One-pot conversion of the
to synthetic proteins bearing site-specific PTMs, as well hydrazide yielded the Flag-tri-Ub-thioester building
as numerous other modifications and tags12. A PTM in block. The second building block, Myc-Ub-HA-α-globin,
itself, the natural protein-splicing mechanism of intein was assembled by ligating HA-α-globin(2–82)-thioester
splicing interfaces neatly with synthetic protein-ligation to α-globin(83–142) bearing protected δ-mercaptolysine
strategies and has been harnessed to further extend the at position 105, which was deprotected after liga-
scope of protein chemistry7,68,69. Illustrated by selected tion in one pot using [Pd(allyl)Cl]2. The resulting
recent examples, we see how these tools have developed HA-α-globin(2–142) was then ligated via an isopep-
to access a plethora of site-specifically modified proteins tide bond to the proximal Myc-Ub-thioester, followed
with atomic precision. by deprotection of the δ-mercaptolysine at position 48.
Finally, the two building blocks Myc-Ub-HA-α-globin
NCL and derivatives. NCL takes place between a pro- and Flag-tri-Ub-thioester were ligated and desulfurized to
tein segment bearing a C-terminal thioester and a yield the desired Flag-Ub-Ub-Ub-Myc-Ub-HA-α-globin,
protein segment bearing an N-terminal cysteine. which was folded and purified. This tetra-Ub-α-globin
As shown in FiG. 3b, transthioesterification forms a construct was used to determine the fate of Ub moieties
branched thioester intermediate, which rearranges irre- in proteasomal degradation studies, helping to decipher
versibly by S→N acyl shift to a peptide bond65. Several the signalling mechanisms of this important PTM78,79,81.
extensions to NCL that allow for ligation at non-cysteine Once a ligation strategy has been designed and
junctions, faster ligations using selenium chemistry, and optimized, the modular nature of chemical protein
N-terminal and C-terminal protection strategies that synthesis means that substitution of protein segments
enable sequential ligation of multiple protein segments in a mix-and-match fashion allows rapid generation of
from either direction have been developed and reviewed several protein variants bearing different PTMs. One
recently68,70,71. As the protein length and number of liga- of many studies illustrating this concept describes the
tion steps increase, difficulties such as decreasing liga- generation of a library of tick-derived thrombin inhib-
tion rates and poor solubility can accumulate. These itors bearing different tyrosine sulfation patterns82,83.
challenges have motivated development of faster ligation A library of 34 proteins bearing sulfotyrosine PTMs was
methods, auxiliaries, solubility and purification tags, generated using one-pot diselenide–selenoester ligation.
and one-pot sequential ligations, expanding the scope This derivative of NCL, in which selenocystine (oxidized
of NCL and providing access to site-specifically mod- selenocysteine) or a selenol-derived amino acid and an
ified proteins that are inaccessible by other means68,71. aryl selenoester replace the cysteine and thioester func-
Complex glycoproteins with native linkages are a classic tionalities of NCL, has the advantage of much faster liga-
target, exemplified by impressive syntheses of homo- tion kinetics and can be followed by rapid deselenization
geneously glycosylated EPO47. Such ambitious targets to yield native sequences70. As shown in FiG. 4b, using
required considerable synthetic effort over many years madanin-like protein 2 (MadL2) as an example from the
but led to development of valuable new synthesis and library, the thrombin inhibitor peptides were divided
ligation tools72–76. into two segments with most ligation junctions at aspar-
The extent to which chemical protein-synthesis tools tic acid, which was incorporated as β-selenoaspartate.
have developed is clear when comparing a 1994 report The two segments were synthesized using SPPS, and
of the chemical synthesis of 76-residue (8.5-kDa) protein sulfated tyrosine was incorporated as a neopentyl sul-
Glycoproteins PTM ubiquitin77 with a 2019 report describing the chem- fate ester building block, due to its sensitivity to the
Proteins that have one or more
oligosaccharide chains
ical synthesis of 472-residue (53-kDa) tetra-ubiquitylated acidic conditions used for peptide cleavage and purifi-
covalently attached to an α-globin bearing three different labels — the largest cation. The β-selenoaspartate (diselenide)-containing
amino acid side chain. chemically synthesized protein to date78,79. In the latter segments were ligated to their respective peptide

Nature Reviews | Chemistry


Reviews

a Repeat deprotection and coupling cycles R1 O R3 O


H H
R 2
H2N N N
O O R 2
O N OH
H H H n
N H2N OH O R2 = O R4
N N N
H H
N-deprotection R1 O R1
R1 O
Cleavage from resin, Polypeptide (protein)
Coupling
Resin bead side-chain deprotection

OH
N
H Site-specifically
O
modified protein

b Amide-bond-forming ligations
i) Native chemical ligation O
O HS S→N acyl shift
Transthioesterification S SH
O
SR
H2N
H2N N
H

ii) KAHA ligation OH


O O Ligation O NH2 O→N acyl shift O
OH H H H
N N N N
O O N
H
O O H
O

iii) KAT ligation


O Ligation H
H
N N
RO
BF3K
O

iv) Ser/Thr ligation


O
O
HO H/Me HO H/Me
Ligation N Acidolysis O
O H/Me
O N
H2N
O H H
OH

v) Traceless Staudinger ligation


O O
O O
–N 2 H2O
N N
S PPh2 S H
PPh2 Ph2P Ph
N3 N Ph P SH
SH O

c O
SH
O
SH
H
N-extein N N N C-extein
Intein
H H
NH2
Split or contiguous intein
N-extein O O

N→S acyl shift


S SH
O
N-terminal cleavage H
e.g. with R-SH N N C-terminal cleavage
H2N Intein C-extein
H
SH SH Thioester intermediate NH2
SH
O O O
H O H
H 2N N N Transthioesterification N
Intein C-extein N-extein N Intein
H H NH
NH2
N-extein O
HS O
O O
SH S
SR O H2N C-extein
N-extein H
H2N N N
Intein C-extein
H
NH2

O
Branched thioester intermediate
Succinimide formation

N-extein O
SH
O SH
H S→N acyl shift O
N S
H2N Intein NH N C-extein
N-extein H
Cleaved intein H2N C-extein
O Spliced protein

www.nature.com/natrevchem
Reviews

◀ Fig. 3 | sPPs, protein ligation and intein splicing. a | Solid-phase peptide synthesis and recombinant proteins functionalized with hydroxy-
(SPPS) involves iterative cycles of coupling and deprotection to build up peptide chains lamine moieties have been conjugated to PEG88,89. These
anchored at their C-termini to resin beads. The N-terminal protecting group (red square) studies demonstrate the potential of KAT ligation to
is typically a fluorenylmethyloxycarbonyl or tert-butyloxycarbonyl protecting group access post-translationally modified proteins to decipher
(top). Suitably protected SPPS building blocks of amino acids bearing post-translational
the role of the PTMs.
modifications (blue star) can be incorporated by SPPS at any position in the peptide chain
(bottom). b | Chemoselective ligations that form amide bonds between unprotected
Ser/Thr ligation provides another option for
peptide or protein segments under mild conditions have allowed for the synthesis of ligation junctions and involves a C-terminal
large proteins from synthetic and/or recombinant segments. c | Technologies harnessing O-salicylaldehyde-bearing protein segment and an
the natural process of intein splicing have been key to protein semi-synthesis, expressed N-terminal Ser/Thr-bearing segment (FiG. 3b) . As
protein ligation and protein trans-splicing techniques. KAHA, α-ketoacid-hydroxylamine; well as being used alone, this ligation has been com-
KAT, potassium acyltrifluoroborate; Ser/Thr, serine/threonine. bined with NCL to assemble interleukin-25 from five
segments, incorporating Asn(GlcNAc) as a model
selenoester segments, followed by deselenization to N-linked glycan90. Despite the synthetic achievement
yield the full-length proteins with native aspartic acid of this study, it also illustrates a potential weakness of
at the ligation junction and the sulfotyrosine residues all of the ligation methods mentioned above: correct
were then deprotected82. The modular synthesis allowed folding of the final synthetic protein can be difficult.
all four variants (sulfoforms) of each protein to be gen- Another example illustrates how a toolbox of several
erated, enabling the effects of amino acid sequence, as ligations with different mechanisms can be beneficial
well as position and valency of tyrosine sulfation, to be in reducing the functional-group manipulations that
explored for the inhibition of thrombin and potential are necessary if the same ligation is used iteratively91.
anticoagulant activity. A combination of NCL, KAHA and Ser/Thr ligations
was employed to access small ubiquitin-like modifier
Other amide-forming ligations and combinations. The (SUMO)-conjugated Ubc9 (ref.91). SUMO is a small
different mechanisms and functional groups of alterna- protein attached as a PTM to proteins in a similar man-
tive amide-bond-forming ligations expand the choice of ner to ubiquitin, regulating their cellular location and
ligation junctions, for use alone or in combination with degradation. The E2 SUMO conjugating enzyme Ubc9 is
NCL. KAHA ligation has developed from the original loaded with SUMO via a labile thioester and the SUMO
ligation between an α-ketoacid and a hydroxylamine moiety is then transferred to a lysine residue in a target
to include peptides with free or protected C-terminal protein substrate by an E3 ligase enzyme. Using three
α-ketoacids and N-terminal hydroxylamines in the orthogonal ligation tools, a stable Ubc9–SUMO conju-
form of (S)-5-oxaproline84 (FiG. 3b). The latter give rise gate bearing a photocrosslinking moiety and a biotin tag
to depsipeptides. Depsipeptides with an ester linkage to was designed as a probe to trap E3 ligases that interact
an amino acid side chain such as serine or threonine with Ubc9-SUMO91. The 158-residue Ubc9 enzyme was
have an altered backbone conformation, which can synthesized in four segments and ligated in a conver-
disrupt peptide aggregation and increase solubility. gent manner, as shown in FiG. 4c. Ubc9(1–34) bearing a
Rearrangement by O–N acyl migration at basic pH C-terminal salicylaldehyde ester was ligated via Ser/Thr
restores the native peptide backbone, leaving homoser- ligation to Ubc9(35–74), which bore an N-terminal thre-
ine (Hse) at the ligation site84. A derivative that leaves onine and C-terminal thioester. In parallel, Ubc9(75–
a native aspartic acid residue at the ligation site has 107) bearing an N-terminal cysteine and C-terminal
also recently been described; a 4,4-difluoro version of isoleucine α-ketoacid was ligated via KAHA ligation to
the (S)-5-oxaproline hydroxylamine moiety is used as the Ubc9(108–158), furnished with (S)-5-oxaproline at the
N-terminal residue and the difluoro alcohol formed on N-terminus. The two purified ligation products were
ligation is hydrolysed to aspartic acid85. Although a lack then ligated together via NCL to yield the full-length
of commercial access to the requisite building blocks synthetic Ubc9(Thr108Hse) protein, which was then
has limited its adoption, the potential of KAHA liga- folded. Variants bearing diaminobutyric acid (Dap) in
tion for accessing synthetic proteins bearing PTMs has place of the catalytic residue Cys93 were synthesized
been demonstrated by proteins such as phosphorylated to form a stable conjugate with SUMO upon enzy-
interferon-induced transmembrane protein 3 (ref.86). matic conjugation, and biotin and diazirine-containing
Successful synthesis of this antiviral transmembrane photo­crosslinking residues were also incorporated using
protein highlighted the advantages of increased solubil- SPPS. In contrast to genetic-code expansion, this study,
ity of depsipeptides for such hydrophobic protein targets, amongst many others, demonstrates the relative ease
suggesting the potential application of KAHA ligations with which multiple different PTMs and non-canonical
in the synthesis of other membrane proteins. Developed amino acids can be installed via chemical synthesis.
by the same group, the potassium acyltrifluoroborate
(KAT) ligation involves ligation between stable potas- Protein semi-synthesis. Protein semi-synthesis com-
sium acyltrifluoroborates and hydroxylamines87 (FiG. 3b). bines the flexibility of SPPS for installing PTMs with
The KAT ligation proceeds in aqueous solution in the the capabilities of recombinant protein-expression sys-
presence of unprotected functional groups and with tems, thereby, providing access to large proteins with
kinetics that show promise for rapid ligation of pep- site-specific PTMs. Native chemical ligation has been
Depsipeptides
Peptides that contain an ester
tides and proteins at low concentration87,88. Ligations of most widely used for ligating synthetic and recombinant
linkage in place of one of the polyethylene glycol (PEG), a lipid, biotin and a dye to a protein segments, and has the advantage that either the
backbone amide bonds. 31-residue analogue of GLP1 have been demonstrated, N-terminal or C-terminal segment for ligation can be

Nature Reviews | Chemistry


Reviews

a O NH2 O
O NaNO2 O R-SH O
Flag Ub(1–76) MPA HS Flag Ub(1–76) NH
O
HS NHNH2 N3 SR
NH2 Flag Ub(1–76) NH O Thiolysis
Oxidation
HS 48
HS O
Ub(1–76) NHNH2 48
O i) Ligation O i) Ligation Ub(1–76) NH
48 ii) Oxidation, 48 ii) Oxidation, thiolysis HS
Ub(1–76) NHNH2 thiolysis Ub(1–76) MPAA

O
NH 48
S NH2 Ub(1–76) MPAA
HS
O
O
48 Flag
HA -globin(2–82) MMP O
Myc Ub(1–76) MMP 48 Ub(1–76)

NH NH2 Myc Ub(1–76) NH


Ub(1–76)
S HS HS
i) Ligation i) Ligation i) Ligation Ub(1–76)
HS ii) -Mercaptolysine ii) -Mercaptolysine ii) Desulfurization
iii) Folding, purification Myc Ub(1–76)
105 deprotection 105 deprotection 105
H2N 83 -globin(84–142) HA -globin(2–142) HA -globin(2–142) HA -globin(2–142)

b O O
O S O O S O
Se O O
2 COOH COOH
O i) Ligation O
32 ii) Deselenization 32
MadL2(1–30) SePh H2N 31 MadL2(32–61) MadL2(1–30) N 31 MadL2(32–61)
iii) Sulfate ester H
35 deprotection 35

O O
O S O O S O
O O

c
N N

CF3
HO SAcm t
BuSS NH2 SAcm
O O O
22 43 93 138
COOH OH H 108
Ubc9(1–34) O H2N 35 Ubc9(36–74) S H2N 75 Ubc9(76–107) N Ubc9(109–158)

O O
O H

N N Ser/Thr ligation KAHA ligation

CF3 t
BuSS NH2 SAcm
HO SAcm 93 138
O O H
22 43 H2N 75 Ubc9(76–107) N 108 Ubc9(109–158)
COOH
Ubc9(1–34) N 35 Ubc9(36–74) S
H O

OH
O i) Disulfide reduction, native chemical ligation
ii) Acm removal
SUMO OH iii) Purification, folding
N N , ATP
SUMO
SUMO E1 activating enzyme
CF3 O
HO SH NH SH
O O HS
22 43 93 138
Ubc9(1–34) N 35 Ubc9(36–74) H
H N 75
H
Ubc9(76–107) N 108 Ubc9(109–158)
O

OH

Fig. 4 | site-specifically modified proteins accessed via chemical synthesis and amide-forming ligations. a | Tetra-
ubiquitylated (Ub) α-globin bearing three different tags (Flag, Myc and HA)79. Activation of acyl hydrazides via oxidation
and thiolysis to yield thioesters80 (inset). b | Sulfated madanin-like protein 2 (MadL2) assembled using diselenide–
selenoester ligation–deselenization chemistry82. c | Small ubiquitin-like modifier (SUMO)-conjugating enzyme Ubc9
conjugated to SUMO via a stable amide bond and bearing a photocrosslinking moiety and biotin tag (yellow hexagon),
assembled using a combination of native chemical ligation, α-ketoacid-hydroxylamine (KAHA) and serine/threonine
(Ser/Thr) ligations91. Acm, acetamidomethyl; MMP, methyl 3-mercaptopropionate; MPA, mercaptopropionic acid; MPAA,
4-mercaptophenylacetic acid.

generated from recombinant proteins68. Enzymatic meth- similar to those involved in NCL. Inteins are protein
ods are also increasingly applied69, as we see later. Protein domains embedded in protein sequences that splice
semi-synthesis tools employ intein technologies, making themselves out in an autocatalytic PTM process, ligat-
use of splicing mechanisms that involve transformations ing the flanking protein segments (exteins) with a native

www.nature.com/natrevchem
Reviews

Epigenetic regulation
amide bond (FiG. 3c). Inteins are widespread in nature out in a C-terminal to N-terminal direction, with the
Control of gene expression and exist as either a continuous sequence (contiguous N-terminus of the central synthetic protein segment pro-
and activity that is heritable and inteins) or as two segments on separate polypeptides tected as a thiazolidine residue to prevent cyclization and
does not involve changes in the (split inteins). Protein semi-synthesis tools make use polymerization side reactions. After the second ligation,
DNA sequence.
of both native splicing mechanisms and ‘off-pathway’ cysteine residues at the ligation sites were desulfurized to
Amyloidogenic protein N-terminal intein cleavages to generate C-terminal yield native alanine residues at these positions. The triply
A protein that produces or thioester-bearing protein segments for expressed protein glycosylated variant was able to inhibit aggregation of
tends to produce fibrillar ligation7; after the initial N→O/S acyl shift, the N-extein unmodified α-synuclein and to inhibit toxicity of extra-
aggregates.
can be liberated by addition of a thiol capture reagent as cellular α-synuclein fibres, suggesting O-glycosylation
Consensus sequence
an α-thioester for ligation to synthetic or recombinant as a potential therapeutic strategy 103. Similarly to
A representative protein protein segments7. Recombinant protein segments bear- α-synuclein, the Alzheimer-disease-associated amyloi-
or nucleic acid sequence ing N-terminal cysteine residues can also be generated by dogenic protein Tau is also extensively modified and
comprising the most frequently intein cleavage or by selective proteases or cleavage of the semi-synthetic modified variants have given insights
occurring residues at each
position, calculated by aligning
initial methionine residue68,69. Combining intein technol- into the role of acetylation, phosphorylation, glycosyla-
multiple sequences. ogy with SPPS and N-terminal or C-terminal protecting tion and carboxymethylation PTMs in aggregation and
groups allows for multi-segment protein ligations that tubulin binding104–106.
enable installation of a desired PTM, in principle, in any Kinase enzymes, which install phosphorylation
part of a protein. The versatility of protein semi-synthesis PTMs on other proteins, are themselves often activated
and ability to install multiple PTMs has led to its broad by phosphorylation. The AKT kinases are one such
application to studying PTMs found on histones92, group and are activated by phosphorylation in their
neurodegenerative proteins and a host of other protein activation loop and C-terminus. AKT1 is phosphoryl-
examples, as reviewed recently68,69. Three examples are ated at Thr308, Ser473, Ser477 and Thr479, and also
highlighted below to illustrate installation of PTMs in has a native cysteine residue conveniently located at
N-terminal, middle and C-terminal protein regions. residue 460, facilitating access to site-specifically phos-
Protein semi-synthesis has facilitated the study of phorylated AKT1 via an expressed protein-ligation
many histone PTMs and their roles in epigenetic regulation strategy107 (FiG. 5c). Recombinant AKT1 N-terminal
by providing access to precisely modified histones92. segment fused to an intein and chitin binding domain
New PTMs on histones are still being discovered, was expressed in insect cells and phosphorylation at
such as attachment of the neurotransmitter seroto- Thr308 in this segment was achieved enzymatically,
nin to glutamine 5 of histone H3 (ref.93). To generate either by co-expression with the kinase PDK1 or by
semi-synthetic serotonylated H3, the N-terminal seg- treatment of the thioester segment after intein cleav-
ment H3(1–13) bearing a C-terminal thioester was age with purified PDK1. The C-terminal segment was
synthesized by SPPS and serotonylated at glutamine 5 synthesized by SPPS and various combinations of the
on resin (FiG. 5a). The C-terminal segment H3(14–135) C-terminal phosphorylations were introduced, yielding
with a K14C mutation was produced recombinantly in a library of site-specifically phosphorylated AKT1 var-
E. coli, fused at its N-terminus to a His6-SUMO tag that iants upon ligation. Using these variants in kinetic and
was cleaved by ULP1 protease to yield an N-terminal structural analyses, the authors proposed a mechanism
cysteine. NCL of these two segments, followed by alky- for AKT1 activation in which pSer473 interacts with
lation of the cysteine at the ligation site to mimic the the pleckstrin homology (PH)-kinase domain linker,
native lysine, yielded site-specifically modified H3 that while phosphorylated Ser477 and Thr479 interact with
was incorporated into nucleosomes and used to deter- the activation loop, demonstrating the value of being
mine the effects of Gln5 serotonylation on methylation able to access proteins bearing site-specific PTMs for
of neighbouring Lys4 (ref.93). mechanistic studies.
The amyloidogenic protein α-synuclein implicated in
Parkinson disease is heavily post-translationally mod- Protein trans-splicing. Intein splicing, as noted above,
ified and many phosphorylated, ubiquitylated and can also be employed to ligate protein segments directly
glycosylated variants have been accessed by protein via protein trans-splicing (PTS) of split inteins. Naturally
semi-synthesis and chemoenzymatic synthesis94–100. or artificially split inteins fused to protein segments can
In a recent example, semi-synthetic α-synuclein var- associate and fold to splice themselves out and ligate the
iants bearing O-glycosylation PTMs in the central two flanking protein segments69,108 (FiG. 3c). This versa-
protein regions were generated to study the effects of tile mechanism has been applied to protein cyclization,
O-GlcNAcylation on aggregation, cleavage and mem- conditional protein splicing and segmental isotope label-
brane binding101–103. The 140-residue protein α-synuclein ling, as well as to accessing post-translationally modified
was divided into three segments: an N-terminal recom- proteins109–112. Substantial work has gone into optimiz-
binant segment fused to an intein; a synthetic central seg- ing intein split sites and sequence tolerance for faster
ment bearing site-specific threonine O-GlcNAc PTMs; splicing. One example is the consensus-fast (Cfa) DnaE
and a C-terminal recombinant segment, from which intein, which is a consensus sequence designed by align-
the initial methionine was fortuitously removed during ment of multiple split inteins and protein engineering113.
expression103. Using these segments with ligation-site It shows broad tolerance of the flanking extein sequences
residues mutated to cysteines for NCL, five glycosylated and ultra-fast kinetics, and was used to assemble
α-synuclein variants were generated, as shown for the tri- site-specifically modified histone H3 in the context of
ply glycosylated variant in FiG. 5b. Ligations were carried native chromatin (FiG. 5d). The N-terminal H3 segment

Nature Reviews | Chemistry


Reviews

a HS
HN HN HN
OH SUMO OH OH
N 14 H3(15–135)
H
Protease NH2
cleavage SUMO
HN O HN O HN O
HS
HS HS
O O O
NH2
5 H2N 14 H3(15–135) 5 Br 5
H3(1–13) SR H3(1–13) N 14 H3(15–135) H3(1–13) N 14 H3(15–135)
Ligation H Cys alkylation H

b SH R2 R2 R2 HS
O S O R1= SO3
72 75 81
-syn(1–68) N Intein
N 69 -syn(70–90) SR1 H2N 91 -syn(92–140)
H OH
H
R2 = HO O
Ligation
Intein cleavage Intein HO O
(thiolysis) NHAc
R2 R2 R2 HS
O S O
72 75 81
-syn(1–68) SR1 N 69 -syn(70–90) N 91 -syn(92–140)
H H

i) Deprotection (MeONH2)
ii) Ligation
iii) Desulfurization
R2 R2 R2
O O
72 75 81
-syn(1–68) -syn(70–90) N 91
N 69 -syn(92–140)
H H

c SH
O

AKT1(1–459) N Intein
H

PDK1-catalysed
Intein
phosphorylation O O O
Intein cleavage O P O O P O O P O
O (thiolysis) HS O O O O O O O
O P O 473 477 479
O P O O P O O P O O P O
O H2N 460 AKT1(461–480) O HS O O O
O O
308 308 473 477 479
AKT1(1–459) SR Ligation AKT1(1–459) N 460 AKT1(461–480)
H

d SH H3(1–28) CfaC H3(29–135)


N N N
H2N CfaN In chromatin
CfaN
O i) Oxidation, thiolysis SH
O In nucleo protein splicing H3(1–28) CfaC
27 ii) Ligation 27 27
H3(1–28) NHNH2 H3(1–28)
N
H3(1–28) H3(29–135) In chromatin
CfaN
H

Fig. 5 | site-specifically modified proteins accessed via protein semi-synthesis and trans-splicing. a | Semi-synthetic
histone H3 serotonylated at glutamine 5. Cysteine at the ligation site was alkylated to mimic the native lysine at this position.
H3(1–13) is synthetic, H3(14–135) is recombinant93. b | Semi-synthetic multiply-glycosylated α-synuclein generated using
a three-segment, two-ligation strategy with a synthetic central segment103. c | Phosphorylated AKT kinase generated by an
expressed protein-ligation strategy. The semi-synthetic protein bears one enzymatic phosphorylation in the recombinant
segment AKT(1–459) and three phosphorylated residues in the synthetic C-terminal segment AKT(460–480)107. d | Protein
trans-splicing to generate histone H3 bearing a biotin tag (yellow hexagon) and trimethyllysine in intact chromatin.
The synthetic N-terminal segment H3(1–28) was ligated to an optimized N-intein, which then underwent trans-splicing
with its corresponding C-intein fused to H3(29–135) and incorporated into chromatin113.

was synthesized by SPPS bearing an N-terminal biotin or trimethyllysine and a photo-affinity probe into histone
tag and a trimethyllysine (K27me3) PTM. This segment H3 to capture proteins that interact with this PTM114.
was ligated to recombinant N-intein CfaN via NCL, Segmental isotope labelling, for example 2H, 15N
forming a semi-synthetic split-intein fusion protein. and/or 13C labelling of protein segments or methyl-group
The C-intein CfaC was embedded in the H3 sequence labelling of specific residues, has advantages for
between residues 28 and 29 and expressed in HEK293T NMR spectroscopy, as it reduces spectral crowding,
cells, where it was incorporated into chromatin. Isolated especially for large proteins109,111. In a recent exam-
Nucleosomes nuclei from these cells were exposed to semi-synthetic ple, PTS for segmental isotope labelling was com-
The basic structural units biotin-H3(1-28)K27me3-CfaN and the two intein seg- bined with enzymatic phosphorylation to study
of eukaryotic chromatin,
comprising a segment of
ments underwent PTS, yielding semi-synthetic modi- phosphorylation-dependent conformational changes in
DNA wrapped around eight fied H3 within intact nucleosomes113. This strategy has the β2-adrenoreceptor by NMR115. The G protein cou-
histones. recently been extended to introduce monomethyllysine pled receptor β2-adrenoreceptor is a membrane protein

www.nature.com/natrevchem
Reviews

receptor that is phosphorylated at its C-terminus, trigger- challenging targets such as large membrane proteins and
ing several downstream signalling events. The N-terminal protein complexes is needed to demonstrate their utility
membrane-embedded segment of β2-adrenoreceptor was to a broad range of biological questions.
expressed in insect cells fused to an N-intein, while the One of the biggest challenges for protein chemical
C-terminal cytosolic segment was expressed fused to the synthesis and semi-synthesis is being able to analyse
corresponding C-intein in E. coli with uniform 15N/13C the effects of PTMs in an in vivo context. PTS strategies
labelling. After PTS, the resulting segmentally labelled have shown promising developments in this direction,
β2-adrenoreceptor was reconstituted into nanodiscs and with modification of histones in isolated chromatin
phosphorylated enzymatically with GRK2 kinase. NMR and nuclei, conditional switching of inteins in cells and,
spectroscopic analysis revealed conformational changes recently, fluorescent labelling of the outer mitochon-
of the C-terminal and membrane-embedded regions drial protein TOM20 (refs92,118–120). Cell-penetrating
upon phosphorylation and, thus, provide insights into the peptide tags, microinjection and electroporation have
mechanism of how this receptor recognizes β-arrestins in also been used to introduce (semi-)synthetic proteins
cell-signalling pathways115. into cells121,122. In a study demonstrating this potential, a
fluorescent label was introduced into cellular histones by
Advantages, disadvantages and challenges. The num- conjugating a fluorescent-tag-bearing split intein (IntC)
ber of site-specifically modified proteins accessed by segment to a cell-penetrating peptide123. This construct
protein synthesis and semi-synthesis gives testimony to was then transported into cells to undergo PTS with its
the value of these tools for deciphering PTMs. As illus- corresponding IntN fragment fused to the desired his-
trated in many of the examples above, one of the biggest tone. Similar strategies could be envisioned for intro-
advantages of protein synthesis and semi-synthesis is the ducing native PTMs. While in vitro assays remain key
ability to introduce multiple PTMs of the same or dif- to deciphering the precise effects of PTMs, elucidating
ferent types in any part of a protein. PTMs with native mechanisms and carrying out structural studies, increas-
linkages can be installed irrespective of their size, in con- ing abilities to manipulate and assemble proteins within
trast to genetic-code expansion, which would typically cells or introduce (semi-)synthetic proteins into cells
require large PTMs to be installed using a bioorthogo- should bring exciting developments in deciphering the
nal handle. Developments in NCL strategies and other roles of specific PTMs in a cellular context.
amide-forming ligations have obviated the original
requirement for a cysteine residue at the ligation site and Enzymatic modifications and ligations
intein technologies provide access to recombinant pro- Nature’s catalysts — enzymes — are highly optimized
tein thioesters and hydrazides for ligation, removing the for efficient and site-specific chemical transformations
constraints on sequence length imposed by SPPS. These under native conditions. Our ability to produce many
advances have allowed access to proteins and PTMs not enzymes recombinantly enables us to use them as tools,
accessible by other means, such as proteins that are resulting in enzymes being widely used for protein
toxic to cells. As more PTMs are discovered, suitably labelling, conjugation and modification124. In so-called
protected building blocks for their installation by SPPS ‘chemoenzymatic’ strategies, a combination of chemical
are being developed, as illustrated, for example, by a and enzymatic methods is used for synthesis or conju-
study on a non-enzymatic glycation end product PTM gation of biomolecules69. As more robust and specific
argpyrimidine116. An SPPS building block for this modi- enzymes are engineered and chemical methods are
fication of arginine with methylglyoxal was synthesized, developed, this powerful combination is increasingly
installed in a synthetic segment of heat shock protein used to access post-translationally modified proteins,
HSP27 and ligated to the remaining recombinant seg- using enzymes to either attach PTMs to proteins or to
ment to study the effects of argpyrimidine PTMs on ligate protein segments bearing PTMs.
chaperone activity116.
Disadvantages of protein synthesis and semi-synthesis Installation of PTMs using enzymes. Enzymes install
include the need for synthetic expertise and facilities. PTMs on proteins with high yield and fidelity, but
Synthesis and ligation strategies might also require opti- carry­ing out these reactions in vitro to access homoge-
mization for each target protein. Although numerous neous, site-specifically modified proteins is not trivial3.
options are available to address problems with purity, sol- Firstly, the enzymes responsible for many PTMs are
ubility and ligation efficiency, and efforts have been made unknown and identifying PTM enzyme–substrate pairs
to predict the best ligation junctions117, these bottle­necks is a growing research area, itself employing many elegant
can be unforeseen and time-consuming. A further chemical biology tools125–128. We have already met such
potential disadvantage is that most in vitro ligations are an example, shown in FiG. 4c, in which a SUMO–Ubc9
carried out under denaturing conditions at high protein conjugate was made to capture and identify E3 ligases
concentrations, necessitating a refolding step before the that bind to this complex91. Even if known, the enzyme
modified protein can be used in functional or struc- might be difficult to extract or produce recombinantly,
tural studies. This critical step can prove problematic. and some PTMs such as ubiquitylation involve a sequen-
Although many impressive syntheses and semi-syntheses tial cascade of enzymes that would be challenging to rep-
of large, complex proteins and some membrane proteins licate in vitro. Secondly, many enzymes recognize and
have been demonstrated68,69, the majority of studies install a PTM at more than one site, resulting in hetero­
has focused on small or medium-sized soluble pro- geneity that makes it difficult to attribute a function to
teins. Wider application of new ligation technologies to a particular PTM. Despite these potential drawbacks,

Nature Reviews | Chemistry


Reviews

there are many successful studies in which enzymes have line and the glycan structures trimmed to a uniform
been used to access site-specifically modified proteins, core-fucosylated GlcNAc-EPO using endoglycosidases
as illustrated by selected examples below. (FiG. 6c). This trimmed EPO was then a substrate for suc-
Enzymes that recognize specific protein sequences cessive enzymatic transglycosylation reactions, yielding
and install a PTM are valuable tools that can be used to site-selective complex N-glycosylation132. Recently, a
modify recombinant or synthetic protein substrates. One similar strategy has been carried out on living cell sur-
example is the enzyme farnesyltransferase124. Protein pre- faces; heterogeneous N-glycoforms were removed selec-
nylation (most commonly farnesylation and geranylger- tively by enzymes and then defined glycan structures
anylation) is a PTM in which an isoprenoid moiety is were assembled at those sites133.
transferred to a cysteine side chain, forming a thioether
linkage, and usually leading to membrane association of Protein ligation using enzymes. Chemical amide-
the modified protein. Farnesyltransferase recognizes a forming ligations can be extended and complemented
C-terminal CaaX sequence motif in which ‘C’ is cysteine, by enzymatic methods employing transpeptidases and
‘a’ is a small aliphatic amino acid and ‘X’ is an amino acid peptide ligases. Indeed, the tools described above using
that determines the substrate specificity124. Using farnesyl inteins for protein semi-synthesis and PTS can also be
diphosphate as a substrate, farnesyltransferase transfers regarded as chemoenzymatic strategies69. The number,
the 15-carbon isoprenoid to the cysteine side chain. In a selectivity and versatility of enzymes available for protein
recent report making use of this site selectivity, farnesyl­ ligation has grown in the last decade due to discovery
transferase was used to farnesylate isotope-labelled per- and protein-engineering efforts, and they are increas-
oxisomal membrane protein receptor PEX19 (ref.129) ingly being used in combination with synthetic and
(FiG. 6a). Elegant protein-labelling strategies and NMR genetic-code-expansion strategies to access complex
experiments that enable detection of interactions site-specifically modified proteins, as illustrated by three
between labelled protein residues and the unlabelled examples below134.
farnesyl moiety were combined to achieve assignment The lipid phosphatase PTEN (phosphatase and ten-
and structure determination of farnesylated PEX19. sin homologue) is regulated by phosphorylation at its
Glycosyl transferase enzymes, in contrast, can be C-terminal region and acts as a tumour suppressor. To
promiscuous, and the way in which glycotransferase access semi-synthetic phosphorylated PTEN without
enzymes recognize different substrates and protein sites introducing a non-native cysteine at the ligation junc-
for glycosylation is still not fully understood130. Efforts tion, the ligation was carried out with subtiligase, which
to map these enzyme–substrate specificities include the ligates a C-terminal ester-bearing or thioester-bearing
so-called ‘bump–hole strategy’, in which an enzyme that protein segment to another protein segment and has a
installs a PTM is engineered with a ‘hole’ (usually sub- broad sequence tolerance135,136. The N-terminal segment
stitution of a bulky residue with glycine), in order that it of PTEN was fused to an intein and expressed in insect
can then accept an analogue of its respective cofactor that cells, followed by thiol-mediated cleavage to yield an
bears a bioorthogonal handle ‘bump’ used to tag the sub- α-thioester (FiG. 6d). Subtiligase was then used to ligate
strate protein126. The bioorthogonal handle is, therefore, this segment to the synthetic C-terminal segment bear-
installed only on substrates of the engineered enzyme. ing four phosphorylated residues. The phosphorylated
This approach has been used widely for many PTMs, PTEN variant had a slower rate of phosphatase activity
and was recently applied to N-acetylgalactosaminyl than its unmodified counterpart and a more compact
transferase enzymes, which were engineered to accept conformation135.
azide-bearing or alkyne-bearing uridine diphosphate Sortase A is a widely used transpeptidase enzyme that
N-acetylgalactosamine analogues and install them on recognizes the LPxTG peptide sequence motif, cleaving
their substrate proteins125. While this approach sought to the Thr–Gly peptide bond to form an acyl-enzyme
identify the proteins glycosylated by particular enzymes, thioester intermediate. Peptide or protein segments
other research has focused on directing glycotrans- bearing one or more N-terminal glycine residues can
ferase enzymes to particular substrates. In one exam- then act as nucleophiles, resulting in transpeptidation. In
ple, O-GlcNAc transferases were targeted to specific an elegant combination of both enzymatic ligation and
proteins by fusing them to a nanobody131. A nanobody PTS, sortase was used to ligate a synthetic PTM-bearing
that recognizes the sequence ‘EPEA’ at the C-terminus of peptide to the C-intein fragment of a split intein, which
α-synuclein was fused to an O-GlcNAc transferase and then underwent PTS with its cognate split intein, splicing
co-expressed with α-synuclein in HEK293T cells (FiG. 6b). itself out and yielding a semi-synthetic protein137. This
Compared to overexpression of full-length O-GlcNAc transpeptidase-assisted intein ligation (TAIL) strategy
transferase, which increases overall O-GlcNAc levels was applied to several targets, including the 486-residue
in the proteome, co-expression of α-synuclein with the protein methyl-CpG-binding protein 2 (MeCP2), which
nanobody–O-GlcNAc transferase resulted in selectively regulates transcription in nerve cells. As shown in FiG. 6e,
increased O-GlcNAcylation of α-synuclein131. However, semi-synthetic MeCP2 was generated, bearing a native
the site specificity of glycosylation on α-synuclein was phosphorylation at Ser423 and a diazirine-containing
not determined. Another approach to the application crosslinker. Although the C-intein was mutated slightly
of glycotransferases made use of a sequence of gly- to accommodate an LAxTG motif for recognition by an
cotransferase enzymes to ‘remodel’ glycans present on evolved sortase variant, the resulting semi-synthetic pro-
an expressed protein. Heterogeneously glycosylated tein bears no ligation ‘scar’ because this motif is in the
EPO was produced in an engineered mammalian cell intein, which is spliced out. The TAIL strategy further

www.nature.com/natrevchem
Reviews

a CaaX box O O
P P
SH O O O S
O R2 O O O O R2 O
H H H H
PEX19 N N PEX19
N N OH N N OH
Farnesytransferase, N N
H H H H
O R1 O R3 Zn2+, Mg2+ O R1 O R3

b OH
(nEPEA)nb
O
EPEA O-GlcNAc HO
transferase HO O H/Me
-synuclein
NHAc
n
-synuclein

N3
OH OH N3
N3
O O
O O
HO N3 HO
N O N O
24 83 24 83 24 83 24 83
EPO EPO EPO EPO
38 38 38 38
Deglycosylation Transglycosidation Transglycosidation
(EndoH) (EndoF3-D165A) (EndoA)
Core-fucosylated
GlcNAc-EPO
Man5GlcNAc2Fuc

d O
O P O
O
O
Intein
S380, T382, O P O
T383, S385
SH O S380, T382,
O O PTEN(378–403)
T383, S385

PTEN(1–377) N Intein Inteincleavage PTEN(1–377) SR1 Subtiligase-mediated PTEN(1–377) PTEN(378–403)


H (thiolysis) expressed protein ligation

e O O O
O P O N N O P O N N O P O N N
O O MeCP2(2–412) IntN O
423 424 423 424 423 424 IntN
IntC MeCP2(413–486) IntC MeCP2(413–486) MeCP2(2–412) MeCP2(413–486) IntC
Transpeptidation Protein trans-splicing

f O O O
H H Ub LALT H
N N N
N3 NH H2N NH N NH
G G H G G
O O O

Ub LALTGG
164 164 164
Genetic-code PCNA Staudinger PCNA Transpeptidation PCNA
expansion reduction sortase A

Fig. 6 | enzymatic protein modifications and ligations. a | Farnesylation of selectively isotope-labelled peroxisomal
membrane protein receptor PEX19 by farnesyltransferase, which recognizes a C-terminal ‘CaaX’ motif129. b | Targeting
an O-GlcNAc transferase to α-synuclein by fusing it to a nanobody that binds to an ‘EPEA’ motif at the C-terminus of
α-synuclein131. c | Trimming and site-selective transglycosylation on erythropoietin (EPO). Recombinant N-glycosylated
EPO was trimmed to a uniform core-fucosylated GlcNAc-EPO intermediate. Transglycosylation with sialylated and
azide-tagged glycans yielded homogeneous, multiply-glycosylated EPO132. Chemical structure of the sialyl-oligosaccharide
is shown in FiG. 2e and the red triangle represents l-fucose. d | Semi-synthetic phosphorylated lipid phosphatase PTEN
accessed via enzyme-catalysed expressed protein ligation to avoid introducing a non-native cysteine at the ligation site135.
e | Transpeptidase-assisted intein ligation used to access methyl-CpG-binding protein 2 (MeCP2). Synthetic C-terminal
segment bearing a site-specific phosphorylation and photocrosslinker, and fused to part of a C-intein was ligated to the
remainder of the C-intein using sortase. The split intein then underwent trans-splicing to yield full-length MeCP2 (ref.137).
f | Site-specific ubiquitylation using a combination of genetic-code expansion and sortase-mediated protein ligation,
yielding a native isopeptide linkage138.

extends the scope of rapidly splicing inteins such as the and in live cells, with the possibility to trigger specific
Cfa DnaE intein, which would be challenging to incor- ubiquitylation events138. The non-canonical amino acid
porate by SPPS in a synthetic peptide segment due to the AzGGK bears a glycine–glycine moiety for sortase liga-
length of the fragments137. tion but is masked by an azide at the N-terminus (FiG. 6f).
In an impressive combination of genetic-code expan- This non-canonical residue was incorporated site spe-
sion and sortase-mediated protein ligation, ubiqui- cifically in the target protein (PCNA in this example)
tin and SUMO were ligated to target proteins in vitro by genetic-code expansion. To induce ubiquitylation,

Nature Reviews | Chemistry


Reviews

Aa O Ab O
O
O
O
O O
OBn
Ac
O N N O
O P O O O P P HO P P N
N P O P OEt
O O OH N
O O OEt
OH OH OH Fmoc OH OH Boc OH
H2N H2N H2N N H2N N
H H
O O O O O O
Phosphoserine Aspartic acid as Pyrophosphoserine Stable pyrophosphoserine 3-Phosphohistidine 3-Phosphoryltriazolylalanine
phosphoserine mimic analogue for SPPS non-hydrolysable pHis SPPS building block

B OH
OH
O H
O HO
HO H HO N
HS HO N S
I NHAc
38 NHAc O 38
O
EPO EPO
Cysteine alkylation
O R O R R=CH3;
C C(CH3)2OH;
NH NH ubiquityl
O N HN

H H
H
HS O S N R S S
H2N
Cl 56
56 H O 56 NaBH3CN 56
H3 H3 H3 H3
Cysteine alkylation Hydrazone formation Reduction

D N

HS Br S
N
36 36
H3
Br H3
Cysteine alkylation

E Ub Asp
HS HS
O O
YUH1
NH Ub N Ub NH
NH2NH2
SAcm
H2 O
HS N S S S
Br
15
Ub NHNH2 15 15
15
H3 H3 H3 H3
i) Cysteine alkylation Ligation Auxiliary removal
ii) Acm removal

F HS
X = I, Br
i) DTT R R = PTM mimic
X
ii) 2,5-Dibromohexane diacetamide
R
O
O P O R
O Zn0, CuII
Dehydroalanine
or NaBH4, NH4OAc
Ba(OH)2

G OH OH
OH OH
O O
N O
O O N O P O P O
N O O
N O P O P O N O O
N
O
N N O O
NH2 O
OH OH N
NH2 O
N3 OH OH N
N

SPPS 42
CuAAC 42
Ub(1–76)
ADPr homoalanine triazole analogue Ub(1–76)
Azidohomoalanine

the azide was reduced using Staudinger reduction to the ubiquitin-binding domains. Using this strategy, ubiq-
corresponding amide using 2-(diphenylphosphino) uitylation and SUMOylation could also be achieved
benzoic acid (2DPBA). Meanwhile, the C-terminus in mammalian cells by co-expressing all the compo-
of ubiquitin was mutated to incorporate the LALTGG nents and triggering ubiquitylation/SUMOylation by
sortase-recognition motif. The ligation yielded target treatment with 2DPBA138.
proteins conjugated to ubiquitin via a native isopeptide
bond. Mutations in the ubiquitin C-terminus prevent Advantages, disadvantages and challenges. The abil-
its removal by deubiquitinases, but the native isopep- ity to carry out enzymatic transformations under
tide protein–ubiquitin linkages are still recognized by native conditions and at low protein concentrations is

www.nature.com/natrevchem
Reviews

◀ Fig. 7 | mimics and stable analogues of Ptms. Aa | Aspartic acid as a phosphoserine mechanisms are yielding enzymes with highly tuned
mimic. Ab | Solid-phase peptide synthesis (SPPS) building block for installing a stable activities and the challenge of combining them to access
analogue of pyrophosphoserine. Ac | SPPS building block for installing a stable analogue site-specifically modified proteins to decipher the roles
of 3-phosphohistidine. B | Glycosylation of erythropoietin (EPO) via cysteine alkylation of PTMs is well within reach.
with iodoacetamide-functionalized GlcNAc147. C | Hydrazide mimics of lysine acylation
post-translational modifications (PTMs) on histone H3 generated via cysteine alkylation,
hydrazone formation and reduction to a hydrazide148. D | Thioether-linked mimics of lysine
PTM mimics and stable analogues
trimethylation on histone H3 generated by alkylation of cysteine150. e | Ubiquitination The goal of most of the genetic-code expansion, protein
of histone H3 via cysteine alkylation and auxiliary-mediated native chemical ligation (semi-)synthesis and enzymatic strategies we have dis-
(cysteine-aminoethylation-assisted chemical ubiquitylation)151. F | β-Elimination of cussed so far has been to access site-specifically mod-
cysteine or phosphoserine to yield dehydroalanine as a bioorthogonal handle for thiol ified proteins bearing native PTMs. However, native
nucleophiles or radicals. Reaction of carbon-centred radicals with dehydroalanine PTMs might not be necessary or even desirable for
was used to access a wide variety of PTMs with native carbon–carbon linkages154,155. certain applications; chemical stability and resistance to
G | An azide-functionalized non-canonical amino acid was incorporated into ubiquitin, enzymes that remove native PTMs can be distinct advan-
providing a bioorthogonal handle for ligation of an ADP-ribosylation analogue via tages. The simplest PTM mimics are substitutions of the
azide–alkyne click chemistry156. Acm, acetamidomethyl; Boc, tert-butyloxycarbonyl;
modified residue with a canonical amino acid mimic,
Fmoc, fluorenylmethyloxycarbonyl.
such as aspartic/glutamic acid for phosphoserine/phos-
phothreonine and glutamine for acetyllysine (FiG. 7A).
a significant advantage for accessing complex proteins Although these canonical amino acid mimics have been
that are not amenable to refolding. As demonstrated widely used because they are easy to install using mod-
above, the potential to carry out these reactions in cells ern molecular biology methods, care should be taken
and to combine them with other tools for accessing in extrapolating effects of such mutations, as they may
post-translationally modified proteins is also attrac- be poor mimics of the chemical, structural and func-
tive. Further advantages of enzyme-mediated trans- tional properties of the native PTM, as has been noted
formations are that the enzymes can potentially be in several studies100,139–141. Most large and complex PTMs,
co-expressed with the protein target for in-cell modi- however, have no suitable mimic amongst the canonical
fication or, after in vitro modification, can be separated amino acids, and more versatile strategies are needed.
from the modified protein product by affinity purifica- Such strategies usually involve selective conjugation to
tion. If an enzyme installs a PTM site specifically at a a canonical or non-canonical amino acid side chain and
particular protein sequence, this can be used to access have the advantage that late-stage conjugation of a PTM
PTMs in central protein regions. While this is possible in or mimic increases flexibility and modularity, enabling
principle with protein semi-synthesis approaches, it usu- a variety of modified proteins to be generated quickly.
ally requires several ligation steps and functional-group Chemoselective reactions with side chains of canon-
transformations. ical amino acids are valuable tools because proteins can
Disadvantages of enzyme-mediated protein-ligation be expressed using standard recombinant expression
strategies arise predominantly from the sequence methods and then modified. The thiol side chain of
specificity, or lack thereof, of the enzyme. Whereas cysteine is the most widely used and is discussed as
engineered ligases such as subtiligase, tryptiligase and an example, but chemoselective reactions at other
peptiligase have broad sequence tolerance and can be residues are also available and have been reviewed
used for traceless ligations, their promiscuity precludes recently13,62,142. As we have seen in previous sections,
their use for semi-synthesis in complex cellular envi- non-canonical amino acids bearing bioorthogonal
ronments. Furthermore, preparation and stability of handles can be installed using genetic-code expan-
the acyl donor protein segments (e.g. thioesters) can be sion, protein (semi-)synthesis and enzymatic ligations,
challenging, and these have the proclivity to cyclize if allowing for conjugation of PTM mimics or PTMs with
the N-terminus of the segment is not protected. On the non-native linkages. PTM mimics can also be installed
other hand, transpeptidases such as sortase, butelase and directly by SPPS, and some SPPS building blocks for
oaAEP have narrower sequence tolerance, which can stable mimics of otherwise unstable PTMs have ena-
give high specificity, but they often leave a non-native bled these PTMs to be studied63. An illustrative exam-
peptide scar at the ligation site. As enzymatic transfor- ple of the latter is pyrophosphoserine, a labile and less
mations are usually carried out on folded proteins under studied PTM in which a second phosphoryl group is
native conditions, another potential drawback of chemo- transferred to a phosphoserine side chain from inosi-
enzymatic ligation strategies can be lack of accessibility tol pyrophosphate. An SPPS building block was syn-
of the respective enzyme to the ligation or PTM site. thesized in which the labile phosphoanhydride bond
The benefits of combining different ligation strategies was replaced with a stable methylene-bisphosphonate
have been illustrated in several examples above, but a group 143 (FiG. 7A) . The building block was incorpo-
remaining challenge is to combine multiple enzymatic rated into peptides using SPPS and showed enhanced
strategies for protein modification and ligation. Unlike chemical and biological stability. Another labile
synthetic and bioorthogonal strategies, which rely on PTM for which stable mimics have been developed
orthogonality of functional groups, enzymatic modi- as SPPS building blocks is phosphohistidine, a PTM
fications generally rely on sequence and/or structural involved in prokaryotic signal transduction but also
recognition, which offers great potential for orthogonal- found in eukaryotic cells. Azide–alkyne cycloaddi-
ity, even in a one-pot setting. Developments in protein tions catalysed by either Cu(I) or Ru(II) yield stable
engineering and structural understanding of enzyme triazolylalanine derivatives of 3-phosphohistidine

Nature Reviews | Chemistry


Reviews

or 1-phosphohistidine, respectively 144 (FiG. 7A) . histone analogues with thioether linkages151. As shown
The 3-phosphohistidine analogue was incorporated in FiG. 7E, recombinant histones with lysine-to-cysteine
into histone H4 via SPPS and protein semi-synthesis, mutations were treated with a bifunctional handle com-
and used to generate anti-phosphohistidine antibodies. prising 2-bromoethylamine N-alkylated with a protected
2-mercapto-2-phenethyl auxiliary. Separately, ubiquitin
Conjugation to cysteine. The relatively low abundance bearing a C-terminal acyl hydrazide or thioester func-
of cysteine in proteins, the nucleophilicity of its thiol tionality was generated via intein technology or from
side chain and the ease with which it can be introduced ubiquitin appended with a C-terminal aspartic acid
by mutation at the gene level make it a useful handle and cleaved with the ubiquitin C-terminal hydrolase
for installing PTMs and mimics142,145. One of the most enzyme YUH1 in the presence of 5% aqueous hydrazine
common strategies for protein modification at cysteine to yield the ubiquitin acyl hydrazide. After removal of
is direct alkylation with electrophiles, such as iodoaceta- the thiol protecting group on the auxiliary attached to the
mide — a reagent also used for mass-spectrometric map- histone, hydrazide-based, auxiliary-mediated NCL
ping of disulfides146. In a study on glycosylation of EPO, to the ubiquitin acyl hydrazide yielded the ubiquitin–
glycosylation-site asparagine residues were mutated histone conjugate. Finally, removal of the auxiliary
to cysteine and the protein was expressed in E. coli afforded the ubiquitylated histone linked by a native
and folded. Homogeneous N-glycosylation mimics isopeptide bond151.
were generated by reaction of the cysteine thiol with For completeness, we note that alkylation of cysteine
glycosyl-β-N-iodoacetamide, as shown in FiG. 7B (ref.147). can also be achieved using Michael acceptors such as
More recently, cysteine alkylation was also used to gener- maleimides, vinyl sulfones and α,β-unsaturated systems,
ate hydrazide mimics of both large and small lysine acy- and is widely used for conjugation of tags and labels to
lation PTMs148. Histones containing lysine-to-cysteine proteins142,145. Alternatively, a disulfide bond can provide
mutations (such as H3K56C shown in FiG. 7C) were a reversible cysteine-conjugation strategy. Illustrating
expressed in bacteria, purified and treated with chlo- this application for ubiquitylation of histones, an ami-
roacetaldehyde. Subsequent reaction of the aldehyde noethane thiol linker was ligated to the C-terminus of
moiety with an acyl-hydrazide yielded a hydrazone, ubiquitin via NCL, and the ubiquitin and histone pro-
which was then reduced with sodium cyanoborohydride teins were then linked via a disulfide bond152. As either
to the corresponding hydrazide. This strategy was used to the histone or the ubiquitin proteins could be uniformly
install acetyl, 2-hydroxyisobutyryl and ubiquityl PTM isotope labelled, a segmentally labelled ubiquitylated
mimics into histones, which were then incorporated into histone was used in NMR studies to show how ubiq-
nucleosomes to study their effects on nucleosome stabil- uitin interacts with the acidic patch on nucleosomes to
ity and dynamics148. In a further example illustrating the modify chromatin structure152. Using a similar strategy,
use of α-halocarbonyls for cysteine alkylation, a homo- disulfide-linked ubiquitylated α-synuclein variants were
geneous, monoglycosylated insulin was synthesized that also generated. After aggregation, the PTM could be
has comparative activity to native insulin but is more removed by reduction of the disulfide, allowing struc-
stable and does not form fibrils149. In this strategy, the tures of the α-synuclein amyloid fibrils to be compared99.
three native intrachain and interchain disulfide bonds Furthermore, desulfurization of disulfide bonds to the
were formed stepwise using orthogonal protecting-group corresponding thioether has been used to generate
removal, followed by selective deprotection of an addi- thioether-linked glycoproteins153.
tional cysteine introduced at the N-terminus of the insulin Finally, β-elimination of cysteine to yield dehydroa-
B-chain. This seventh cysteine was then alkylated with a lanine can provide a valuable bioorthogonal handle that
bromo-acetamidyl undecasaccharide isolated from egg acts as a Michael acceptor for thiol nucleophiles or rad-
yolk and derivatized149. icals. Two groups have used radical-based mechanisms
Numerous cysteine aminoethylation strategies have to form carbon–carbon bonds in proteins and, thereby,
been used to access proteins bearing mimics of lysine access a wide variety of site-selective PTMs154,155 (FiG. 7F).
PTMs to decipher mechanistic and structural effects9. In one study, the dehydroalanine handle was generated
To study structural interactions of methylated histones, from phosphoserine incorporated by genetic-code
histones containing a cysteine mutant were treated with expansion and the phosphate eliminated under alkaline
(2-bromoethyl)trimethylammonium bromide to yield a conditions155. In the other study, dehydroalanine was
trimethyllysine mimic150 (FiG. 7D). Cryo-electron micros- generated by bis-alkylation-elimination of cysteine154.
copy revealed the structure of a PWWP reader domain Carbon-centred radicals were then generated from
of human transcriptional co-activator LEDGF bound to alkyl bromides or iodides and conjugated to dehydro-
H3 bearing the trimethyllysine mimic in a nucleosome. alanine. Although the resulting modified proteins con-
The reader domain binds the charged H3K36me3 mimic tain native carbon–carbon bond linkages to the PTMs,
side chain with a cation–π interaction with a cluster of stereochemistry at the α-carbon cannot be directed154,155.
aromatic residues and also binds to both turns of nucle-
osomal DNA150. Another common lysine PTM that has Conjugation to non-canonical amino acids. Conjugation
been mimicked by cysteine alkylation is ubiquitylation. of PTM mimics to non-canonical amino acids incor-
In a ‘cysteine-aminoethylation-assisted chemical ubiq- porated by genetic-code expansion or SPPS provides
uitylation’ (CAACU) strategy, cysteine alkylation was increased bioorthogonality compared to cysteine,
combined with auxiliary-mediated native chemical liga- avoiding the need to protect or mutate native cysteines.
tion to produce mono-ubiquitylated and di-ubiquitylated Several examples have already been mentioned in

www.nature.com/natrevchem
Reviews

previous sections — representing only a small selection position relative to the kinase docking site and had differ-
of biocompatible, chemoselective conjugation reactions. ing effects on transcriptional activation. Based on these
As bioorthogonal ligations for conjugation of PTMs and studies, the authors concluded that multiple PTM sites
labels to proteins have been extensively reviewed10–13,18, on a protein can be self-limiting, and that PTM states are
we only illustrate the principle and potential advan- not only controlled by PTM-removing enzymes such as
tages here with one example. Peptides bearing phosphatases158.
ADP-ribosylation PTMs are susceptible to both acidic The tools discussed so far have mostly focused on
and basic conditions, making synthetic approaches installing PTMs on proteins, but an understanding of
challenging, so an azide–alkyne click chemistry strategy enzymes and processes that remove PTMs will be equally
was used to access ADP-ribosylated proteins156. While important in understanding PTM dynamics160. The ability
the same group has reported the synthesis of an SPPS to remove selected PTMs in a controlled manner could be
building block for mono-ADP-ribosylation and incor- a potential tool for monitoring and controlling dynamic
porated it into peptides via SPPS157, azide-functionalized protein interactions and cellular processes in real time.
and alkyne-functionalized amino acids can also be A recent study demonstrated the potential of selectively
readily incorporated into proteins either via SPPS or removing lysing acylations. Using directed evolution,
genetic-code expansion. Using the SPPS approach, an lysine deacetylases were selected that were 400 times more
alkyne-bearing ADP-ribosylation analogue was syn- selective for butyryl-lysine over crotonyl-lysine and could
thesized and ligated to β-azidoalanine-functionalized shift the proportion of cellular acetylation towards croto-
or azidohomoalanine-functionalized peptides or nylation by removing butyryl PTMs161. A wider range of
ubiquitin via azide–alkyne click chemistry, yield- selective bond-cleavage reactions could also be valuable
ing a stable triazole-linked mono-ADP-ribosylation for controlling removal of PTMs in in vitro and in vivo
analogue156 (FiG. 7G). contexts162. Photo-inducible cleavage (photo-decaging)
reactions and inteins represent examples of such cleav-
New frontiers in deciphering PTMs age reactions that can be precisely controlled and strat-
In the context of a growing appreciation for the diversity egies developed for prodrug activation could also be
of the proteome, the chemical biology tools discussed applied to deciphering dynamic PTM roles.
here have much to offer in deciphering the roles of PTMs
and addressing gaps in our understanding of biological Structural effects of PTMs
processes. While each of the tools has its advantages, Genetic-code expansion, protein semi-synthesis, enzy-
disadvantages and challenges, in combination, they ena- matic modifications and bioorthogonal ligation tools can
ble us to access a vast array of proteins bearing specific all provide access to site-specifically modified proteins
PTMs. An increasing number of examples illustrate the for structural biology studies to decipher the structural
potential of combining tools and collaborating across consequences of PTMs. However, proteins bearing
disciplines. Three frontiers are discussed in which the native PTMs are under-represented in structural data-
ability to access proteins bearing specific PTMs could bases. This is partly because proteins for structural
provide exciting advances: understanding PTM dynam- biology studies are typically produced by recombinant
ics; understanding structural effects of PTMs; and expression in bacteria, which might lack the machinery
understanding PTM crosstalk. to install relevant PTMs. Furthermore, PTMs are often
present in flexible loops of proteins that might not be
PTM dynamics visible in structures derived by X-ray crystallography
The reversible nature of many PTMs allows precise or cryo-electron microscopy163,164. To address these
control of cellular processes, in some cases, by distinct issues, integrated structural biology approaches have the
on/off switching, but in other cases, by more nuanced potential to combine the advantages of X-ray crystal-
control. However, most PTM identification tools (Box 1) lography, NMR spectroscopy, cryo-electron microscopy
give only a snapshot of the overall proteome, and stud- and other structure-determination techniques to give a
ies on modified proteins might not consider that several more complete picture of three-dimensional protein
proteoforms are likely to be present at a given time or interactions involving PTMs. Moreover, wider access
location. Combining access to site-specifically modified to the tools described in this Review and increasing
proteins with techniques that can monitor PTMs in real collaborations between chemical biologists and struc-
time offers the potential to understand PTM dynamics. tural biologists should lead to a better understanding
One such technique is NMR spectroscopy. In an example of how structural changes as a result of PTMs regulate
illustrating how the timing of PTMs can be investigated, protein activity, as well as by direct interactions with
an elegant study using time-resolved NMR spectroscopy binding partners. Indeed, a recent study has shown
showed that phosphorylation of the transcriptional acti- that ubiquitylation alters the energy landscape of folded
vation domain of ELK1 transcription factor occurs at proteins and can destabilize native structures and lead
different rates, depending on the site158. Using the char- to proteasomal degradation, depending on the site of
acteristic down-field NMR chemical shift change of ubiquitylation165. Several of the examples mentioned in
amide resonances upon phosphorylation141,159, the rates this Review have also illustrated how tools for accessing
of phosphorylation of ELK1 transcriptional activation site-specifically modified proteins can also be employed
domain by recombinant ERK2 kinase were monitored for segmental isotope labelling and installing paramag-
in vitro. Fast, intermediate and slow phosphorylation netic relaxation tags to gain structural insights into
sites were identified that were determined by their post-translationally modified proteins111,166,167.

Nature Reviews | Chemistry


Reviews

PTM crosstalk modifications171. Further development of such methods


With several orthogonal tools for installing site-specific that allow monitoring of multiple PTMs in real time and
PTMs in hand, we can now tackle the challenging in live cells and tissues will be key to understanding the
question of how PTMs work in combination to control intricate network of PTM crosstalk.
protein structure and function. Although our ability
to identify PTM sites on proteins has generally far out- Conclusions
paced our understanding of the functions and regulation Deciphering how PTMs regulate protein functions in
of these PTMs, there have been relatively few stud- core biological processes has been facilitated by develop-
ies focused on the simultaneous detection of multiple ment of chemical biology tools to access proteins bear-
PTMs. The value of such studies is illustrated by a recent ing site-specific PTMs. The examples mentioned in this
mass-spectrometry-based proteomic study on site-specific Review provide a small selection of studies illustrating
modification with SUMO. In addition to identifying the variety of PTM types, target proteins and biological
many new SUMOylation sites, co-modifications with processes that have been accessed by these tools. They
phosphorylation, ubiquitylation, acetylation and methyl- also demonstrate the value of the robust bioorthogonal
ation were also identified, as well as structural features of and chemoselective reactions that form the foundations
SUMOylation sites168. Investigating how these PTMs work of this research area, reactions that can be carried out
together and their effects on cellular processes will pro- on proteins in aqueous solution under near-native con-
vide far deeper insights into SUMOylation than studies ditions with high yield and selectivity. Although groups
focused on SUMOylation alone. focusing on identifying PTMs in the proteome and those
Studying PTM crosstalk often requires a combina- focusing on generating modified proteins have relied
tion of tools to be used together. In a recent example, a largely on the same chemistries, these two areas have
combination of antibody-based detection, phosphomi- tended to be mutually exclusive. Valuable insights might
metic mutations and enzymatic modifications were used be gained by collaborative efforts underway to combine
to study the crosstalk of PTMs on folliculin-interacting high-throughput proteomic data on PTMs with the
protein 1 (FNIP1), a co-chaperone of HSP90 (ref.169). The structural and functional information gained by rela-
study showed that accumulation of serine phosphoryla- tively low-throughput studies on site-specifically mod-
tions led to increasing interaction with HSP90, whereas ified proteins. Similarly, increasing use of combinations
dephosphorylation promoted O-GlcNAcylation, ubiqui- of the tools described here for accessing site-specifically
tylation and degradation. Crosstalk of histone PTMs is modified proteins has the potential to address new fron-
another challenging area, as different PTMs can occur tiers in challenging areas such as understanding PTM
on each of the two copies of a histone within a nucle- crosstalk, structural effects and dynamic processes.
osome. Access to asymmetrically modified nucleosomes Exciting new developments in monitoring PTMs on a
has been demonstrated using protein semi-synthesis proteome-wide level and in cellular contexts suggest an
tools and enzymatic modification92,170,171. In one study, increasing impact in understanding the diversity of the
the two copies of H3 were differently isotope labelled, proteome and the chemistry of life.
and acetylation and phosphorylation were monitored
Published online xx xx xxxx
by NMR spectroscopy in the presence of pre-existing

1. Walsh, C. T., Garneau-Tsodikova, S. & Gatto, G. J. Jr 12. Bondalapati, S., Jbara, M. & Brik, A. Expanding 23. Arranz-Gibert, P., Vanderschuren, K. & Isaacs, F. J.
Protein posttranslational modifications: the chemistry the chemical toolbox for the synthesis of large and Next-generation genetic code expansion. Curr. Opin.
of proteome diversifications. Angew. Chem. Int. Ed. uniquely modified proteins. Nat. Chem. 8, 407–418 Chem. Biol. 46, 203–211 (2018).
44, 7342–7372 (2005). (2016). 24. Neumann, H., Peak-Chew, S. Y. & Chin, J. W.
2. Aebersold, R. et al. How many human proteoforms 13. Hoyt, E. A., Cal, P. M. S. D., Oliveira, B. L. & Genetically encoding Nε-acetyllysine in recombinant
are there? Nat. Chem. Biol. 14, 206–214 (2018). Bernardes, G. J. L. Contemporary approaches to proteins. Nat. Chem. Biol. 4, 232–234 (2008).
3. Barber, K. W. & Rinehart, J. The ABCs of PTMs. site-selective protein modification. Nat. Rev. Chem. 3, 25. Nguyen, D. P., Garcia Alai, M. M., Kapadnis, P. B.,
Nat. Chem. Biol. 14, 188–192 (2018). 147–171 (2019). Neumann, H. & Chin, J. W. Genetically encoding
4. Farley, A. R. & Link, A. J. Identification and 14. Radziwon, K. & Weeks, A. M. Protein engineering Nε-methyl-l-lysine in recombinant histones. J. Am.
quantification of protein posttranslational for selective proteomics. Curr. Opin. Chem. Biol. 60, Chem. Soc. 131, 14194–14195 (2009).
modifications. Methods Enzymol. 463, 725–763 10–19 (2020). 26. Groff, D., Chen, P. R., Peters, F. B. & Schultz, P. G.
(2009). 15. UniProt Consortium. Controlled vocabulary of A genetically encoded ε-N-methyl lysine in mammalian
5. Chuh, K. N. & Pratt, M. R. Chemical methods for the posttranslational modifications (PTM). UniProt cells. ChemBioChem 11, 1066–1068 (2010).
proteome-wide identification of posttranslationally https://www.uniprot.org/docs/ptmlist (2020). 27. Nguyen, D. P., Garcia Alai, M. M., Virdee, S. &
modified proteins. Curr. Opin. Chem. Biol. 24, 27–37 16. Liu, C. C. & Schultz, P. G. Adding new chemistries to Chin, J. W. Genetically directing ɛ-N, N-dimethyl-l-
(2015). the genetic code. Annu. Rev. Biochem. 79, 413–444 lysine in recombinant histones. Chem. Biol. 17,
6. Harmel, R. & Fiedler, D. Features and regulation (2010). 1072–1076 (2010).
of non-enzymatic post-translational modifications. 17. Chin, J. W. Expanding and reprogramming the genetic 28. Akahoshi, A., Suzue, Y., Kitamatsu, M., Sisido, M.
Nat. Chem. Biol. 14, 244–252 (2018). code. Nature 550, 53–60 (2017). & Ohtsuki, T. Site-specific incorporation of arginine
7. Muir, T. W., Sondhi, D. & Cole, P. A. Expressed protein 18. Lang, K. & Chin, J. W. Cellular incorporation of analogs into proteins using arginyl-tRNA synthetase.
ligation: a general method for protein engineering. unnatural amino acids and bioorthogonal labeling Biochem. Biophys. Res. Commun. 414, 625–630
Proc. Natl Acad. Sci. USA 95, 6705–6710 (1998). of proteins. Chem. Rev. 114, 4764–4806 (2014). (2011).
8. Chuh, K. N., Batt, A. R. & Pratt, M. R. Chemical 19. Koh, M., Yao, A., Gleason, P. R., Mills, J. H. & 29. Park, H. S. et al. Expanding the genetic code of
methods for encoding and decoding of posttranslational Schultz, P. G. A general strategy for engineering Escherichia coli with phosphoserine. Science 333,
modifications. Cell Chem. Biol. 23, 86–107 (2016). noncanonical amino acid dependent bacterial 1151–1154 (2011).
9. Wang, Z. A. & Cole, P. A. The chemical biology of growth. J. Am. Chem. Soc. 141, 16213–16216 30. Rogerson, D. T. et al. Efficient genetic encoding
reversible lysine post-translational modifications. (2019). of phosphoserine and its nonhydrolyzable analog.
Cell Chem. Biol. 27, 953–969 (2020). 20. Wang, L., Brock, A., Herberich, B. & Schultz, P. G. Nat. Chem. Biol. 11, 496–503 (2015).
10. Sletten, E. M. & Bertozzi, C. R. Bioorthogonal Expanding the genetic code of Escherichia coli. 31. Zhang, M. S. et al. Biosynthesis and genetic encoding
chemistry: fishing for selectivity in a sea of functionality. Science 292, 498–500 (2001). of phosphothreonine through parallel selection and
Angew. Chem. Int. Ed. 48, 6974–6998 (2009). 21. Goto, Y., Katoh, T. & Suga, H. Flexizymes for genetic deep sequencing. Nat. Methods. 14, 729–736 (2017).
11. Schumacher, D. & Hackenberger, C. P. More than code reprogramming. Nat. Protoc. 6, 779–790 32. Chen, S. et al. Incorporation of phosphorylated
add-on: chemoselective reactions for the synthesis of (2011). tyrosine into proteins: in vitro translation and study
functional peptides and proteins. Curr. Opin. Chem. 22. Brown, W., Liu, J. & Deiters, A. Genetic code expansion of phosphorylated IκB-α and its interaction with NF-κB.
Biol. 22, 62–69 (2014). in animals. ACS Chem. Biol. 13, 2375–2386 (2018). J. Am. Chem. Soc. 139, 14098–14108 (2017).

www.nature.com/natrevchem
Reviews

33. Hoppmann, C. et al. Site-specific incorporation of in wild-type bacterial cells. ACS Chem. Biol. 15, 84. Bode, J. W. Chemical protein synthesis with the
phosphotyrosine using an expanded genetic code. 1852–1861 (2020). α-ketoacid–hydroxylamine ligation. Acc. Chem. Res.
Nat. Chem. Biol. 13, 842–844 (2017). 59. Dunkelmann, D. L., Willis, J. C. W., Beattie, A. T. 50, 2104–2115 (2017).
34. Luo, X. et al. Genetically encoding phosphotyrosine & Chin, J. W. Engineered triply orthogonal 85. Baldauf, S., Ogunkoya, A. O., Boross, G. N. & Bode, J. W.
and its nonhydrolyzable analog in bacteria. pyrrolysyl-tRNA synthetase/tRNA pairs enable the Aspartic acid forming α-ketoacid–hydroxylamine
Nat. Chem. Biol. 13, 845–849 (2017). genetic encoding of three distinct non-canonical (KAHA) ligations with (S)-4,4-difluoro-5-oxaproline.
35. Liu, C. C., Cellitti, S. E., Geierstanger, B. H. & amino acids. Nat. Chem. 12, 535–544 (2020). J. Org. Chem. 85, 1352–1364 (2020).
Schultz, P. G. Efficient expression of tyrosine-sulfated 60. Merrifield, R. B. Solid phase peptide synthesis. I. 86. Harmand, T. J., Pattabiraman, V. R. & Bode, J. W.
proteins in E. coli using an expanded genetic code. The synthesis of a tetrapeptide. J. Am. Chem. Soc. Chemical synthesis of the highly hydrophobic antiviral
Nat. Protoc. 4, 1784–1789 (2009). 85, 2149–2154 (1963). membrane-associated protein IFITM3 and modified
36. Italia, J. S. et al. Genetically encoded protein sulfation 61. Bertran-Vicente, J. et al. Chemoselective synthesis variants. Angew. Chem. Int. Ed. 56, 12639–12643
in mammalian cells. Nat. Chem. Biol. 16, 379–382 and analysis of naturally occurring phosphorylated (2017).
(2020). cysteine peptides. Nat. Commun. 7, 12703 (2016). 87. Dumas, A. M., Molander, G. A. & Bode, J. W.
37. Porter, J. J. et al. Genetically encoded protein tyrosine 62. deGruyter, J. N., Malins, L. R. & Baran, P. S. Amide-forming ligation of acyltrifluoroborates and
nitration in mammalian cells. ACS Chem. Biol. 14, Residue-specific peptide modification: a chemist’s hydroxylamines in water. Angew. Chem. Int. Ed. 51,
1328–1336 (2019). guide. Biochemistry 56, 3863–3873 (2017). 5683–5686 (2012).
38. Xiao, H., Xuan, W., Shao, S., Liu, T. & Schultz, P. G. 63. Hauser, A., Penkert, M. & Hackenberger, C. P. R. 88. Noda, H., Erős, G. & Bode, J. W. Rapid ligations with
Genetic incorporation of ε-N-2-hydroxyisobutyryl- Chemical approaches to investigate labile peptide equimolar reactants in water with the potassium
lysine into recombinant histones. ACS Chem. Biol. 10, and protein phosphorylation. Acc. Chem. Res. 50, acyltrifluoroborate (KAT) amide formation. J. Am.
1599–1603 (2015). 1883–1893 (2017). Chem. Soc. 136, 5611–5614 (2014).
39. Zheng, Y., Gilgenast, M. J., Hauc, S. & Chatterjee, A. 64. Hartrampf, N. et al. Synthesis of proteins by 89. White, C. J. & Bode, J. W. PEGylation and dimerization
Capturing post-translational modification-triggered automated flow chemistry. Science 368, 980–987 of expressed proteins under near equimolar conditions
protein–protein interactions using dual noncanonical (2020). with potassium 2-pyridyl acyltrifluoroborates.
amino acid mutagenesis. ACS Chem. Biol. 13, 65. Dawson, P. E., Muir, T. W., Clark-Lewis, I. & Kent, S. B. ACS Cent. Sci. 4, 197–206 (2018).
1137–1141 (2018). Synthesis of proteins by native chemical ligation. 90. Lee, C. L., Liu, H., Wong, C. T., Chow, H. Y. & Li, X.
40. Wang, Z. A. et al. A versatile approach for site-specific Science 266, 776–779 (1994). Enabling N-to-C Ser/Thr ligation for convergent
lysine acylation in proteins. Angew. Chem. Int. Ed. 56, 66. Bode, J. W., Fox, R. M. & Baucom, K. D. Chemoselective protein synthesis via combining chemical ligation
1643–1647 (2017). amide ligations by decarboxylative condensations of approaches. J. Am. Chem. Soc. 138, 10477–10484
41. Nilsson, B. L., Kiessling, L. L. & Raines, R. T. N-alkylhydroxylamines and α-ketoacids. Angew. Chem. (2016).
Staudinger ligation: a peptide from a thioester Int. Ed. 45, 1248–1252 (2006). 91. Zhang, Y. et al. Chemical synthesis of atomically
and azide. Org. Lett. 2, 1939–1941 (2000). 67. Zhang, Y., Xu, C., Lam, H. Y., Lee, C. L. & Li, X. Protein tailored SUMO E2 conjugating enzymes for the
42. Saxon, E., Armstrong, J. I. & Bertozzi, C. R. chemical synthesis by serine and threonine ligation. formation of covalently linked SUMO–E2–E3
A “traceless” Staudinger ligation for the chemoselective Proc. Natl Acad. Sci. USA 110, 6657–6662 (2013). ligase ternary complexes. J. Am. Chem. Soc. 141,
synthesis of amide bonds. Org. Lett. 2, 2141–2143 68. Conibear, A. C., Watson, E. E., Payne, R. J. 14742–14751 (2019).
(2000). & Becker, C. F. W. Native chemical ligation in protein 92. David, Y. & Muir, T. W. Emerging chemistry strategies
43. Rostovtsev, V. V., Green, L. G., Fokin, V. V. & synthesis and semi-synthesis. Chem. Soc. Rev. 47, for engineering native chromatin. J. Am. Chem. Soc.
Sharpless, K. B. A stepwise Huisgen cycloaddition 9046–9068 (2018). 139, 9090–9096 (2017).
process: copper(I)-catalyzed regioselective “ligation” 69. Thompson, R. E. & Muir, T. W. Chemoenzymatic 93. Farrelly, L. A. et al. Histone serotonylation is a
of azides and terminal alkynes. Angew. Chem. Int. Ed. semisynthesis of proteins. Chem. Rev. 120, permissive modification that enhances TFIID binding
41, 2596–2599 (2002). 3051–3126 (2020). to H3K4me3. Nature 567, 535–539 (2019).
44. Tornoe, C. W., Christensen, C. & Meldal, M. 70. Kulkarni, S. S., Sayers, J., Premdjee, B. & Payne, R. J. 94. Dikiy, I. et al. Semisynthetic and in vitro
Peptidotriazoles on solid phase: [1,2,3]-triazoles Rapid and efficient protein synthesis through phosphorylation of alpha-synuclein at Y39 promotes
by regiospecific copper(I)-catalyzed 1,3-dipolar expansion of the native chemical ligation concept. functional partly helical membrane-bound states
cycloadditions of terminal alkynes to azides. Nat. Rev. Chem. 2, 0122 (2018). resembling those induced by PD mutations.
J. Org. Chem. 67, 3057–3064 (2002). 71. Agouridas, V. et al. Native chemical ligation and ACS Chem. Biol. 11, 2428–2437 (2016).
45. Rosner, D., Schneider, T., Schneider, D., Scheffner, M. extended methods: mechanisms, catalysis, scope, 95. Fauvet, B. & Lashuel, H. A. Semisynthesis and
& Marx, A. Click chemistry for targeted protein and limitations. Chem. Rev. 119, 7328–7443 enzymatic preparation of post-translationally
ubiquitylation and ubiquitin chain formation. (2019). modified α-synuclein. Methods Mol. Biol. 1345,
Nat. Protoc. 10, 1594–1611 (2015). 72. Wang, P. et al. Erythropoietin derived by chemical 3–20 (2016).
46. Streichert, K. et al. Synthesis of erythropoietins synthesis. Science 342, 1357–1360 (2013). 96. Levine, P. M. et al. O-GlcNAc modification inhibits
site-specifically conjugated with complex-type 73. Wilson, R. M., Dong, S., Wang, P. & Danishefsky, S. J. the calpain-mediated cleavage of α-synuclein.
N-glycans. ChemBioChem 20, 1914–1918 (2019). The winding pathway to erythropoietin along the Bioorg. Med. Chem. 25, 4977–4982 (2017).
47. Wang, Y., Yang, S. H., Brimble, M. A. & Harris, P. W. R. chemistry–biology frontier: a success at last. 97. El Turk, F. et al. Exploring the role of post-translational
Recent progress in the synthesis of homogeneous Angew. Chem. Int. Ed. 52, 7646–7665 (2013). modifications in regulating α-synuclein interactions by
erythropoietin (EPO) glycoforms. ChemBioChem 74. Unverzagt, C. & Kajihara, Y. Chemical assembly studying the effects of phosphorylation on nanobody
https://doi.org/10.1002/cbic.202000347 (2020). of N-glycoproteins: a refined toolbox to address binding. Protein Sci. 27, 1262–1274 (2018).
48. Dedkova, L. M. & Hecht, S. M. Expanding the scope a ubiquitous posttranslational modification. 98. Chen, H., Zhao, Y.-F., Chen, Y.-X. & Li, Y.-M. Exploring
of protein synthesis using modified ribosomes. J. Am. Chem. Soc. Rev. 42, 4408–4420 (2013). the roles of post-translational modifications in the
Chem. Soc. 141, 6430–6447 (2019). 75. Murakami, M. et al. Chemical synthesis of pathogenesis of Parkinson’s disease using synthetic
49. Oller-Salvia, B. & Chin, J. W. Efficient phage display erythropoietin glycoforms for insights into the and semisynthetic modified α-synuclein. ACS Chem.
with multiple distinct non-canonical amino acids relationship between glycosylation pattern and Neurosci. 10, 910–921 (2019).
using orthogonal ribosome-mediated genetic code bioactivity. Sci. Adv. 2, e1500678 (2016). 99. Moon, S. P., Balana, A. T., Galesic, A., Rakshit, A. &
expansion. Angew. Chem. Int. Ed. 58, 10844–10848 76. Li, Y., Tran, A. H., Danishefsky, S. J. & Tan, Z. Pratt, M. R. Ubiquitination can change the structure of
(2019). Chemical biology of glycoproteins: from chemical the α-synuclein amyloid fiber in a site selective fashion.
50. Reinkemeier, C. D., Girona, G. E. & Lemke, E. A. synthesis to biological impact. Methods Enzymol. 621, J. Org. Chem. 85, 1548–1555 (2020).
Designer membraneless organelles enable codon 213–229 (2019). 100. Pan, B., Rhoades, E. & Petersson, E. J. Chemoenzymatic
reassignment of selected mRNAs in eukaryotes. 77. Ramage, R. et al. Synthetic, structural and biological semisynthesis of phosphorylated α-synuclein enables
Science 363, aaw2644 (2019). studies of the ubiquitin system: the total chemical identification of a bidirectional effect on fibril formation.
51. Anderson, J. C. et al. An expanded genetic code synthesis of ubiquitin. Biochem. J. 299, 151–158 ACS Chem. Biol. 15, 640–645 (2020).
with a functional quadruplet codon. Proc. Natl Acad. (1994). 101. Marotta, N. P. et al. O-GlcNAc modification blocks
Sci. USA 101, 7566–7571 (2004). 78. Sun, H. & Brik, A. The journey for the total chemical the aggregation and toxicity of the protein α-synuclein
52. Neumann, H., Wang, K., Davis, L., Garcia-Alai, M. & synthesis of a 53 kDa protein. Acc. Chem. Res. 52, associated with Parkinson’s disease. Nat. Chem. 7,
Chin, J. W. Encoding multiple unnatural amino acids 3361–3371 (2019). 913–920 (2015).
via evolution of a quadruplet-decoding ribosome. 79. Sun, H. et al. Diverse fate of ubiquitin chain moieties: 102. Lewis, Y. E. et al. O-GlcNAcylation of α-synuclein at
Nature 464, 441–444 (2010). the proximal is degraded with the target, and the serine 87 reduces aggregation without affecting
53. Zhang, Y. et al. A semi-synthetic organism that stores distal protects the proximal from removal and membrane binding. ACS Chem. Biol. 12, 1020–1027
and retrieves increased genetic information. Nature recycles. Proc. Natl Acad. Sci. USA 116, 7805–7812 (2017).
551, 644–647 (2017). (2019). 103. Levine, P. M. et al. α-Synuclein O-GlcNAcylation alters
54. Fischer, E. C. et al. New codons for efficient production 80. Fang, G. M. et al. Protein chemical synthesis by aggregation and toxicity, revealing certain residues as
of unnatural proteins in a semisynthetic organism. ligation of peptide hydrazides. Angew. Chem. Int. Ed. potential inhibitors of Parkinson’s disease. Proc. Natl
Nat. Chem. Biol. 16, 570–576 (2020). 50, 7645–7649 (2011). Acad. Sci. USA 116, 1511–1519 (2019).
55. Iwane, Y. et al. Expanding the amino acid repertoire 81. Hua, X., Chu, G. C. & Li, Y. M. The ubiquitin enigma: 104. Schwagerus, S., Reimann, O., Despres, C.,
of ribosomal polypeptide synthesis via the artificial progress in the detection and chemical synthesis Smet-Nocca, C. & Hackenberger, C. P. Semi-synthesis
division of codon boxes. Nat. Chem. 8, 317–325 of branched ubiquitin chains. ChemBioChem https:// of a tag-free O-GlcNAcylated tau protein by sequential
(2016). doi.org/10.1002/cbic.202000295 (2020). chemoselective ligation. J. Pept. Sci. 22, 327–333
56. Fredens, J. et al. Total synthesis of Escherichia coli with 82. Watson, E. E. et al. Rapid assembly and profiling of (2016).
a recoded genome. Nature 569, 514–518 (2019). an anticoagulant sulfoprotein library. Proc. Natl Acad. 105. Haj-Yahya, M. & Lashuel, H. A. Protein semisynthesis
57. Lajoie, M. J. et al. Genomically recoded organisms Sci. USA 116, 13873–13878 (2019). provides access to tau disease-associated post-
expand biological functions. Science 342, 357–360 83. Maxwell, J. W. C. & Payne, R. J. Revealing the translational modifications (PTMs) and paves the
(2013). functional roles of tyrosine sulfation using synthetic way to deciphering the tau PTM code in health and
58. Kuru, E. et al. Release factor inhibiting antimicrobial sulfopeptides and sulfoproteins. Curr. Opin. Chem. Biol. diseased states. J. Am. Chem. Soc. 140, 6611–6621
peptides improve nonstandard amino acid incorporation 58, 72–85 (2020). (2018).

Nature Reviews | Chemistry


Reviews

106. Ellmer, D., Brehs, M., Haj-Yahya, M., Lashuel, H. A. & 131. Ramirez, D. H. et al. Engineering a proximity- and proteins. Angew. Chem. Int. Ed. 57, 1659–1662
Becker, C. F. W. Single posttranslational modifications directed O-GlcNAc transferase for selective protein (2018).
in the central repeat domains of Tau4 impact its O-GlcNAcylation in cells. ACS Chem. Biol. 15, 157. Kistemaker, H. A. et al. Synthesis and macrodomain
aggregation and tubulin binding. Angew. Chem. 1059–1066 (2020). binding of mono-ADP-ribosylated peptides.
Int. Ed. 58, 1616–1620 (2019). 132. Yang, Q. et al. Glycan remodeling of human Angew. Chem. Int. Ed. 55, 10634–10638 (2016).
107. Chu, N. et al. Akt kinase activation mechanisms erythropoietin (EPO) through combined mammalian cell 158. Mylona, A. et al. Opposing effects of Elk-1 multisite
revealed using protein semisynthesis. Cell 174, engineering and chemoenzymatic transglycosylation. phosphorylation shape its response to ERK activation.
897–907.e14 (2018). ACS Chem. Biol. 12, 1665–1673 (2017). Science 354, 233–237 (2016).
108. Shah, N. H., Eryilmaz, E., Cowburn, D. & Muir, T. W. 133. Tang, F. et al. Selective N-glycan editing on living 159. Theillet, F. X. et al. Site-specific NMR mapping and
Naturally split inteins assemble through a “capture cell surfaces to probe glycoconjugate function. time-resolved monitoring of serine and threonine
and collapse” mechanism. J. Am. Chem. Soc. 135, Nat. Chem. Biol. 16, 766–775 (2020). phosphorylation in reconstituted kinase reactions and
18673–18681 (2013). 134. Schmidt, M., Toplak, A., Quaedflieg, P. J. & Nuijens, T. mammalian cell extracts. Nat. Protoc. 8, 1416–1432
109. Muona, M., Aranko, A. S., Raulinaitis, V. & Iwai, H. Enzyme-mediated ligation technologies for peptides (2013).
Segmental isotopic labeling of multi-domain and and proteins. Curr. Opin. Chem. Biol. 38, 1–7 160. Köhn, M. Turn and face the strange: a new view
fusion proteins by protein trans-splicing in vivo and (2017). on phosphatases. ACS Cent. Sci. 6, 467–477
in vitro. Nat. Protoc. 5, 574–587 (2010). 135. Henager, S. H. et al. Enzyme-catalyzed expressed (2020).
110. Wood, D. W. & Camarero, J. A. Intein applications: protein ligation. Nat. Methods. 13, 925–927 161. Spinck, M., Neumann-Staubitz, P., Ecke, M.,
from protein purification and labeling to metabolic (2016). Gasper, R. & Neumann, H. Evolved, selective erasers
control methods. J. Biol. Chem. 289, 14512–14519 136. Henager, S. H., Henriquez, S., Dempsey, D. R. & of distinct lysine acylations. Angew. Chem. Int. Ed. 59,
(2014). Cole, P. A. Analysis of site-specific phosphorylation 11142–11149 (2020).
111. Liu, D. & Cowburn, D. Segmental isotopic labeling of of PTEN by using enzyme-catalyzed expressed protein 162. Li, J. & Chen, P. R. Development and application of
proteins for NMR study using intein technology. ligation. ChemBioChem 21, 64–68 (2020). bond cleavage reactions in bioorthogonal chemistry.
Methods Mol. Biol. 1495, 131–145 (2017). 137. Thompson, R. E., Stevens, A. J. & Muir, T. W. Nat. Chem. Biol. 12, 129–137 (2016).
112. Di Ventura, B. & Mootz, H. D. Switchable inteins Protein engineering through tandem transamidation. 163. Bah, A. & Forman-Kay, J. D. Modulation of intrinsically
for conditional protein splicing. Biol. Chem. 400, Nat. Chem. 11, 737–743 (2019). disordered protein function by post-translational
467–475 (2019). 138. Fottner, M. et al. Site-specific ubiquitylation and modifications. J. Biol. Chem. 291, 6696–6705
113. Stevens, A. J. et al. A promiscuous split intein with SUMOylation using genetic-code expansion and (2016).
expanded protein engineering applications. Proc. Natl sortase. Nat. Chem. Biol. 15, 276–284 (2019). 164. Theillet, F. X. et al. Cell signaling, post-translational
Acad. Sci. USA 114, 8538–8543 (2017). 139. Chen, Z. & Cole, P. A. Synthetic approaches to protein modifications and NMR spectroscopy.
114. Burton, A. J. et al. In situ chromatin interactomics protein phosphorylation. Curr. Opin. Chem. Biol. 28, J. Biomol. NMR 54, 217–236 (2012).
using a chemical bait and trap approach. Nat. Chem. 115–122 (2015). 165. Carroll, E. C., Greene, E. R., Martin, A. & Marqusee, S.
12, 520–527 (2020). 140. Pedersen, S. W. et al. Site-specific phosphorylation of Site-specific ubiquitination affects protein energetics
115. Shiraishi, Y. et al. Phosphorylation-induced PSD-95 PDZ domains reveals fine-tuned regulation and proteasomal degradation. Nat. Chem. Biol. 16,
conformation of β2-adrenoceptor related to arrestin of protein–protein interactions. ACS Chem. Biol. 12, 866–875 (2020).
recruitment revealed by NMR. Nat. Commun. 9, 194 2313–2323 (2017). 166. Freiburger, L. et al. Efficient segmental isotope
(2018). 141. Conibear, A. C., Rosengren, K. J., Becker, C. F. W. & labeling of multi-domain proteins using Sortase A.
116. Matveenko, M., Cichero, E., Fossa, P. & Becker, C. F. Kaehlig, H. Random coil shifts of posttranslationally J. Biomol. NMR 63, 1–8 (2015).
Impaired chaperone activity of human heat shock modified amino acids. J. Biomol. NMR 73, 587–599 167. Nitsche, C. & Otting, G. Pseudocontact shifts in
protein Hsp27 site-specifically modified with (2019). biomolecular NMR using paramagnetic metal tags.
argpyrimidine. Angew. Chem. Int. Ed. 55, 142. Krall, N., da Cruz, F. P., Boutureira, O. & Bernardes, G. J. Prog. Nucl. Magn. Res. Spectrosc. 98–99, 20–49
11397–11402 (2016). Site-selective protein-modification chemistry for (2017).
117. Jacobsen, M. T., Erickson, P. W. & Kay, M. S. Aligator: basic biology and drug development. Nat. Chem. 8, 168. Hendriks, I. A. et al. Site-specific mapping of the
A computational tool for optimizing total chemical 103–113 (2016). human SUMO proteome reveals co-modification with
synthesis of large proteins. Bioorg. Med. Chem. 25, 143. Yates, L. M. & Fiedler, D. A stable pyrophosphoserine phosphorylation. Nat. Struct. Mol. Biol. 24, 325–336
4946–4952 (2017). analog for incorporation into peptides and proteins. (2017).
118. Liszczak, G. P. et al. Genomic targeting of epigenetic ACS Chem. Biol. 11, 1066–1073 (2016). 169. Sager, R. A. et al. Post-translational regulation
probes using a chemically tailored Cas9 system. 144. Kee, J. M., Villani, B., Carpenter, L. R. & Muir, T. W. of FNIP1 creates a rheostat for the molecular
Proc. Natl Acad. Sci. USA 114, 681–686 (2017). Development of stable phosphohistidine analogues. chaperone Hsp90. Cell Rep. 26, 1344–1356.e5
119. Gramespacher, J. A., Burton, A. J., Guerra, L. F. & J. Am. Chem. Soc. 132, 14327–14329 (2010). (2019).
Muir, T. W. Proximity induced splicing utilizing caged 145. Chalker, J. M., Bernardes, G. J., Lin, Y. A. & 170. Lechner, C. C., Agashe, N. D. & Fierz, B. Traceless
split inteins. J. Am. Chem. Soc. 141, 13708–13712 Davis, B. G. Chemical modification of proteins at synthesis of asymmetrically modified bivalent
(2019). cysteine: opportunities in chemistry and biology. nucleosomes. Angew. Chem. Int. Ed. 55, 2903–2906
120. Bhagawati, M. et al. In cellulo protein semi-synthesis Chem. Asian J. 4, 630–640 (2009). (2016).
from endogenous and exogenous fragments using the 146. Lakbub, J. C., Shipman, J. T. & Desaire, H. 171. Liokatis, S., Klingberg, R., Tan, S. & Schwarzer, D.
ultra-fast split Gp41-1 intein. Angew. Chem. Int. Ed. Recent mass spectrometry-based techniques and Differentially isotope-labeled nucleosomes to study
https://doi.org/10.1002/anie.202006822 (2020). considerations for disulfide bond characterization asymmetric histone modification crosstalk by
121. Stewart, M. P. et al. In vitro and ex vivo strategies for in proteins. Anal. Bioanal. Chem. 410, 2467–2484 time-resolved NMR spectroscopy. Angew. Chem. Int.
intracellular delivery. Nature 538, 183–192 (2016). (2018). Ed. 55, 8262–8265 (2016).
122. Bruce, V. J. & McNaughton, B. R. Inside job: methods 147. Macmillan, D., Bill, R. M., Sage, K. A., Fern, D. & 172. Aebersold, R. & Mann, M. Mass-spectrometric
for delivering proteins to the interior of mammalian Flitsch, S. L. Selective in vitro glycosylation of exploration of proteome structure and function.
cells. Cell Chem. Biol. 24, 924–934 (2017). recombinant proteins: semi-synthesis of novel Nature 537, 347–355 (2016).
123. David, Y., Vila-Perello, M., Verma, S. & Muir, T. W. homogeneous glycoforms of human erythropoietin. 173. Jiang, H. et al. Protein lipidation: occurrence,
Chemical tagging and customizing of cellular Chem. Biol. 8, 133–145 (2001). mechanisms, biological functions, and enabling
chromatin states using ultrafast trans-splicing inteins. 148. Bhat, S. et al. Hydrazide mimics for protein lysine technologies. Chem. Rev. 118, 919–988 (2018).
Nat. Chem. 7, 394–402 (2015). acylation to assess nucleosome dynamics and 174. Heal, W. P. & Tate, E. W. Getting a chemical
124. Zhang, Y., Park, K. Y., Suazo, K. F. & Distefano, M. D. deubiquitinase action. J. Am. Chem. Soc. 140, handle on protein post-translational modification.
Recent progress in enzymatic protein labelling 9478–9485 (2018). Org. Biomol. Chem. 8, 731–738 (2010).
techniques and their applications. Chem. Soc. Rev. 47, 149. Hossain, M. A. et al. Total chemical synthesis of a
9106–9136 (2018). nonfibrillating human glycoinsulin. J. Am. Chem. Soc. Acknowledgements
125. Choi, J. et al. Engineering orthogonal polypeptide 142, 1164–1169 (2020). A.C.C. is supported by a UQ Development Fellowship (project
GalNAc-transferase and UDP-sugar pairs. J. Am. 150. Wang, H., Farnung, L., Dienemann, C. & Cramer, P. 613982) and an Early Career Researcher Grant (project
Chem. Soc. 141, 13442–13453 (2019). Structure of H3K36-methylated nucleosome–PWWP 616535) from the University of Queensland. J. Rosengren
126. Islam, K. The bump-and-hole tactic: expanding the complex reveals multivalent cross-gyre binding. and O. Gajsek are gratefully acknowledged for helpful discus-
scope of chemical genetics. Cell Chem. Biol. 25, Nat. Struct. Mol. Biol. 27, 8–13 (2020). sions and feedback.
1171–1184 (2018). 151. Chu, G. C. et al. Cysteine-aminoethylation-assisted
127. Garre, S., Gamage, A. K., Faner, T. R., chemical ubiquitination of recombinant histones. Author contributions
Dedigama-Arachchige, P. & Pflum, M. K. H. J. Am. Chem. Soc. 141, 3654–3663 (2019). A.C.C. gathered literature, and wrote and edited the manuscript
Identification of kinases and interactors of p53 using 152. Debelouchina, G. T., Gerecht, K. & Muir, T. W. and figures.
kinase-catalyzed cross-linking and immunoprecipitation. Ubiquitin utilizes an acidic surface patch to alter
J. Am. Chem. Soc. 140, 16299–16310 (2018). chromatin structure. Nat. Chem. Biol. 13, 105–110 Competing interests
128. Mathur, S., Fletcher, A. J., Branigan, E., Hay, R. T. & (2017). The author declares no competing interests.
Virdee, S. Photocrosslinking activity-based probes 153. Bernardes, G. J. et al. From disulfide- to thioether-
for ubiquitin RING E3 ligases. Cell Chem. Biol. 27, linked glycoproteins. Angew. Chem. Int. Ed. 47, Peer review information
74–82.e6 (2020). 2244–2247 (2008). Nature Reviews Chemistry thanks Y. Kajihara, M. Pratt and
129. Tripsianes, K., Schutz, U., Emmanouilidis, L., 154. Wright, T. H. et al. Posttranslational mutagenesis: the other, anonymous, reviewer(s) for their contribution to the
Gemmecker, G. & Sattler, M. Selective isotope labeling a chemical strategy for exploring protein side-chain peer review of this work.
for NMR structure determination of proteins in diversity. Science 354, aag1465 (2016).
complex with unlabeled ligands. J. Biomol. NMR 73, 155. Yang, A. et al. A chemical biology route to site- Publisher’s note
183–189 (2019). specific authentic protein modifications. Science 354, Springer Nature remains neutral with regard to jurisdictional
130. Li, C. & Wang, L. X. Chemoenzymatic methods for 623–626 (2016). claims in published maps and institutional affiliations.
the synthesis of glycoproteins. Chem. Rev. 118, 156. Liu, Q. et al. A general approach towards triazole-
8359–8413 (2018). linked adenosine diphosphate ribosylated peptides © Springer Nature Limited 2020

www.nature.com/natrevchem

You might also like