Broad Specificity Profiling of Talens Results in Engineered Nucleases With Improved Dna-Cleavage Specificity
Broad Specificity Profiling of Talens Results in Engineered Nucleases With Improved Dna-Cleavage Specificity
Broad Specificity Profiling of Talens Results in Engineered Nucleases With Improved Dna-Cleavage Specificity
npg
John P Guilinger1,2, Vikram Pattanayak1,2, Deepak Reyon3,4, Shengdar Q Tsai3,4, Jeffry D Sander3,4,
J Keith Joung3,4 & David R Liu1,2
Although transcription activatorlike effector nucleases
(TALENs) can be designed to cleave chosen DNA sequences,
TALENs have activity against related off-target sequences.
To better understand TALEN specificity, we profiled 30 unique
TALENs with different target sites, array length and domain
sequences for their abilities to cleave any of 1012 potential
off-target DNA sequences using in vitro selection and highthroughput sequencing. Computational analysis of the selection
results predicted 76 off-target substrates in the human genome,
16 of which were accessible and modified by TALENs in human
cells. The results suggest that (i) TALE repeats bind DNA
relatively independently; (ii) longer TALENs are more tolerant
of mismatches yet are more specific in a genomic context; and
(iii) excessive DNA-binding energy can lead to reduced TALEN
specificity in cells. Based on these findings, we engineered a
TALEN variant that exhibits equal on-target cleavage activity but
tenfold lower average off-target activity in human cells.
1Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, USA. 2Howard Hughes Medical Institute, Harvard University,
Cambridge, Massachusetts, USA. 3Molecular Pathology Unit, Center for Cancer Research, and Center for Computational and Integrative Biology, Massachusetts General
Hospital, Charlestown, Massachusetts, USA. 4Department of Pathology, Harvard Medical School, Boston, Massachusetts, USA. Correspondence should be addressed to
D.R.L. (drliu@fas.harvard.edu).
Received 25 September 2013; accepted 11 January 2014; published online 16 february 2014; doi:10.1038/nmeth.2845
Articles
a
RVD
RVD code
N-terminal
domain
TALE repeats
5
3
C-terminal domain
Canonical
Q3
Q7
28-aa
DNA library of
~1012 target sites
TALEN
expression
plasmid
In vitrocoupled
transcription and
translation (2 h)
L S R
L S R
L
L
S R
L S R
L S R
S R
L S R
L S R
S R
L S R
L S R
L S R
L S R
L S R
L S R
L S R
R
S
Rolling-circle
amplification
(overnight)
L
L S R
L S R
Circularization
(overnight)
L
npg
L S R
L S R
(i) High-throughput
sequencing (overnight)
(ii) Data analysis (1 d)
S R
S R
L S R
Gel purification
S
Specificity profile
S R
L S R
Figure 1 | TALEN architecture and selection scheme. (a) Schematic of a TALEN. The 12th and 13th amino acids (the RVD) of each TALE repeat recognize
a specific DNA base pair. Two different TALENs bind their corresponding half-sites, which allows FokI dimerization and DNA cleavage. The C-terminal
domain variants used in this study are shown at the bottom. (b) Outline of the selection. Single-stranded library of DNA oligonucleotides contain
partially randomized left half-site (L), spacer (S), right half-site (R) and constant region (thick black line). Double arrows represent concatameric repeats
of a DNA target site variants. (c) Selection of TALEN-cleaved library members by adaptor ligation and high-throughput DNA sequencing.
are highly specific for their intended target base pair at 103 of
the 104 positions profiled, with specificity increasing near the
N-terminal TALEN end of each TALE repeat array (corresponding to the 5 end of the bound DNA); (ii) longer TALENs are
more specific in a genomic context, whereas shorter TALENs
have higher specificity per nucleotide; (iii) TALE repeats each
bind their respective base pairs relatively independently; and (iv)
excess DNA-binding affinity leads to increased TALEN activity
against off-target sites and therefore decreased specificity.
RESULTS
Specificity profiling of CCR5- and ATM-targeted TALENs
We profiled the specificities of 30 unique heterodimeric TALEN
pairs (hereafter referred to as TALENs) harboring different
C-terminal, N-terminal and FokI domain variants, and targeted to half-sites of various lengths. The number of base pairs
recognized by each half-site that we list in this paper includes
the 5 thymine (T) recognized by the N-terminal domain. Most
of the TALENs we tested were obligate heterodimers with FokI
Q105E,I118L in one TALEN and FokI E109K,I157K for the
Articles
a
25
20
35
CCR5A preselection
library
CCR5A TALEN
digestion
15
10
30
Sequences (%)
30
Sequences (%)
b
35
25
20
ATM preselection
library
ATM TALEN
digestion
15
10
0
0
10
5
Total number of mutations in half-sites
0
5
10
Total number of mutations in half-sites
other (EL/KK); we also used a more active heterodimeric variant with FokI Q105E,I118L,N115D in one TALEN and FokI
E109K,I157K,H156R in the other (ELD/KKR), and homodimeric
FokI nuclease domains, as specified below3,25.
We designed TALENs as previously reported12 to target one
of three distinct sequences, CCR5A, CCR5B or ATM, in two
different human genes, CCR5 and ATM (Supplementary Fig. 1).
We determined the specificity profiles using a previously described
in vitro selection method22,24. Briefly, we digested preselection
libraries of > 1012 DNA sequences, each theoretically containing at
least 10 copies of all possible DNA sequences with six or fewer mutations relative to the on-target sequence, with 3 nM to 40 nM of an
in vitrotranslated TALEN (Online Methods, Supplementary
Table 1, Supplementary Fig. 2 and Supplementary Results).
Cleaved library members harbored a free 5 monophosphate that
enabled them to be captured by adaptor ligation (Fig. 1b,c). We
isolated DNA fragments of length corresponding to 1.5 target
sites (an intact target site and a repeated half-site up to the
point of TALEN-induced DNA cleavage) by gel purification.
High-throughput sequencing and computational analysis of
TALEN-treated or control samples that survived this selection
revealed the abundance of all TALEN-cleaved sequences as
well as the abundance of the corresponding sequences before
selection (Supplementary Notes). In the control sample, all
members of the preselection library were cleaved by a restriction
endonuclease at a constant sequence to enable their capture by
adaptor ligation and isolation by gel purification. We calculated
the enrichment value for each library member that survived selection by dividing the abundance of its sequence after selection by
that before selection.
For all TALEN variants and under all tested conditions, the
DNA that survived the selection contained significantly fewer
mean mutations in the targeted half-sites than were present in
the preselection libraries (P < 107, Fishers exact test; Fig. 2a,b
and Supplementary Tables 2 and 3). For all selections, the
on-target sequences were enriched 8-fold to 640-fold with
e
Cleavage efficiency of discrete
sequence (relative to on-target)
Figure 2 | In vitro selection results. (a,b) Sequences surviving selection (TALEN digestion) compared
to preselection libraries for CCR5A TALENs (a) and ATM TALENs (b) with EL/KK FokI domains as a
function of the number of mutations in both half-sites (left and right half-sites combined excluding
the spacer). Each selection was performed once with more than 28,900 sequences analyzed per
selection. (c,d) Specificity scores for the CCR5A TALENs (c) and ATM TALENs (d) at all positions in
the target half-sites plus a single flanking position. A score of 0 indicates no specificity. Boxed data
represent the intended target base. For the right half-site, the R18 TALENs, data for the sense strand
are shown. (e) Correspondence between discrete in vitro TALEN cleavage efficiency (cleaved DNA as a
fraction of total DNA) for mutant sequences normalized to on-target sequence cleavage (value of 1)
versus their enrichment values in the selection normalized to the enrichment values of the on-target
sequence (value of 1) listed in Supplementary Figure 3. The Pearsons r = 0.90 between normalized
cleavage efficiency and normalized enrichment value.
1.0
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1.0
Right half-site
Left half-site
npg
Specificity
score
1.2
1.0
0.8
0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8
1.0
Enrichment value (relative to on-target)
1.2
Articles
predicted off-target genomic sites
Site
OnCCR5A
OffC-5
OffC-15
OffC-16
OffC-28
OffC-36
OffC-38
OffC-49
OffC-69
OffC-76
npg
Site
OnATM
OffA-1
OffA-11
OffA-13
OffA-16
OffA-17
OffA-23
OffA-35
No TALEN (%)
<0.006
<0.006
<0.020
<0.006
<0.009
<0.006
<0.006
<0.006
<0.010
<0.006
No TALEN (%)
0.007
<0.006
<0.006
<0.006
<0.006
<0.051
0.018
<0.006
28
2.3
0.23
0.031
0.16
0.15
ND
ND
ND
ND
47
2.3
0.043
<0.006
0.056
0.028
0.067
0.110
0.089
0.149
ATM ELD/KKR
ATM
FokI (%)
Homo FokI (%)
16
0.026
0.036
0.025
<0.006
<0.17
0.29
<0.006
18
0.077
0.39
<0.006
0.057
0.94
0.23
0.070
For cells treated with either no TALEN or CCR5A TALENs containing heterodimeric EL/KK,
heterodimeric ELD/KKR or the homodimeric (Homo) FokI cleavage domain variants, cellular
modification rates are shown as the percentage of observed insertions or deletions (indels)
consistent with TALEN cleavage relative to the total number of sequences for on-target (On)
and predicted off-target sites (Off). ND, no data were collected. Same as above for ATM
TALENs. Sample sizes and P values are given in Supplementary Tables 7 and 8.
purely computational approach for the identification of TALENinduced off-target substrates in cells (Supplementary Tables 7
and 8, Supplementary Fig. 11, and Supplementary Results).
Repeat-binding independence and effects of TALEN length
The extensive number of quantitatively characterized off-target
substrates in the selection data enabled us to address several
key questions about TALEN specificity. First, we assessed
whether mutations at one position in the target sequence
affect the ability of TALEN repeats to productively bind other
positions. We found that TALE repeats bound their respective
DNA base pairs independently, beyond a slightly increased
tolerance for adjacent mismatches (Supplementary Fig. 12 and
Supplementary Results).
The independent binding of TALE repeats simplistically
predicts that TALEN specificity per base pair is independent of
target-site length. To experimentally characterize the relationship between TALE array length and off-target cleavage, we constructed TALENs targeting 10 bp, 13 bp and 16 bp (including
the 5 T) for both the left (L10, L13 and L16) and right (R10,
R13 and R16) half-sites. We subjected TALENs representing all
nine possible combinations of left and right CCR5B TALENs
to in vitro selection. The results revealed that shorter TALENs
had greater specificity per targeted base pair than longer
TALENs (Supplementary Table 2). For example, sequences
cleaved by the L10 + R10 TALEN contained a mean of 0.032
mutations per recognized base pair, whereas those cleaved by
the L16 + R16 TALEN contained a mean of 0.067 mutations per
recognized base pair.
For selections with the longest CCR5B TALENs targeting 16
bp plus 16 bp or CCR5A and ATM TALENs targeting 18 bp plus
18 bp, the mean selection enrichment values did not follow a simple
exponential decrease as function of mutation number (Fig. 3 and
Supplementary Table 9). It is possible these TALENs had greater
affinity than is required to substantially bind and cleave the target
site (referred to hereafter as excess DNA-binding energy). Thus,
we hypothesize that excess DNA-binding energy from the larger
number of TALE repeats in longer TALENs reduces specificity by
102
Enrichment value
(post selection abundance pre selection abundance)
101
100
L16 + R16
L16 + R13
101
L13 + R16
L16 + R10
L13 + R13
L10 + R16
102
L13 + R10
L10 + R13
L10 + R10
103
1
2
3
4
5
6
Total number of mutations in left and right half-sites
Articles
80
70
60
50
40
30
20
10
0
CCR5A TALENs
On-target site
enrichment value
On-target site
enrichment value
Canonical
Q3
Q7
28-aa
600
ATM TALENs
DNA
500
Left half-site
d
Right half-site
DNA
Left half-site
Right half-site
400
300
200
100
0
Canonical
Q3
Q7
1.0
1.0
npg
enabling the cleavage of sequences with more mutations, without a corresponding increase in the cleavage of sequences with
fewer mutations, because the latter are already nearly completely
cleaved. Indeed, the in vitro cleavage efficiencies of discrete DNA
sequences for these longer TALENs were independent of the presence of a small number of mutations in the target site (Fig. 4cf),
suggesting there was nearly complete binding and cleavage of
sequences containing few mutations. Likewise, higher TALEN
concentrations also resulted in decreased enrichment values of
sequences with few mutations and increased enrichment values
of sequences with many mutations (Supplementary Table 4).
These results together support a model in which excessive TALEN
binding arising from either long TALE arrays or high TALEN
concentrations decreases the observed TALEN DNA-cleavage
specificity for each recognized base pair. Despite the fact that
TALENs designed to cleave longer target sites are less specific per
base pair, this model predicts that such TALENs have higher overall specificity than those that target shorter sites, when considering the number of potential off-target sites in the human genome
(Supplementary Fig. 13 and Supplementary Results).
Engineering TALENs with improved specificity
The findings above suggest that TALEN specificity could be
improved by reducing non-specific DNA binding energy to
only support efficient on-target cleavage. We hypothesized that
reducing the cationic charge of the canonical 63-amino-acid
(aa) TALE C-terminal domain, which contains ten cationic residues4,5,7,9,10,12, would decrease nonspecific DNA binding32 and
improve the specificity of TALENs.
We constructed two variants in which we changed three (K788Q,
R792Q and R801Q; named Q3) or seven (K777Q, K778Q, K788Q,
R789Q, R792Q, R793Q and R801Q; named Q7) cationic arginine
or lysine residues in the canonical 63-aa C-terminal domain to
glutamine (Fig. 1a). We performed in vitro selections on CCR5A
and ATM TALENs containing the canonical C-terminal domain,
the engineered Q3 domain and the engineered Q7 domain as
well as a previously reported 28-aatruncated C-terminal
domain5 with a theoretical net charge (1) identical to that
Cleavage efficiency
Cleavage efficiency
Canonical
Canonical
Figure 4 | In vitro selection specificity and discrete
0.9
0.9
Q7
Q7
0.8
0.8
cleavage efficiencies of TALENs containing canonical or
0.7
0.7
engineered C-terminal domains. (a,b) On-target enrichment
0.6
0.6
values for selections of CCR5A TALENs (a) and ATM TALENs (b)
0.5
0.5
0.4
0.4
containing indicated domains with EL/KK FokI cleavage
0.3
0.3
domains (Supplementary Table 4a,b). 28-aa indicates
0.2
0.2
28-aatruncated variant. Each selection was performed
0.1
0.1
0
0
once with more than 4,622 sequences analyzed per
OnA A1 A2 A3 A4 A5 A6 A7 A8
OnC C1 C2 C3 C4 C5 C6 C7 C8
selection. (c) CCR5A on-target sequence (OnC) and
double-mutant sequences with mutations highlighted in red. (d) ATM on-target sequence (OnA), single-mutant sequences and double-mutant sequences
with mutations highlighted in red. (e,f) Discrete in vitro cleavage efficiency of DNA sequences listed in c with CCR5A TALENs (e) and of DNA sequences
listed in d with ATM TALENs (f) containing indicated domains with EL/KK FokI domains. Error bars, s.d. from three biological replicates. Average is shown
for C4 from two replicates. See Supplementary Results for P values.
Articles
npg
On-target site
Off-target site 5
On:off-target activity:
19
176 >8,760
12 284 >1,450
19 635 >576
100.00
Cellular modification
efficiency (%)
10.00
1.00
0.10
0.01
C-terminal domain:
Can. Q3 Q7
FokI domain:
Homo
Can. Q3 Q7
ELD/KKR
Can. Q3 Q7
EL/KK
Articles
of cellular complications enabled the elucidation of the inherent DNA-cleavage specificity of TALENs. The small number of
genomic sequences that are closely related to a target sequence
also intrinsically limits studies of cellular off-target cleavage. In
contrast, we evaluated each active, dimeric TALEN in this study
for its ability to cleave any of 1012 close variants of its on-target
sequence, a library size several orders of magnitude greater than
the number of different sequences in a mammalian genome. This
dense coverage of off-target sequence space enabled the elucidation of detailed relationships between DNA-cleavage specificity
and target base pair position, TALE repeat length, TALEN concentration, mismatch location and TALEN domain composition.
These results collectively reveal principles for characterizing and
improving TALENs with greater specificity that may enable a
wider range of genome-engineering applications.
npg
Methods
Methods and any associated references are available in the online
version of the paper.
Accession codes. Sequence Read Archive: SRP035232.
Note: Any Supplementary Information and Source Data files are available in the
online version of the paper.
Acknowledgments
J.P.G., V.P. and D.R.L. were supported by Defense Advanced Research Projects
Agency HR0011-11-2-0003 and N66001-12-C-4207, US National Institutes of
Health (NIH) NIGMS R01 GM095501 (D.R.L.), and the Howard Hughes Medical
Institute (HHMI). D.R.L. was supported as a HHMI Investigator. V.P. was
supported by award T32GM007753 from US National Institute of General
Medical Sciences. D.R., S.Q.T., J.D.S. and J.K.J. were supported by a NIH Director
Pioneer Award (DP1 GM105378). J.K.J. was supported by the Jim and Ann
Orr Massachusetts General Hospital Research Scholar Award. We thank M.L.
Maeder for preforming transfections and isolating genomic DNA, and C. Khayter
and M. Goodwin for technical assistance.
AUTHOR CONTRIBUTIONS
J.P.G., V.P., D.R., J.D.S. and S.Q.T. performed the experiments, designed
the research, analyzed the data and wrote the manuscript. J.K.J. and D.R.L.
designed the research, analyzed the data and wrote the manuscript.
COMPETING FINANCIAL INTERESTS
The authors declare competing financial interests: details are available in the
online version of the paper.
Reprints and permissions information is available online at http://www.nature.
com/reprints/index.html.
1. Moscou, M.J. & Bogdanove, A.J. A simple cipher governs DNA recognition
by TAL effectors. Science 326, 1501 (2009).
2. Boch, J. et al. Breaking the code of DNA binding specificity of TAL-type
III effectors. Science 326, 15091512 (2009).
3. Doyon, Y. et al. Enhancing zinc-finger-nuclease activity with improved
obligate heterodimeric architectures. Nat. Methods 8, 7479 (2011).
4. Cade, L. et al. Highly efficient generation of heritable zebrafish gene
mutations using homo- and heterodimeric TALENs. Nucleic Acids Res.
40, 80018010 (2012).
5. Miller, J.C. et al. A TALE nuclease architecture for efficient genome
editing. Nat. Biotechnol. 29, 143148 (2011).
6. Bedell, V.M. et al. In vivo genome editing using a high-efficiency TALEN
system. Nature 491, 114118 (2012).
7. Hockemeyer, D. et al. Genetic engineering of human pluripotent cells
using TALE nucleases. Nat. Biotechnol. 29, 731734 (2011).
npg
ONLINE METHODS
Oligonucleotides, PCR and DNA purification. All oligonucleotides were purchased from Integrated DNA Technologies
(IDT). Oligonucleotide sequences are listed in Supplementary
Notes. PCR was performed with 0.4 l of 2 U/l Phusion Hot
Start II DNA polymerase (Thermo-Fisher) in 50 l with 1
HF buffer, 0.2 mM dNTP mix (0.2 mM dATP, 0.2 mM dCTP,
0.2 mM dGTP and 0.2 mM dTTP) (NEB), 0.5 M to 1 M of
each primer and a program of 98 C, 1 min; and 35 cycles of
(98 C, 15 s; 62 C, 15 s; 72 C, 1 min) unless otherwise noted.
Many DNA reactions were purified with a QIAquick PCR
Purification Kit (Qiagen), referred to below as Q-column
purification, or MinElute PCR Purification Kit (Qiagen) referred
to below as M-column purification.
TALEN construction. The canonical TALEN plasmids were
constructed by the fast ligation-based automatable solid-phase
high-throughput (FLASH) method12 with each TALEN targeting
1018 bp. Sequences encoding proteins with substitutions in
the N termini were cloned by PCR with Q5 Hot Start Master
Mix (NEB) (98 C, 22 s; 62 C, 15 s; 72 C, 7 min)) using phosphorylated TAL-N1fwd (for N1), phosphorylated TAL-N2fwd
(for N2), or phosphorylated TAL-N3fwd (for N3) and phos
phorylated TAL-Nrev as primers. 1 l DpnI (NEB) was added, and the
reaction was incubated at 37 C for 30 min and then subjected
to M-column purification. ~25 ng of eluted DNA was blunt-end
ligated intramolecularly in 10 l 2 Quick Ligase buffer, 1 l of
Quick Ligase (NEB) in a total volume of 20 l at room temperature
(~21 C) for 15 min. 1 l of this ligation reaction was transformed
into Top10 chemically competent cells (Invitrogen). Sequences
encoding proteins with C-terminal domain substitutions were
cloned by PCR using TAL-Cifwd and TAL-Cirev primers, and
then Q-columnpurified. ~1 ng of this eluted DNA was used as
the template for PCR with TAL-Cifwd and either TAL-Q3 (for Q3)
or TAL-Q7 (for Q7) for primers and then Q-columnpurified.
~1 ng of this eluted DNA was used as the template for PCR with
TAL-Cifwd and TAL-Ciirev for primers, and then Q-column
purified. ~1 g of this DNA fragment was digested with HpaI
and BamHI in 1 NEBuffer 4 and cloned22 into ~2 g of desired
TALEN plasmid pre-digested with HpaI and BamHI. TALENs
containing the N1, N2 and N3 N-terminal variant domains
and TALENs containing the canonical, Q3 and Q7 C-terminal
domains are available from Addgene (5143851449). Protein
sequences are listed in Supplementary Notes.
In vitro TALEN expression. TALEN proteins, all containing a
3 Flag tag, were expressed by in vitro transcription-translation.
800 ng of TALEN-encoding plasmid or no plasmid (empty lysate
control) was added to an in vitro transcription-translation reaction using the TNT Quick Coupled Transcription-Translation
System, T7 Variant (Promega) in a final volume of 20 l at 30 C
for 1.5 h. Western blots were used to visualize protein using
1 l of anti-Flag M2 monoclonal antibody (Sigma-Aldrich, SKU
F3165). TALEN concentrations were calculated by comparison to
standard curve of 1 ng to 16 ng N-terminally Flag-tagged bacterial
alkaline phosphatase (Sigma-Aldrich).
In vitro selection for DNA cleavage. Preselection libraries
were prepared with 10 pmol of oligo libraries containing
nature methods
npg
doi:10.1038/nmeth.2845
nature methods