[go: up one dir, main page]

0% found this document useful (0 votes)
121 views9 pages

DP4-AI Automated NMR Data Analysis Straight From

Uploaded by

lucas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views9 pages

DP4-AI Automated NMR Data Analysis Straight From

Uploaded by

lucas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Chemical

Science
EDGE ARTICLE

DP4-AI automated NMR data analysis: straight from


Cite this: Chem. Sci., 2020, 11, 4351
spectrometer to structure†
All publication charges for this article Alexander Howarth, Kristaps Ermanis * and Jonathan M. Goodman *
have been paid for by the Royal Society
of Chemistry A robust system for automatic processing and assignment of raw 13C and 1H NMR data DP4-AI has been
developed and integrated into our computational organic molecule structure elucidation workflow.
Starting from a molecular structure with undefined stereochemistry or other structural uncertainty, this
system allows for completely automated structure elucidation. Methods for NMR peak picking using
13
objective model selection and algorithms for matching the calculated C and 1H NMR shifts to peaks in
noisy experimental NMR data were developed. DP4-AI achieved a 60-fold increase in processing speed,
and near-elimination of the need for scientist time, when rigorously evaluated using a challenging test
set of molecules. DP4-AI represents a leap forward in NMR structure elucidation and a step-change in
Received 23rd January 2020
Accepted 2nd March 2020
the functionality of DP4. It enables high-throughput analyses of databases and large sets of molecules,
which were previously impossible, and paves the way for the discovery of new structural information
DOI: 10.1039/d0sc00442a
through machine learning. This new functionality has been coupled with an intuitive GUI and is available
rsc.li/chemical-science as open-source software at https://github.com/KristapsE/DP4-AI.

candidate molecule is the correct one (assuming one of the


Introduction provided or generated structures is correct).3,4 DP4 has been
Structural elucidation of molecules is a challenging problem in successfully used to determine the stereochemistry of many
both synthetic organic and natural product chemistry. Struc- natural product like molecules, synthetic intermediates, natural
tural near isomers (for example regioisomers and protecting product fragments and also pharmaceutical compounds.5–10
group localisation) and diastereomers usually exhibit only and has been explored further in DP4+ and J-DP4 analyses.11,12
subtle differences in their 1D NMR spectra, making determi- Since its publication, the calculation of DP4 has been
nation of structure and relative stereochemistry very difficult. streamlined and user input minimized as all calculations are
This can be addressed by additional NMR experiments such as now automatically managed by the Python program PyDP4.11–13
nOeSY spectra or synthesizing isomers of the natural product Only a structure of the molecule with undened stereochem-
and comparing the resulting observed NMR spectra with those istry and assigned experimental 1D 13C and 1H NMR spectra are
published. Both approaches are very expensive and time required as inputs from the user. The most user intensive part
consuming. of relative stereochemistry elucidation using this program is
An attractive and now established alternative1,2 is to use now the assignment of the NMR spectra. This is not only very
computational NMR prediction. This process uses DFT to time consuming but also oen laborious and error prone.14
calculate NMR shis for all the diastereomers of the uncertain Automated interpretation of NMR spectra has been a major goal
structure and compare these predications with the published of analytical chemistry for many years.15 Much of this work has
spectra using parameters such as, correlation coefficient, mean been focused on developing CASE (Computer Aided Structure
absolute error (MAE), corrected mean absolute error (CMAE).3 Elucidation) soware16–18 for automated 2D structure determi-
DP4 analysis is particularly powerful as it not only predicts the nation and dereplication rather than automated assignment of
relative stereochemistry and other variations of the molecule, atoms in a known structure. Typically a number of 2D NMR
but also using Bayes theorem gives a probability that each spectra in addition to the 1D NMR spectra must be provided.19
A small number of commercial soware packages offer
expert-guided NMR assignment algorithms for 1H NMR spectra,
Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, notably Mestrelab Mnova.20 This soware has focused on aiding
Lenseld Road, Cambridge CB2 1EW, UK. E-mail: ke291@cam.ac.uk; jmg11@cam.ac.
a user to interpret NMR spectra as opposed to automated pro-
uk
† Electronic supplementary information (ESI) available: Detailed description of
cessing and assignment of raw NMR data.
the program, the full results from the program evaluation and discussion, In this work a system for fully automatic robust processing and
DP4-AI processed and assigned spectra and NMR shi calculations and assignment of both 1H and 13C NMR spectra is presented (Fig. 1). A
experimental values. All DFT and MM calculations, as well as the raw NMR data schematic of this program is displayed in Fig. 2. It provides
at https://doi.org/10.17863/CAM.47721. See DOI: 10.1039/d0sc00442a

This journal is © The Royal Society of Chemistry 2020 Chem. Sci., 2020, 11, 4351–4359 | 4351
Chemical Science Edge Article

Fig. 1 (a) The structure of DP4-AI. This system affords fully automated stereochemistry elucidation, only the raw NMR data is a required input
from the user. (b) Example structures with stereochemistry correctly predicted fully automatically using DP4-AI integrated in PyDP4.

automated relative stereochemistry and structural ambiguity The automation DP4-AI affords is exciting as it will allow
prediction using 1D 1H and 13C NMR spectra. Chemical shi values high-throughput DP4 analysis of databases and large sets of
are calculated using the DFT GIAO method. Shi prediction by this molecules, which was previously impossible. In addition,
method can also be performed using only free and open source automatic processing and assignment of NMR spectra will
soware such as NWChem21 and Tinker22 making this method reduce the time constraints of synthesis, allowing for more
more accessible than any other soware currently available. opportunities in chemical discovery. Moreover, this system will

Fig. 2 The overall structure of DP4-AI. Raw NMR data is processed in a series of stages to yield experimental multiplet shift values and their
integrals. The program then takes shifts calculated using DFT for each atom in the molecule and assigns them to the experimental peaks. This
assignment is then used to calculate a DP4 probability for each diastereomer.

4352 | Chem. Sci., 2020, 11, 4351–4359 This journal is © The Royal Society of Chemistry 2020
Edge Article Chemical Science

also provide a framework for the development of automated Wang et al.42 was incorporated into the nal program (ESI
interpretation of more complex NMR experiments in the near Section S2.1.4†). For 1H spectra, initially peak picking is per-
future and could be used in conjunction with CASE soware to formed using rst and second derivatives of the spectrum.
solve structural elucidation problems from analytical data. Potential peaks are found as points that are simultaneously zero
in the rst derivative and minima in the second derivative.
Computational methods These candidate peaks are picked if they are both, above an
amplitude threshold and below a second threshold in the
The calculation of DP4 was performed following previous second derivative. These threshold values are adaptive as they
works.3,11–13 All molecular mechanics calculations were per- are set to multiples of the noise standard deviation values. Peak-
formed using MacroModel (Version 9.9). All conformational picking in this manner allows both threshold values to be set
searches were performed in the gas phase utilizing the MMFF very low, screening out as much noise as possible whilst
force eld23–28 and a mixture of Low Mode following and Monte missing as few signals as possible. In addition, the use of
Carlo search algorithms.29,30 The step count for MacroModel was derivative ensures baseline independence. This process is
set so that all low energy conformers were found at least 5 times. summarized in Fig. 3.
Quantum mechanical calculations were carried out using In 1H spectra signal peaks must be grouped together to
Gaussian09. NMR shielding constants were found using the establish where the multiplet centers are located. The
GIAO method.31–33 The functional mPW1PW91 (ref. 34) was maximum coupling constant expected to be seen between
chosen with the 6-311G(d) basis set35 for NMR shi prediction protons in 1H spectra is around 18 Hz. Any peaks <18 Hz apart
as this has been shown to be optimal for DP4 calculation.12 For can be grouped together as multiplets. To avoid missing any
molecules containing iodine, the basis set def2-SVP36,37 was signal peaks, the peak picking threshold for signal to noise ratio
chosen. All DFT calculations were performed using the implicit is deliberately set very low. However, this increases the proba-
PCM solvent model.38 The molecular geometries were also bility of noise peaks being mistaken for signal peaks and can
optimized at the DFT level of theory, this was performed using cause over grouping of peaks (ESI Section S2.1.6†).
the B3LYP functional39,40 with the 6-31G(d) basis set. Finally, To mitigate this issue an algorithm for removing noise
single-point energies were separately calculated using M06-2X utilizing objective model selection was developed. Picked peaks
functional and def2-TZVP basis set. separated by less than 18 Hz are grouped together to dene
The calculations were managed by the PyDP4 Python script signal containing regions. For each region a line shape model is
written in Python 3.7 which is now part of DP4-AI. DP4-AI is constructed with multiple generalized Lorentzian line shape
available from http://www-jmg.ch.cam.ac.uk/tools/nmr/ and functions.45 The parameters in the model of each region are
GitHub https://github.com/KristapsE/DP4-AI/. Some elements of varied iteratively until the integral of the model converges to
NMR processing was performed using the package NMRglue.41

Program description
Automated NMR processing would remove the need for the user
to laboriously write an NMR description, radically increasing
the productivity of the process. In order to assign atoms in
a molecule to peaks in an NMR spectrum, peak locations and
integral values must be extracted from the raw NMR data as
shown in Fig. 2. Fully automated processing and analysis of
NMR data is a complex problem as all NMR spectra are different
and each stage in the processing can affect subsequent stages.
DP4-AI has been designed to process NMR data as robustly and
reliably as possible in spite of these challenges. An overview of
this section of the program is given below, a more detailed
description is given in the ESI (Section S2.1†).
Aer performing a Fourier transform the spectrum may
display phasing errors which must be corrected prior to further
processing. Unfortunately, none of the existing methods phased
the test set of spectra as reliably as required. To alleviate this
issue, a hybrid method combining, the signal classication
method developed by Wang et al.,42 the entropy based objective
function of the phasing algorithm ACME43 and the robust
framework of weighted linear regression approach (WLR)
Fig. 3 Figure illustrating the gradient peak picking process. Peaks are
developed by Zorin et al.44 was employed. picked if they are below a threshold in the second derivative (orange)
Many spectra also display baseline distortions which must and above an intensity threshold (blue). The final picked peaks are
be removed. A modied version of the algorithm developed by highlighted in green.

This journal is © The Royal Society of Chemistry 2020 Chem. Sci., 2020, 11, 4351–4359 | 4353
Chemical Science Edge Article

within 1% of the corresponding region of the spectrum. To a maximum value (which has been set to twice the total number
avoid overtting, the groups of parameters describing each of protons in the ambiguous structure) and calculate a score
peak are then tested for their information content. A new model based on how integer like the corresponding set of integrals are
is constructed without each line shape function in turn. If the (the integrals of the multiplets are calculated using the model
Bayesian Information Criterion (ESI Section S2.1.6†) of the spectrum as described by Schoenberger et al.45). The value of k
model is lowered by more than a threshold value, these producing the highest score is taken as the constant of pro-
parameters are assumed to describe a noise peak (as they do not portionality, and is used to normalize the integrals (ESI Section
increase the information content of the model) and are deleted. S2.1.10†). This scoring method is particularly advantageous as it
Once all of the peaks have been tested, the remaining signals accounts for deviations from integer integral values that are
are regrouped to produce the nal multiplets.46 An example of oen observed due to, the choice of shimming parameters or
this modelling process is displayed in Fig. 4. incomplete relaxation for example. Peak-picking of 13C spectra
Using this modelling process, solvent peaks and other is performed using a similar algorithm. The most intense peak
contaminants can also be selectively removed. The solvent used in the spectrum is picked and simple Lorentzian function is
is dened by the user to adjust DFT solvent model. To identify tted to it to create an initial model, this is repeated for the next
the solvent multiplet in the experimental data, each peak in the most intense peak. This process continues until all the
region of the spectrum expected to contain the solvent is given unpicked peaks fall within three times the standard deviation of
a score. This score takes into account how closely the pattern of the noise of the tted model. This algorithm has been chosen as
peak locations and amplitudes around each peak match that of it effectively discards noise peaks whilst identifying low inten-
the expected solvent multiplet and also the distance from the sity signal peaks such as quaternary carbons.
expected solvent location. The peaks that most closely match
those of the simulated solvent multiplet are removed from the
model and the spectrum is referenced (see ESI Section S2.1.9†). Assignment algorithm
Finally, the multiplets in 1H spectra must be integrated. Due
to the 100% abundance of the 1H isotope of hydrogen the The nal challenge in the development of DP4-AI is the
integrals of multiplets in the spectra are proportional to the assignment algorithm (AA) which assigns the atoms in each
number of protons in each chemical environment. If this diastereomer of the molecule to observed peaks in the spectra.
constant of proportionality can be estimated, the assignment This assignment is made using the GIAO predicted shis.
algorithm (AA) can be told explicitly how many protons can be The core of the AA calculates the assignment probability
assigned to each multiplet. matrix M. The elements of this matrix Mij give the probability of
The algorithm for estimating this constant of proportionality calculated shi i corresponding to experimental peak j. The
for 1H spectra incorporated into the program has been devel- matrix M is used to nd the most probable assignment by the
oped from previous work in this area.20 The premise of this Hungarian linear sum minimization47 method as shown in
algorithm is to iterate this constant k from the minimum Fig. 5.
possible number of protons in the spectrum (the number of The value M is calculated using a statistical model (ESI
protons in the structure minus the number of labile protons) to Section S2.2†) that takes into account the distribution of DFT
prediction errors observed for the chosen computational
conditions and, in the case of 13C NMR, also the amplitudes of
the experimental peaks.
GIAO shi predictions are subject to systematic errors that
vary depending on position within the spectrum and the
computational conditions.12,48 These systematic errors must be
corrected prior to calculation of M. Classical DP4 alleviates this
problem by performing an internal scaling process.3 It is not
possible to use this method in this program as the assignments
are unknown.
To mitigate this issue, the assignment process is performed
in three stages. In the rst round of assignment, prior to
calculation of M a linear scaling is performed using known
external scaling factors (ESI Section S2.2.1†). Aer the rst
assignment has been completed, the assigned shis and peaks
are used to calculate internal linear scaling factors in a similar
fashion to DP4. The calculated shis are then rescaled and the
assignment repeated.
In 13C the number of experimental peaks may not be equal to
Fig. 4 An example multiplet (blue) and deconvolved model (orange).
the number of carbon atoms in the molecule. The GIAO shi
The signal peaks are highlighted in cyan, the peaks determined to be predictions may also not reect the degeneracy seen in the
noise are highlighted in red. spectrum. The 13C is provided with additional exibility to

4354 | Chem. Sci., 2020, 11, 4351–4359 This journal is © The Royal Society of Chemistry 2020
Edge Article Chemical Science

distinguished by amplitude: noise, 1-atom signals and signals


corresponding to multiple equivalent carbon atoms. In order to
capture this variation the probability density function of peak
amplitudes in the spectra is estimated,49 the peaks are grouped
by which minima in the second derivative of this function their
amplitudes fall between. The amplitude weights are then calcu-
lated using the number of peaks in each group and the expected
number of carbon atoms in the structure as shown in Fig. 6.
The 13C AA is also able to bias the assignment towards
position or amplitude information (ESI Section S2.2.2†) by
considering the distribution of peak intensities and positions in
the local environment around each calculated shi. Aer the
second round of assignment, the unassigned peaks within
10 ppm of the experimental peak assigned to each calculated
shi are analyzed. The bias for calculated shi i is given by eqn
(2). Any shis with biases above a value of one are reassigned in
order of bias to unassigned experimental peaks within 10 ppm
in order of amplitude.
The role of the bias is to assess whether any signal peaks
have been missed during the initial assignment. This is
particularly useful in spectra where a large amount of noise has
been carried through, as the AA typically favors assigning close
Fig. 5 Figure illustrating how calculated shifts can be assigned to
experimental peaks using the assignment probability matrix M. (a) The noise peaks rather than more distant intense signal peaks in the
peaks in the simulated calculated spectrum (blue) are assigned to rst pass.
those in the experimental spectrum (orange). (b) The matrix M is In contrast the 1H AA does not require amplitude weighting,
calculated and the optimum assignment (cyan) calculated. (c) The final biasing or the multiple assignment penalty as this AA can be
assignment found in this example.
told explicitly how many times each peak may be assigned using
the integral information. The 1H AA also has an additional stage
assign peaks in the spectrum multiple times, using a penalty
system given by eqn (1).
 ki ti
1
Penaltyi ¼ (1)
8

The multiple assignment penalty for experimental peak i ki


depends on the amplitude KDE group peak i is in. A value of k ¼
1 is given to the group containing the most intense peaks, then k
¼ 2 to the group with the second most intense peak etc. The
value of t represents the number of times the peak has already
been assigned.
8  
0
>
> max Aunassigned
< ; if . 1
Bi ¼ Aassigned i (2)
>
>
:
1; otherwise

0
The bias for shi i is given above. Where Aunassigned is a vector
containing the amplitudes of all unassigned peaks within
10 ppm of the peak assigned to calculated shi i and
Aunassignedi is the amplitude weight of the peak assigned to
calculated shi i
Fig. 6 Peaks (left) are grouped by amplitude, depending on the
The 13C algorithm also takes into account the amplitudes of minima in the second derivative of the amplitude probability density
experimental peaks. Each element of M, Mij is multiplied by function (right) they fall between (dashed lines). In this simulated
a weight Aj derived from the amplitude of experimental peak j. example, the number of carbon atoms in the structure is nine. The
This has been incorporated to prioritise the assignment of more cumulative sum of peaks above each groups lower boundary is
calculated, the weight assigned to each group is the number of carbon
intense peaks over those more likely to be noise. The peaks in 13C
atoms in the structure divided by this value. The weights are then
spectra typically fall into three groups which can be normalized to fix the largest weight to one.

This journal is © The Royal Society of Chemistry 2020 Chem. Sci., 2020, 11, 4351–4359 | 4355
Chemical Science Edge Article

for the assignment of methyl protons. Protons in methyl groups The GUI allows the user to easily calculate DP4 probabilities,
consistently appear as equivalent in 1H NMR spectra and hence visualize the assignments made by DP4-AI and investigate the
should be assigned to the same peak. The 1H AA assigns these populated conformers and prediction errors.
protons in groups to peaks with sufficient integrals prior to the
assignment of the remaining protons.
Results
Graphical user interface In order to evaluate the performance of NMR-AI a test set of 47
molecules (with an average of 3.49 stereocentres per molecule)
DP4-AI may be run either from the command line to afford with a diverse range of carbon skeletons was constructed
a fully automated workow, or from the accompanying GUI. (Fig. 7).50–55 This test set has been designed to include natural

Fig. 7 Figure illustrating the 47 molecules utilized to evaluate the performance of DP4-AI. Molecules, AT3, TS3A, TS4 and NL1A have only have
corresponding 1H NMR data, all other molecules have both 1H and 13C NMR data. The spectra for molecules JB7, JB11, JB5 and JB8 were taken in
solvents methanol, benzene, DMSO and methanol respectively, whilst all others were taken in CDCl3. Sources for the spectral data: AT1-3,50,51
BYH1-2,52 JB1-13B,53,54 TP1-4 (personal correspondence), TS1-4 (personal correspondence), OD1 (personal correspondence).

4356 | Chem. Sci., 2020, 11, 4351–4359 This journal is © The Royal Society of Chemistry 2020
Edge Article Chemical Science

products, synthetic intermediates and natural product frag- and the pairwise AA for the highest level of theory and most
ments to represent a wide cross section of potential use cases reliable statistical model is presented in Fig. 8.
for DP4-AI. These molecules display challenging properties for
both the AA and DP4. Previous work12,13 has demonstrated that
exible structures, particularly ve-membered rings, and well- Discussion
separated stereocentres make spectral interpretation difficult.
All of these molecules are expected to present signicant chal- DP4-AI, at the highest level of theory tested, interprets spectra
lenges to DP4-AI. A dataset of smaller, rigid molecules would with a similar reliability to the traditional, labour intensive,
have been much more straightforward to analyse. The corre- pairwise AA, which requires a highly-trained chemist to pre-
sponding spectra have also been determined in a range of process the spectra (Fig. 8). This is an impressive result given
solvents, some display very low signal to noise ratio and some the challenging nature of the dataset. The probability of
contain mixtures of compounds. The use of this test set repre- correctly assigning the stereochemistry this effectively in this
sents a demanding test of the performance of DP4-AI. data set is about 3  108, indicating DP4-AI is very reliably
To predict the relative stereochemistry of a molecule in the performing better than chance (ESI Section S3†). Most impres-
current release of DP4, the user must provide an NMR sively DP4-AI correctly assigned the relative stereochemistry of
description. The minimum amount of information required in molecules NP1 and NP2 out of the 32 and 64 diastereomers. The
the NMR description is, the experimental peak locations and pairwise AA represents the upper limit of DP4-AIs performance
either a description of which atoms in the molecule are chem- in this study as the NMR descriptions used by the pairwise AA
ically equivalent or the number of times each peak can be have been meticulously written to remove any errors. In reality
assigned. With this information DP4 assigns the atoms in the errors are oen incorporated into NMR descriptions and
molecule in order of chemical shi to the peaks in the NMR
description. We call this approach “the pairwise AA” and it is
used as the benchmark for comparison with DP4-AI.
The pairwise AA was performed for all the molecules in the
test set. This was very hard work, as it required manual
analysis of all of the NMR spectra in order to break the signal
into individual peaks and their multiplicities. This is the
most time-consuming part of classical DP4, and also has the
potential for subjectivity and the introduction of errors. DP4
probabilities were calculated using three different sets of
computational conditions. The rst level of theory tested was
MM derived geometries with GIAO shi predictions utilizing
the mPW1PW91 functional, 6-311G(d) basis set (def2-SVP
was used for molecules containing iodine) and PCM solvent Fig. 8 The correct prediction rates for DP4-AI (orange) and the
model as recommend in previous work.11 DP4 calculations pairwise AA (blue) at the three levels of theory tested for the
compounds in Fig. 7 (average number of stereocentres equal to 3.49).
were also performed aer optimizing the geometries at the
These predictions were produced using the fitted 3 Gaussian cross
DFT level using the B3LYP functional prior to GIAO NMR validated statistical model.
shi predictions. The highest level of theory tested utilized
the same DFT optimized geometries, with single point ener-
gies calculated using the M06-2X functional and def2-TZVP
basis set.
DP4 also requires a statistical model describing the NMR
shi prediction error probabilities. As the prediction error
distribution is expected to change with computational condi-
tions, a different model is required for each set of conditions
used. Four different statistical models were tested (ESI Section
S3.1†), it was found that the highest performance was obtained
utilizing a single region 3 Gaussian model tted to an empirical
prediction error distribution derived from the test set. As this
statistical model was constructed using the molecules in the
test set and also used to calculate DP4 probabilities for the same
test set, a cross validation study was also completed to assess if
any overtting was occurring. This cross-validation study was
performed in a leave-one-class-out fashion for each group of
molecules denoted in Fig. 7 by their initials.
DP4-AI was tested at all three levels of theory described with Fig. 9 DP4-AI processed and assigned 1H spectrum of molecule BYH1
each statistical model (ESI Section S3†). A comparison of DP4-AI (taken in chloroform).

This journal is © The Royal Society of Chemistry 2020 Chem. Sci., 2020, 11, 4351–4359 | 4357
Chemical Science Edge Article

Conflicts of interest
There are no conicts to declare.

Acknowledgements
We thank EPSRC (A. H.), Leverhulme Trust (K. E. ECF-2017-255)
and Isaac Newton Trust (K. E. 17.08(d)) for nancial support.
This work has been performed using resources provided by the
Cambridge Tier-2 system operated by the University of Cam-
bridge Research Computing Service (http://www.hpc.cam.ac.uk)
Fig. 10 NMR-AI can process a molecule for DP4 calculation in around funded by EPSRC Tier-2 capital grant EP/P020259/1. We are very
one minute, a task that previously would require roughly 8 hours of the grateful to Prof. Ian Paterson (University of Cambridge), Prof.
users time. This corresponds to a 60 fold increase in the number of
Matthew Gaunt (University of Cambridge), Prof. Jonathan Bur-
molecules that can be processed per day.
ton (University of Oxford) and Prof. Michael Porter (University
College London) for providing NMR data used in this study (ESI
Section S3.1†).
assignments, in these cases it would be possible for NMR-AI to
outperform the pairwise AA.
The performance of DP4-AI, relative to pairwise AA, increases References
with the level of theory (Fig. 8). As in previous work13 shows that
as the level of theory is increased in the DP4 calculation, the 1 G. Barone, L. Gomez-Paloma, D. Duca, A. Silvestri, R. Riccio
correct prediction rate of the pairwise AA also increases. DP4-AI and G. Bifulco, Chem.–Eur. J., 2002, 8, 3233.
shows a greater sensitivity to the level of theory. This is because 2 G. Barone, D. Duca, A. Silvestri, L. Gomez-Paloma, R. Riccio
both the assignment and the DP4 calculation are dependent on and G. Bifulco, Chem.–Eur. J., 2002, 8, 3240.
the accuracy of the NMR shi calculations. Therefore, it can be 3 S. G. Smith and J. M. Goodman, J. Am. Chem. Soc., 2010, 132,
concluded that when using DP4-AI, the conditions that produce 12946–12959.
the most accurate shi predictions should always be used. 4 S. G. Smith and J. M. Goodman, J. Org. Chem., 2009, 74, 4597–
DP4-AIs performance could be improved even further by 4607.
robustly addressing some of the remaining challenges in the 5 K. M. Snyder, J. Sikorska, T. Ye, L. Fang, W. Su, R. G. Carter,
GIAO NMR prediction, including conformational exibility, K. L. McPhail and P. H.-Y. Cheong, Org. Biomol. Chem., 2016,
specic solvent interactions and the presence of heavy atoms. 14, 5826.
The performance may be improved further by adding explicit 6 J. K. Cooper, K. Li, J. Aubé, D. A. Coppage and
support for spectra containing mixtures of compounds (such as J. P. Konopelski, Org. Lett., 2018, 20, 4314–4317.
IP2 see ESI Section S3.2†). These issues will be addressed in 7 Y. Tang, Z.-Z. Zhao, K. Hu, T. Feng, Z.-H. Li, H.-P. Chen and
future developments of DP4-AI. An example of a spectrum J.-K. Liu, J. Org. Chem., 2019, 84, 1845–1852.
assigned by DP4-AI is given in Fig. 9 (All the processed and 8 C. I. MacGregor, B. Y. Han, J. M. Goodman and I. Paterson,
assigned spectra are provided in the ESI, Section S4†). Chem. Commun., 2016, 52, 4632–4635.
9 N. Grimblat, J. A. Gavı́n, A. Hernández Daranas and
Conclusion A. M. Sarotti, Org. Lett., 2019, 21, 4003–4007.
10 N. Grimblat, M. M. Zanardi and A. M. Sarotti, J. Org. Chem.,
DP4-AI – a robust system for automatic resolution of structural 2015, 80, 12526–12534.
uncertainty utilizing automatic processing and assignment of 11 K. Ermanis, K. E. B. Parkes, T. Agback and J. M. Goodman,
raw 13C and 1H NMR spectra has been developed and released Org. Biomol. Chem., 2016, 14, 3943–3949.
as open source soware. This automation will allow rapid DP4 12 K. Ermanis, K. E. B. Parkes, T. Agback and J. M. Goodman,
analyses of databases and large set of molecules, which was Org. Biomol. Chem., 2017, 15, 8998–9007.
previously impossible (Fig. 10). DP4-AI maintains the same high 13 K. Ermanis, K. E. B. Parkes, T. Agback and J. M. Goodman,
rate of correct structure elucidation as DP4 utilizing NMR Org. Biomol. Chem., 2019, 17, 5886–5890.
descriptions written by an expert chemist. Moreover, this 14 K. C. Nicolaou and S. A. Snyder, Angew. Chem., Int. Ed., 2005,
system can reliably process and assign an NMR spectrum 44, 1012–1044.
around 60 times faster, releasing time for experimentation and 15 M. Perez, Magn. Reson. Chem., 2017, 55, 15–21.
discovery. In addition, this new system provides a robust 16 A. V. Buevich and M. E. Elyashberg, J. Nat. Prod., 2016, 79,
framework for developing new functionality in the future such 3105–3116.
as J value analysis, 2D NMR assignment, assigning spectra of 17 J.-M. Nuzillard and B. Plainchont, Magn. Reson. Chem., 2018,
complex mixtures and aiding conformational analysis. DP4-AI 56, 458–468.
is available as open source soware at https://github.com/ 18 P. Kessler and M. Godejohann, Magn. Reson. Chem., 2018, 56,
KristapsE/DP4-AI. 480–492.

4358 | Chem. Sci., 2020, 11, 4351–4359 This journal is © The Royal Society of Chemistry 2020
Edge Article Chemical Science

19 D. C. Burns, E. P. Mazzola and W. F. Reynolds, Nat. Prod. 39 A. D. Becke, Phys. Rev. A: At., Mol., Opt. Phys., 1988, 38, 3098–
Rep., 2019, 36, 919–933. 3100.
20 C. Cobas, F. Seoane, E. Vaz, M. A. Bernstein, S. Dominguez, 40 C. Lee, W. Yang and R. G. Parr, Phys. Rev. B: Condens. Matter
M. Pérez and S. Sýkora, Magn. Reson. Chem., 2013, 51, 649– Mater. Phys., 1988, 37, 785–789.
654. 41 J. J. Helmus and C. P. Jaroniec, J. Biomol. NMR, 2013, 55, 355–
21 A. M. Torres and W. S. Price, Concepts Magn. Reson., Part A: 367.
Bridging Educ. Res., 2017, 45A, e21387. 42 K.-C. Wang, S.-Y. Wang, C. Kuo and Y. J. Tseng, Anal. Chem.,
22 L. Lagardère, L. H. Jolly, F. Lipparini, F. Aviat, B. Stamm, 2013, 85, 1231–1239.
Z. F. Jing, M. Harger, H. Torabifard, G. A. Cisneros, 43 L. Chen, Z. Weng, L. Goh and M. Garland, J. Magn. Reson.,
M. J. Schnieders, N. Gresh, Y. Maday, P. Y. Ren, 2002, 158, 164–168.
J. W. Ponder and J. P. Piquemal, Chem. Sci., 2018, 9, 956–972. 44 V. Zorin, M. A. Bernstein and C. Cobas, Magn. Reson. Chem.,
23 T. A. Halgren, J. Comput. Chem., 1996, 17, 490–519. 2017, 55, 738–746.
24 T. A. Halgren, J. Comput. Chem., 1996, 17, 520–552. 45 T. Schoenberger, S. Menges, M. A. Bernstein, M. Pérez,
25 T. A. Halgren, J. Comput. Chem., 1996, 17, 553–586. F. Seoane, S. Sýkora and C. Cobas, Anal. Chem., 2016, 88,
26 T. A. Halgren and R. B. Nachbar, J. Comput. Chem., 1996, 17, 3836–3843.
587–615. 46 T. S. Hughes, H. D. Wilson, I. M. S. de Vera and D. J. Kojetin,
27 T. A. Halgren, J. Comput. Chem., 1996, 17, 616–641. PLoS One, 2015, 10, e0134474.
28 T. A. Halgren, J. Comput. Chem., 1999, 20, 720–729. 47 H. W. Kuhn, Nav. Res. Logist. Q., 1956, 3, 253–258.
29 I. Kolossváry and W. C. Guida, J. Comput. Chem., 1999, 20, 48 G. K. Pierens, J. Comput. Chem., 2014, 35, 1388–1394.
1671–1684. 49 D. W. Scott, Multivariate Density Estimation: Theory, Practice,
30 I. Kolossváry and W. C. Guida, J. Am. Chem. Soc., 1996, 118, and Visualization, Wiley, 2015.
5011–5019. 50 K. F. Hogg, A. Trowbridge, A. Alvarez-Pérez and M. J. Gaunt,
31 F. London, J. Phys. Radium, 1937, 8, 397–409. Chem. Sci., 2017, 8, 8198–8203.
32 K. Wolinski, J. F. Hinton and P. Pulay, J. Am. Chem. Soc., 51 J. R. Cabrera-Pardo, A. Trowbridge, M. Nappi, K. Ozaki and
1990, 112, 8251–8260. M. J. Gaunt, Angew. Chem., Int. Ed., 2017, 56, 11958–11962.
33 R. Ditcheld, J. Chem. Phys., 1972, 56, 5688–5691. 52 B. Y. Han, N. Y. S. Lam, C. I. MacGregor, J. M. Goodman and
34 C. Adamo and V. Barone, J. Chem. Phys., 1998, 108, 664–675. I. Paterson, Chem. Commun., 2018, 54, 3247–3250.
35 K. B. Wiberg, J. Comput. Chem., 1986, 7, 379. 53 S. Ainsua Martinez, M. Gillard, A. C. Chany and J. W. Burton,
36 F. Weigend and R. Ahlrichs, Phys. Chem. Chem. Phys., 2005, Tetrahedron, 2018, 74, 5012–5021.
7, 3297. 54 L. B. Marx and J. W. Burton, Chem.–Eur. J., 2018, 24, 6747–
37 F. Weigend, Phys. Chem. Chem. Phys., 2006, 8, 1057. 6754.
38 E. Cancès, B. Mennucci and J. Tomasi, J. Chem. Phys., 1997, 55 N. Y. S. Lam, G. Muir, V. R. Challa, R. Britton and I. Paterson,
107, 3032–3041. Chem. Commun., 2019, 55, 9717–9720.

This journal is © The Royal Society of Chemistry 2020 Chem. Sci., 2020, 11, 4351–4359 | 4359

You might also like