Course: PG Pathshala-Biophysics
Paper 14: Bioinformatics
Module 22: Molecular Dynamics – I
Content Writer : Dr. Manoj kumar, Biophysics, AIIMS
Objectives:
1. Introduction to Molecular Dynamics (MD) simulations
2. Essential element of MD simulations - Molecular Mechanics Force Field
3. Setting up MD simulations
4. Summary
Introduction:
1.1 What is Molecular Dynamics Simulation?
Molecular Dynamics (MD) is a computational method to study the dynamic behavior of molecules
using derived physicochemical parameters with the help of Newton’s second law of motion (Force =
mass x acceleration). That means if we know the force “F” on molecule of mass “m”, then we can
calculate the acceleration “a” of the molecule. From this known acceleration “a”, velocity “v” of the
molecule can be calculated (a =dv/dt) and similarly position of the molecule can be calculated (dx/dt)
from velocity. Using the mathematical integration of this equation, motion of the molecule can be
traced. Net force on a molecule, like protein, can be calculated from forces between each atom of the
protein and that depends upon charges on each atom of the protein and their environment. Knowing
the forces and mass of each atom of a protein molecule, their overall motion can be calculated with
respect to time.
dv d 2x
F ma m m ……………… (i)
dt dt 2
1.2 Basis behind MD simulation
MD simulation is based on the concept of statistical mechanics developed by Maxwell, Boltzman &
Gibbs to study the kinetic behavior of gases (kinetic theory of gases). It helps to determine the
macroscopic properties (pressure, temperature, heat capacity) of molecules based on microscopic
calculations (small number or one molecule) on the basis of ergodic hypothesis. Ergodic hypothesis
states that the time average of thermodynamic properties (in ergodic system) corresponds to its
microcanonical ensemble average. Ensembles are microstates of a system at different time/phase
space. In analogy, conformations of the protein at different time point during MD simulation are
ensembles of bulk protein. That means evolution of MD simulation of a single molecule can be used to
determine the macroscopic properties of the bulk molecules.
1.3 How is MD simulations performed in computers?
To perform any kind of molecular simulations in computers, we need to develop (i) a computer model
or representation of the molecule (based on their physicochemical parameters of size, shape, charges,
bond and hybridization), (ii) interaction potential of each atom of the molecule and (iii) a mathematical
function/equation to calculate the forces/interactions acting on them. A molecule like a protein or a
nucleic acid is composed of several atoms of carbon, oxygen, hydrogen, nitrogen and sulphur or
phosphorous connected through covalent bonds. They can be represented by either a molecular model
(MM) based on the classical concept of chemical representation of molecule or a quantum mechanical
model (QM) based on the solution of Schrodinger wave equation. In molecular modelling (MM), each
atom is considered as a particle (of spherical shape and size depending on their van der Waals radii)
having fixed electrostatic charge (depending on the differential electronegativity with neighbouring
atom) at centre of the particle and they are connected by spring (with force constant depending on
strength of bond). The geometry of the molecule is determined by bond angles (depending on
hybridization of the atoms in the molecule) and torsion/dihedral rotation around single bond
(depending on steric parameters). The van der Waals radii, bond types, bond angles, torsions and
electrostatic charges are parameters needed to define the molecule. These parameters are associated
with interaction potential/force (F= -dU/dx) based on the chemical configuration of the molecule.
These parameters and interaction potentials define the potential energy surface (PES) of the molecule
and a mathematical equation is required to calculate the PES of the molecule. This combination of
interaction potentials and mathematical equation is known as Molecular Mechanics (MM) Force Field
(FF) or simply force field (FF). Interaction potentials are derived from chemical properties observed in
the experiments or by a precise quantum calculation. They are parameterized to reproduce the
physicochemical property of the molecules and hence force field is also known as empirical force
field. This force field is the soul of molecular modeling and simulations.
2 Molecular Mechanics (MM) Force Field (FF)
Force Field is an approximate method used for simulations of the molecules. It is based on Born-
Oppenheimer approximation which ignores the electronic motion and considers that potential energy is
a function of nuclear motion. This approximation is quite precise because mass of the atom is mainly
concentrated in its nucleus (nucleus is heavier than electrons by approx. 2 x 1836 fold). It simplifies
the calculation of potential energy of the molecule by removing the complexity of the electronic
motion. However ignoring the electrons snaps its ability to study the covalent bond breaking or
formation. FF is based on assumption that atom is a particle of fixed charge and attached to each other
through a relatively rigid springs. Strength of the spring is represented by a force constant which
depends on the strength of chemical bond.
Figure 1: Force Field Parameters
FF has two components: interaction potential associated with parameters (Figure 1) and a
mathematical equation or energy expression (Equation ii) to calculate the potential energy of the
system. Interaction potentials are classified into two types – bonded potential and non-bonded
potential. Bonded potentials are potential of the molecule arises as a result of chemical bonding and it
includes bond stretching potential, angle bending potential and torsinal potential. While non-bonded
potentials arise as a result of non-covalent interactions between atoms and it includes mainly
electrostatic potential between charged atoms and van der Waals potential between uncharged atoms.
V = Vbonded (Vbond-stretch + Vangle-bend + Vtorsion)
+ ……… (ii)
Vnon-bonded (Velectrostatic + Vvan der Waals)
2.1 Bonded Potential: Bonded potential arises from vibrational property of shared electrons between two
atoms which form covalent bond. Bond stretching, bond angle bending and rotation around single bond
contributes to bonded potential. Every chemical bond has equilibrium bond length and bonded potential is
maximum (negative) at the equilibrium length. Any change in bond length (increase or decrease) from
equilibrium value reduces (make less negative) the bond stretching potential. This follows a harmonic
pattern of variation in potential. This is analogous to the behavior of energy stored by mechanical spring
which follows Hooke’s law. Similarly bond angle bending behaves like spring and bond angle potential
also follows a harmonic pattern. Hence bond stretching potential and bond angle bending potential can be
described by Hooke’s law for spring (Figure 2). Torsional potential varies in a wave like pattern (gradual
increase and decrease in cyclic fashion) and its potential can be described as sinusoidal or cosine function.
Figure 2: Bonded Parameters
In addition to these three essential bonded terms, one more bonded term i.e “out-of-plane bending” is
included in the force fields and represented by improper torsional term (Figure 3). This helps to maintain
the planarity of the molecule and mainly useful for maintaining the correct geometry of π-
system/aromatic part of the molecule. It is also modeled similar to proper torsional term.
Figure 3: Improper Torsion
2.2 Non-bonded Potential: Non-bonded potential arises due to attraction or repulsion between two atoms
which are not covalently bonded to each other. That means non-bonded potential is calculated between
two atoms in two different molecules or two atoms in the same molecule which are minimum three bonds
apart. Potential arises between atoms within three bonds are taken care by bonded potential terms. Non-
bonded potential can be sub-divided mainly into electrostatic and van der Waals potential. Electrostatic
potential is calculated between two atoms having integral charges (due to loss or gain of electron) and/or
partial charges (due to differences in their electronegativity) using Coulomb’s law while van der Waals
potential is calculated between two neutral atoms which interact with each other due to London dispersion
force. It arises between instantaneous dipoles which arise due to continuous movement of electrons in
their orbit. It is calculated by Lennard-Jones potential or Morse potential.
Enon bonded Eelectrosta tic Evander Waals
.……(iii)
qi qk Aik
Cik
Eelectrostatic
4 rik
r 12 r 6
nonbonded
pairs
nonbonded ik ik
pairs
Commonly used molecular mechanics force field is CHARMM, AMBER, OPLS and GROMOS.
There are different versions of each of them which are either optimized for proteins, DNA or small
molecules or simply updated version of previous one.
3 Setting up MD simulation
Prime objective of MD simulation is to study the dynamic behavior of a molecule in a particular
environment (vacuum, water, organic solvent or phospholipid bilayer). Therefore it is important to
perform MD in accurate environmental condition with accurate physicochemical parameters of the
molecules. That means accuracy of simulations depends on accuracy of the force field parameters for
the model of the molecule (protein, DNA, ligands) and environmental condition (solvent medium, pH,
tempearture).
3.1 Preparation of molecule for simulation
If we want to perform MD simulation of a cellular protein, model structure of the protein
(crystallography or computational prediction) should be correct in terms of steriochemical and
physicochemical parameters. That means protein molecule should be in lowest energy conformation
and having correct bond order, ionization state, accurate bonded and non-bonded parameters like bond
length, bond angle, torsion (phi, psi and chi values) and atomic charges.
3.2 Solvation of molecule
Simulation environment affects the behavior of molecules and hence it should be close to the real
condition with respect to pH, temperature, pressure and electrostatic properties. Dielectric effect of the
surrounding environment (like water solvents) influences the electrostatic interactions between atomic
charges on the molecules to be simulated and hence this impacts the thermodynamic behavior of the
molecule. Therefore it is essential to perform MD simulation in proper medium. Biological molecules
(soluble proteins, DNA, RNA, vitamins or minerals) are almost always surrounded by water molecules
(in cytosol or extracellular fluid). Transmembrane proteins and lipids require model of the non-polar
medium like lipid bilayer or non-polar solvents like octanol.
Solvent Models
Model for the water solvents are either implicit or explicit. Simplest model to mimic the surrounding
water medium is implicit solvent model where a dielectric constant is used to emulate the dielectric
effect of the solvents without the physical presence of solvent molecules. Implicit solvent models for
water include distance dependent dielectric; Generalized Born and Poisson-Boltzmann Surface Area
model. Generally value of dielectric constant used for interior of protein is 2-4 while it is 80 for bulk
water outside the protein. They require less computational cost as compared to explicit solvents since
they do not increase any atom in the simulation system. But major limitation of implicit model is the
uniform dielectric effect while actual dielectric effect varies inside the protein. Some parts of protein
are highly hydrated while some parts completely lack any water molecule.
Explicit solvation is always performed in a defined volume (box/cell) of different shape and size.
Commonly used box shapes are spherical, cubic, triclinic, orthorhombic, octahedron and truncated
octahedron. In explicit solvation, model of water molecules (SPC, TIP3P) are present along with
molecule to be simulated rather merely mimicking the dielectric effect of water in implicit model.
Hence it increases the computational cost by increasing the number of atoms in simulation system. But
it is much more accurate in comparison to implicit models. Like a simulation of protein molecule with
N number of atoms, need to calculate approximately N2 pairs of non-covalent interactions (LJ potential
and Coulombic interaction). Hence any increase in number of atoms in the simulation system increases
the computation cost exponentially. On an average, a simulation system contains about 10-fold water
atoms as compared to protein atoms and thus this can increases the simulation time by 100-fold.
The commonly used water models are SPC (Simple Point Charge) (Berendesen et al, 1987) and TIP3P
(Jorgensen et al, 1987). Both the models are 3-site water model where three interaction points
correspond to three atoms of the water (Figure 4). They have rigid geometry and each point carries
respective partial charges. SPC model has an ideal tetrahedral bond angle (109.5°) and bond length
(1.0Å) while TIP3P has experimentally observed bond angle (104.5°) and bond length (0.95Å) of
water in liquid state.
Figure 4: Water Models
Four site water models are also available like TIP4P (Jorgensen et al, 1987). It has similar geometry to
its TIP3P model but has fourth interaction site which is a dummy atom near oxygen which carries only
charge. Similarly five site water models like TIP5P (Mahoney & Jorgensen, 1987) has five interaction
site where three sites represent three atoms of water and two sites represent two lone pairs on oxygen.
This improves simulation quality due better distribution of electrostatics of water model but
computationally cost increases with interaction points.
3.2 Boundary condition
Boundary condition: Solvation box must have a boundary to check the flow of any atom (solute or
solvents) to keep the total number of atoms constant in the simulation box. This is the essential condition
of MD simulation otherwise simulation process will crash. Simplest to way to limit the flow is to restrain
the outer most layers of the solvents. One such example is spherical boundary with harmonic restraint.
However this method forcefully restricts the natural movement of molecules and accumulates errors.
Hence it is not a preferred choice.
Figure 5: PBC
Another commonly used boundary condition in MD simulation is Periodic Boundary Condition (PBC).
Here (primary) simulation box produces the exact image (replica) all around and they are related by
simple translation. It means that when any atom leaves the primary simulation box from one face during
simulation, then image (atom having same identity in terms of position vector and velocity) of that will
enter the primary simulation box from the opposite face of the replica and hence it keeps the total number
of atoms constant in the simulation box. PBC also allows the calculation of interaction forces on the
atoms in the primary box with the image box and hence it enables molecule to experience forces as they
are in bulk solution. This can be used by efficient method of electrostatic calculation using Ewald
summation. Therefore PBC is the preferred choice of boundary condition.
3.3 Run parameters of MD simulation
MD simulation is performed under feasibly accurate experimental conditions with correct run parameters
which include type of ensemble (NPT, NVT & NVE), desired temperature and proper thermostat to
regulate the temperature, pressure and appropriate barostat to regulate the pressure, duration of MD run,
time step/step size, dynamics integrator, interactions calculator and any required restraint or constraint.
MD Ensembles: MD simulation by default produces trajectories similar to NVE (micro-canonical)
ensemble where number of molecules, volume and energy is conserved. Only exchange of potential and
kinetic energy happen in a micro-canonical MD trajectory. However biological system is equivalent to
closed system where energy is exchanged with outside environment while temperature and volume are
almost constant. Hence its MD trajectory matches with NVT (canonical) ensemble. While in vitro
chemical reactions occur under atmospheric condition where pressure and temperature is almost constant
and hence its MD trajectory matches with NPT (isothermal-isobaric) ensembles. But most of the
biological processes occur at constant pressure like atmospheric pressure. Hence MD simulations of
biological systems are also equilibrated in NPT ensemble along with NVT ensemble before the final
production trajectory of NVT ensemble. Therefore MD simulations of biological process require a
thermostat and barostat.
Temperature & Thermostat: Temperature is statistical quantity and instantaneous temperature is
proportional to the kinetic energy of the atoms in the simulation system at particular time (kinetic
temperature theory). Hence continuous increase in kinetic energy due to forces will lead to continuous
rise in temperature of the simulation system. However a simulation is always performed at desired (target)
temperature as required by experiment condition. Therefore a thermo-regulator is required to sense the
temperature and maintain the desired temperature of simulation system. The simplest method to regulate
the temperature is velocity rescaling where velocity of the atoms in simulation system is rescaled at every
time step to modify the temperature to maintain the desired temperature. But this method causes
significant changes in the atomic trajectory and hence does not produce correct statistical ensemble (NVT
& NPT). Moreover this method is totally dependent on averages of velocity and ignores the distribution
while temperature is a function of velocity of particles as well as distribution of particles (Maxwell-
Boltzmann distribution). Andersen thermostat assigns velocity to each atom based on proper statistical
distribution in such a manner that new velocity corresponds to the desired temperature (Andersen, 1980).
Hence it can calculate the thermodynamics averages correctly but unable to produce correct trajectory due
to perturbation of the atomic velocity. In Berendsen thermostat, simulation system is coupled to an
external heat bath with fixed temperature and velocity is scaled at each step to match the target
temperature (Berendsen et al, 1984). This thermostat also suppresses the kinetic energy of the system
(velocity is multiplied by an attenuation factor after each time step to maintain the temperature) and hence
it is not suitable for producing correct trajectories for canonical ensemble. However it produces better
trajectory due to weak coupling with external bath and it is very efficient to achieve the desired
temperature even when they are far away from the equilibrium. Nose-Hoover thermostat produces the
most accurate canonical trajectory. Most of the biological processes occur at constant temperature of
310K and hence simulations of biological molecules should be performed at 310K.
Pressure & Barostat: Biological processes occur around atmospheric pressure of approximately 1 bar.
But continuous motion of particles will lead clash with box and it will lead to increase in pressure.
Therefore we need a pressure regulatory mechanism to keep the pressure at desired level. Berendsen
barostat regulates the pressure at desired value by changing the dimension of simulation box by a scaling
factor which is function of isothermal compressibility of the system and coupling time constant for
pressure scaling (Berendsen et al, 1984). Scaling is done for all the components of atom position and
simulation cell dimension. It helps to achieve the desired pressure very quickly and hence it is suitable for
systems which are far away from equilibrium. It is used in combination of Berendsen thermostat to obtain
realistic fluctuations in temperature and pressure. Andersen barostat uses a fictional piston with a fixed
mass to control the volume of the simulation box (Andersen, 1980). Parrinello-Rahman barostat
control the pressure by changing box size as well as box shape (Parrinello & Rahman, 1981). It adds an
extra degree of freedom similar to Nose-Hoover thermostat. Hence it produces more accurate trajectory.
Generally Parinello-Rahman barostat is used in combination with Nose-Hoover thermostat for realistic
fluctuations in temperature and pressure. If the simulation system is far away from equilibrium then it is
better to use Berendsen barostat to reach the desired pressure and then shift to Parrinello-Rahman barostat
for more accurate trajectories.
MD Duration: Duration of MD run depends on the objective for which simulation is performed (Figure
6). To study the complete folding of a protein, we may need to run MD simulation for millisecond to
many second but refinement of model structure of that protein require a simulation of just 100 picosecond
to few nanosecond only.
Figure 6: Time scale
Time step/step size: MD run is numerical product of time step and number of steps. Time step is the time
interval between successive evaluations of forces in MD simulation. It should be approximately 10-20
times smaller than the fastest motion in the simulation system to capture their effect. The fastest motion in
a molecule is the vibrational frequency of bond between hydrogen atoms and other atoms. This is about
1014 per second or simply 10 femtosecond. Shorter will be the time step, higher will be the accuracy of
MD trajectory and vice versa. But shorter will be the time step, higher will be the computational cost.
Therefore ideally time step should be 0.5 to 1 femtosecond. It is an important parameter that influences
the stability as well as quality of the MD trajectories. Time step can be increased to 2 picosecond by
constraining the hydrogen atoms without significant compromise on accuracy of simulation trajectory.
Dynamics Integrator: This integrates the equation of motion based on forces acting on each atom in the
simulation system after every time step and produces trajectory (conformation with information of
position, velocity and energy with respect to time). The commonly used dynamics integrators are Verlet
and Leapfrog. Verlet algorithm calculates the position at t + Δt and t – Δt using 3rd order Taylor
expansion. It calculates the position independent of velocity. It is simple and efficient and requires less
memory and CPU. It is stable even for large simulation system and also time reversible. Leapfrog
integrator is a variant of Verlet algorithm and it calculates the position at half of the time step (t + Δt/2).
It is dependent on velocity information at each time step. Trajectories are identical to Verlet trajectories.
However the potential energy and kinetic energy can differ as they are calculated at different time (Δt and
Δt/2).
Interactions/Force calculation: Forces are dependent on bonded (bond stretching, angle bending and
torsional potential) and non-bonded (electrostatic and van der Waals) interaction potential. Forces are
calculated at every time step during MD simulation. Calculations of forces due to non-bonded potentials
are the most time consuming part of the MD simulation. Simplest way to calculate these forces is
summation of all the interactions between every possible pair of atoms in the simulation system. However
this increases the number of interaction pairs proportional to the square of number of atoms (N2). Hence it
increases the huge computational cost especially for larger simulation system. One way to speed up the
process is to consider interaction pair up to a particular distance like 10 Å and ignore the interactions
beyond that. This is a good approximation as non-covalent interactions are too small beyond 10 Å.
However recent studies has indicated the significant role of long range interactions particularly where
metal ions play a role. Therefore long range interactions must also be calculated for more accurate results.
People have made alternative strategy of force calculation to reduce the computation cost. Particle Mesh
Ewald (PME) is one such strategy which is used most often for long range force calculation (Darden et al,
1993). PME cost is proportional to NlogN as compared to N2. Hence it speeds up the simulation process.
It is possible to calculate the long range interactions between immediate image of simulation system in
PBC due to reduced computational cost by PME. Since the force calculations are performed at every time
step, so pair-list/neighbor list should be updated after every step due to change in position of atom. This
pair list update also has computational cost and updating this list after every 10 steps rather than every
step will further reduce the computational cost. This strategy will not cause significant change in accuracy
as movement of atom is very small at femtosecond scale.
Restraint/Constraint: Sometimes part of the simulation system can be restarained/constrained without
significantly affecting the objective of MD simulation. It speeds up the simulation process because
bonded interactions do not vary for the constrained part.
4. Summary
MD simulation requires co-ordinates of molecules to be simulated in correct chemical
configuration and sterical conformation.
MD simulation is based on Newton’s second law of motion i.e rate of change of momentum is
proportional to force.
Force Field or interaction potential allows the simulation of molecules in virtual space i.e
computer.
Interaction potentials are bonded (Bond stretching, Bond angle bending and torsional potential)
and non-bonded (Electrostatic and van der Waals potential).
Forces on each particle of simulation system can be calculated and integrated to determine the
position of each atom of simulation system with respect to time.
Simulation is performed in proper environment (solvent, pH, ionic concentration, temperature and
pressure) with feasibly accurate run parameters (step size, integrator, thermostat and barostat).