This article appeared in a journal published by Elsevier. The attached
copy is furnished to the author for internal non-commercial research
and education use, including for instruction at the authors institution
and sharing with colleagues.
Other uses, including reproduction and distribution, or selling or
licensing copies, or posting to personal, institutional or third party
websites are prohibited.
In most cases authors are permitted to post their version of the
article (e.g. in Word or Tex form) to their personal website or
institutional repository. Authors requiring further information
regarding Elsevier’s archiving and manuscript policies are
encouraged to visit:
http://www.elsevier.com/copyright
Author's personal copy
Commun Nonlinear Sci Numer Simulat 16 (2011) 2999–3004
Contents lists available at ScienceDirect
Commun Nonlinear Sci Numer Simulat
journal homepage: www.elsevier.com/locate/cnsns
Short communication
Using information to generate derivative coordinates from noisy time series
B.P. Mann a,⇑, F.A. Khasawneh a, R. Fales b
a
b
Department of Mechanical Engineering & Material Science, Duke University, Durham, NC 27708, USA
Department of Mechanical & Aerospace Engineering, University of Missouri, Columbia, MO 65211, USA
a r t i c l e
i n f o
Article history:
Received 25 August 2010
Received in revised form 16 November 2010
Accepted 21 November 2010
Available online 3 December 2010
Keywords:
Information theoretic
Signal derivatives
Derivative coordinates
a b s t r a c t
This paper describes an approach for recovering a signal, along with the derivatives of the
signal, from a noisy time series. To mimic an experimental setting, noise was superimposed
onto a deterministic time series. Data smoothing was then used to successfully recover the
derivative coordinates; however, the appropriate level of data smoothing must be determined. To investigate the level of smoothing, an information theoretic is applied to show
a loss of information occurs for increased levels of noise; conversely, we have shown data
smoothing can recover information by removing noise. An approximate criterion is then
developed to balance the notion of information recovery through data smoothing with
the observation that nearly negligible information changes occur for a sufficiently
smoothed time series.
! 2010 Elsevier B.V. All rights reserved.
1. Introduction
It is rarely practical to measure each state variable in an experimental setting. To circumvent this issue, methods exist for
reconstructing a pseudo-state space from a small number of measured observables [1–3]. A primary benefit of the reconstruction is that it produces a pseudo-state space with dynamics equivalent to those of the original state space. As a consequence of the equivalence, an attractor in the reconstructed state space has the same invariants, such as Lyapunov exponents
and dimension, as the original attractor [4].
While delayed embedding is the predominant choice for attractor reconstruction [3,5–8], the use of derivative coordinates is appealing, given their obvious physical meaning. For instance, many physical systems can be described by a set
of equations with states that are related through their derivatives. While the numerical derivatives of a signal can be used
for noise-free data, the presence of noise renders the signal derivatives to be poor approximates – owing to noise amplification in the signal derivatives.
The inherent goal of signal analysis is to extract useful information about a system from the observed data. Consider a
continuous and deterministic system that has produced the scalar time series q(t). To mimic realistic data from an experiment, noise is superimposed onto the noise-free data as follows: g(t) = q(t) + ra(t), where a(t) is a time series with normally
distributed random noise, a zero mean, and a standard deviation equal to r. The underlying goal of this investigation is to
extract q(t) and the derivatives of q(t) from the noisy time series g(t).
This work investigates smoothing g(t) to recover q(t) and its derivatives. While the process of data smoothing is well
known, the question of how much smoothing yields accurate derivative coordinates is unclear. To answer this question,
we explored the use of an information theoretic, known as the average mutual information, to develop a criterion for the
appropriate amount of data smoothing.
⇑ Corresponding author. Tel./fax: +1 919 660 5214.
E-mail address: brian.mann@duke.edu (B.P. Mann).
1007-5704/$ - see front matter ! 2010 Elsevier B.V. All rights reserved.
doi:10.1016/j.cnsns.2010.11.011
Author's personal copy
3000
B.P. Mann et al. / Commun Nonlinear Sci Numer Simulat 16 (2011) 2999–3004
The work of this paper is organized as follows. The next section describes the data smoothing technique and average mutual information tools used in our analyses. These discussions are followed by a series of example results that apply the average mutual information to investigate the influence of noise and an approximate criterion for the level of data smoothing.
2. Example implementation
The investigations that follow use synthetic data generated from a Duffing oscillator,
q00 þ lq0 þ x2 q þ bq3 ¼ C cos Xt;
ð1Þ
where a prime denotes a derivative with respect to time. Numerical simulation was used to generate a chaotic time series for
qðtÞ; q0 ðtÞ, and q00 ðtÞ while using the following parameters l = 0.2, x = 1, b = 1, C = 27, and X = 1.33; however, in an effort to
mimic the realistic challenges of an experiment, we have assumed only the noisy time series is observable g(t) = q(t) + ra(t).
The remainder of this section describes the application of data smoothing for noise removal and the use of the average mutual information to determine the appropriate smoothing level to recover the derivatives of q(t).
2.1. Noise removal with smoothing
Cubic splines are often applied to empirical data to estimate interim points or data points that lie between two measurements. The basic idea is to fit the data with a piecewise polynomial,
sðtÞ ¼ bi0 þ bi1 ðt % ti Þ þ bi2 ðt % t i Þ2 þ bi3 ðt % t i Þ3 ;
ð2Þ
where the polynomial coefficients bi0, bi1, bi2, and bi3 have subscripts that denote their validity between two neighboring data
points - the time interval from ti to ti+1. Therefore, cubic splines provide a piecewise fit to the data while simultaneously giving a functional relationship for the derivatives of the data, i.e. after differentiation of Eq. (2). Furthermore, cubic splines invoke continuity in the signal and its first two derivatives at the intersection of the neighboring time steps [9].
Smoothing splines, which differ from the typical cubic spline fitting operation, provide a refinement to the idea using a
polynomial to fit empirical data. The basic difference lies in the introduction of a smoothing parameter which reduces noise
amplification in the signal derivatives by balancing a fit between the measured data g(t) and the smoothness in the second
derivative [10]. In the results that follow, cubic smoothing splines have been implemented; this approach obtains the polynomial coefficients for Eq. (2) by minimizing
a
b
c
Fig. 1. Two dimensional state space with graphs showing: (a) the actual q and q0 ; (b) the noisy time series g and the numerical derivative g0 when r = 0.1;
and (c) the smoothed version s of the noisy time series and the derivative of the smoothed signal s0 obtained when c = 99.95 & 10%2.
Author's personal copy
3001
B.P. Mann et al. / Commun Nonlinear Sci Numer Simulat 16 (2011) 2999–3004
3
2.5
2
1.5
1
0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0.25
0.2
0.15
0.1
0.05
0
0.8
0.6
0.4
0.2
0
Fig. 2. Average mutual information for different levels of noise intensity. Graphs denote the mutual information between the observable signal g and the
following signals: (a) q, (b) q0 , (c) q00 .
Z
N
X
ð1 % cÞjgi % si j2 þ c
tN
t1
i¼1
!
!
!d2 sðtÞ!
!
!
!dt;
!
! dt 2 !
ð3Þ
where the notation gi = g(ti) and si = s(ti) has been applied over N-data points. The smoothing parameter c, provides a weight
to balance the smoothness in the second derivative with a fit to the measured data.
The smoothing process can be illustrated with the following example. Consider the superposition of the scalar time series
q(t), generated from Eq. (1), with normally distributed random noise; this takes the form g(t) = q(t) + ra(t). In the absence of
noise, accurate derivative coordinates may be obtained from the numerical derivatives of the observed signal since g(t) = q(t)
and r = 0 (see attractor in Fig. 1a). If normally distributed random noise, with a standard of deviation r = 0.1, is added to the
the signal, Fig. 1b shows that the attractor generated by g(t) and g0 ðtÞ will significantly differ from the attractor given by the
states q(t) and q0 ðtÞ. While the numerical derivative, denoted by g0 ðtÞ, contains an amplified amount of noise, the noise in the
higher-order numerical derivatives often overtakes the deterministic component of the signal. Here we note that the numerical derivative and derivative of the a cubic spline fit, one without smoothing, are identical and both yield the time series
shown in Fig. 1b. The attractor of Fig. 1c shows the potential for smoothing the data since the time series for s(t) and sðtÞ0
closely replicate the time series shown in Fig. 1a for q(t) and qðtÞ0 .
2.2. Average mutual information
The mutual information between the measurement xi = x(ti), drawn from a set of measurements x, and a measurement
yj = y(tj), drawn from a set of measurements y, is the amount learned by the measurement of xi about the measurement
of yj. The average amount of information learned by comparing the measurements of signal x to the measurements of the
signal y is
Iðx; yÞ ¼
X
xi ;yj
pðxi ; yj Þ log
pðxi ; yj Þ
;
pðxi Þpðyj Þ
ð4Þ
where log was taken to be the natural log and p(xi, yj) is the joint probability density that a measurement drawn from the
signals x and y will result in the values of xi and yj [3,11]. The individual probability densities for the measurements xi in
Author's personal copy
3002
B.P. Mann et al. / Commun Nonlinear Sci Numer Simulat 16 (2011) 2999–3004
a
b
c
d
e
f
Fig. 3. Average mutual information and error results for a range of smoothing values and a fixed noise level r = 0.1. Graphs (a) and (b) are error plots for the
smoothed velocity and acceleration, respectively. Graphs (c)–(f) show average mutual information plots for the corresponding variables. The smoothing
level that gives the smallest error is shown by a ' on the plots with s0 and by a ( on the plots with s00 .
x and yi in y are given by p(xi) and p(yj), respectively. If the two signals are completely independent of each other, which
indicates no information transfer, the joint probability factorizes to p(xi, yj) = p(xi)p(yj) [12]. For computational purposes, it
is convenient to rewrite Eq. (4) as
Iðx; yÞ ¼
X
pðxi ; yj Þ log pðxi ; yj Þ %
xi ;yj
X
xi
pðxi Þ log pðxi Þ %
X
pðyj Þ log pðyj Þ:
ð5Þ
yj
Fig. 2 shows a series of plots comparing the mutual information between the observable signal g and the signals qðtÞ; q0 ðtÞ,
and q00 ðtÞ. For each case, the shared information between the signals rapidly decreases for an increase in the noise intensity;
conversely, these graphs also show that a reduction in noise, either from smoothing the data or through an alternative approach, increases the shared information between the signals.
Before investigating the usefulness of the shared information and noise level trends, the following error equation is introduced to ascertain the goodness of fit between any two signals
Eðx; yÞ ¼
N
1 X
jxi % yi j;
N i
ð6Þ
where E(x, y) is the total error obtained when the time series x and y are compared at each ti and i 2 [1, N]. For the investigations that follow, N = 18 & 103 data points were used.
The effect of different levels of data smoothing is illustrated through the graphs of Figs. 3 and 4. While the top graphs
show the error obtained when estimating q0 ðtÞ and q00 ðtÞ by smoothing the noisy observable g(t), the lower graphs show
the mutual information between various signals. Focusing on the results of Fig. 3, it is evident that smoothing will initially
cause a drastic increase in the mutual information between g(t) (or the smoothed version s(t)) and the smoothed estimates
of q0 ðtÞ and q00 ðtÞ, which have been labeled as s0 ðtÞ and s00 ðtÞ. However, once a sufficient amount of noise has been removed,
the mutual information curve essentially plateaus - thus indicating that over smoothing the data will not significantly alter
the shared information; in addition, the error graphs show that over smoothing will increase the errors of estimating q0 ðtÞ
and q00 ðtÞ with s0 ðtÞ and s00 ðtÞ.
While the results of Fig. 3 used r = 0.1, an increased noise level of r = 0.5 was used for the analyses shown in Fig. 4. In
addition to the multiple other cases investigated, these two figures demonstrate nearly identical trends. For instance, both
figures show that different levels of smoothing are required to produce the most accurate derivative coordinates. The mutual
information graphs also indicate that the best level of smoothing - the value that produces the most accurate derivative coordinates - occurs just after the bend in the mutual information curves.
Author's personal copy
B.P. Mann et al. / Commun Nonlinear Sci Numer Simulat 16 (2011) 2999–3004
a
b
c
d
e
f
3003
Fig. 4. Average mutual information and error results for a range of smoothing values and a fixed noise level r = 0.5. Graphs (a) and (b) are error plots for the
smoothed velocity and acceleration, respectively. Graphs (c)–(f) show average mutual information plots for the corresponding variables. The smoothing
level that gives the smallest error is shown by a ' on the plots with s0 and by a ( on the plots with s00 .
2.3. Approximate smoothing criterion
While it was initially tempting to search for an exact smoothing level that yields the absolute minimum error, we found
this approach impractical, given the fact that the noisy time series is the only observable. Instead, we have focused on an
approximate criteria that leverages the trends in the mutual information curves. For instance, Fig. 5 shows unit-normalized
curvature and slope plots of the mutual information curves of Figs. 3 and 4; these were obtained by fitting the function
f ðcÞ ¼ ðco þ c1 cÞð1 þ e%c2 c Þ to the I(x, y) curves, where co–c2 are fitted constants, and taking the derivatives with respect c.
Here, eI x;y represents the normalized slope of I(x, y) and j(Ix,y) denotes the normalized curvature of I(x, y). The correlation coefficient obtained when fitting f(c) to the mutual information curves of Figs. 3 and 4 are (a) 0.96, (b) 0.99, (c) 0.97, and (d) 0.96,
where the letters (a)–(d) correspond to the letters assigned to the graphs appearing in Fig. 5.
The shown in Fig. 5 indicate that the slope is initially large, but it rapidly decays to a constant value - indicating a leveling
off in the shared information between the smoothed signals for further levels of smoothing. Thus one idea would be to consider a small value ! and to choose the smoothing level after the normalized slope dropped below !. The problem with this
approach was that the choice of ! was arbitrary. As an alternative, a threshold value could be adopted using the normalized
curvature. For instance, a range of accurate smoothing levels could be identified when the curvature exceeded a threshold !.
However, once again, the choice of ! is rather arbitrary - unless the two ideas are combined. Specifically, we contend that the
first intersection of the normalized curvature and slope, where eI x;y ¼ jðIx;y Þ, should be used as the threshold !; the acceptable
range for c is then defined by the normalized curvature that exceeds this threshold j(Ix,y) P !. This was the approach taken
to successfully smooth the data shown in Fig. 1.
To summarize, the proposed criterion considers a range of acceptable smoothing levels as opposed to a single value. The
idea stems from the notion that data smoothing can remove noise and therefore recover information; this is confirmed by
the shape of the I(x, y) curves and quantified through the normalized slope and curvature.
3. Conclusions
This paper described an approach to recover the derivative coordinates from a noisy time series. We have assumed the
underlying system was deterministic and continuous. Although a data smoothing technique was shown to successfully recover the derivative coordinates, a criterion was needed to ascertain the appropriate level of data smoothing. To investigate
Author's personal copy
3004
B.P. Mann et al. / Commun Nonlinear Sci Numer Simulat 16 (2011) 2999–3004
a
c
b
d
Fig. 5. Unit-normalized slope (solid line) and curvature (dotted line) plots of the mutual information curves of Figs. 3 and 4. Graphs give the slope and
curvature for: (a) Iðs; s0 Þ and r = 0.1, (b) Iðs; s00 Þ and r = 0.1, (c) Iðs; s0 Þ and r = 0.5, and (d) Iðs; s00 Þ and r = 0.5. The smoothing level that gives the smallest error
is shown by a ' on the plots with s0 and by a ( on the plots with s00 .
the best level of smoothing, we applied an information theoretic to show information loss for increased levels of noise intensity. Conversely, we also showed that data smoothing can recover information.
An approximate criterion was then developed that balanced the notion of recovering information from data smoothing
with the observation that nearly negligible information changes occur once the time series has been sufficiently smoothed.
In particular, the slopes and curvature of the mutual information curves were used to describe an approximate criterion for
an appropriate range of data smoothing - the range that provides the most accurate recovery of the derivative coordinates.
Acknowledgement
Support from the National Science Foundation (CMMI-0900266) is gratefully acknowledged.
References
[1] Packard NH, Crutchfield JP, Farmer JD, Shaw RS. Geometry from time series. Phys Rev Lett 1980;45:712–6.
[2] Takens F. Detecting strange attractors in turbulence. In: Rand A, Young LS, editors. Dynamical Systems and Turbulence. Lecture Notes in
Mathematics. New York: Springer-Verlag; 1981. p. 366–81. Ch. 898.
[3] Arbarbanel HD. Analysis of Observed Chaotic Data. 2nd ed. New York, NY: Springer; 1996.
[4] Nayfeh AH, Balachandran B. Applied Nonlinear Dynamics. 1st ed. New York, NY: John Wiley & Sons, Inc.; 1995.
[5] Kennel MB, Abarbanel HD. False neighbors and false strands: A reliable minimum embedding dimension algorithm. Phys Rev E 2002;66:026209.
[6] Fraser AM. Reconstructing attractors from scalar time series: A comparison of singular and redundancy criteria. Physica D 1989;34:391–404.
[7] Fraser AM. Independent coordinates for strange attractors from mutual information. Phys Rev A 1986;33:1134–40.
[8] Garcia SP, Almeida JS. Nearest neighbor embedding with different time delays. Phys Rev E 2005;71:037204.
[9] Kreyszig E. Advanced Engineering Mathematics. 8th ed. New York, NY: John Wiley and Sons, Inc.; 1999.
[10] Boor CD. A Practical Guide to Splines. New York, NY: Springer-Verlag; 1978.
[11] Nichols JM. Inferences about information flow and dispersal for spatially extended population systems using time-series data. Proc R Soc London B
2005;272:871–6.
[12] Nichols JM, Seaver M, Trickey ST, Salvino LW, Pecora DL. Detecting impact damage in an experimental composite structures: an information-theoretic
approach. Smart Mater Struc 2006;15:424–34.