Mixture Models of Endhost Network Traffic
John Mark Agosta
Jaideep Chandrashekar
Mark Crovella
Nina Taft
Daniel Ting
Toyota ITC
Technicolor Research
Boston University
Technicolor Labs
Facebook
Abstract—We model a little studied type of traffic, namely
the network traffic generated from endhosts. We introduce a
parsimonious model of the marginal distribution for connection
arrivals consisting of mixture models with both heavy and lighttailed component distributions. Our methodology assumes that
the underlying user data can be fitted to one of several models,
and we apply Bayesian model selection criterion to choose the
preferred combination of components. Our experiments show
that a simple Pareto-exponential mixture model is preferred over
more complex alternatives, for a wide range of users. This model
has the desirable property of modeling the entire distribution,
effectively clustering the traffic into the heavy-tailed as well as
the non-heavy-tailed components. Also this method quantifies the
wide diversity in the observed endhost traffic.
I. I NTRODUCTION
In the last decade or so there has been a tremendous amount
of research done in the area of Internet traffic modeling— [4],
[14], [19] (to name just a few). This research has predominantly focused on traffic from inside a network. The reason
endhost traffic models are so scarce (with some exceptions
[10]) is that it is difficult to obtain the raw measurements
needed, which requires installing a collection tool directly on
each user’s machine, and getting the express consent of users.
Nevertheless it is important to obtain a deeper understanding of
end user traffic because IT management is driving computing
towards self-diagnosis for trouble shooting and user controlled
performance tuning.
We obtained end user data via a measurement tool that
resides on laptops and thus moves with users and continues to
observe network traffic as the user switches between different
networks and different environments (e.g., work and home),
for a population of 270 enterprise users over five weeks (§III).
Starting with this rich dataset, we focus on modeling end
host activity, in particular the rate of flow initiations. A
necessary first task is to estimate reliably the probability
distribution of flow rates. Modeling heavy-tailed data is a
notoriously fraught problem, often approached by just estimating the exponent, known as the scaling parameter of
the distribution tail. Commonly used methods for estimating
the scaling parameter include the Hill estimator, which is
tricky since it relies on estimating a cut-off below which the
central part of the distribution is disregarded [15]. In [3], the
authors highlight the lack of care pervasive in the literature on
estimating power laws. In light of this concern, we demonstrate
an efficient estimator that uses the entire data set by means of
mixture models. Hence, our first contribution is a new method
(§IV-A) to estimate heavy-tailed scaling parameters.
Since we fit the entire sample, we need a method to choose
models, the commonly applied one being goodness-of-fit. The
limitation of this approach and their associated P -values is
that they are meant to rule out hypotheses. This is certainly
useful for steering data collection, but they do not provide
an acceptance criterion. In our situation, with effectively an
endless stream of data as a source, any reasonable model
will eventually be rejected. Model selection methods give us
a quantitative criterion that lets us explore a wider class of
models than has hitherto been considered. Thus we do not
presuppose a single parametric distribution model; instead we
start with a class of nested mixture models (i.e. a family of
models where one is a subset of another) and use Bayes Factors approximated in large samples by Bayesian Information
Criteria (BIC) to select the best model for a user’s data. Since
it is a requirement for Bayes Factors comparisons to compare
both models on the full sample data, as a side benefit we
produce models that comprise the complete distribution.
Thus our second contribution is a richer, non-parametric
class of models for traffic modeling (§V). The strength of this
approach is that, over a population of users, the choice of
which model is best can be explained statistically.
Our third contribution lies in applying this tool to an
extensive, diverse set of endhosts’ traffic data (§VI). We began
by observing that distributions of users’ flow arrival counts
are monotonically declining from a mode at zero. Preliminary analysis eliminated conventional component distributions
(such as, e.g., Gaussian or Poisson), to concentrate on mixtures
of heavier tailed exponential and Pareto distributions. Since
mixtures of exponentials constitute a very flexible framework,
restricting to these two distribution classses is a good approximation to the properties of the samples. We find that the
majority of the endhost population can be described by a two
component model whose diversity is expressed by the revealed
distribution over users of estimated parameters.
II. R ELATED W ORK
Heavy tailed statistics have been documented in numerous
phenomena in network traffic; in the popularity of web pages
[2], in traffic demands [8]; in network topology [15], in
TCP inter-arrival times [7], in wireless LAN traffic [16], and
many others. The seminal work by Leland et al. [14] studied
LAN traffic and convincingly demonstrated that actual network
traffic is self-similar or long-range dependent in nature (i.e.,
bursty over a wide range of time scales). Our work differs
by revealing the distribution of models over users rather than
aggregating all users. Secondly, we observe the power law
nature of traffic in the first-order statistics of traffic rates, rather
than in the second-order autocorrelation properties. For a more
detailed comparison to prior art, we refer the reader to a longer
version of the paper at [1].
The idea of using mixture models for Internet traffic has
been proposed in other contexts before [9]. That work proposes
using hyperexponential models as a tractable way to approximate a heavy-tailed distribution. Our work instead does not
assume the presence of a heavy tail, and instead uses mixture
models to extend the range of models to consider.
III. DATASET D ESCRIPTION
The dataset consists of traces collected at 270 enterprise
end-hosts (90% laptops) over a period of approximately 5
weeks. Each host was associated with a unique user for the
entire trace collection period, and ran a corporate standard
build of Windows XP that included a number of enterprise IT
applications.
Packet-level traces were collected on the end-hosts, providing a longitudinal view of the traffic even as they moved in
and out of the network and across interfaces (wired and wireless). The trace logging software included a wrapper around
WinDump to log only packet headers. It also tracked changes
in IP address or interface, restarting the trace collection as
required. The logged data was uploaded opportunistically a
few times a day to a central server (the logging was paused
during the upload). Overall, we obtained over 400 Gb of packet
traces, which were then converted into flows using BRO [18].
The starting time of each flow generates a point process
in continuous time that we bin over non-overlapping, fixed
size time-windows to create a time series for each user. Each
user trace was binned for 8 different window sizes, starting
at 4 seconds, and increasing in multiples of 2, up to 512
seconds. Each bin contains a count of the new flow arrivals.
The flow count events within each time-window or bin are
the random variables modeled in this work. In our datasets
the median sample size was 9771 intervals, and the maximum
was 264,000. Zeros could occur in bins because the host was
turned off (or asleep), or else the host was disconnected from
the network during that bin. We filter out all such bins and in
the resulting data, we see zeros only because there were no
flows originated in that bin (and the machine was turned on).
That being said, we model the flow events when the counts
are nonzero since our goal is to characterize the distribution
of active traffic.
IV. M ETHODOLOGY
A. Mixture Models with Heavy Tails
A probability mixture model is a convex combination of
probability densities. A mixture model can be thought of as
a hierarchical model where the mixing weights determine
the probability of each of the component models, which in
turn generate the sample. Since all components share the
same support, any sample point could in principle have been
generated by any component, but with possibly vanishingly
small probability. Such models are familiar in the Statistics
literature, [6] [11] and have become a mainstay in the machine
learning community [12].
A finite mixture model’s probability density is defined by
k component densities, fi (x), and mixture fractions mi , with
parameters m, θ as given by:
f (x | m, θ) =
s.t.
k
∑
k
∑
i=1
mi fi (x | θi ),
(1)
mi = 1, mi > 0.
i=1
where the θi are the component parameters, and m =
m1 . . . mk .
We consider the following nested family of models: a Pareto
only model labeled (P), a mixture of one exponential and one
Pareto (EP), and a mixture of two exponentials and one Pareto
(EEP). The “pure power-law” model we fit is
1
x−α , x ∈ N
ζ(α, xmin )
∞
∑
(n + xmin )−α .
ζ(α, xmin ) =
f (x | α) = Cx−α =
(P)
(2)
n=0
where x takes on positive integer values, for which we use the
discrete version of the Pareto density (referred to also as the
Zeta distribution).
The exponential–Pareto model is defined as
f (x | m, λ1 , α) = m1 λ1 e−λ1 x + (1 − m1 )Cx−α .
(EP)
The mixture variable adds another degree of freedom, revealing the relative contributions of the components.
The two exponential–Pareto mixture density model is:
f (x | m, λ1 , λ2 , α) = m1 λ1 e−λ1 x
+ m 2 λ2 e
(EEP)
−λ2 x
+ (1 − m1 − m2 )Cx−α .
We were motivated originally to consider this set of models
because visual exploration of the data showed traffic flow
distributions with a mode left-most, then a monotone decrease,
with a linear segment on a semi-log plot in the dense part of
the distribution, followed by a long, heavy tail.
The intent behind using a family of models is to capture
the diversity across users. The EEP model is capable of fitting
any combination of the 3 component distributions, although
in practice we almost always see a heavy-tailed component.
In terms of degrees of freedom, these are very parsimonious
models; the EP model has 3 parameters, and the EEP has only
5.
B. Estimating Model Parameters
Desirable properties of maximum likelihood estimation
(MLE) recommend its use to estimate model parameters. Besides being asymptotically efficient, if the model does contain
the true data generating distribution and is differentiable in
quadratic mean (DQM) [20],
√ the MLE converges to the true
parameters at a rate O(1/ n). If not, the MLE still converges
to the best approximation to the true distribution within the
model’s constraints at the same rate.
Instead of a conventional Expectation-Maximization (EM)
methods, we solved for the MLE as constrained optimization
problem using an interior point method to enforce the constraints on the model parameters. We found EM converged
slowly, probably due to the common mode of the components.
The method uses a concave barrier function that decreases
steeply to −∞ at the boundary of the constraint set, preventing estimates from violating constraints and making it
amenable to unconstrained solution methods. The weight is
reduced on each iteration until the barrier becomes negligible.
These unconstrained problems are solved using the optim()
function in the statistical package R, which implements a
Quasi-Newton optimization method. To exclude bad solutions,
we also added constraints α < 4 and λ < 3.5 so that the
component parameters do not grow not too large. Since the
mixture model typically contains local optima, we performed
the optimization multiple times with random initializations to
find the global maximum.
C. Model Selection
Given multiple probability models for the same sample,
model selection uses a comparative metric called a Bayes
Factor (BF ) as a means of comparing which model is more
probable. In practice the estimated mixing weights will find
the correct model: Components with insubstantial weights
can be ignored, leaving only desired components. Model
selection, in addition reveals the strength of the comparison,
and can be applied generally, not only to mixture models. Our
explanation of model selection borrows extensively from Kass
and Raftery [13]. This can be understood using the odds ratio
form of Bayes rule, where the posterior odds—the ratios of
posteriors—between two models, is expressed as the product
of the BF and the prior odds. So, for example to compare the
model MP to the proposed model MEP , the posterior odds
will be
P(D | MEP ) P(MEP )
P(MEP | D)
=
P(MP | D)
P(D | MP ) P(MP )
(3)
where the middle term in this equation, the Bayes factor, BF ,
is defined as the ratio of marginal likelihoods:
BFEP,P =
P(D | MEP )
P(D | MP )
(4)
The larger BF , the greater the weight of evidence for the EP
model. As for the prior term, an unprejudiced rule implies
equal model priors, in which case the Bayes Factor and
the posterior odds-ratio are equal. This criterion is similar
to the maximum likelihood ratio, but rather than taking the
probability at the maximum, one integrates over the range
of model parameters θ, resulting in a correction for the
degrees of freedom of the models. Adding more parameters
to a model and thus increasing its degrees of freedom can
Odds
20:1
100:1
1000:1
log10 (BF )
1.3
2
3
log(BF )
3
4.5
7
Strength of comparison
“substantial”
“strong”
“decisive”
TABLE I: Interpretation of Bayes Factor strengths
only increase the likelihood at the maximum but does not
necessarily improve marginal likelihood. This criterion trades
off simplicity with accuracy—a built-in “Occam’s Razor.”
D. Interpreting The Weight of Evidence
Interpreting the magnitude of a BF is commonly done by
considering the ratio as an odds ratio, e.g., odds of 20 to
1 in favor of the model in the numerator corresponds to a
BF = 20, or, using natural logs, to log BF ≃ 3. Table I
shows a standard convention [13] that we adopt for interpreting
the strength of Bayes Factors, with their suggested labels. For
comparison between E and EP models, we give precedence to
the conventional model, and hence require an log odds-ratio
significantly greater than zero—we use 10—which is well into
the “decisive” range, corresponding to an odds ration of greater
than 20,000. If the EP model is selected, then we compute
log BFEEP,EP . Again, if this factor is above 10, then EEP
is selected, otherwise the final choice is EP. Of course, the
test is symmetric and the ratio may be expressed either way.
A negative log BFEP,P would be evidence against the EP
model, in favor of P.
E. Approximation by BIC
In practice the integral implied by P(M | D, θ) requires a
prior over the θ. Recall these parameters are bounded, so the
integral is equivalent to setting a proper, uniform prior over
their range. Since with such large samples likelihood values
are strongly peaked around their maximum and numerical
integration works poorly, we use a common approximation
to BF .
With large samples, BF is approximated by the Bayes
Information Criterion (BIC). BIC is often presented as a
correction to maximum log likelihood to account for the
degrees of freedom of a model. BIC is defined as
BIC = log P(D | M, θ̂) − log(N ) · d/2
(5)
where N is the sample size and d is the numbers of parameters
in the model. In our experimental work we computed both
Laplace approximations and BIC corrections and found to
our satisfaction that they agreed with each other to within
a fraction of a percent on the dataset.
With the BIC approximation, the log Bayes Factor becomes
log BFEP,P = BICEP − BICP
V. VALIDATION
We validated our model-fitting and selection method by
first showing that the estimates produced are accurate, in
comparison to a widely used power-law tail fitting procedure.
We used synthetic data from an EP mixture that is of a
Simulated Zeta (Discrete Pareto) Estimates
1.2
1.32
1.43
1.54
1.66
1.77
1.88
Truth
Model Choice:
EP
EP vs. P
EP
EEP vs. EP
EEP
EEP vs. EP
2
2.2
Estimated alpha
2.0
1.8
Min Number
Samples
1000
5000
1000
10,000
9000
log10 BF
strength
substantial
decisive
substantial
strong
substantial
1.6
TABLE II: Sample sizes and the strength of comparison they achieve
with simulated data, for different model comparisons.
1.4
AEST
ML
AEST
ML
AEST
ML
Log10 Bayes Factor
200
type covered by both procedures, where the true value of the
parameters of the generating data is known.
The tail-fitting method used for comparison is a widely used
tool for estimating the α parameter of α-stable distributions,
based on a scaling property of sums of heavy-tailed random
variables [17]. We used a publicly available implementation,
called aest [5]. We see in Fig. 1 that the range of α̂’s in the
columns subtitled “ML” for the mixture model estimates, is
within a few percent of the true value, unlike the aest α̂’s that
have high variance and bias. See [1] for details.
Next we validate that model selection by pair-wise comparison of BIC scores does indeed select the right model. Since
the EEP model subsumes the other two, the model with more
parameters will always fit better, so the model choice is driven
by the penalty due to the BIC penalty term.
The test data consisted of pseudo-random samples with
known parameters α̂, m̂{1,2} , λ̂{1,2} , generated from each of
the three models, P, EP and EEP. We ran 100 test cases over a
range of sample sizes from 500 to 20,000 points, in the style
of an empirical “design of experiments” to find what sample
sizes were necessary to show adequate model selection results.
We ran 3 pair-wise comparisons: EP vs. P on EP data, EEP
vs EP on EP data and EEP vs EP on EEP data.
In Table II, we summarize the ability of our model selection
method to distinguish the 3 hypotheses. For each test, we state
the number of samples and the Bayes Factor level so achieved,
using the conventions substantial, strong, or decisive in Table
I. For the first two tests we list sample sizes for two levels.
The more complicated the model comparison, the larger the
sample required for the same strength of differentiation. In
short, the EP model can be selected “substantially” with traces
of no less than about 1000 samples. The EEP model requires
about 10 times the sample to be selected at the same level.
This is reason to believe that requiring samples on the order of
a few thousand (or at most 10,000) is a fairly light requirement
compared to the typical size of our sample traces.
For sake of comparison, the Results reveal that the actual
Bayes Factors computed on the data have values ranging in
the hundreds, with sample sizes in the thousands and tens
of thousands—clearly at the “decisive” level, and orders of
magnitude larger than seen in these validation tests!
0
Fig. 1: With synthetic Pareto-tailed data over 1 < α ≤ 2 an EP
mixture model estimator performs accurately, and with less variance,
than the aest method.
1000 1200
ML
800
AEST
600
ML
True alpha
Log10 Bayes Factor
AEST
400
ML
200
AEST
0
ML
800
AEST
600
ML
400
AEST
1000 1200
1.2
64
128
256
512
64
Bin size, seconds
Fig. 2: Boxplot of BIC for P
vs. EP models.
128
256
512
Bin size, seconds
Fig. 3: Boxplot of BIC comparison
for EP vs. EEP models.
VI. R ESULTS
Choice of Models: We use our methodology to select the best
model for each of our 270 users. In Fig. 2 we show a box-plot
of the log of the Bayes Factor (or difference in BICs) of the P
and EP models against bin-size on the x-axis. We can see that
for nearly all users we can select the two component EP model
as ‘decisive’, according to Table 2. There are a very small
number of users—roughly a dozen— whose log BFEP,P was
near zero, suggesting a Pareto-only model. Not only is the two
component mixture model EP preferred for all the other users,
but it is strongly preferred as evidenced by the high Bayes
factor values. We observe a small trend here in that as the bin
sizes increase, the log Bayes factor ratio gets larger, indicating
that for larger bin sizes, the exponential component plays an
increasingly dominant role. Next we compare the EP and EEP
models. Fig. 3 plots the BF distribution for all users, for each
of 4 bin sizes. Interestingly, we see that at bin sizes of 64 and
128, the Bayes factors are close to zero for the majority of the
users. Since the two models are fairly indistinguishable here,
we again select the model of lower complexity, namely EP
for nearly all the users save a few outliers. At larger bin sizes,
we do see some users for whom the EEP model is selected.
Overall, our method assigns the EEP model to roughly 30%
of the users and the EP model to the remaining 70%.
The percentage of users that were assigned a given model
depends upon the bin size. However, we see clear trends. The
fraction of users assigned a Pareto-only model was always
less than 5%, the fraction assigned an EP model varied
from 50-85% and the fraction assigned an EEP ranged from
15-40%. We conclude two things from this section. First, the
flexibility we have built into our methodology is important
and needed because the best model for one endhost is not
necessarily the same for another endhost. Second, for the
Zeta (Discrete Pareto) Mixture Weight,
All Users & 64 Sec Binnings
50
Zeta (Discrete Pareto) Alpha Parameter,
All Users & 64 Sec Binnings
40
30
0
0
10
20
Number of Users
40
mean = 1.6
20
Number of Users
60
mean = 0.252
1.0
1.5
2.0
2.5
3.0
0.0
0.2
Alpha
0.4
0.6
0.8
1.0
Mixture Weight
Fig. 4: Histogram of estimated
α values across users.
Fig. 5: Histogram of estimated mα
across users.
majority of the endhosts, the mixture model consisting of one
exponential and one Pareto is clearly the preferred model.
User Behavior: As indicated in §II, there is a growing interest
in understanding the range of variation of user behavior. We
now look at some model details to explore the range of
parameters selected across users, and the amount of mixing
between the two model components. We computed an EP
model for all our users, and examined the resulting α and
λ values. We first observed that there is little correlation
between α and λ values within the set of endhost EP models.
This is reassuring, as it indicates that the fitting process
does not introduce dependencies between the two component
distributions, and that properties of one distribution do not
affect the other.
In Fig. 4 we show the histograms of α values over users
for a bin size of 64 sec. We see that the values of α range
from 1.3 to 2.3 across the users; different users have very
different properties in terms of the heaviness of the tail of the
distribution. Roughly 1/6 of our users have α < 1.5 implying
a fairly heavy tail, while most users have α values around 1.6
or 1.7. It is interesting that we do have a small number users
(4) with α > 2 indicating a finite second moment.
We now look more closely at the user mix of the two
components of the model. A value of m close to 0 implies that
the model is dominated by the exponential distribution, (when
m = 0 there is no Pareto component in the model). Similarly
when m is close to 1 the Pareto component dominates the
behavior of the model. To see the range of m values chosen
across our users, we provide a histogram of this mixing factor
in Fig. 5. The frequency on the y-axis denotes the number
of users whose m parameter is that indicated on the x-axis.
Only 3 users picked an m very close to 1, indicating that
the pure Pareto model suites practically none of our users—in
agreement with the Bayes Factors conclusions. Most of the
users have an m parameter less than 0.4, and roughly half
of our users had m < 0.25 indicating the dominance of the
exponential component in the model. The m values are fairly
well spread across the range 0 to 0.5 (roughly). We can also
interpret this range of m as a indication of user diversity, in
that their mixing fractions differ substantially.
VII. C ONCLUSION
To the best of our knowledge, this is the first paper to
study heavy tails of traffic from endhosts, and to study heavytailed network traffic using mixture models, employing model
selection. We have shown strong evidence that the rate of
initiation of flows in end host traffic, over a variety of users,
is almost always heavy-tailed. The scaling parameter varies
widely, between 1.0 and 2.0, and on average the heavy
tailed component makes up about one quarter of the traffic.
We demonstrated that a model selection approach using the
Bayesian Information Criteria (BIC) rather than a goodnessof-fit test, applied to a family of mixture models is both
accurate and versatile. We showed that this versatility was
needed to yield good models for all 270 diverse users. This
underscores the value of a method that does not presuppose a
single distribution model for flow traffic.
Acknowledgements— We wish to thank Eve Schooler and others
at Intel who were closely involved in the collection of the data. Most
of the work in this paper was carried out when three of the authors
were employed at Intel.
R EFERENCES
[1] AGOSTA , J. M., C HANDRASHEKAR , J., C ROVELLA , M., TAFT, N.,
AND T ING , D. Mixture models of endhost network traffic. arXiv
1212.2744 [cs.NI] (2012).
[2] B RESLAU , L., C UE , P., C AO , P., FAN , L., P HILLIPS , G., AND
S HENKER , S. Web caching and zipf-like distributions: Evidence and
implications. In In INFOCOM (1999), pp. 126–134.
[3] C LAUSET, A., S HALIZI , C. R., AND N EWMAN , M. E. J. Power-law
distributions in empirical data. SIAM Review (2009).
[4] C ROVELLA , M. E., AND B ESTAVROS , A. Self-similarity in world
wide web traffic: evidence and possible causes. IEEE/ACM Trans. on
Networking 5, 6 (1997).
[5] C ROVELLA , M. E., AND TAQQU , M. S. Estimating the heavy tail index
from scaling properties. In Methodology and Computing in Applied
Probability (1999).
[6] E VERITT, B. S., AND H AND , D. J. Finite Mixture Distributions.
Chapman and Hall, London, 1981.
[7] F ELDMANN , A. Self-Similar Network Traffic and Performance Evaluation. Chapter 2. John Wiley & Sons, New York, 2002.
[8] F ELDMANN , A., G REENBERG , A., L UND , C., R EINGOLD , N., R EX FORD , J., AND T RUE , F. Deriving traffic demands for operational ip
networks: Methodology and experience. IEEE/ACM Transactions on
Networking 9 (2001), 265–279.
[9] F ELDMANN , A., AND W HITT, W. Fitting mixtures of exponentials
to long-tail distributions to analyze network performance models. In
Proceedings of IEEE INFOCOM’97 (April 1997).
[10] G IROIRE , F., C HANDRASHEKAR , J., I ANNACCONE , G., PAPAGIAN NAKI , K., S CHOOLER , E., AND TAFT, N. The Cubicle Vs. The Coffee
Shop: Behaviora Modes in Enterprise End-Users. PAM (2008).
[11] J. M. M ARIN , K. M., AND ROBERT, C. Bayesian modelling and inference on mixtures of distributions. Tech. rep., CEREMADE, Universite
Paris Dauphine, February 2004.
[12] J ORDAN , M. I., AND JACOBS , R. A. Hierachical mixtures of experts
and the em algorithm. Neural Computation 6 (1994), 181–214.
[13] K ASS , R. E., AND R AFTERY, A. E. Bayes factors. Journal of the
American Statistical Association 90, 430 (1995), 773–795.
[14] L ELAND , W. E., TAQQ , M. S., W ILLINGER , W., AND W ILSON , D. V.
On the self-similar nature of Ethernet traffic. In ACM SIGCOMM.
[15] L I , L., A LDERSON , D., W ILLINGER , W., AND D OYLE , J. C. A
First-Principles Approach to Understanding the Internet’s Router-level
Topology. Proc. ACM SIGCOMM (2004).
[16] L UO , S., L I , J., PARK , K., AND L EVY, R. Exploiting Heavy-Tailed
Statistics for Predictable QoS Routing in Ad-Hoc Wireless Networks.
IEEE Infocom (2008).
[17] PAPAGIANNAKI , K., TAFT, N., AND D IOT, C. Impact of flow dynamics
on traffic engineering design principles. In INFOCOM (2004).
[18] PAXSON , V. Bro: A system for detecting network intruders in real-time.
Computer Networks (1999).
[19] PAXSON , V., AND F LOYD , S. Wide-area traffic: the failure of poisson
modeling. In SIGCOMM (1994).
[20] VAN DER VAART, A. W. Asymptotic Statistics (Cambridge Series in
Statistical and Probabilistic Mathematics). Cambridge University Press,
June 2000.