Environmental Modelling & Software 20 (2005) 547e559
www.elsevier.com/locate/envsoft
Neural network prediction model for fine particulate
matter (PM2.5) on the USeMexico border in
El Paso (Texas) and Ciudad Juárez (Chihuahua)
J.B. Ordieresa,*, E.P. Vergaraa, R.S. Capuzb, R.E. Salazarc
a
Universidad de La Rioja, c/ Luis de Ulloa 20, 26004, Logroño, La Rioja, Spain
Universidad Politécnica de Valencia, Camino de Vera, s/n. 46022 Valencia, Spain
c
Instituto Tecnológico de Mexicali, Av. Tecnológico, s/n Col. Elı´as Calles, 21396 Mexicali B.C., Mexico
b
Received 5 June 2003; received in revised form 4 October 2003; accepted 8 March 2004
Abstract
The daily average PM2.5 concentration forecast is a leading component nowadays in air quality research, which is necessary to
perform in order to assess the impact of air on the health and welfare of every living being. The present work is aimed at analyzing
and benchmarking a neural-network approach to the prediction of average PM2.5 concentrations. The model thus obtained will be
indispensable, as a control tool, for the purpose of preventing dangerous situations that may arise. To this end we have obtained
data and measurements based on samples taken during the early hours of the day. Results from three different topologies of neural
networks were compared so as to identify their potential uses, or rather, their strengths and weaknesses: Multilayer Perceptron
(MLP), Radial Basis Function (RBF) and Square Multilayer Perceptron (SMLP). Moreover, two classical models were built (a
persistence model and a linear regression), so as to compare their results with the ones provided by the neural network models. The
results clearly demonstrated that the neural approach not only outperformed the classical models but also showed fairly similar
values among different topologies. Moreover, a differential behavior in terms of stability and length of the training phase emerged
during testing as well. The RBF shows up to be the network with the shortest training times, combined with a greater stability during
the prediction stage, thus characterizing this topology as an ideal solution for its use in environmental applications instead of the
widely used and less effective MLP.
Ó 2004 Elsevier Ltd. All rights reserved.
Keywords: USeMexico border; Air quality; Particulate matter; PM2.5; Neural Network Modeling; Multilayer Perceptron (MLP); Radial Basis
Function (RBF); Square Multilayer Perceptron (SMLP)
1. Introduction
In the past few years, the U.S.eMexican border has
been an important environmental concern for both
the U.S. and Mexico (Vega et al., 2002). Since 1983,
when the La Paz Agreement1 was signed, both countries
combined efforts to improve the environmental conditions in the region.
* Corresponding author. Universidad de La Rioja, Edificio
Departamental, Luis de Ulloa 20, 26004, Logroño, La Rioja, Spain.
Fax: C34-941-299-478.
E-mail address: joaquin.ordieres@dim.unirioja.es (J.B. Ordieres).
1
Acuerdo de Cooperación para la Protección y Mejoramiento del
Medio Ambiente en la Región Fronteriza.
1364-8152/$ - see front matter Ó 2004 Elsevier Ltd. All rights reserved.
doi:10.1016/j.envsoft.2004.03.010
The U.S.eMexican border stretches over 100 km on
both sides of the boundary line between both countries
(Fig. 1); from the Gulf of Mexico to the Pacific Ocean,
it is 3100 km long from end to end. The terrain therein
includes large deserts, numerous mountain ranges,
shared rivers, wet-lands, large estuaries, aquifers, national parks, and protected areas. Its weather dramatically varies from the Pacific Ocean to Arizona-Sonora,
where harsh conditions rule day and night. Such differences stem from an extraordinary variety of wildlife
which must be preserved.
At present, 11.8 million people live along the border,
almost equally divided among the two countries.
Projected population growth rates in the border region
548
J.B. Ordieres et al. / Environmental Modelling & Software 20 (2005) 547e559
ARIZONA
San Diego
10
0K
m
OKLAHOMA
NEW MEXICO
Yuma
Sunland
Park
Nogales
Douglas
El Paso
TEXAS
SONORA
Ciudad
Juárez
CHIHUAHUA
COAHUILA
PACIFIC OCEAN
GULF OF
MEXICO
Fig. 1. U.S.eMexican border region, as defined under the La Paz Agreement.
exceed anticipated national average growth rates for
each country. If current growth rates remain consistent
with the projected figures, the population in the border
area will increase by 7.6 million people by 2020 (USEPA and SEMARNAT, 2002).
In 1990, 1770 assembly plants were operating in
Mexico. By 2001, this figure is projected to double,
reaching a number of 3800 factories, 2700 of which will
be based in borderline states. Compared with other
regions in Mexico, the border area is characterized by
very low unemployment rates and high wages. In spite
of this economic growth, the region’s infrastructure has
yet to see much improvement. Thus, natural resources,
environment and public health are adversely affected on
both sides of the border.
The U.S. Environmental Protection Agency has
designated cities including San Diego, CA; Yuma, AZ;
Nogales, AZ; Douglas, AZ; Sunland Park, NM; and
El Paso, TX, as non-attainment cities because they
have failed to meet the National Ambient Air Quality
Standards (US-EPA, 1998, 2000b; Mukerjee et al.,
2001, Mukerjee, 2001; Watson and Chow, 2001).
The Border 2012 program proposes goals aimed to
meet the most difficult challenges concerning air quality.
Among these, goal #2 proposes three objectives to
reduce air pollution (US-EPA, 2000a; US-EPA and
SEMARNAT, 2002):
By the year 2003, define baseline and alternative scenarios for emission reduction along the border area.
By the year 2004, based on results from subobjective #1, define specific emission reduction
strategies.
By 2012 or sooner, reduce air emissions, as much as
possible in order to reach national ambient air
quality standards.
Our work deals with pollution by particulate matter in
the central-south border region of the U.S., particularly
in the area of El Paso, Texas, and Ciudad Juárez in
Chihuahua (Mexican central-north border). Air pollution in this area has been a major concern for years
due to the environmental problems already described
(US-EPA, 1996, 1998, 2000b and 2000c; US-EPA and
SEMARNAT, 2002). According to the Texas Commission on Environmental Quality (TCEQ), El Paso is
affected by the air emissions from Ciudad Juárez
(Mexico), and in order to avoid federal penalties, must
comply with the Clean Air Standard by 2007. Instead of
the PM10 classical approach (Chow and Watson, 2001;
Fuller et al., 2002), for this particular area the data must
be based on PM2.5, since natural emissions of minerals
(i.e. sand storms) mask in that parameter the anthropogenic sources. Fortunately, the PM2.5 measures allow us
to identify anthropogenic sources for their later analysis
(Magliano et al., 1999).
Several works have tried to isolate and identify the
sources and composition of this type of air pollution, i.e.
Querol et al., (2001a,b), Lenschow et al., (2001),
Rodrı́guez et al., (2002), Lu (2002); in order to evaluate
and avoid its health-related effects (McDonnell et al.,
2000; Ostro et al., 1999a,b). Prediction models can be
used to this end. Although there are few works on
prediction models for PM2.5, there have been several
attempts to analyze its behavior and evaluate and/or
predict the variations in fine particulate concentrations.
J.B. Ordieres et al. / Environmental Modelling & Software 20 (2005) 547e559
Mukerjee et al. (2001) and Kukkonen et al. (2001)
developed a model based on the correlation between
PMxx and NOx with a resulting fractional bias (FB)
ranging from 0.05 to C0.09 and an index agreement
(IA) from 0.85 to 0.96 to predict yearly average PM10
concentrations, but unfortunately with an index agreement (IA) ranging from 0.45 to 0.65 for hourly average
concentrations. In Kuopio, Finland, Tiittaa et al. (2002)
developed a semi-empirical model and applied Multiple
Regression Analysis with a 0.67 square correlation
coefficient, however, the concentrations predicted were
much lower than the actual measurements. Jorquera
et al. (2001) used box models to analyze air pollution in
Santiago, Chile, and developed a linear model for PM2.5,
PM10 and coarse PM2.5ePM10 fractions. They estimated
a decrease of 50% and 22% for PM10 and PM2.5,
respectively, in the last 10 years.
Nevertheless, in spite of the various types of modeling
methodologies available, the trend of using neural
networks seems to be growing (Reich et al., 1999; Pérez
and Reyes, 2001; Podnar et al., 2002; Chaloulakou et al.,
2003), as they rest firmly upon the classical, statistical
approaches. Some interesting cases in which these methodologies have been previously used for different purposes,
in environmental sciences, and in particular, for atmospherical pollution modeling, are as follows. Pérez et al.
(2000) developed a non-linear prediction model for PM2.5
using neuronal networks, in Santiago de Chile, but the
prediction errors of the model ranged from 30% in the
early hours to 60% in the late hours. Later, Pérez and
Reyes (2002) developed a model to predict daily average
PM10 concentrations 30 h in advance, with a prediction
error of about 20%. Pérez and Trier (2001) have also
succeeded in modeling and forecasting other polluting
agents by means of neural networks in the surrounding
area of Santiago, Chile. Another approach based on
neural networks as well is that of Kolehmainen et al.
(2001), in which periodic components were employed in
order to predict polluting agent concentrations.
However, it is important to keep on working along
this line and to try to find the most accurate prediction
model in order to prevent serious environmental damage,
and health-related problems in susceptible groups such
as children.
In this paper, we introduce three short term PM2.5 nonlinear prediction models by comparing the accuracy provided by different systems (Ho et al., 2002). The classical
ARIMA linear modeling tool becomes unreliable inasmuch as air-pollution sources tend not to behave in a
linear fashion. This is the reason why we used artificial
neural networks (ANN): to find a prediction model with
the lowest possible real prediction error according to the
data. The first prediction model was developed using
a Multilayer Perceptron neural network (MLP), in the
second one, a Square Multilayer Perceptron (SMLP), and
finally, a Radial Basis Function network (RBF).
549
In order to carefully appraise the need for complex
models like the neural networks, their outcomes have
been compared against the ones pertaining to the traditional models. Thus, a persistence model and a linear
regression have been assessed.
These models predict PM2.5 behavior using the data
of the previous 24 h and the first 8 h of the day to
determine PM2.5 concentration in the remaining 16 h in
the area of Paso del Norte. The demand for a prediction
of the average level for the current day in the early
morning hours led us to use only the first 8 h of the day,
thus making available the estimation at 08:00 am, when
typically, the labour day starts.
The tools used in the data pre-processing and
conditioning steps were our own Linux-based software
tools and R, (Ihaka, 1996), and the tools used in the
neural network processing were some Linux-based tools
we had built using the excellent NODELIB library
(Flake, 1998). (Available from http://www.neci.nec.
com/homepages/flake/nodelib/html).
2. Materials and methods
2.1. Data
Geographically, Ciudad Juárez and El Paso are
located next to each other and have a common air
shed. Air pollution sources on either side of the border
have an impact on air quality on both cities. These cities
share the Chihuahua Desert (3710 feet above sea level),
are shielded by mountains on three sides, and enjoy
more than 200 days of sunshine a year. The annual
mean temperatures range from 27 (F (3 (C) in winter
to 100 (F (38 (C) in summer. We assumed that these
data represent the urban environment on both sides of
the border (Ciudad Juárez and El Paso) due to the
reasons previously explained.
The data set was obtained from the urban air quality
monitoring network of the Texas Commission of
Environment Quality (TCEQ). The station used was
El Paso UTEP-C12 (EPA number 48-141-0037). The
station is located in downtown El Paso (31( 46# 06$N,
106( 30# 05$W), at 1158 m above the sea level. This
station is constantly monitoring air and provides the
latest hourly averaged data available in terms of carbon
monoxide, sulfur dioxide, nitric-oxide, nitrogen dioxide,
oxides of nitrogen, ozone, wind speed, resulting wind
speed, resultant wind direction, maximum wind gust,
standard deviation, wind direction, outdoor temperature, dew point temperature, relative humidity, solar
radiation, ultraviolet radiation, PM10 standard conditions and PM2.5 local conditions (Fig. 2).
The data were recorded on an hourly basis (24 h
a day) from 2000 to 2002. According to the data, for
the year 2000, the average PM2.5 concentration was
550
J.B. Ordieres et al. / Environmental Modelling & Software 20 (2005) 547e559
Fig. 2. Urban air quality monitoring network on the USeMexico border in El Paso (Texas) and Ciudad Juárez, Chihuahua (Mexico).
8.27 mg/m3, with a standard deviation of 4.71 mg/m3,
reaching a maximum value of 26.74 mg/m3. For the year
2001, the average was 8.87 mg/m3, with a standard deviation of 7.54 mg/m3, and a maximum value of 87.44 mg/m3.
In turn, the average PM2.5 concentration for the year
2002 was 9.61 mg/m3, with a standard deviation of
7.42 mg/m3, and a maximum value of 69.21 mg/m3.
Because quality standards for PM2.5 have not been
defined in Mexico yet, U.S. NAAQS (National Ambient
Air Quality Standards) guidelines and threshold values,
shown in Table 1, were used in this work.
Other researchers, such as Jorquera et al. (2001),
Rodrı́guez et al. (2001), and Yang (2002), to name
a distinct few, found a seasonal behavior in different
locations. Nevertheless, in El PasoeCiudad Juárez, we
could not identify such a component, but in this case we
observed similarities between both years approximately
on the same days. (Fig. 3).
On relatively similar dates in the three years, we
observed that the highest peaks deviated from the 24-h
average standard (NAAQS). Some of these peaks
reached values including 87.4 mg/m3 (April 10, 2001),
65.0 mg/m3 (March 14, 2002), and 69.21 mg/m3 (April 26,
2002). These peaks coincided with the so-called ‘‘spring
dust storms’’ events reported by the Texas Commission
Environmental Quality, although these events do not
occur only in spring.
As these peaks were due to a meteorological phenomenon (dust storms), we assumed that it would be
difficult to deduce proper rules and avoid these high
PM2.5 concentrations. However, it would be very useful
to predict PM2.5 concentrations before air pollution
Table 1
National Ambient Air Quality Standard (NAAQS) for the US
Particulate (PM2.5)
Annual
arithmetic mean
15.1 mg/m3
24-h average
66 mg/m3
The three-year average of annual arithmetic mean concentrations
from single or multiple community-oriented monitors
is not to be at or above this level.
The three-year average of the annual 98th percentile for each populationoriented monitor within an area is not to be at or above this level.
Primary and
secondary NAAQS
Primary and
secondary NAAQS
Primary NAAQS: the levels of air quality that the EPA judges necessary, with an adequate margin of safety, to protect the public health.
Secondary NAAQS: the level of air quality that the EPA judges necessary to protect the public welfare from any known or anticipated
adverse effects.
551
80
60
40
20
0
Measured PM2.5 24h average (µg/m3)
J.B. Ordieres et al. / Environmental Modelling & Software 20 (2005) 547e559
0
100
200
300
400
500
600
700
Days of 2000 and 2001
Fig. 3. Daily mean PM2.5 value registered in 2000 and 2001.
events occur in order to take preventive actions such as
alerting the population.
A new set of variables was designed (see Table 2)
from the features actually available in the station, with
the aim of complying with the requisite enforced,
i.e. having a daily-average level prediction using only
the samples from the first 8 h of the day. It is interesting to observe the variable corresponding to the Wind
Direction Index (WDI), considered so as to avoid the
discontinuity that it would cause the Wind Direction
variable, if used instead. We define the WDI according
to the following expression:
p
WDI ¼ 1Csin WDC
ð1Þ
4
So as to select, among these features, those more
appropriate to serve as explanatory variables in the
different models, a linear correlation analysis was
initially performed. The analysis revealed that the only
notable relationships were those between the real average
level of PM2.5 and the average corresponding with the
first 8 h, between the latter and the maximum level of
PM2.5, and between the wind direction and the Wind
Direction Index. Fig. 4 shows the results of this analysis.
Table 2
Input variables used in the forecast models
Symbol
Description
Units
Pm25m8
Average levels
of PM2.5 during the first 8 h of the day.
Maximum level
of PM2.5 during the first 8 h of the day.
Average temperature
during the first 8 h of the day.
Average relative
humidity during the first 8 h of the day.
Average wind
speed during the first 8 h of the day.
Average wind
bearing during the first 8 h of the day.
Wind direction index.
mg/m3
Pm25max8
Tam8
Hrm8
Vvm8
rdvm8
rdvsin8
mg/m3
(F
%
Next, a stepwise regression analysis was performed, which measured only one explanatory variable,
the Pm25m8. The residual analysis for such a model
exhibited plainly its low quality. Thus, by virtue of the
large number of samples, the whole set of the designed
variables was considered in the input layer.
2.2. Classical models
The persistence model is an extremely simple model,
with no adjustable parameter; consequently we can
expect nothing but poor precision. Due to its simplicity,
it represents the minimum acceptable quality out of any
other model proposed. Basically, it accepts that the
concentration levels in PM2.5 at a particular time of day
correspond to the value which occurred the day before
at the same hour. That is to say:
yt ¼ xt
ð2Þ
On the other hand, linear regression models can be
applied to both categorical and continuous explanatory
variables in the prediction of continuous variables. The
mathematical formulation is a model in which, for
each observation i, i ¼ 1; .; N, the yi value of the
variable to be explained is linearly fitted according to
the observed values of the samples. The error of the
prediction is represented by 3. The complete model can
be expressed as:
yi ¼ b 0 C
N
X
bj xij C3i
ð3Þ
j¼1
The reader interested in applying linear models in an
atmospheric context may find useful the work of
Castejón et al. (2001).
m/s
2.3. Neural network models
e
e
Artificial neuronal networks (ANN) are powerful
data modeling tools with a proven efficiency in dealing
552
J.B. Ordieres et al. / Environmental Modelling & Software 20 (2005) 547e559
0 10
30
50
20 40 60 80
0
0.5 1.0 1.5 2.0
100 200 300
0.019
0.29
0.11
0.067
0.45
0.054
0.098
0.27
0.16
0.15
0.79
0.028
0.12
0.14
0.15
0.061
0.079
0.33
0.24
0.15
0.19
0.18
0.089
0.11
0.11
0.80
0
0.63
20
50
real
0
vvm8
5 10
0
20 40
pm25m8
150 300
tam8
30 50 70
20
60
hrm8
0
rdvm8
0
0.058
40
80
pm25max8
0.5
1.5
rdvsin8
0
20
40
60
0
5
10
15
30
50
70
0 20 40 60 80
Fig. 4. Linear relations between variables.
with complex problems, particularly in the fields of
association, classification and prediction. Many researchers have shown that neural networks can solve
almost any problem more efficiently than the traditional
modeling and statistical methods (Hornik, et al., 1989;
Masters, 1993). In this paper, we compare three
neuronal network architectures in order to obtain the
best possible efficiency in the outcomes of Multilayer
Perceptron (MLP), Square Multilayer Perceptron
(SMLP), and Radial Basis Function (RBF) analyses.
Typically, a neural network is composed of a set of
neurons laid out in layers. Commonly, those layers are
classified as input layer, hidden layers and output layer.
Some neural networks do not have hidden layers and
are used as more linear statistical techniques. These
networks (with input and output layers only) are useful
in many linear or semi-linear applications, but in
general, it is difficult to get accurate results in nonlinear problems (McCullagh and Nelder, 1989). We
understand that particulate matter is clearly a non-linear
E1
w1
S1
w2
E2
S2
Transference
function
input
layer
Fig. 5. Artificial neural network structure.
Hidden
layer
output
layer
J.B. Ordieres et al. / Environmental Modelling & Software 20 (2005) 547e559
problem, at least in the current region of the study.
However, there are no specific rules to define how many
hidden layers a neural network must have (Fig. 5).
For MLP and RBF neural networks, one hidden layer
with a large number of neurons usually yield good results
(Hornik et al., 1989; Hornik, 1993; Bishop, 1995).
A similar situation occurs in terms of the quantity of
data needed to obtain the best training results from the
network. The neural network has the capacity to
‘‘learn’’ new skills and make predictions from new data;
that is to say, it generalizes observed behavior, rather
than simply memorizing a given training data set. As
a ‘‘rule of the thumb’’, the quantity of data necessary in
a neural network analysis would be, for a noise-free
quantitative target variable, twice as many training cases
as weights, while for a very noisy target variable, 30
times as many training cases as weights may not be
enough. The high number of input variables frequently
presented in these models implies an even higher number
of weights to traindif the networks have a fully
connected topologydthus turning the overwhelming
size of the training data set into one of the main
obstacles associated with this methodology.
2.3.1. Multilayer Perceptron (MLP) and Square
Multilayer Perceptron (SMLP)
MLP is the most common and successful neural
network architecture with feed-forward network (FFN)
topologies (three layers of neurons: input layer, hidden
layer and output layer).
Each layer uses a linear combination function. The
inputs are fully connected to the hidden layer, which is
fully connected to the outputs.
These networks are used to create a model and map
the input to the output using historical data. Run-on in
the model can be used to produce an output, even if the
desired output is at that point unknown. These networks
are called supervised networks because they need a
desired output to learn (supervised training). For MLP
applications in the atmospheric sciences, see Gardner
and Dorling (1998).
The most common supervised training algorithm is
the so-called backpropagation (Haykin, 1994). With
backpropagation, the input data are repeatedly presented to the neural network. With each presentation,
the output of the neural network is compared to the
desired output and an error is computed. This error is
then fed back (backpropagated) to the neural network
and used to adjust the weights such that the error
decreases with each iteration and the neural model
gets closer and closer to the desired output. This process is known as ‘‘training’’. This kind of training is
relatively easy and offers good support for prediction
applications.
It is generally accepted that the characteristics of
a correctly designed MLP network are, though worthy
553
of comparison, not better than the characteristics that
can be obtained from classical statistical techniques.
Nevertheless, MLP networks outperform classical
statistical techniques in their much shorter time of
development, and their adaptive capacity when faced
with changes.
Generalized Additive Models (GAM), (Hastie and
Tibshirani, 1990), are relevant statistical factors. Within
this framework, Flake suggested an architecture similar
to GAM, in which each hidden unit had a parametric
activation function which could change from a projection-based to a radial function in a continuous way
(Flake, 1998). He called this architecture SMLP.
2.3.2. Radial Basis Function (RBF)
The architecture of RBF neural networks is less wellknown than that of the MLP, although it has been used
in time series modeling predictions with good results.
The input for this kind of architecture is a feed-forward
network (i.e., an MLP neuron network), but every
unit of the hidden layer has a ‘‘radial basis function’’
(statistical transformation based on Gaussian distribution function). Like MLP neural networks, RBF
networks are suited for applications such as pattern
discrimination and classification, interpolation, prediction, forecasting, and process modeling.
The ‘‘basis function’’ (often a Gaussian function) has
the parameters ‘‘centre’’ and ‘‘width’’. Usually each
unit of the network has a different central value. The
center of the basis function is a vector of numbers Ci
of the same size as the inputs to the unit. Normally,
there is a different center for each unit in the neural
network.
In the first computation, the ‘‘radial distance’’ is
computed for every unit between the input vector and
the center of the basis function using the Euclidean
distance algorithm. In other words, the structure of the
RBF has non-linear inputs (input vector) for every data
(unit) and the radial distance is computed between the
input vector and the center of the basis function. See
Fig. 6.
The input of the RBF neural network is non-linear
whereas the output is linear. Because of these properties,
RBF neural networks can model complex maps more
easily and quickly than MLP (Haykin, 1994). For
further details and a more complete description, see Tao
(1993).
2.4. Missing data
Missing data were carefully managed because there
were relevant periods without information on one
parameter, and therefore we could not apply the
popular filling strategy described by Dixon (1979). In
our particular case, such samples were discarded from
the database, even when it was clear that this strategy
554
J.B. Ordieres et al. / Environmental Modelling & Software 20 (2005) 547e559
Unit Inputs
Radial Distance
Computation
Basis Function
(Gaussian
Distribution)
∑
Input layer
Unit Output
Output layer
Hidden layer
Fig. 6. Radial Basis Function structure.
could reduce the number of patterns available. We
did not strive particularly to identify the outliers in
the data that exceeded the first validation level since
this level was considered to be sufficient in terms of
quality.
Another interesting index is the Correlation Factor
(R2), defined as:
PN
2
½ P O
i¼1 i
2
R ¼ P
ð6Þ
N
2
i¼1 ½Oi O
where N is the number of observations, Oi is the observed
is the average value of
value, Pi is the predicted value, y O
the explained variable on N observations.
So as to avoid distortions in the residuals due to the
random initialization of the neuron weights during the
training phase, each topology has been trained one
hundred times, thus obtaining the distribution of the
errors.
2.5. Error measurement
In this kind of problem, there are many statistical
indices to provide a numerical description of the
accuracy of the estimates. One of those is the Root
Mean Square Error (RMSE). This is calculated according to Eq. (4)
!1=2
N
1X
2
RMSE ¼
½Pi Oi
ð4Þ
N i¼1
3. Results and discussion
The objective was to model the PM2.5 daily average
concentration using the mean PM2.5, wind speed,
maximum level of PM2.5 during those first 8 h, wind
direction, humidity, and temperature values registered
in the first 8 h of the day, insofar as these were assumed
to be the main parameters needed to predict it. The
model was based on a neural network and we used
where N is the number of observations, Oi is the
observed value, and Pi is the predicted value.
Another is the Mean Absolute Error (MAE), defined as:
MAE ¼
N
1X
Oi Pi
N i¼1
ð5Þ
0,05
Var=0.25
Var=0.50
Var=0.75
Var=1.50
Var=3.50
Mean absolute error
0,045
0,04
0,035
0,03
0,025
0,02
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
No. hidden nodes
Fig. 7. MAE index evolution as a function of the number of hidden nodes for the different variances assessed.
28
29 30
555
J.B. Ordieres et al. / Environmental Modelling & Software 20 (2005) 547e559
MLP, SMLP, and RBF basic architectures for two
reasons:
Linear model
50
Measured
10
20
30
40
Predicted
0
PM2.5 24h. average (µg/m3)
60
1. To evaluate the three kinds of neural computing
topologies.
2. And particularly, to identify the best prediction
model.
According to the assumption of the major parameters
and the neural network architectures used, the neural
network that was required had 7 nodes on the input
layer and one node on the output layer. The quantity of
neurons on the hidden layer was estimated by performing experimental analysis on the errors. So as to
0
50
100
200
150
250
300
350
60
MLP model
50
Measured
10
20
30
40
Predicted
0
PM2.5 24h. average (µg/m3)
Days of 2002
0
50
100
200
150
250
300
350
50
SMLP model
Measured
10
20
30
40
Predicted
0
PM2.5 24h. average (µg/m3)
60
Days of 2002
0
50
100
200
150
250
300
350
60
50
RBF model, Var=1.5
Measured
10
20
30
40
Predicted
0
PM2.5 24h. average (µg/m3)
Days of 2002
0
50
100
150
200
250
300
Days of 2002
Fig. 8. Daily times series of the measured and predicted concentrations of PM2.5.
350
556
J.B. Ordieres et al. / Environmental Modelling & Software 20 (2005) 547e559
action. Using the same chart structure, Fig. 9 shows the
errors against the number of hidden units. It was
apparent that a relatively low number of hidden neurons
were giving a kind of weighted-mean estimator, and
therefore, more members were necessary to represent the
real model. To illustrate this particular aspect, an SMLP
with 20 hidden neurons was used. As expected, the
operation was slower with the test data.
As for Radial Basis Function (RBF) networks, they
were trained with different variance values in the
gaussian radial function: 025, 0.5, 0.75, 1.5, and 3.5.
Clearly seen in Fig. 7 is the evolution of the MAE index
as a function of the hidden nodes for the different
variances assessed.
Fig. 8 shows the time series corresponding to the
daily average values of the PM2.5 observed during 2002
and the fitted values corresponding to different models.
The behavior of the linear model lacks refinement, so to
RMSE=1.58.10-3
MAE=3.58.10-3
R2=0.3805
15
20
MLP
5
10
Predicted (µg/m3)
20
15
10
Predicted (µg/m3)
40
30
20
30
50
RMSE=0.27
MAE=0.02
R2=0.3983
25
Linear model
RMSE=0.41
MAE=4.41
R2=0.0806
25
60
Persistence
0
0
0
5
10
Predicted (µg/m3)
determine the number of hidden nodes, the MAE index
was used.
In all the training and test cycles, and as far as there
were no observable data clusters, the 2000 and 2001 data
sets were used as sources of the training data, and the
2002 data were used as the test data.
To handle the over-fitting problem, a regularization
scheme is applied in neural networks.
In the case of the Multilayer Perceptron (MLP), a BPM
(Back Propagation with Moment) was used as a learning
rule. The number of nodes on the hidden layer was tested
from 1 to 30. The stability reached, with the exception of
some ‘like resonant’ topologies, was exceptional. In order
to monitor the quality of the prediction over the time axis,
Fig. 8 shows the real predicted values by MLP with 18
hidden nodes for the test data, i.e. 2002 data.
In the case of the SMLP network, some tests were
carried out to see the Squared Multilayer Perceptron in
40
50
60
0
10
30
40
50
60
0
10
30
25
30
25
20
15
30
40
50
60
RBF Var=0.5
RMSE=1.23.10-3
MAE=2.96.10-2
R2=0.4554
0
0
0
5
5
5
10
RMSE=1.32.10-3
MAE=3.21.10-2
R2=0.4611
10
Predicted (µg/m3)
R =0.3712
20
Measured (µg/m3)
RBF Var=0.25
RMSE=1.28.10-3
MAE=3.08.10-3
2
15
20
25
SMLP
Predicted (µg/m3)
20
Measured (µg/m3)
20
30
15
20
Measured (µg/m3)
10
10
Predicted (µg/m3)
0
0
10
20
30
40
50
60
0
10
20
30
40
50
60
0
10
Measured (µg/m3)
20
30
40
50
60
50
60
Measured (µg/m3)
30
Measured (µg/m3)
RBF Var=1.5
10
20
30
40
Measured (µg/m3)
50
60
25
20
15
5
0
0
5
0
RMSE=1.27.10-3
MAE=2.91.10-2
R2=0.4143
10
Predicted (µg/m3)
25
20
15
10
Predicted (µg/m3)
RBF Var=3.5
RMSE=1.3.10-3
MAE=2.95.10-2
R2=0.4380
5
10
15
20
25
RMSE=1.23.10-3
MAE=2.93.10-2
R2=0.4451
0
Predicted (µg/m3)
30
RBF Var=0.75
0
10
20
30
40
50
60
0
Measured (µg/m3)
Fig. 9. Scatter plots of the measured and predicted concentrations of PM2.5.
10
20
30
40
Measured (µg/m3)
557
J.B. Ordieres et al. / Environmental Modelling & Software 20 (2005) 547e559
speak, insofar as it proves reluctant to defect from the
low ranges. The MLP, SMLP and RBF models enforce
the predictions to follow more precisely the observed
reality. In Fig. 9, we have shown the dot plots of the
observed measures vs. the fitted values, marking both
the identity line and the trend obtained.
According to the results obtained, it is clear that the
RBF neural network is particularly suited for our aims,
with predictions as good as those obtained by the MLP,
a lower training effort and more stability ( probably due
to the bounded, derivative property of RBF networks).
The results have also proven that SMLP does not
have any particular advantage in our case compared to
MLP in terms of prediction.
In Fig. 10, we have shown the histograms of the
residuals for the different models assessed.
4. Conclusions
Frequency
30 40 50
20
40
10
0
0
40
-50
-40
-30
-20
-10
Residuals
0
100 120
RBF Var=0.25
Frequency
40
60
0
-40
-30
-20 -10
Residuals
0
10
RBF Var=1.5
-50
-40
-30
-20
-10
Residuals
0
10
RBF Var=0.5
-50
-40
-30
-20
-10
Residuals
0
10
RBF Var=3.5
20
0
0
0
20
40
20 40
RBF Var=0.75
50
0
-50
10
100 120
-20 -10
Residuals
40
Frequency
40 60 80
-30
Frequency
60 80 100 120 140
-40
20
30
Residuals
20
20
0
20
0
-50
10
0
Frequency
40 60 80
80
SMLP
-10
10
100
20
0
Residuals
MLP
60
Lineal
20
20
0
-20
Frequency
40 60 80
100 120
-40
Frequency
60 80 100 120 140
70
Frequency
60 80 100 120 140
Persistence
40
Frequency
60 80 100 120 140
A short time PM2.5 prediction model has been built
taking into account a large number of samples from
a non-linear data set with a high degree of internal noise.
The model can be used as a tool for short time control
and planning in difficult areas like the U.S.eMexican
border in El PasoeCiudad Juárez. The comparative
analysis of neural network architectures has provided
very interesting results, comparing RBF networks with
MLP and SMLP networks, which are the most commonly used.
The MLP network provides acceptable predictions, in
spite of the difficult environmental conditions of the
location (i.e. even though PM2.5 data were considered,
samples still show hard peaks of inmissions seasonally,
mainly due to the common presence of dust storms in this
area). The SMLP network shows a very similar behavior,
although a few more neurons in the hidden layer are
necessary to obtain the same error found in the former
case. This fact has a direct influence on the sample size
necessary for the right training, as previously mentioned.
Finally, the RBF network shows the best behavior, with
the shortest training times and best stability. These
results suggest that the widely used MLP should be
replaced by the more convenient RBF network.
-50
-40
-30
-20 -10
Residuals
0
10
-50
-40
Fig. 10. Histograms of the residuals for the different models assessed.
-30
-20
-10
Residuals
0
10
60
558
J.B. Ordieres et al. / Environmental Modelling & Software 20 (2005) 547e559
Table 3
Performance statistics for the models
Performance
measure
Model
Topology
Validation
set
RMSE
Persistence
Lineal regression
NNMLP
NNSMLP
NNRBF Var=0,25
NNRBF Var=0.50
NNRBF Var=0.75
NNRBF Var=1.5
NNRBF Var=3.5
e
e
7-18-1
7-20-1
7-21-1
7-20-1
7-16-1
7-10-1
7-10-1
0.41
0.27
1.58 ! 103
1.28 ! 103
1.32 ! 103
1.23 ! 103
1.23 ! 103
1.30 ! 103
1.27 ! 103
MAE
Persistence
Lineal regression
NNMLP
NNSMLP
NNRBF Var=0,25
NNRBF Var=0.50
NNRBF Var=0.75
NNRBF Var=1.5
NNRBF Var=3.5
e
e
7-18-1
7-20-1
7-21-1
7-20-1
7-16-1
7-10-1
7-10-1
4.41
0.02
3.58 ! 103
3.08 ! 103
3.21 ! 103
2.96 ! 103
2.93 ! 103
2.95 ! 103
2.91 ! 103
R2
Persistence
Linear regression
NNMLP
NNSMLP
NNRBF Var=0,25
NNRBF Var=0.50
NNRBF Var=0.75
NNRBF Var=1.5
NNRBF Var=3.5
e
e
7-18-1
7-20-1
7-21-1
7-20-1
7-16-1
7-10-1
7-10-1
0.0806
0.3983
0.3805
0.3712
0.4611
0.4554
0.4451
0.4380
0.4143
Nevertheless, it seems necessary to point out that the
ANN, in general, have limitations inherent to their own
structure. The main handicap is the impossibility of
generalizing what is trained for the 24-h range; the perhour context, for instance. The location concept is also
a key element; we must train the networks with data sets
corresponding to the periods and locations in order to
perform and analyse.
Table 3 summarizes the results of the analyses.
Acknowledgements
We gratefully acknowledge the funding support
from the SUPERA program (MEXICO), and Spanish
Ministerio de Ciencia y Tecnologı´a grant DPI2001-1408,
which made this work possible.
References
Bishop, C.M., 1995. Neural Networks for Pattern Recognition. Oxford
University Press, Oxford.
Castejón Limas, M., Ordieres Meré, J.B., De Cos Juez, F.J., Martı́nez
de Pisón, F.J., 2001. Control de Calidad, Metodologı́a para el
análisis previo a la modelización de datos en procesos industrials,
Universidad de La Rioja.
Chaloulakou, A., Saisana, M., Spyrellis, N., 2003. Comparative
assessment of neural networks and regression models for forecasting summertime ozone in Athens. The Science of the Total
Environment 313, 1e13.
Chow, J.C., Watson, J.G., 2001. Zones of representation for PM10
measurements along the US/Mexico border. The Science of the
Total Environment 276, 49e68.
Dixon, J.K., 1979. Pattern recognition with partly missing missing
data. IEEE Transactions on Systems Man and Cybernetics SMC-9
10, 617e621.
Flake, G.W., 1998. Square unit augmented, radially extended,
multilayer perceptrons. In: Orr, G., Müller, K.-R., Caruana, R.
(Eds.), Tricks of the Trade: How to Make Algorithms Really
Work, LNCS State-of-the-Art-Surveys. Springer-Verlag.
Fuller, G.W., Carslaw, D.C., Lodge, H.W., 2002. An empirical
approach for the prediction of daily mean PM10 concentrations.
Atmospheric Environment 36, 1431e1441.
Gardner, M.W., Dorling, S.R., 1998. Artificial neural networks (the
multi-layer perceptron)da review of applications in the atmospheric sciences. Atmospheric Environment 32, 2627e2636.
Hastie, T.J., Tibshirani, R.J., 1990. Generalized Additive Models.
Chapman & Hall, London.
Haykin, S., 1994. Neural Networks: A Comprehensive Foundation.
Prentice-Hall, Englewood Cliffs, NJ.
Ho, S.L., Xie, M., Goh, T.N., 2002. A comparative study of neural
network and Box-Jenkins ARIMA modeling in time series
prediction. Computers & Industrial Engineering 42, 371e375.
Hornik, K., Stinchcombe, M., White, H., 1989. Multilayer feedforward networks are universal approximators. Neural Networks 2,
359e366.
Hornik, K., 1993. Some new results on neural network approximation.
Neural Networks 6, 1069e1072.
Ihaka, R., Gentleman, R., 1996. R: a language for data analysis and
graphics. Journal of Computational and Graphical Statistics
299e314.
Jorquera, H., Palma, W., Tapia, J., 2000. An intervention analysis of
air quality data at Santiago, Chile. Atmospheric Environment 34,
4073e4084.
Kukkonen, J., Harkonen, J., Karppinen, A., Pohjola, M., Pietarila, H.,
Koskentalo, T., 2001. A semi-empirical model for urban PM10
concentrations, and its evaluation against data from an urban
measurement network. Atmospheric Environment 35, 4433e4442.
Kolehmainen, M., Martikainen, H., Ruuskanen, J., 2001. Neural
networks and periodic components used in air quality forecasting.
Atmospheric Environment 35, 815e825.
Lenschow, P., Abraham, H.-J., Kutzner, K., Lutz, M., Preuß, J.-D.,
Reichenbächer, W., 2001. Some ideas about the sources of PM10.
Atmospheric Environment 35 (Supplement No. 1), S23eS33.
Lu, H.-C., 2002. The statistical characters of PM10 concentration in
Taiwan area. Atmospheric Environment 36, 491e502.
Magliano, K.L., Hughes, V.M., Chinkin, L.R., Coe, D.L., Haste, T.L.,
Kumar, N., Lurmann, F.W., 1999. Spatial and temporal variations
in PM10 and PM2.5 source contributions and comparison to emissions during the 1995 integrated monitoring study. Atmospheric
Environment 33, 4757e4773.
Masters, T., 1993. Practical Neural Network Recipes in CCC.
Academic Press, San Diego.
McCullagh, P., Nelder, J.A., 1989. Generalized Linear Models, second
ed. Chapman and Hall, London.
McDonnell, W.F., Nishino-Ishikawa, N., Peterson, F.F., Chen, L.H.,
Abbey, D.E., 2000. Relationship of mortality with the fine and
coarse fraction of long term ambient PM10 concentrations in
nonsmokers. Journal of Exposure Analysis and Environmental
Epidemiology 10, 427e436.
Mukerjee, S., 2001. Selected air quality trends and recent air pollution
investigations in the USeMexico border region. The Science of the
Total Environment 276, 1e18.
J.B. Ordieres et al. / Environmental Modelling & Software 20 (2005) 547e559
Mukerjee, S., Shadwick, D.S., Smith, L.A., Somerville, M.C., Dean,
K.E., Bowser, J.J., 2001. Techniques to assess cross-border air
pollution and application to a USeMexico border region. The
Science of the Total Environment 276, 205e224.
Ostro, B.D., Eskeland, G.S., Sánchez, J.M., Feyzioglu, T., 1999a. Air
pollution and health effects: a study of medical visits among
children in Santiago, Chile. Environmental Health Perspective 107,
69e73.
Ostro, B., Chesnut, L., Vichit-Vadakan, N., Laixuthai, A., 1999b. The
impact of particulate matter an daily mortality in Bangkok,
Thailand. Journal of Air and Waste Management Association 49,
100e107.
Pérez, P., Trier, A., Reyes, J., 2000. Prediction of PM2.5 concentrations several hours in advance using neural networks in
Santiago, Chile. Atmospheric Environment 34, 1189e1196.
Pérez, P., Reyes, J., 2001. Prediction of particulate air pollution using
neural techniques. Neural Computing and Applications 10,
165e171.
Pérez, P., Trier, A., 2001. Prediction of NO and NO2 concentrations
near a street with heavy traffic in Santiago, Chile. Atmospheric
Environment 35, 1783e1789.
Pérez, P., Reyes, J., 2002. Prediction of maximum of 24-h average of
PM10 concentrations 30 hours in advance in Santiago, Chile.
Atmospheric Environment 36, 4555e4561.
Podnar, D., Koračin, D., Panorska, A., 2002. Application of artificial
neural network to modeling the transport and dispersion of tracers
in complex terrain. Atmospheric Environment 36, 561e570.
Querol, X., Alastuey, A., Rodrı́guez, S., Plana, F., Mantilla, E., Ruiz,
C.R., 2001a. Monitoring of PM10 and PM2.5 around primary
particulate anthropogenic emission sources. Atmospheric Environment 35, 845e858.
Querol, X., Alastuey, A., Rodrı́guez, S., Plana, F., Ruiz, C.R., Cost,
N., Massagué, G., Puig, O., 2001b. PM10 and PM2.5 source
apportionment in the Barcelona metropolitan area, Catalonia,
Spain. Atmospheric Environment 35, 6407e6419.
Reich, S.L., Gómez, D.R., Dawidowski, L.E., 1999. Artificial neural
network for the identification of unknown air pollution sources.
Atmospheric Environment 33, 3045e3052.
Rodrı́guez, S., Querol, X., Alastuey, A., Kallos, G., Kakaliagou, O.,
2001. Saharan dust contributions to PM10 and TSP levels in
Southern and Eastern Spain. Atmospheric Environment 35,
2433e2447.
Rodrı́guez, S., Querol, X., Alastuey, A., Mantilla, E., 2002. Origin of
high summer PM10 and TSP concentrations at rural sites in
Eastern Spain. Atmospheric Environment 36, 3101e3212.
Tao, K.M., 1993. A Closer Look at the Radial Basis Function (RBF)
Networks. Conference Record of the 27th Asilomar Conference on
Signals, Systems and Computers, vol. 1. IEEE Comput. Soc. Press,
Los Alamitos, CA, pp. 401e405.
Tiittaa, P., Raunemaa, T., Tissari, J., Yl-Tuomi, T., Leskinen, A.,
Kukkonen, J., Harkonen, J., Karppinen, A., 2002. Measurements
and modelling of PM2.5 concentrations near a major road in
Kuopio, Finland. Atmospheric Environment 36, 4057e4068.
US-EPA, 1996. Air Quality Criteria for Particulate Matter.
EPA/600/P-95/001F. US Environment Protection Agency, Washington, DC.
US-EPA, 1998. USeMexico Border XXI Program. United StateseMexico Border Environmental Indicators EPA909-R-98-001. US
Environment Protection Agency, Washington, DC.
559
US-EPA, 2000a. Summary of Selected Environmental Indicators from
the U.S.eMexico Border XXI Program: Progress Report
1996e2000. EPA 909-R-00-002. US Environment Protection
Agency, Washington, DC.
US-EPA, 2000b. National Air Quality and Emissions Trends Report
1998. EPA 454-R-00-003. US Environment Protection Agency,
Washington, DC.
US-EPA, 2000c. Latest Findings on National Air Quality: 1999 Status
and Trends. EPA-454-F-00-002. US Environment Protection
Agency, Washington, DC, 00e002.
US-EPA, SEMARNAT, 2002. FRONTERA 2012: Programa Ambiental Mexico-Estados Unidos, US Environment Protection
Agency, Washintong, DC. Secretarı́a de Medio Ambiente y
Recursos Naturales de México.
Watson, J.G., Chow, J.C., 2001. Source characterization of major
emission sources in the Imperial and Mexicali Valleys along the
US/Mexico border. The Science of the Total Environment 276,
33e47.
Yang, K.-L., 2002. Spatial and seasonal variation of PM10 mass
concentrations in Taiwan. Atmospheric Environment 36,
3403e3411.
Further readings
Chen, L., Sandhu, H.S., Angle, R.P., McDonald, K.M., Myrick, R.H.,
2000. Rural particulate matter in Alberta, Canada. Atmospheric
Environment 34, 3365e3372.
Gauvin, S., Reungoat, P., Cassadou, S., Dechenaux, J., Momas, I.,
Just, J., Zmir, D., 2002. Contribution of indoor and outdoor
environments to PM2.5 personal exposure of children VESTA
study. The Science of the Total Environment 297, 175e181.
Hien, P.D., Binh, N.T., Truong, Y., Ngo, N.T., 1999. Temporal
variations of source impacts at the receptor, as derived from air
particulate monitoring data in Ho Chi Minh City, Vietnam.
Atmospheric Environment 33, 3133e3142.
Hien, P.D., Bac, V.T., Tham, H.C., Nhan, D.D., Vinh, L.D., 2002.
Influence of the meteorological conditions on PM2.5 and PM2.5-10
during the monsoon season in Hanoi, Vietnam. Atmospheric
Environment 36, 3473e3484.
Hopke, P.K., 1985. Receptor Modeling in Environmental Chemistry.
Wiley, New York, 319 pp.
McClellan, O.R., 2001. Setting ambient air quality standards for
particulate matter, University of New Mexico, 13701 Quaking
Aspen Place NE, Albuquerque, NM 87111, USA, Toxicology 00,
119 pp.
Morel, B., Yeh, S., Cifuentes, L., 1999. Statistical distributions for air
pollution applied to the study of the particulate problem in
Santiago. Atmospheric Environment 33, 2575e2585.
Salcedo, R.L.R., Alvim Ferraz, M.C.M., Alves, C.A., Martins, F.G.,
1999. Time-series analysis of air pollution data. Atmospheric
Environment 33, 2361e2372.
Vega, E., Reyes, E., Sanchez, G., Ortiz, E., Ruiz, M., Chow, J.,
Watson, J., Edgerton, S., 2002. Basic statistics of PM2.5 and PM10
in the atmosphere of Mexico City. The Science of the Total
Environment 287, 167e176.