1. Introduction
The electrical power system is a dynamic system that is known as one of the most complex systems designed by humans. This system, consisting of power system generators, transmission and distribution lines, transformers, loads, and protective devices, must always maintain its stability, including its conditions of failure [
1]. Moreover, this system is equipped with smart technologies and keeps up with developments in communication and intelligent power control technologies while preserving its smart grid feature [
2]. One of the most important cases in the operation of power systems is to maintain a supply–demand balance to instantly meet the electrical load demand without any interruption. This demand-dependent case requires a sufficient energy production scheme at all times [
3]. Therefore, having prior knowledge of electrical energy consumption will enable power utility companies to correctly manage each distributed energy source in a user-friendly manner to advance the future power market. In addition, this situation is extremely important in preventing sudden power system outages and losses caused by supply–demand imbalances. In this regard, accurate load estimation is of great interest, especially in power system management [
4]. The accuracy and reliability of the estimations directly affect the operators of power systems in the deregulated power market environment, resulting in substantial savings in operating and maintenance costs. Moreover, this will indirectly affect the end-user, and electricity consumption costs will decrease. Furthermore, the majority of the operationally decisive processes, such as operational energy planning, dynamic electrical energy consumption pricing in deregulated power market environments, generation and maintenance scheduling, energy transactions, optimal unit commitment, and economic power dispatch, need accurate load forecasting [
5].
Electrical load forecasting (ELF) is studied in three different time intervals, namely short-term, medium-term, and long-term, depending on these power system application areas. Long-term forecasts have periods ranging from months to years, medium-term forecasts have periods ranging from one week to several months, and short-term forecasts have periods ranging from an hour to a week. On the other hand, power system applications that generally require long-term forecasts are power system planning, those that require mid-term applications are maintenance and fuel supply planning, and hourly and daily power system operations, management, and planning require short-term applications. ELF can generally be performed in two levels: aggregated and building (individual) levels. With regard to the building level, load estimation is a priority in energy management systems. Aggregated-level forecasting gives an estimate of the total load demand for a group of users at such particular levels as system level [
6,
7], regional level [
8], and feeder level [
9]. As data-driven methods, ML-based methods rely on historical load data aggregated over previous periods. These methods are of great interest to most researchers, but their accuracy can decrease when utilized on new electrical load data [
10].
Instantaneous forecasts concerning hour-ahead electrical demand are utilized by energy providers in the realization of fluctuations and power outages that are unpredictable by day-ahead and week-ahead electrical demand prediction. Although operationally decisive processes such as optimal unit commitment, economic power dispatch, and maintenance scheduling need hour-ahead demand prediction, energy sector stakeholders have not gained much from ML techniques in hour-ahead demand prediction in comparison with the commissioning of month-ahead and day-ahead demand prediction models [
11]. This subject has not been extensively studied in the literature since it has not been utilized frequently in practice until today. Related articles mostly focus on day-ahead load forecasting or next-day peak load forecasting.
It is possible to classify methods of hour-ahead ELF as a form of STLF into three categories: traditional, artificial intelligence (AI), and hybrid. A sufficient accuracy may not be achieved when using time-series approaches as a traditional method since they only consider the electrical load, whereas multiple linear regression (MLR) models have been commonly utilized for STLF studies even in recent decades because of their satisfactory accuracy [
12]. Furthermore, they perform poorly in terms of accuracy, performance, and speed compared to AI-based methods, especially ML methods used in power system applications. The most popular ML-based methods in STLF are ANNs [
13,
14,
15], support vector machines (SVMs) [
16], regression tree (RTs) [
17], MLR [
12,
18], k-nearest neighbor (KNN) algorithms [
19], recurrent neural networks (RNNs) [
20,
21], convolutional neural networks (CNNs) [
22,
23], and different ML ensemble models [
4,
24,
25] and hybrid models, which come through deficits of conventional and AI forecasting models [
26,
27,
28,
29].
On the other hand, short-term load sequences have a highly fluctuating and stochastic nature, which makes it difficult to obtain accurate results in STLF. Factors affecting this character of load sequences are varied and complex, such as climatic, geographical, energy prices, and holidays. The original load sequence has a large amount of non-periodic information due to its complexity. If raw data are directly inserted into the deep learning model, the complexity of the hidden layer neuron coefficients will rise and the fitting performance will also decrease [
30]. At this point, feature selection plays an important role in removing information complexity. Advanced ML methods such as mutual information (MI), ReliefF (RF), and correlation-based selection (CFS) can be used to interpret the grade of candidate features and give weights to features [
31].
In the remainder of this article, the literature review and problem statement are presented in
Section 1.
Section 2 gives datasets in detail and information on the methodologies used in the experimental phase of the study, and then outlines the development processes of proposed methods. In
Section 3, experimental results and a benchmarking study are presented with the support of related tables. Finally,
Section 4 summarizes the findings of the suggested model, emphasizing its originality, and offers recommendations for further research.
1.1. Literature Review
There are a variety of research studies regarding STLF, but most of them are related to day-ahead, week-ahead, and longer periods, rather than for hour-ahead horizon targets. This section reviews selected literature associated with hour-ahead LF.
Chitalia et al. applied DL algorithms to hour-ahead and day-ahead building-level ELF, which displayed a superior performance, with an improvement of 20–45% compared to the latest results available in the literature. Moreover, a 30% improvement was also observed in hour-ahead results by the authors with available 15 min resolution data [
21].
A multi-scaled RNN-based DL technique was proposed for one-hour-ahead ELF by Bui et al. Regarding the approach, observation values were classified into two scales—short (6-hour duration) and long-term (1-week duration)—and RNNs were fed with these [
32].
Rafati et al. proposed LTF-RF-MLP, a new hybrid method, and applied it on New England ISO data for the study of one-hour-ahead forecasting. This hybrid method extracts innovative features from load data by using load tracing features, and then uses RF for selecting the most suitable features and MLP as the forecaster. Unlike others, this method only uses load data, but is more accurate than the benchmarking models by 0.42% in terms of the yearly mean absolute percentage error (MAPE) [
27].
Day and hour-ahead forecasting models were developed for several types of buses (urban, suburban, and industrial) based on an ANN by Panapakidis et al. In addition to the ANN model, a time-series clustering model was used to improve the accuracy of the ANN model. According to the overall test results, hour-ahead forecasting displayed a better accuracy compared with day-ahead forecasting due to the operation of the ANN itself. When the ANN model was taken as reference, a 5.23% average improvement in the MAPE value was achieved with the developed model [
14].
Dagdougui et al. carried out day and hour-ahead ELF on real-world data from a campus in downtown Montreal. An ANN model was utilized for the aforementioned study to analyze the ANN performance in many aspects, such as back-propagation algorithm, hidden layer, and input number. It was observed that increasing the number of neuron layers has a significant contribution on the learning process while making no significant difference in MAPE. The Bayesian regularized learning algorithms and hour-ahead predictions displayed a higher accuracy and performance compared with the Levenberg–Marquardt learning algorithm and day-ahead prediction, respectively [
15].
The ensemble re-forecast method was proposed for the prediction of one-hour and day-ahead electrical loads over a state-wide load dataset by Kaur et al. With the proposed method, the hour-ahead and day-ahead forecasts displayed a 47% improvement in MAPE, whereas the re-forecasts displayed a considerable improvement during off-peak hours and a small improvement during peak hours. The ensemble re-forecast method was proposed for the prediction of one-hour and day-ahead electrical loads over a state-wide load dataset. The hour-ahead and day-ahead forecast results obtained with the proposed method displayed an improvement in MAPE of up to 47%, whereas the re-forecasts showed a significant improvement in off-peak hours and a minor improvement in on-peak hours [
24].
One-hour-ahead forecasting was realized by a new data-driven approach in terms of the energy Internet by Du et al. In this context, the spatiotemporal correlation-based multidimensional input automatically adds miscellaneous information concerning load variations followed by data preprocessing based on low-rank decomposition and load gradients, thus clarifying the features to learn more targeted features. Finally, the developed 3D CNN-GRU model was implemented, and a good accuracy was achieved, with a mean absolute error (MAE) of 2.14% [
28].
Laouafi et al. developed three electrical-demand-forecasting models by performing the adaptive neuro-fuzzy inference system approach in parallel load series. For forecasting the one-hour-ahead load, the real-time quart-hourly metropolitan France electricity load was used by implementing these three models. One of the models showed the best performance, with a nearly 0.5 absolute percentage error for 56% of the forecasted loads. The authors have concluded that a highly accurate model can be developed with fewer data through a combined model of a neural network and fuzzy logic [
33].
Fallah et al. carried out a study on computational intelligence techniques used for load forecasting. They concluded that the performances of the methods first depend on datasets, and that hybrid methods outperform single methods in terms of accuracy [
34].
1.2. Problem Statement
This paper provides research results regarding hour-ahead ELF, a subfield of STLF, while investigating other state-of-the-art papers in the field. The estimation performance of ELF methods is firstly dependent on the datasets used in the estimation procedure [
34]. Hence, datasets with different structures from two countries, the Czech Republic and Australia, were used in the present study. The hour-ahead electrical consumption is a forecast using fine-tuned time-series prediction models that make use of supervised and ensemble learning methods. Hour-ahead MLR, CNN, and RNN forecasters were built for the first model, and hour-ahead SVR, KNN, and tree forecasters were built for the second model, with the goal of shortening the training period. Additionally, feature selection was carried out independently for each analyzed method on two datasets in order to observe the impact of each feature on each dataset. For feature selection, a backward-eliminated exhaustive method based on the ANN performance on the validation set was suggested. The proposed ensemble models and state-of-the-art techniques were then evaluated for their performance using the following accuracy metrics using normalized data and the z-score method [
35]: MAPE [
36] and MAE [
37]. Since these metrics have been commonly used in other similar studies, they were preferred for comparison of results. Experiments were conducted using Matlab r2021a, providing rich support for ML.
2. Methodology
Selecting a proper forecaster structure is the first important step when designing a forecasting system. Many different forecaster architectures are presented here for this study on one-hour-ahead load forecasting. These forecasters used the last six days’ one-hour resolution data as input. The number of previous days (i.e., six days) taken to form the input sequence was coarsely determined by trying the performance of a few possible values on the validation set. Here, the sequence length ‘s’ was 144 for hourly resolution data (Czech data) and 288 for 30 min resolution data (Australian data). Adjustable parameters that belong to forecasters are stated with respect to the protocol represented in
Section 4 (cross-validation) by using another subset of the data.
2.1. Input Datasets
The Czech data form a one-hour resolution dataset that includes information about year, day, month, hour, wind power plant (WPP) energy generation, photovoltaic power plant (PVPP) energy generation, load consumption including pumping load, temperature, direct horizontal radiation, and diffuse horizontal radiation features from 1 January 2012 to 31 December 2016 in the Czech Republic. All of the information in the dataset was considered for predicting the next hour’s electrical energy demand value. Thus, 11 separate features were used as input.
Ausdata form a 30 min resolution (48 measurements per day) dataset with data on day, month, year, hour, dry bulb temperature, dew point, electricity consumption price, electrical energy consumption, wet bulb temperature, and humidity from 1 January 2006 to 31 December 2010 in Australia. Electrical load consumption and price data were provided by the Australian Energy Market Operator, and meteorological data were obtained from the Bureau of Meteorology for Sydney and New South Wales. Henceforth, the next hour’s electrical energy demand value was predicted by taking into consideration all of the information in the dataset. As such, nine separate features were used as input.
For data preprocessing, month, day, and hour values were divided by corresponding maximum values and one was added to the resulting values. As for the other values, preprocessing was handled by subtracting the mean and then dividing to standard deviation, ending up with zero mean and unit variance.
2.2. Backward-Eliminated Exhaustive Feature Selection Approach
A backward-eliminated exhaustive approach based on the performance of ANN on the validation set was proposed for feature selection. The algorithm was initialized with the complete set of features. A reasonable amount of best features (i.e., six features) was first selected as an initial elimination using the backwards feature elimination. Backward-elimination works by eliminating the worst performing feature by trying to leave out each feature one by one and finding the best performing subset of one less feature. This process was repeated until the remaining number of features was 6.
In the second stage, best feature subset was determined utilizing exhaustive feature selection, which only works in
(except the empty subset, thus 63 trials in total) iterations. Performance of candidate features
was decided using the MAE metric. The proposed algorithm is explained in Algorithm 1.
Algorithm 1: Feature selection |
Initialize the feature set R as the complete set of features whiledo where the function gives the MAE of the given model and X is the observations matrix in the validation set with the denoted feature subset end while where is the power set of , thus denotes any subset of R except ∅ Output r
|
2.3. Feedforward ANN
Neural networks are appropriate structures for the development of forecasting methods, owing to their nonlinear architecture and capability to learn from data. They were early attempts toward the development of load forecasting methods [
38].
This work utilized two different ANNs. First ANN had 1 hidden layer of 12 hidden units. Second ANN had five hidden layers. Hyperbolic tangent sigmoid transfer function was used in hidden layers and pure linear transfer function was used in output layers.
Figure 1 demonstrates the ANNs. The trained ANN function was
, where
c was either 1 or 2 for some test sample
.
2.4. CNN
CNN is a feed-forward neural network and the neurons in this neural network structure are created by imitating human neurons. CNN, which is a deep learning method, has been widely applied in areas such as visual and audio processing and natural language processing. It is employed to process the power system data topology within the scope of the ELF [
39].
CNN was utilized in the present study to learn continuous load values for forecasting task. This study utilized two independent CNNs (
and
) and an ensemble of these two forecasters. CNN input size was
(number of features times sequence length).
Figure 2 demonstrates the proposed CNN architectures. The trained CNN function was
, where
c is one of 1, 2, 3, or 4 for some test sample
.
2.5. MLR
The goal of regression analysis is to specify a function that defines the correlation among the variables in such a way that the value of the dependent variables can be forecasted using a set of independent variable values [
12]. MLR describes a linear relationship between a dependent variable
y and one or more independent variables (features) matrix
X of observations:
where
is the
ith response,
is the
kth coefficient,
is the
ith observation on the
kth element of input sequence,
s is the sequence length,
f is the number of features,
n is the number of observations, and
denotes the
ith noise term. The fitted linear function can be expressed as:
where
is the estimated response and
s are the coefficients, which are learned by minimizing the mean squared difference between
y and
. The multiple linear forecaster function
for a test sample
of length
is thus given as:
2.6. SVM
SVMs were presented by Vapnik in 1995 for analyzing machine learning problems such as regression, pattern recognition, and density estimation. SVM as a learning mechanism applies a high-dimensional feature space and yields functions that are defined in terms of a subset of support vectors. SVM can model complex structures with a few support vectors. SVMs comprise support vector classification (SVC), which defines classification with SVMs, and support vector regression (SVR), which defines regression with SVMs [
40].
In this study, cost was chosen as 0.08 and was chosen as 0.5 without a detailed grid search for hyper-parameter fine-tuning. The trained SVR function was for some test sample . Hyper-parameters were coarsely optimized using the validation set by evaluating a few values instead of fine-tuning. Aim was to prevent overfitting to the validation set and to be more general and adaptive to the possible best feature subset found by the proposed feature selection algorithm.
2.7. RT
Because of the division circumstances, the decision tree clarifies the correlation between input and output variables and provides a visual representation of the problem structure to be worked with a binary tree; that is, the decision tree can find out the rules embedded in the input and output data, and, therefore, this method can be included in data mining techniques. Decision trees can be classified into two groups: classification and regression trees. In classification, the classification phase is handled with a discrete number, whereas regression trees care for one with an uninterrupted output variable [
41].
Hyper-parameters of the tree (such as minimum sample leaf size, maximum number of splits, and number of variables to sample) were automatically optimized from each training set in the current study by way of Bayesian optimization using an inner cross-validation within the current training set, where the current training set was also determined from an outer cross-validation as explained in
Section 3.1. The trained tree function was
for some test sample
.
2.8. KNN
In the KNN method, the k-nearest neighbor of an unknown pattern is found and classified by considering the labels of these neighbors. For regression problems, KNNs are utilized by taking the average of the ground-truths of
k nearest neighbors [
7].
It was determined in the present study, by applying a few tests on validation set (first partition of Czech data-
Section 3.1), that the simplest KNN architecture of
and the Euclidean distance performs well as the metric; hence, this simple architecture was preferred for KNN tests. The trained KNN function was denoted as
for some test sample
.
2.9. RNN
An RNN is a type of ANN that uses sequential information as a result of directed connections between units in each layer. RNNs are known as ‘recurrent’ since they perform the same duty for each member [
42]. They are able to identify patterns in time series data. These models process an input sequence at a time and keep hidden units in their state vector, which comprise knowledge about the history of all of the former components of the series. Nevertheless, RNNs have a gradient vanishing problem and hence cannot maintain long-range dependences [
43].
Here, fully connected layer gave one output of sequence length
s (e.g., 144 for Czech data) as only one feature (i.e., load) was forecasted.
Figure 3 demonstrates the proposed RNN. The trained RNN function was
for some test sample
.
2.10. Proposed Forecaster Models
Depending on the characteristics of the forecasters, two different models were utilized. Input was either represented as image of size (sequence length × feature size for CNN and RNN) or one-dimensional vector (others).
forecasted the next hour’s single value. The models were trained with all training data, with inputs of past 6 days’ feature values to predict just the next hour, obviously including all in-between hours (e.g., using previous 144 h’s measurements, forecast the value at 14:00). One forecaster instance of a kind suffices to generate the
. Hour-ahead ANN, MLR, CNN, and RNN forecasters were built following this approach.
is shown in
Figure 4.
Similar to
,
forecasted the next hour’s single value and all training data were utilized during training. Different from
, a specialized forecaster was trained depending on the hour to be forecasted. In total, there were
H different forecasters
that built up
. Hour-ahead SVR, KNN, and tree forecasters were built following this approach. The main motivation was to reduce the training time dramatically while preserving performance.
is illustrated in
Figure 5.
Hourly Ensemble
Best ensemble for each hour was determined independently by finding the weight set
for each hour
h to apply a hourly weighted sum of forecasts by different forecasters:
where
is the output of forecaster model
k (e.g., CNN, RNN…) for hour
h,
is the corresponding weight (
and
), and
N represents the number of different forecasters.
s were determined using the validation set. Forecaster ensemble can be helpful for improving the same kind of single forecasters with different hyper-parameters (e.g., ANNs, RNNs, CNNs) by taking the average of forecasts.
3. Experimental Results and Benchmarking
3.1. Cross-Validation
The days were split into five equal folds in size for cross-validation. One of these folds was reserved for testing, whereas the remaining four folds were employed for training. The first fold was used for ensemble weight optimization, selective ensemble hours, model hyper-parameter optimization, and feature selection, which are development goals here. However, that fold was utilized for training the models in the remaining tests, whereas the other folds took part in training. The results eventually consist of average and standard deviation values for four cross-validation tests.
3.2. Feature Selection
The initially selected features for the Czech dataset were day, hour, PVPP, load, wind speed (10 m), and radiation (diffuse horizontal). The best subsets of these features were exhaustively found as day, hour, load, and radiation (diffuse horizontal). The initially selected features for Ausdata were day, month, hour, dew point, load, and humidity. The best subsets of these features were exhaustively identified as hour and load.
3.3. Proposed Models’ Results with Full Set of Features and Selected Features
The results with the full set of features and selected features are presented in
Table 1 and
Table 2 for Czech data and Ausdata, respectively.
It was observed that, when the hour-ahead load forecasting results obtained with the Czech Republic dataset were examined, although the highest MAPE and MAE values were obtained with the KNN method, both when all features were used and when a feature selection was performed, the lowest values for these metrics were obtained with the suggested hourly ensemble method. Whereas the MAPE and MAE values closest to the suggested method were obtained using the SVR method when all features were used, the closest MAPE and MAE values were obtained with the and (13) ensemble method when feature selection was made.
The results obtained with the Australia dataset are slightly better compared with those obtained with the Czech dataset. The primary reason for this is considered to be the fact that this dataset has a 30 min resolution (48 measurements per day). Here, the highest MAPE and MAE values were again obtained with the KNN method, both when all features were used and for feature selection, whereas the lowest values for these metrics were obtained using the suggested hourly ensemble method. The MAPE and MAE values closest to the suggested method were obtained using the and (11) ensemble method, and the closest MAPE and MAE values obtained with feature selection were also obtained here with the and (13) ensemble method.
3.4. Benchmarking
As can be seen from
Table 3, a comparison of the present study with some related studies in the literature reveals that the lowest values for MAPE were observed in our study. Although the studies performed with data at other levels were also included in the comparison table, the results differ slightly due to the internal dynamics of the data at each level. According to the table, MAPE results are in a wide range, as the dynamics of energy consumption vary significantly inside the building in studies using loads at the building level [
15,
21]. The reason for this is that the consumer behavior, which we refer to as the internal dynamics of the data, is different at each data level. Whereas the data at the aggregated level follow a smoother load consumption pattern, the data at the building level have a higher variability and fluctuations due to the consumer load profile in the residence [
44]. Among the other studies, ref. [
14] bus-load-level data are the closest to the aggregated-level data used in this study in terms of the data structure. However, the MAPE value is 5.23% here as well, which is considerably higher than that of the present study. Although the lowest MAPE value of 1.47% was reached in [
24] by utilizing data with a characteristic comparable to ours, it is considerably higher than the value determined in the present study.
4. Conclusions and Future Work
This article presents a proper approach utilizing ensemble and supervised learning techniques in hour-ahead load forecasting that tries to improve dynamic-data-driven smart power system applications. The proposed ensemble approach was thoroughly tested and evaluated by comparing it with the state-of-the-art STLF methods.
For hour-ahead load forecasting with Czech data, SVR performs the best, followed by MLR and ANNs. Deep models perform worse, indicating that simpler models are appropriate as the problem gets simpler. However, the CNN performs better than the RNN in DL methods. This phenomenon is more extreme in Ausdata, where MLR performs the best, followed by ANNs. However, the proposed ensemble method may still provide the best results for both data sets. The difference between mean as well as random guesses becomes more than 20-fold compared to the hourly ensemble, showing the significance of the proposed forecasters and ensembles.
On the other hand, feature selection was realized for each data. Although many feature selection methods have been used in previous ELF studies so far, the proposed backward-eliminated exhaustive approach has not been used on ELF before. It was observed that the process significantly improves the hour-ahead forecasting results by almost 1%. Although the data sets used in this study were different from each other in terms of the time step, they also had very different and various features. It should be noted that the effect of distinctive features on the estimation results is different for each data set due to feature selection. Therefore, making the feature selection special for each dataset while studying ELF will produce more effective results.
A more comprehensive comparison can be made for the proposed ensemble approach with other state-of-the-art methods in future studies. Moreover, implementing or modifying the proposed method for more diverse energy forecasting studies, such as solar energy forecasting, electricity price forecasting, and other load forecasting horizons such as very-short-term and day-ahead, is planned.