Peyton-Manning Dataset Fbprophet Daily Australia Temperature Dataset Beijing PM2.5 Dataset
Peyton-Manning Dataset Fbprophet Daily Australia Temperature Dataset Beijing PM2.5 Dataset
Peyton-Manning Dataset Fbprophet Daily Australia Temperature Dataset Beijing PM2.5 Dataset
validation sets are chosen. This model is then retrained on the training data for the
corresponding BM-fold. The average error across all test sets provides a robust
estimate of the model performance with this forecast horizon.
For our benchmark, we ran the models on two different forecast horizons, 1-day
ahead and 7-day ahead. We chose consistent benchmark settings suitable for all
algorithms, including the slower ones, and used a single CV fold to speed up runtime
across algorithms.
We chose datasets with at least two years of training data so that the models could
accurately estimate yearly seasonality patterns. The models were run on the following
datasets:
The number of periods between successive test sets and total number of splits are
chosen to ensure the following:
1. The predictive performance of the models is measured over a year to ensure that
the test sets provide a representative sample across time properties, e.g.,
seasonality, holidays.
2. The test sets are completely randomized in terms of time features. For daily data,
we avoid setting "periods between splits" to any multiple of 7, because that would
result in the training and test sets always ending on the same day of the week.
3. The total computation time is minimized while maintaining the previous points.
For daily data, setting “periods between successive test sets” to 1 and number of
splits to 365 is a more thorough CV procedure. But it massively increases the
total computation time. Hence, we set periods between successive test sets to 16
and the number of splits to 25.
As shown in Table 1, Silverkite performs better out-of-the-box for 1-day and 7-day
forecast horizons. On average, Silverkite and Auto-Arima run 4 times faster than
Prophet. Note that the average test MAPE values are high due to values close to 0 in
the Beijing PM2.5 dataset.
12 of 14 8/7/21, 12:56 AM
Greykite: A flexible, intuitive, and fast forecasting library | LinkedIn Eng... https://engineering.linkedin.com/blog/2021/greykite--a-flexible--intuitive...
Conclusion
The Greykite library provides a fast, accurate, and highly customizable algorithm
(Silverkite) for forecasting. Greykite also provides intuitive tuning options and
diagnostics for model interpretation. It is extensible to multiple algorithms, and
facilitates benchmarking them through a single interface. Currently, Greykite also
supports Facebook Prophet (fbprophet), and we plan to add other useful open-source
algorithms in the future to give users more options to choose from, through a unified
interface.
Acknowledgements
The Greykite library is developed by the Data Science Research and Productivity
team at LinkedIn. Special thanks to Rachit Kumar and Saad Eddin Al Orjany for their
contributions to this project, and to our close collaborators in Data Science,
Engineering, SRE, FP&A, and BizOps for adopting the library. In particular, Ashok
Sridhar, Mingyuan Zhong, and Jerry Shan provided valuable ideas and feedback. We
also thank our management team Ya Xu and Sofus Macskássy for their continued
encouragement and support.
Topics
data science, machine learning, Data, Open Source
13 of 14 8/7/21, 12:56 AM