[go: up one dir, main page]

0% found this document useful (0 votes)
77 views2 pages

Peyton-Manning Dataset Fbprophet Daily Australia Temperature Dataset Beijing PM2.5 Dataset

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 2

Greykite: A flexible, intuitive, and fast forecasting library | LinkedIn Eng... https://engineering.linkedin.com/blog/2021/greykite--a-flexible--intuitive...

validation sets are chosen. This model is then retrained on the training data for the
corresponding BM-fold. The average error across all test sets provides a robust
estimate of the model performance with this forecast horizon.

For our benchmark, we ran the models on two different forecast horizons, 1-day
ahead and 7-day ahead. We chose consistent benchmark settings suitable for all
algorithms, including the slower ones, and used a single CV fold to speed up runtime
across algorithms.

We chose datasets with at least two years of training data so that the models could
accurately estimate yearly seasonality patterns. The models were run on the following
datasets:

1. Peyton-Manning Dataset from fbprophet package

2. Daily Australia Temperature Dataset, Temperature column

3. Beijing PM2.5 Dataset, pm2.5 concentration column

The number of periods between successive test sets and total number of splits are
chosen to ensure the following:

1. The predictive performance of the models is measured over a year to ensure that
the test sets provide a representative sample across time properties, e.g.,
seasonality, holidays.

2. The test sets are completely randomized in terms of time features. For daily data,
we avoid setting "periods between splits" to any multiple of 7, because that would
result in the training and test sets always ending on the same day of the week.

3. The total computation time is minimized while maintaining the previous points.
For daily data, setting “periods between successive test sets” to 1 and number of
splits to 365 is a more thorough CV procedure. But it massively increases the
total computation time. Hence, we set periods between successive test sets to 16
and the number of splits to 25.

We used out-of-the-box configuration for Auto-Arima (pmdarima) and Facebook


Prophet (fbprophet). The Silverkite out-of-the-box configuration was also chosen prior
to running the benchmark. It uses ridge regression to fit the model and contains linear
growth, appropriate seasonalities (e.g., monthly, quarterly, and yearly seasonality for
daily data), automatic changepoint detection, holiday effects, autoregression, and
seasonality interaction terms with the trend and changepoints. 

As shown in Table 1, Silverkite performs better out-of-the-box for 1-day and 7-day
forecast horizons. On average, Silverkite and Auto-Arima run 4 times faster than
Prophet. Note that the average test MAPE values are high due to values close to 0 in
the Beijing PM2.5 dataset.

12 of 14 8/7/21, 12:56 AM
Greykite: A flexible, intuitive, and fast forecasting library | LinkedIn Eng... https://engineering.linkedin.com/blog/2021/greykite--a-flexible--intuitive...

Table 1. Benchmark comparison of Silverkite against Auto-Arima and Prophet

This is an initial benchmark on a few public datasets. We hope to benchmark


additional datasets, forecast horizons, and data frequencies that better match our
industry applications. (Please reach out to us on github if you have a public dataset to
recommend!) For example, unlike the weather datasets used above, LinkedIn’s
metrics tend to show strong changepoint and event/holiday effects with temporal
dependencies. Thus, the benefits of Silverkite are more apparent for our internal
datasets; for example for (i) Revenue forecasts 1-day ahead and 7-day ahead and (ii)
Weekly Active User forecast 2-weeks ahead, Silverkite decreased the MAPE by more
than 50% and 30%, respectively.

Conclusion

The Greykite library provides a fast, accurate, and highly customizable algorithm
(Silverkite) for forecasting. Greykite also provides intuitive tuning options and
diagnostics for model interpretation. It is extensible to multiple algorithms, and
facilitates benchmarking them through a single interface. Currently, Greykite also
supports Facebook Prophet (fbprophet), and we plan to add other useful open-source
algorithms in the future to give users more options to choose from, through a unified
interface.

We have successfully applied Greykite at LinkedIn for multiple business and


infrastructure metrics use cases. If you are interested, please visit GitHub or PyPI to
try it out.

Acknowledgements

The Greykite library is developed by the Data Science Research and Productivity
team at LinkedIn. Special thanks to Rachit Kumar and Saad Eddin Al Orjany for their
contributions to this project, and to our close collaborators in Data Science,
Engineering, SRE, FP&A, and BizOps for adopting the library. In particular, Ashok
Sridhar, Mingyuan Zhong, and Jerry Shan provided valuable ideas and feedback. We
also thank our management team Ya Xu and Sofus Macskássy for their continued
encouragement and support.

Topics
data science, machine learning, Data, Open Source

Related story Related story


2233))**44))**44%%5533''66""7788%%99::""33,,%%+
+""; ;<<""33%%88''%%==))>>""??%%@@AA""** HH''33$$==??%%FF%%CCIIJJ%%8833$$**((==$$88))''**//%%$$**$$==,,(())((//%%$$**KK%%33""LL33))88""
((''BB3377))**44%%''BB33%%CCAA$$33DD%%))**""EEBB$$==))88,,%%FFGG22%%88""((88))**44%%==))<<33$$33,, ""**44))**""%%>>''33%%; ;''KK""33**%%KK$$88$$%%==$$DD""MM''BB((""((

13 of 14 8/7/21, 12:56 AM

You might also like