Abstract
Many organizations require detailed and real time analysis of their business data for effective decision making. OLAP is one of the commonly used methods for the analysis of static data and has been studied by many researchers. OLAP is also applicable to data streams, however the requirement to produce real time analysis on fast and evolving data streams is not possible unless the data to be analysed reside on memory. Keeping in view the limited size and the volatile nature of the memory, we propose a novel architecture AOLAP which in addition to storing raw data streams to the secondary storage, maintains data stream’s summaries in a compact memory-based data structure. This work proposes the use of piece-wise linear approximation (PLA) for storing such data summaries corresponding to each materialized node in the OLAP cube. Since the PLA is a compact data structure, it can store the long data streams’ summaries in comparatively smaller space and can give approximate answers to OLAP queries.
OLAP analysts query different nodes in the OLAP cube interactively. To support such analysis by the PLA-based data cube without the unnecessary amplification of querying errors, inherent in the PLA structure, many nodes should be materialized. However, even though each PLA structure is compact, it is impossible to materialize all the nodes in the OLAP cube. Thus, we need to select the best set of materialized nodes which can give query results with the minimum approximation errors within the given memory bound. This problem is NP-hard. Hence this work also proposes an optimization scheme to support this selection. Detailed experimental evaluation is performed to prove the effectiveness of the use of PLA structure and the optimization scheme.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
TPC-H. http://www.tpc.org/tpch/.
References
Han, J., Chen, Y., et al.: Stream cube: an architecture for multi-dimensional analysis of data streams. Distrib. Parallel Databases 18(2), 173–197 (2005)
Duan, Q., Wang, P., Wu, M.X., Wang, W., Huang, S.: Approximate query on historical stream data. In: Hameurlain, A., Liddle, S.W., Schewe, K.-D., Zhou, X. (eds.) DEXA 2011. LNCS, vol. 6861, pp. 128–135. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23091-2_12
Sadoghi, M., Bhattacherjee, S., Bhattacharjee, B., Canim, M.: L-store: a real-time OLTP and OLAP system. In: CoRR (2016)
Elmeleegy, H., Elmagarmid, A.K., Cecchet, E., Aref, W.G., Zwaenepoel, W.: Online piece-wise linear approximation of numerical streams with precision guarantees. Proc. VLDB Endow. 2(1), 145–156 (2009)
Xie, Q., Zhu, J., et al.: Efficient buffer management for piecewise linear representation of multiple data streams. In: ACM CIKM, pp. 2114–2118 (2012)
Luo, G., Yi, K., Cheng, S.W., Li, Z., Fan, W., He, C., Mu, Y.: Piecewise linear approximation of streaming time series data with max-error guarantees. In: 2015 IEEE 31st ICDE, pp. 173–184 (2015)
Wei, Z., Luo, G., Yi, K., Du, X., Wen, J.-R.: Persistent data sketching. In: ACM SIGMOD 2015, pp. 795–810 (2015)
Harinarayan, V., Rajaraman, A., Ullman, J.D.: Implementing data cubes efficiently. In: ACM SIGMOD, pp. 205–216 (1996)
O’Rourke, J.: An on-line algorithm for fitting straight lines between data ranges. Commun. ACM 24(9), 574–578 (1981)
O’Neil, P., O’Neil, E., Chen, X., Revilak, S.: The Star Schema Benchmark and Augmented Fact Table Indexing. In: Nambiar, R., Poess, M. (eds.) TPCTC 2009. LNCS, vol. 5895, pp. 237–252. Springer, Heidelberg (2009). doi:10.1007/978-3-642-10424-4_17
Garofalakis, M., Kumar, A.: Deterministic wavelet thresholding for maximum-error metrics. In: ACM PODS (2004)
Karras, P., Mamoulis, N.: One-pass wavelet synopses for maximum-error metrics. In: PVLDB, pp. 421–432 (2005)
De Rougemont, M., Cao, P.T.: Approximate answers to OLAP queries on streaming data warehouses. In: Proceedings of the Fifteenth International Workshop on Data Warehousing and OLAP, pp. 121–128 (2012)
Talebi, Z.A., Chirkova, R., Fathi, Y., Stallmann, M.: Exact and inexact methods for selecting views and indexes for OLAP performance improvement. In: EDBT (2008)
Zhang, R., Koudas, N., Ooi, B.C., Srivastava, D., Zhou, P.: Streaming multiple aggregations using phantoms. VLDB J. 19(4), 557–583 (2010)
Koch, C., Ahmad, Y., Kennedy, O., Nikolic, M., Nötzli, A., Lupei, D., Shaikhha, A.: DBToaster: higher-order delta processing for dynamic, frequently fresh views. VLDB J. 23(2), 253–278 (2014)
Lazaridis, I., Mehrotra, S.: Capturing sensor-generated time series with quality guarantees. In: ICDE, pp. 429–440 (2003)
Olston, C., Jiang, J., Widom, J.: Adaptive filters for continuous queries over distributed data streams. In: ACM SIGMOD, pp. 563–574 (2003)
Gray, J., Chaudhuri, S., et al.: Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Min. Knowl. Discov. 1(1), 29–53 (1997)
Acknowledgment
This research was partly supported by the program “Research and Development on Real World Big Data Integration and Analysis” of the RIKEN, Japan.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Shaikh, S.A., Kitagawa, H. (2017). Approximate OLAP on Sustained Data Streams. In: Candan, S., Chen, L., Pedersen, T., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10178. Springer, Cham. https://doi.org/10.1007/978-3-319-55699-4_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-55699-4_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55698-7
Online ISBN: 978-3-319-55699-4
eBook Packages: Computer ScienceComputer Science (R0)