Abstract
As web documents proliferate fast, the need fo real-time computation of change (edit script) between web documents increases. Though fast heuristic algorithms have been proposed recently, the qualities of edit scripts produced by them are not satisfactory. In this paper, we propose X-tree Diff+ which produces better quality of edit scripts by introducing a tuning step based on the notion of consistency of matching. We also add copy operation to provide users more convenience. Tuning and copy operation increase matching ratio drastically. X-tree Diff+ produces better quality of edit scripts and runs fast equivalent to the time complexity of fastest heuristic algorithms.
The present research was conducted by the research fund of Dankook University in 2005.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Chawathe, S., Rajaraman, A., Molina, H.G., Widom, J.: Change Detection in Hierarchically Structured Information. In: Proc. of ACM SIGMOD Int’l. Conf. on Management of Data, Montreal (June 1996)
Selkow, S.M.: The tree-to-tree editing problem. Information Proc. Letters 6, 184–186 (1977)
Tai, K.: The tree-to-tree correction problem. Journal of the ACM 26(3), 422–433 (1979)
Lu, S.: A tree-to-tree distance and its application to cluster analysis. IEEE TPAMI 1(2), 219–224 (1979)
Wang, J.T., Zhang, K.: A System for Approximate Tree Matching. IEEE TKDE 6(4), 559–571 (1994)
Chawathe, S., Molina, H.G.: Meaningful Change Detection in Structured Data. In: Proc. of ACM SIGMOD 1997, pp. 26–37 (1997)
Chawathe, S.: Comparing Hierarchical Data in External Memory. In: Proc. of the 25th VLDB Conf., pp. 90–101 (1999)
Lim, S.J., Ng, Y.K.: An Automated Change-Detection Algorithm for HTML Documents Based on Semantic Hierarchies. In: The 17th ICDE, pp. 303–312 (2001)
Curbera, Epstein, D.A.: Fast Difference and Update of XML Documents. In: XTech 1999, San Jose (March 1999)
Cobéna, G., Abiteboul, S., Marian, A.: Detecting Changes in XML Documents. In: The 18th ICDE (2002)
Wang, Y., DeWitt, D.J., Cai, J.Y.: X-Diff: An Effective Change Detection Algorithm for XML Documents. In: The 19th ICDE (2003)
Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM journal of Computing 18(6), 1245–1262 (1989)
Rivest, R.: The MD4 Message-Digest Algorithm. MIT and RSA Data Security, Inc. (April 1992)
Kim, D.A., Lee, S.K.: Efficient Change Detection in Tree-Structured Data. In: Human.Society@Internet Conf. 2003, pp. 675–681 (2003)
Aboulnaga, A., Naughton, J.F., Zhang, C.: Generating Synthetic Complex-structured XML Data. In: Proceedings of the Fourth International Workshop on the Web and Databases, WebDB (2001)
NIAGARA Query Engine, http://www.cs.wisc.edu/niagara/
Kim, D.A.: Change Detection and Management in XML Documents. Ph.D. thesis, Dankook University, Korea (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lee, S.K., Kim, D.A. (2006). X-Tree Diff+: Efficient Change Detection Algorithm in XML Documents. In: Sha, E., Han, SK., Xu, CZ., Kim, MH., Yang, L.T., Xiao, B. (eds) Embedded and Ubiquitous Computing. EUC 2006. Lecture Notes in Computer Science, vol 4096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11802167_104
Download citation
DOI: https://doi.org/10.1007/11802167_104
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-36679-9
Online ISBN: 978-3-540-36681-2
eBook Packages: Computer ScienceComputer Science (R0)