Abstract
The obtention of a set of homogeneous classes of pages according to the browsing patterns identified in web server log files can be very useful for the analysis of organization of the site and of its adequacy to user needs. Such a set of homogeneous classes is often obtained from a dissimilarity measure between the visited pages defined via the visits extracted from the logs. There are however many possibilities for defined such a measure. This paper presents an analysis of different dissimilarity measures based on the comparison between the semantic structure of the site identified by experts and the clustering constructed with standard algorithms applied to the dissimilarity matrices generated by the chosen measures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
CELEUX, G., DIDAY, E., GOVAERT, G., LECHEVALLIER, Y. and RALAM-BONDRAINY, H. (1989): Classification Automatique des Données. Bordas, Paris.
CHEN, C. (1998): Generalized similarity analysis and pathfinder network scaling. Interacting with Computers, 10:107–128.
FOSS, A., WANG, W. and ZAÏANE, O.R. (2001): A non-parametric approach to web log analysis. In Proc. of Workshop on Web Mining in First International SIAM Conference on Data Mining (SDM2001), pages 41–50, Chicago, IL, April 2001.
GOWER, J. and LEGENDRE, P. (1986): Metric and euclidean properties of dissimilarity coefficients. Journal of Classification, 3:5–48.
HUBERT, L. and ARABIE, P. (1985): Comparing partitions. Journal of Classification, 2:193–218.
KAUFMAN, L. and ROUSSEEUW, P.J. (1987): Clustering by means of medoids. In Y. Dodge, editor, Statistical Data Analysis Based on the L1-Norm and Related Methods, pages 405–416. North-Holland, 1987.
ROSSI, F., EL GOLLI, A. and LECHEVALLIER, Y. (2005): Usage guided clustering of web pages with the median self organizing map. In Proceedings of XIIIth European Symposium on Artificial Neural Networks (ESANN 2005), pages 351–356, Bruges (Belgium), April 2005.
TANASA, D. and TROUSSE, B. (2004): Advanced data preprocessing for intersites web usage mining. IEEE Intelligent Systems, 19(2):59–65, March–April 2004. ISSN 1094-7167.
TANASA, D. and TROUSSE, B. (2004): Data preprocessing for wum. IEEE Potentials, 23(3):22–25, August–September 2004.
VAN RIJSBERGEN, C.J. (1979): Information Retrieval (second ed.). London: Butterworths.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin · Heidelberg
About this paper
Cite this paper
Rossi, F., De Carvalho, F., Lechevallier, Y., Da Silva, A. (2006). Dissimilarities for Web Usage Mining. In: Batagelj, V., Bock, HH., Ferligoj, A., Žiberna, A. (eds) Data Science and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg . https://doi.org/10.1007/3-540-34416-0_5
Download citation
DOI: https://doi.org/10.1007/3-540-34416-0_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34415-5
Online ISBN: 978-3-540-34416-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)