By Rodrygo L. T. Santos, University of Glasgow, UK, rodrygo@dcs.gla.ac.uk | Craig Macdonald, University of Glasgow, UK, craigm@dcs.gla.ac.uk | Richard McCreadie, University of Glasgow, UK, richardm@dcs.gla.ac.uk | Iadh Ounis, University of Glasgow, UK, ounis@dcs.gla.ac.uk | Ian Soboroff, National Institute of Standards and Technology, USA, ian.soboroff@nist.gov
Blogs have recently emerged as a new open, rapidly evolving and reactive publishing medium on the Web. Rather than managed by a central entity, the content on the blogosphere — the collection of all blogs on the Web — is produced by millions of independent bloggers, who can write about virtually anything. This open publishing paradigm has led to a growing mass of user-generated content on theWeb, which can vary tremendously both in format and quality when looked at in isolation, but which can also reveal interesting patterns when observed in aggregation. One field particularly interested in studying how information is produced, consumed, and searched in the blogosphere is information retrieval. In this survey, we review the published literature on searching the blogosphere. In particular, we describe the phenomenon of blogging and the motivations for searching for information on blogs. We cover both the search tasks underlying blog searchers' information needs and the most successful approaches to these tasks. These include blog post and full blog search tasks, as well as blog-aided search tasks, such as trend and market analysis. Finally, we also describe the publicly available resources that support research on searching the blogosphere.
Disclaimer: Certain companies and/or products are identified in this paper in order to describe concepts and to specify experimental procedures adequately. Such identification is not intended to imply recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that the companies or products identified are necessarily the best available for the purpose.
The last decade has witnessed a tremendous shift in publishing power. In particular, the emergence of the Web has influenced not only the way information is distributed and consumed but also how and who produces it. The advent of blogging as a publishing paradigm has led to an increasing mass of content being produced collectively by millions of bloggers worldwide. The large volume and diversity of blog content makes searching for trustworthy, high-quality information on the blogosphere a challenging task. Information Retrieval on the Blogosphere provides a comprehensive, up-to-date, and critical review of the published literature on searching the blogosphere for the research community. It details the phenomenon of blogging and the search tasks that it encompasses. These tasks include blog post and full blog search tasks, as well as blog-aided search tasks, such as trend and market analysis. For each search task, the survey thoroughly reviews the most effective approaches in the literature as well as the publicly available resources that can aid further research. Finally, this monograph provides an overview of ongoing and open research directions on searching the blogosphere and other social media channels. Information Retrieval on the Blogosphere is aimed primarily at researchers and developers in the broad area of information retrieval, as well as industrial practitioners. It is an essential companion for students and researchers examining search tasks using user-generated content in general, and the blogosphere in particular.
Disclaimer: Certain companies and/or products are identified in this paper in order to describe concepts and to specify experimental procedures adequately. Such identification is not intended to imply recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that the companies or products identified are necessarily the best available for the purpose.