DRAFT - please do not cite without permission of the authors.
Where do bloggers blog? Studying platform transitions within the Dutch blogosphere
Esther Weltevrede and Anne Helmond
Paper Presentation. MiT7 unstable platforms: the promise and peril of transition
Abstract
The blogosphere has played an instrumental role in the transition and evolution of linking
technologies and practices. This research traces and maps historical transitions of the Dutch
blogosphere and the glue that creates interconnections between blogs which - traditionally
considered - turn the collective of blogs into a blogosphere. This paper aims to problematize
the definition of the blogosphere by questioning who the actors that form the blogosphere
through its interconnections are. Blogs included in the Loglijst, an early manual initiative to
index the Dutch blogosphere, as well as several other expert lists, serve as starting points to
be retrieved from the Internet Archive. Archives have become indispensable tools to study
early web cultures. Whereas the Internet Archive’s interface, the Wayback Machine,
privileges single site histories this research aims to repurpose the Wayback Machine to trace
and map transitions in linking technologies and practices in the blogosphere over time using
digital methods and custom software. We are thus able to create yearly network
visualizations of the historical Dutch blogosphere (1999-2009). This approach allows us to
study the evolution of linking practices, which suggests that particular blogging practices can
be distinguished through the distinct linking patterns of linklogs, lifelogs and platformlogs.
Moreover, this approach not only allows us to study the emergence and decline of blog
platforms and social media platforms within the blogosphere but it also allows us to
investigate whether particular linking technologies or practices are specific to local blog
cultures.
1
Introduction
Why map the blogosphere? The blogosphere has played an instrumental role in the transition
and evolution of linking technologies and practices. Think for example of the introduction
and development of the trackback, pingback and RSS and how these where used by bloggers
to develop a culture of blogging as a distinct online culture. Important research in this area is
practice, event or issue based, and seeks to capture an otherwise fleeting phenomenon in the
moment, before it is deleted, overwritten or no longer available. Now that the blogosphere
has matured, the first historical accounts are being created. This study seeks to contribute to
this body of literature by investigating the more structural platform and software
infrastructure of the blogosphere. More specifically, we seek to contribute to empirical
research of the national blogosphere by putting forward new methods to explore transitions
in the historical Dutch blogosphere in different ways. Additionally, we consider the
implications of these methods for thinking about transitions in blogs and the local
blogosphere.
Working from method, we aim to contribute to the more theoretical body of literature
about blogs and blogosphere as choices in method shape the definition of blogs and the
blogosphere. In doing so, this paper both addresses methodological questions related to
empirically researching the national historical blogosphere, and presents findings of initial
research into the Dutch blogosphere. The approach put forward combines techniques used
by search engines and web archive crawlers as they discover and analyze the content of the
entire web with editorial techniques commonly used in human and social sciences. While
preliminary, the research both acts as proof of concept and as a model for studying national
and historical blogospheres, as well as providing new insights into the shape of the Dutch
blogosphere and its interconnections.
The blogosphere is often studied by mapping and visualizing the interconnections
between blogs to make the blogosphere tangible and visible. Put differently, in order to
become visible, the image of the blogosphere needs to be constructed. There are two common
ways the image of the blogosphere is constructed: First, by blogosphere related services such
as directories, web rings and blog search (Stevenson 2010:11). Second, by academic work
producing network visualizations. Using similar techniques as contemporary blog related
services, current network visualizations are commonly constructed by employing RSS feedcrawlers to fetch the content - current and newly updated blogposts and their links - of blogs
2
using their feeds (Bross 2010) or using web crawlers such as the IssueCrawler which
performs issue network crawls based on hyperlink network analysis to identify ‘‘patterns of
interconnections in the population of websites discovered in the process’’ (Bruns 2007: 1).
While different tools and methods produce different network visualizations they provide
graphical representations of interconnections and insights into the overall structure of the
blogosphere and its actors (Highfield 2009). We pose that choices in method do not only
shape the blogosphere but therewith also the definition of the blogosphere and blogs.
Historical blogosphere research mainly consists of ethnographic research providing
personal stories and anecdotes (Blood 2000; Rosenberg 2009) alongside empirical work.
Including work by web historian Michael Stevenson researching the early A-list blogosphere,
and Rudolf Ammann’s (2009) research project on the birth of the blogosphere and Ravi
Kumar et al researching the structure and evolution of the 2004 LiveJournal blog space.
Kumar et al suggest a method based on blogrolls and time stamps to map a blog space over
time. The Internet Archive provides a way into studying previous states of the web by
providing timestamped snapshots of the web. Although the single-site history is preferred,
Internet Archive data may be used in a variety of ways. For example, Ammann studies the
emerging blogosphere by mapping linking patterns of early blogs with the Internet Archive
and Stevenson outlines a method to re-purpose the Internet Archive – based on single-site
histories as one can only look up single URLs – to create a custom archive by using the early
blog index EatonWeb as a historical resource to ‘conjure up’ the blogosphere. Our research
builds on the methods and tools described by Stevenson as well as develop a number of novel
techniques and methods.
First, the historical blogosphere research is operationalized by constructing snapshots
of the Dutch blogosphere paying specific attention to reconfiguring actor definitions and by
reconsidering interlinking practices by introducing fine-grained URL analysis and source
code analysis. Second, we seek to further enrich historical blogosphere analysis outside of the
Anglo-American context with a specific focus on the Dutch blogosphere. We seek to
contribute to the definition of a “national blogosphere” by investigating the Dutchness of Top
Level Domains, software and platforms. Finally, we aim to contribute to hyperlink network
analysis and issue network analysis research by reconfiguring the actor definition.
3
1. Defining Dutch blogs: Where do bloggers blog?
What is a Dutch blog? The question of how to formally define the nationality of an online site
has been object of attention in the web archiving community and is often answered by
turning to locative technical indicators such as the IP address or Top Level Domain (TLD).
Importantly, indicators for location on the web are always ambiguous and their usefulness
highly depends on the purpose and application. As an example, for the purpose of saving
digital heritage for future posterity, the Dutch web archiving institution has formulated three
defining characteristics, including language, TLD, and the more difficult to automate “subject
matter related to the Netherlands” (Weltevrede 2009). For the definition of a Dutch blog in
this research project, we initially rely on authoritative sources and thus also on their
selection criteria for including blogs in their lists. In a second step we transform the question
of “what is a Dutch blog?” into “where do Dutch bloggers blog?” in order to enrich and
complicate the understanding of the location of web content.
The collection of blogs in our corpus are retrieved from a 2001 database dump –
containing 631 unique blogs – from the Loglijst, an early Dutch blogosphere indexing
initiative. In addition to this list, we compiled expert lists from interviews, books and
authoritative lists found on the web and in the Internet Archive. These experts lists include
long list nominations from the Dutch blog awards, the Dutch Bloggies from 2001 - 2008, all
blogs mentioned in two seminal pieces on the history of the Dutch blogosphere by Schaap
(2005) and Meeuwsen (2010) and finally a list citing “Weblogs that really matter” in a
December 2010 blogpost1 by Bert Brussen, blogger for the famous Dutch ‘shocklog’ Geenstijl.
Relying on these sources to provide us with a collection of Dutch blogs led us to include a
small number of Belgian blogs that were considered to be part of the Dutch blogosphere by
our sources.
We compiled a collection of blogs serving as our starting points from the above
mentioned expert lists and using custom tools we aimed to retrieve all these blogs from the
Internet Archive. We queried the Internet Archive’s new Wayback Machine for each blog URL
and selected the result closest to the middle of the year. This method creates a collection of
archived copies of historical Dutch blogs for each year which all have a timestamp near the
middle of the year. Only those blogs with a copy in the Internet Archive were retained for
1
http://www.dejaap.nl/2010/12/28/verplicht-in-uw-rss-reader-weblogs-die-er-echt-toe-doen/
4
further analysis. The following table represents the number of blogs per year that were
retrieved from the Internet Archive to serve as starting points 1:
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
24
138
456
816
778
863
850
788
717
860
723
Table 1: Starting points retrieved from the Internet Archive per year
While attention has been paid to the role of blog software technology in relation to popularity
and the success of weblogs (Du and Wagner 2006) and in relation to blogging practices as
enabling or restricting certain actions (Schmidt 2007), we want to interrogate our starting
points to look at the question ‘where’ bloggers blog by analyzing the TLDs, platforms and
software they use. This question is – according to our knowledge – thus far understudied and,
if addressed, is done so by analyzing the demographics of bloggers, such as the location
provided in user profiles of blog platforms (Kumar et al 2004). The limitation of this approach
is that such information is optional and limited to some platforms only. This study introduces
an approach, with the focus on online culture – national digital culture, if you will –
recognizing further specificity in Dutch online practice: software and platform use as well as
applications that persist, despite Twitter, Technorati and other dominant services from the
U.S. Our main effort is to describe Dutch blogging practices by paying specific attention to
‘where’ Dutch bloggers blog. In this study the question of ‘where’ is three-fold and includes
TLD analysis, platform analysis and self-hosted blog software analysis.
TLD analysis
Where do bloggers blog? The Top Level Domain (TLD) analysis presented here is part of a
larger series of URL analysis methods discussed in this paper. As a first step we counted the
TLD use of our starting points per year by entering URLs in batches corresponding to a single
year using the TLDCount tool3 In a next step these counts are copied into a Google
Spreadsheet where the absolute number of TLDs is transformed into the relative number of
TLDs and visualized using the inbuilt chart generator. Figure 1 shows the relative distribution
of TLD usage over time.
1
3
Method described in detail: https://wiki.digitalmethods.net/Dmi/DutchBlogosphere
https://tools.digitalmethods.net/beta/tldCounts/
5
The Dutch blogs in our collection favor the .nl domain over all other domains throughout the
years. Moreover, a significant increase in the .nl domain is visible, whereas the .com domain
is steadily losing share over time, which – as our preliminary findings in the next section will
show – is related to Dutch bloggers moving away from .com blogging platforms such as
Blogger’s Blogspot to Dutch .nl blogging platforms. The Dutch .nl domain is one of the top five
largest country code Top Level Domains (ccTLDs) in the world (SIDN 2010), which is also
reflected in the Dutch blogs. It is however remarkable that the .nl domain is dominant from
the beginning because the .nl domain only became available to private individuals as of 2003.
As a forerunner, since 2000, individuals were allowed to register third-level domains in the
form jansen.123.nl (SIDN 2007) but these domains were very uncommon and are absent from
our collection of blogs. As mentioned previously, there are a number of .be blogs present in
the Dutch blog collection and from 2000 onwards they remain steadily present. Furthermore,
the peak of .tk domains in 2002 is notable. Dot.tk “Renaming the Internet” offers free domain
names and includes URL redirection and forwarding services. Lastly, there are a number of
domains that are unconventionally used for “commercial or vanity” purposes, including .nu
(Niue), marketed as ‘now’ in Dutch and .is (Iceland), which is used as the verb ‘to be’.
(Wikipedia/ccTLD)
Figure 1: Relative distribution of Top Level Domains (TLDs) in the Dutch blogs over time
6
Platform analysis
A second way to answer the question of ‘Where do bloggers blog?’, which complements the
TLD analysis, is by rendering visible the variety and proportion of blog platforms used in the
Dutch blogosphere. The second URL analysis presented here requires a stronger focus on
editorial skills. A list of blog platforms was initially compiled by reading URLs and writing
down the blog platforms found. This requires a basic background knowledge of blog
platforms and an occasional look-up. With the use of Google Refine, “a power tool for
working with messy data” 4 we ‘coded’ each of the blog platforms in GREL (Google Regular
Expressions) to automatically search, transform and count the platforms in our set of URLs. 5
The results are presented in figure 2, a custom made visualization combining the blog
platform analysis with the self-hosted software analysis as discussed in the next section.
The graph shows the rise and popularization of Blogspot, the Blogger platform, in the
beginning of 2000. The decline of Blogspot coincides with the rise of Dutch blogging platform
Web-Log.nl. The rise of Web-Log.nl is accompanied with the rise of other Dutch blog
platforms such as BlogNL, Blogo, Blogse, Punt and Blogeiland. Figure 2 powerfully shows how
from 2004-2005 onwards Dutch bloggers – except for a relatively small number of Blogspot
and WordPress.com users – move to Dutch platforms, which are color coded orange. It is
visible that there are a few bloggers on legacy platforms such as Pitas.com which no longer
accept new members but are still functional for old members.
Dutch software and platforms play an important role in the Dutch blogosphere and
between 2004-2009 over 40% of all bloggers using blog software or blog platforms are
running Dutch software or are on Dutch platforms. When zooming into the use of platforms
almost all bloggers on blog platforms make use of Dutch platforms (see figure 3).
4
5
http://code.google.com/p/google-refine/
Method described in detail: https://wiki.digitalmethods.net/Dmi/DutchBlogosphere
7
Figure 2: Relative distribution of self-hosted blog software & blog platforms in Dutch blogs
Figure 3: The relative amount of Dutch blog platforms over time compared to other blog platforms.
8
Self-hosted software analysis
The question of ‘Where do bloggers blog?’ was approached with a URL analysis to investigate
the distribution of TLDs and platforms used in the Dutch blogosphere. Our findings suggest
that the early Dutch bloggers, the founding fathers of the Dutch blogosphere, do not make use
of blog platforms. In general, the early Dutch bloggers prefer to create their blogs manually –
written in HTML meaning bloggers would have to manually enforce a reverse-chronology in
order to place the latest blogpost on top – or use specifically designed self-hosted blog
software. To include these blogs using self-hosted blog software in our “where the Dutch
bloggers blog” analysis we developed a method which moves beyond the blog’s URL and
instead searches the page’s source code to look for the blog software powering the blog in
order to carefully create a list of blog software.
Initially, the list was compiled by analyzing the Gephi6 maps (see section 4) and
iteratively enhanced with newly found software throughout the research project. Compiling
the list of self-hosting software, we made use of the reflexive blog culture to complete the list.
Typically, bloggers tend to analyze and describe the practice of blogging (Hourihan 2002;
Blood 2002). Searching for our initial list of software thus lead to blog posts comparing or
mentioning different types of software (see figure 4). For each year we searched the source
code of the collection of archived blog front pages for the presence of the blog software types.
The SourceCodeSearch7 tool has the option to return the query entered as well as the trailing
characters following the query. The results were editorially checked to establish whether the
reference to the software entailed that the blog was running on that software. Especially in
the beginning, references to self hosted blog software were not standardized. In later years
the ‘powered by’ button in the side bar or footer became standard for most self-hosting
software.
Figure 4: “Not tonight love / I’m busy playing around with weblog software” 7 October 2002, by blogger
users.pandora.be/vrints.html
6
7
Gephi is an open-source software for visualizing and analyzing large networks graphs: http://gephi.org
https://tools.issuecrawler.net/beta/SourceCodeSearch
9
Contrary to the blog platform counts, the self-hosted blog software results suggest that the
Dutch blog software Pivot/PivotX has been powering Dutch blogs from the start and was the
most used software in the heydays of Dutch blogging. The decline of Blogger, the first blog
platform used by Dutch bloggers, coincides with the rise of Blogspot - Blogger’s platform.
Furthermore, the bar graph shows a boost of blogs powered by WordPress.org in the
blogosphere from 2006 onwards. Movable Type and the Belgian Nucleus have a small but
loyal share of bloggers running the software.
In terms of blog software and blog platforms, the heyday of Dutch blogs was around
2005 for platforms and 2006 for software. Notably, the share of self-hosted software
outnumbers one-click publishing platforms, which was not expected by the bloggers
themselves. There are a number of posts that can be found by the early bloggers expressing
their fear that soon everybody will be blogging and posts expressing rivalry between selfhosting bloggers and platform bloggers (see figure 5). However, figure 2 also clearly shows
that the large majority of blogs do not run specifically designed blog software or use blog
platforms. A next step in further developing this methodology is to formalize the various
types of references to software, throughout the years, and design queries to automate the
process of collection and analysis of the results.
Figure 5: Early Dutch blogger about the rise of free blog platforms 8
In this first part we focused on designing methods to address the question of ‘Where bloggers
blog?’ in order to enrich current methods used to determine the nationality of blogs by
enhancing a TLD analysis with a platform and software analysis. In the following part we will
look into the interconnections between these blogs and put forward a method to create
historical blogospheres.
8
Translation: “Ah, the free blog services. How we, the bloggers of the first hour, with our own domain and a
self-made site, despised them, the services that allowed you to put up a blog in a few clicks. Look at him, he
has a blogspot, or worse, a web-log.nl, which we scornfully called a web-dash-log.”
http://vandenb.com/archive/2009/10/10/898-woorden-mijn-webloggeschiedenis
10
2. Defining the blogosphere
The term ‘blogosphere’ was originally coined in 1999 by Brad L. Graham to mark the end of
cyberspace “Goodbye, cyberspace! Hello, blogiverse! Blogosphere? Blogmos?” and was
revived in 2001 by William Quick as “the intellectual cyberspace we bloggers occupy” with an
explicit reference to the blogosphere as a space for serious discourse. Echoing the idea of the
blogosphere as a discursive space “the imagined public sphere” (boyd 2006) was presented
alongside the idea of blogs as counter voices to mainstream media (Lovink 2007). Besides the
notion of the blogosphere as a space for discourse other definitions stress the formalistic
characteristics of the blogosphere as an interlinked set of blogs which “allows for the
networked, decentralised, distributed discussion and deliberation on a wide range of topics”
(Bruns and Kirchhoff 2010). A complimentary approach to the blogosphere as an interlinked
set of blogs looks at how blogs are “embedded into a much bigger picture: a segmented and
independent public that dynamically evolves and functions according to its own rules and
with ever-changing protagonists, a network also known as the 'blogosphere' ” (Bross et al
2010:453). Following this line of thinking of blogs which are embedded in a larger networked
ecology with shifting protagonists the blogosphere may also be defined by including the
actors they link to and thereby include in their networked ecology: “The notion of a miniblogsphere additionally rests on the extent to which the set of blogs doing an issue are
interconnected by links and/or by textual referencing. Blogs also make [sic] be 'connected'
together through common references to a third party, e.g., all blogs linking to or referencing a
particular piece in the New York Times” (Rogers 2005). Although these two dominant
approaches to research the blogosphere may be distinguished by their object of research,
they do not exclude each other, as is for instance demonstrated by the US political
blogosphere research of Benkler and Shaw (2010).
Notwithstanding highly formal, the blogosphere has more of a cultural meaning than
a technical meaning, because as the previous section has illustrated, there are many different
blog platforms and blog software types available to customize how the blog is being used. Our
approach may initially be defined as formalistic, because the definition of the blogosphere
follows from the outlined method, which is based on link analysis, below. However, by
mapping the formal changes in linking patterns and URLs over time, we are able to suggest
findings about specific local cultures of use.
The annual blogospheres are created from a collection of blogs retrieved from the
Internet Archive using custom tools. One of the consequences of studying transition with a
11
static Internet Archive is that it is only possible to do research on front-page level and not on
a post level. Thus, this method may be viewed as a more structural ‘blogosphere’ analysis
instead of ‘issue’ or ‘event’ analysis. While we are aware that the choice of our starting points
shapes the Dutch blogosphere, the methodology used only retains those blogs found relevant
by the other blogs. It is a co-link analysis, the analysis module used by the IssueCrawler. 9 The
co-link analysis is performed in two steps: first, for each blog all links on front-page level are
extracted (one depth) and subsequently, in Gephi, only nodes receiving at least two links
from the starting points are kept in the network visualization (one iteration). The outlinks of
the retrieved blogs are co-linked into a network, which means that nodes in the network have
to receive at least two links from the starting points to be retained.
Whereas the co-link analysis is an analysis module most successfully used for locating
issue networks, in our case, the result of the co-link analysis is that issue or event-based links
are excluded from analysis. This has three main reasons. First, the starting points are not
chosen because they share an issue or an interest in an event, but rather, what the starting
points share is the practice of blogging in the Dutch web space. Second, because only the
front-pages are crawled, which means that the more structural links in for example blog rolls,
to blog related services and to blog software are the stable variable in the analysis, whereas
links in posts are only taken into account if they are present on the front-page. Third, the time
frame of the network is one year. Combined with the previous point that links from posts are
only crawled one level deep, the effect is that links to versatile issues that may dominate the
Dutch blogosphere for a short period of time will be excluded, only the more structural issues
will prevail. Studying a structural blogosphere follows the idea that blogs are embedded in a
larger networked ecology created by bloggers through their linking practices where they also
include other actors than blogs, for example blog portals, webrings, news website and social
media platforms. In the following we describe how we have constructed the Dutch
blogosphere through the Internet Archive and prepared it for further analysis. Specific
attention is paid to the process of construction by reconfiguring actor definitions and
reconsidering interlinking practices. We further develop methods to study transitions in the
historical blogosphere with the static Internet Archive. Methodologically we contribute in
three ways. First by refining network analysis with “actor definition” using Gephi and G Atlas
software10. Second by introducing fine-grained URL analysis to study transition in platforms
and TLD use. Third by introducing source code analysis to study transitions in software use.
9
A software tool that locates and visualizes networks on the web, see: www.issuecrawler.net
10
http://ediasporas.ticmigrations.fr/?lang=en
12
3. Actor definition
As previously described, we retrieved snapshots of our blogs from 1999 to 2009 through the
Internet Archive and extracted their outlinks on a front-page level and put the results in
Gephi’s GEXF format. In Gephi, a simplified version of co-link analysis is performed so that
only blogs that receive more than two links from our starting list are kept. Co-link is
performed on a ‘by site’ level which is more indulgent than the ‘by page’ option because it
counts all links from site to site. In other words, the co-link analysis is performed on the hosts
and not on the deep pages.
An important methodological contribution is made to a common problem in online
network visualizations: the problem of big platform nodes that take a prominent position in
the graph. Analysis of these maps often suggest the conclusion that the debate is moving
elsewhere (i.e. to social media). In an attempt to demystify the position of the big platform
nodes in the Dutch blogosphere, we propose to redefine the nodes of the network to actors. 11
Most network analysis software treats the host and in some cases sub-host as the actor.
However, in our case the ‘actor’ or blogger is often defined after the slash. Think, for example,
of the early bloggers that started blogging from their personal homepage to the recent micro
bloggers on Twitter.
To identify nodes in the blogosphere as actors, we redefined what actors are on a URL
level. This is not unproblematic, because not all URLs follow the same pattern. For instance,
with most websites the ‘actor’ is defined by ‘host’ (e.g. example.com) while actors on blog
software are usually defined before the host on a subdomain (e.g. example.blogger.com),
actors on personal homepages are often defined by their ~ after the slash (e.g.
xs4all.nl/~example) and micro-bloggers on Twitter are also defined after the slash (e.g.
twitter.com/example). In the actor definition project we sought to formalize ‘URL patterns’ in
the network.12
11
See also section 4 on social media analysis
12
https://wiki.digitalmethods.net/Dmi/DutchBlogosphere
13
4. Analysis of the Dutch blogosphere in transition
Mapping the outlinks of the blogs we retrieved from the Internet Archive from 1999 till 2009
allows us to go back in time and study how and where the Dutch blogosphere originated.
Using the fine-grained actor definition, the network is visualized with Gephi for each year.
Figure 6 shows the rise, evolution and first signs of decline of the Dutch blogosphere, where
grey depicts the hyperlink network of all years collapsed and red the blogosphere of that
year. While the first Dutch bloggers started mid 1999 they are not interlinked into a ‘sphere’
and we can trace the beginning of a structural Dutch blogosphere to 2000.
Figure 6: The Dutch blogosphere in transition.
In 1999 there are only four nodes on the map (not displayed) which do not link to each other
but they are on the map because they receive at least two links from our selected starting
points. The four nodes are Nedstat, Nedstatbasic, Wired and a Dutch linkdump blog by
blogger Wessel Zweers 13. A familiar node on the map is Wired, a technology magazine that
13
huizen.dds.nl/~wzweers
14
was also prominent in the USA early blogosphere (Stevenson 2010). The only Dutch blogger
on the map is hosted on one of the oldest Dutch hosting services providing free personal
homepages the Digital City, or De Digitale Stad (DDS). Known Dutch blogs from that period,
for example Sikkema, Prolific and Alt0169 are notably absent because they do not receive two
links from the blogs in our starting list. The map in Figure 7 shows that some of the known
Dutch bloggers, as for instance mentioned in Meeuwsen (2010), together with less well known
bloggers, are present but do not form a blogosphere yet. Most notably Alt0169, ~Wzweers and
~Onnoz reach out to other Dutch blogs and may be read as an effort to establish a community
between blogs. Exemplary are links to blogs that list blogs, like Beboo.org/metalog, where the
top 50 (international) blogs are listed.
Figure 7: The pre-blogosphere in 1999. Early blogs linking outward.
15
Cluster analysis over time
2000 shows the Dutch blogosphere for the first time (see figure 8) dominated by bloggers on
personal homepage providers (blue) and student pages (pink). On the left side of the map
there is a loosely defined news-tech cluster of Dutch news sites, surrounded by USA and UK
news and tech blogs. Similar to the early USA blogosphere, tech and news are prominent in
the Dutch blogosphere (Stevenson 2010). On the right side of the blogosphere there is a
cluster of Dutch homepages (~) and student homepages. The free homepage provider DDS
and Dutch internet service provider XS4ALL are the most prominent homepage providers.
The larger nodes in the center are the founding blogs of the Dutch blogosphere, such as
Alt0169, Sikkema, S-lr, Smoel, Rikmulder, Tonie, Prolific, Pjoe, Stronk, Ben Bender, Vandenb,
Retecool. They are actually a closely linked cluster. Alt0169.com, who was a heavy linker in
1999 but did not receive any links back, is a central node in 2000. Figure 9 shows the Dutch
marketing cluster which emerged in 2005 and will continue to be a very dominant cluster in
the Dutch blogosphere. Another distinct cluster in the later blogosphere is the Blog.nl cluster.
Blog.nl has a very distinct shape because all Blog.nl blogs list and link the other blogs on that
platform as can be seen on the right in figure 11.
Figure 8: The Dutch blogosphere in 2000. Blue: personal homepages. Pink: student pages. Yellow: blog
platforms.
16
Figure 9: The Dutch marketing cluster in 2005.
Using the same method for coding blog platforms for our platform analysis we created
several categories in order to follow specific transitions in the Dutch blogosphere. The
following categories were created and coded in Google Refine: Homepages, University
Homepages, Blog Related Services, Platforms, Social Media Platforms, Starting Points,
Statistics. The categorization was created through expert URL reading and iteratively
enhanced with new findings throughout the project. This categorization allows us to color
actors belonging to a specific category in Gephi making it easier to locate actors and to track
changes over time. This method allows us, as we will demonstrate below, for example, to look
at the role of blog related services and social media in the blogosphere over time.
Blog related software: statistics
The blogosphere includes a variety of blog-related actors. The network is not only formed by
the interconnections between the blogs but also by the interconnections the blogs make with
other actors through using external services for the content/monitoring of their blog and by
the blog software which automatically creates connections. Blog related services include
17
portals, manual and automatic blog indexers, external comment services and statistics
providers.
One of the most prominent nodes from 1999 onwards is Nedstat, the Dutch statistics
provider. Nedstat – and its basic/free service Nedstatbasic – is a Dutch service providing
statistics for webmasters and bloggers about their visitors and will continue to be present in
the blogosphere together with other statistics providers over time. Most bloggers made their
statistics publicly available. It supports the claim that “the blogosphere is obsessed with
measuring, counting, and feeding” (Lovink 2007: 30). Zooming into the node (see Figure 10)
shows us all the bloggers linking to and therewith presumably using Nedstat as a statistics
provider. Statistics providers expand their presence over time, also by diversifying with new
actors monitoring new natively digital objects introduced by blogs such as the site feed which
allows users to subscribe to updates to the blog. Feedburner is one of the new actors on the
map which monitors the number of subscribers to a blog.
Figure 10: Links to Nedstat
18
Social media analysis
The early blogosphere is characterized by larger nodes such as Alt0169, Sikkema, Zweers, the
founding fathers of the Dutch blogosphere. The heydays of the Dutch blogosphere are
characterized by the rise of specific clusters, such as the marketing cluster and the blog
platform cluster of Blog.nl, and by the emergence of blog related services such as statistics.
The late period is characterized by social media, the widgetized self and content links. In this
social media research project, we set out to develop measures to analyze more closely the
practices between blogs and social media.
Frank Schaap (2005) empirically researched what he calls “the dichotomous nature of
the Dutch blogosphere” caused by the sharp division between two distinct types of weblog
forms: the linklog and the lifelog. Contributing to the distinction between the lifelog and the
linklog, we propose to include the ‘platformlog’ as a third type of blog with particular
characteristics. Whereas lifelogs primarily post about their daily life in a diary style and link
sparingly, mainly to their about page, their offline contexts and other bloggers, the linklogs
link abundantly to other blogs and media through their role of pointing out the best of the
web (Schaap 2005). The platformblog is characterized by embedding and linking content
from social media platforms like Flickr, YouTube and Facebook and by referring to the
author's presence on these platforms in sidebar widgets. The platformlog is often used to
present the widgetized self, or the distributed self across social media platforms. Whereas in
the mid and late 90s the self was defined on the personal homepage and later on the blog,
with the rise of social networking sites and content platforms the self is now also defined and
performed elsewhere. Blog software popularized the creation of the widgetized self with its
easy drag and drop widgets that allowed bloggers to easily embed content from their other
platforms into their blog through the sidebar. The sidebar is no longer only used to link to
other bloggers, using the blogroll, but also to link to the self on other platforms such as
Last.fm for music, Flickr for photos and YouTube for videos. As our method collects outlinks
from the front-page and subsequently performs a co-analysis this method may capture the
widgetized self in the sidebar on the front page as a new actor in the blogosphere. In
traditional hyperlink analysis social media nodes are disproportionally big because they
collapse all references into one node.
Comparing the 2009 blogosphere with and without actor definition (figure 11), it
becomes clear that the social media platforms privilege a more fine-grained analysis. Social
media are the big nodes in the network without actor definition, however, with actor
definition the social media platforms seem to lose prominence in the blogosphere. The
19
difference between the two methods/maps is further investigated by asking how this
difference may be explained. This undertaking has similarities with Benkler and Shaw’s
work on the U.S. political blogosphere, where they seek to analyze what is inside the large
network nodes in order to specify their internal differences:
[...] earlier studies have counted DailyKos.com and Instapundit.com each as a single,
highly connected node in a graph. Doing this masks the fundamental difference between
how these two visible blogs function as discursive platforms. [...] Link analysis studies
have treated both sites as the same phenomenon: a single node with a very large number
of in-links and out-links (2010:4)
Figure 11: Big social media nodes. The 2009 blogosphere with and without actor definition.
The strategy for research is to further specify what is linked to within social media: user
pages or content (e.g. video, photo, status update). Figure 12 shows the big social media
platform nodes, with smaller nodes inside. Comparing the various social media platforms, the
results suggest that some platforms may be defined as ‘media sharing’ platforms, such as
Youtube and Flickr, which mainly consist of embedded content links in blogs. In the
blogosphere map with actor definition, these nodes decreased in size. Facebook is a relatively
20
small node in the Dutch blogosphere and the links it receives dissolve into a divers set of
profiles, pages, apps, events, and groups. Hyves – the Dutch social network that continues to
lead Facebook in the Netherlands14 – is one of the smallest social media references. Although
the Dutch blogosphere has a preference for Dutch software and platforms, this is not
reflected in social media platform links. Twitter, the largest node in the network is a
platforms that mainly receives links to user pages. This means that bloggers refer to
themselves or friends on the micro-blogging platform.
Figure 12. Social media in the 2009 Dutch blogosphere. A fine-grained URL analysis of the Big social media
nodes. References to social media platforms demystified.
Link analysis as it has been used, has clear limitations for analyzing the share of social media
in blogosphere networks. Our study suggests that the uniform large sized platform nodes are
misleading. Similar to Benkler and Shaw’s study it raises a concern with link analysis that
zooms out to look at platforms as a whole by treating the entire platform domain as the node,
14
http://www.comscore.com/Press_Events/Press_Releases/2011/4/The_Netherlands_Ranks_number_one_Worl
dwide_in_Penetration_for_Twitter_and_LinkedIn
21
and in doing so effaces the individual content link and the individual author. The platform
nodes require more nuanced exploration.
From this undertaking, the next step is to find out to which extent the bloggers mainly
self-reference to their user page on Twitter as a widget of the self. Using Google Refine, we
developed a methodology to study platform migration. For the Twitter users, we chopped
http://twitter.com, lowercased and excluded characters such as the under score. In a similar
way for the blogs, we only retained the domain name and subsequently matched the Twitter
user name with the blog domain name. The remaining user names were eyeballed on small
differences in spelling and added to the list. The findings are that from the 160 unique
bloggers who link to Twitter user pages, 98 (also) link to themselves. For Twitter at least, this
supports the claim that the widgetized self can be found in the sidebar, as a new actor in the
blogosphere.
Conclusion and further research
The results of the research indicate that the methods proposed for investigating the national
historical blogosphere which has been outlined here appears to be overall useful. This paper
aimed to contribute to the growing amount of literature on blogs and the blogosphere by
proposing new methods to empirically investigate transitions in the historical blogosphere
over time. In doing so, a method was developed and outlined to create a so-called structural
blogosphere due to the medium specific characteristics of the Internet Archive which allows
for the re-construction of a blogosphere on domain level and not on a post level. The
advantage of this method is that it allows for a ‘structural’ blogosphere analysis instead of an
‘issue’ or ‘event’ analysis. Questions that may be addressed with a structural analysis involve
software and platform analysis, in this case study used as a new way into the nationality of
blogs. We sought to contribute to the ‘where’ question of web content by turning it into
“where do Dutch bloggers blog?” by looking into TLD, platform and software usage. Our study
suggests that Dutch bloggers increasingly blog on in the .nl space despite the more general
trend of software concentration and domination of actors like Blogger and WordPress.
Methodologically the paper further developed three analytical techniques and
methods to study the national historical blog blogosphere: URL analysis, source code analysis
and hyperlink analysis. URLs are very rich information sources that often follow a certain
syntax which makes them very suitable for analysis. In this work we used URL analysis in
two ways: TLD analysis and platform analysis. With source code analysis we contribute to the
22
study of software more generally and the study of national software specifically. The method
developed provides insight into what software powers a blogosphere. Further research may
include a fine-grained feature analysis over time, placing special emphasis on collaborative
and discursive features such as the comment, plugins and the permalink. Our contribution to
link analysis considers ways to treat the big platform nodes in network visualizations. We
propose two methods, the first is an actor definition and secondly a fine-grained social media
analysis. Whereas traditionally the host is considered the actor, with platforms the actor, or
blogger in this case, is often defined after the slash. By detecting URL patterns, new actor
definitions may be operationalized before co-link analysis. The fine-grained social media
analysis is similar in technique, but instead of only looking for actors, it is aimed at
distinguishing actor links from content links. The analysis is performed after co-link analysis.
Research questions we would like to answer in the future concern to what extent blognative software features such as the permalink, trackback or RSS feed contributed to the
construction of the early blogosphere and later transitions in the blogosphere, in order
words, how changing linking practices relate to new features introduced by blog software.
We would also like to further develop our hyperlink analysis by looking into different types
of hyperlinks, besides the <a href> link now captured from the front pages, by distinguishing
between ‘traditional’ hyperlinks, embed codes and social buttons and plugins in a hyperlink
analysis.
We would also further like to explore how national blogospheres are defined and
demarcated by enriching our research with content analysis, with a special interest in
language transitions in the Dutch blogosphere. Content clusters do not only arise from linking
practices but may also be defined through their common language used. The Dutch
blogosphere may further be analyzed through its usage of web words, choosing its most
distinctive and significant ones as points of departure, as for example Retecool’s jargon or the
specific language used by GeenStijl. A final wish is to further develop the migration technique
used as a method for studying platform diversification and possibly migration.
23
About the authors
Anne Helmond is PhD candidate with the Digital Methods Initiative, the new media PhD
program at the Department of Media Studies, University of Amsterdam. In her research she
focuses on software-engine relations in the blogosphere and cross-syndication politics in
social media. She also teaches new media courses in the Media Studies department.
Contact: anne@digitalmethods.net
Esther Weltevrede is PhD candidate with the Digital Methods Initiative, the new media PhD
program at the Department of Media Studies, University of Amsterdam, where she also
teaches. Esther’s research interests include national web studies as well as platform and
engine politics. Additionally Esther has been coordinating the DMI Summer schools and is
also a member of Govcom.org, a foundation dedicated to creating political web tools. Contact:
esther@digitalmethods.net
Acknowledgments
We would like to express our sincere thanks to Erik Borra for developing custom tools,
discussing methods and probing sharp questions, Mathieu Jacomy for his help with creating
Gephi maps and giving access to the Atlas tool for analyzing maps and Jan-Willem Hiddink &
Robert-Reinder Nederhoed for providing a database dump from the Loglijst.
Tools
https://tools.digitalmethods.net
http://gephi.org/
Wiki
Data and detailed methodological sheets for the projects will be made available shortly on
https://wiki.digitalmethods.net/Dmi/DutchBlogosphere.
24
References
Ammann, Rudolf. 2009a. 'Blogosphere 1998: Analysis.' Tawawa.org (2009) Available online at:
<http:// tawawa.org/ark/2009/11/5/blogosphere-1998-analysis.html> Retrieved 1 May 2011.
Benkler, Yochai and Aaron Shaw. ‘A tale of two blogospheres: Discursive practices on the left
and right.’ Berkman Center for Internet and Society Working Paper Series (2010)
Blood, Rebecca. We've Got Blog: How Weblogs Are Changing Our Culture. Cambridge, MA:
Perseus, 2002.
Blood. ‘How blogging software reshapes the online community.’ Communications of the ACM
(2004) vol. 47 (12) pp. 53-55
boyd, danah. “A Blogger’s Blog: Exploring the Definition of a Medium.” Reconstruction (2006)
vol. 6 (4). <http://reconstruction.eserver.org/064/boyd.shtml>
Bruns, Axel. ‘Methodologies for mapping the political blogosphere: An exploration using the
IssueCrawler research tool.’ First Monday (2007) vol. 12 (5)
Bruns and Kirchhoff. ‘Mapping the Australian political blogosphere.’ In WebSci '09, 18-20
Mar. 2009, Athens, Greece.
Bross et al. ‘Mapping the blogosphere with rss-feeds.’ 24th IEEE International Conference on
Advanced Information Networking and Applications (2010)
Du and Wagner. ‘Weblog success: Exploring the role of technology.’ International Journal of
Human-Computer Studies (2006) vol. 64 (9) pp. 789-798
Graham, Brad L. ‘Friday, September 10, 1999.’ Bradlands. Available online at:
<http://www.bradlands.com/weblog/comments/september_10_1999/> Retrieved 3 May
2011.
Highfield, Timothy. ‘Which way up? Reading and drawing maps of the blogosphere.’
Ejournalist (2009) vol. 9(1) pp. 99-114.
Hourihan, Meg. ‘What We're Doing When We Blog.‘ 2002. Available online at:
<http://oreilly.com/pub/a/javascript/2002/06/13/megnut.html> Retrieved 1 May 2011.
Kumar et al. ‘Structure and evolution of blogspace.’ Communications of the ACM (2004) vol. 47
(12) pp. 35-39
Geert Lovink. Zero Comments: Blogging and Critical Internet Culture. New York: Routledge,
2007.
Meeuwsen, Frank. Bloghelden. Utrecht: AW Bruna, 2010.
Quick, William. ‘Tuesday, January 01, 2002.’ Daily Pundit. Available online at:
<http://replay.web.archive.org/20020603131137/http://www.iw3p.com/DailyPundit/2001_12
_30_dailypundit_archive.php#8315120> Retrieved 4 May 2011.
Rosenberg, Scott. Say Everything. New York: Crown Publishing Group, 2009.
25
Rogers, Richard. ‘Old and New Media: Competition and Political Space’ Theory & Event (2005)
vol. 8 (2)
Schaap, Frank. ‘Links, Lives, Logs: Presentation in the Dutch Blogosphere’ in Gurak et al.
(eds), Into the Blogosphere. Rhetoric, Community and Culture of Weblogs (2004). Available
online at <http://blog.lib.umn.edu/blogosphere/> Retrieved 4 May 2011.
Schmidt, Jan. ‘Blogging practices: An analytical framework.’ Journal of Computer‐Mediated
Communication (2007) vol 12. (4)
SIDN. ‘Jaarverslag 2010: Het jaar dat internet het nieuws beheerste.’ SIDN (2010). Available
online at: https://www.sidn.nl/fileadmin/docs/PDF-files_NL/SIDN_Jaarverslag_2010.pdf
Retrieved 4 May 2011.
SIDN. ‘SIDN kondigt uitfasering persoonsdomeinnamen aan.’ SIDN (2007). Available online
at: <https://www.sidn.nl/nieuws/nieuwsbericht/article/sidn-kondigt-uitfaseringpersoonsdomeinnamen-aan/> Retrieved 4 May 2011.
Stevenson, Michael. ‘The archived blogosphere: exploring web historical methods using the
Internet Archive.’ Paper presented at Digital Methods mini-conference, University of
Amsterdam, January 2010.
Weltevrede, E., Thinking Nationally with the Web: A Medium-Specific Approach to the
National Turn in Web Archiving. M.A thesis, University of Amsterdam, 2009.
Wikipedia. ccTLD. Available online at: <http://en.wikipedia.org/wiki/Country_code_toplevel_domain#Commercial_and_vanity_use> Retrieved 4 May 2011.
26