[go: up one dir, main page]

DRAFT - please do not cite without permission of the authors. Where do bloggers blog? Studying platform transitions within the Dutch blogosphere Esther Weltevrede and Anne Helmond Paper Presentation. MiT7 unstable platforms: the promise and peril of transition Abstract The blogosphere has played an instrumental role in the transition and evolution of linking technologies and practices. This research traces and maps historical transitions of the Dutch blogosphere and the glue that creates interconnections between blogs which - traditionally considered - turn the collective of blogs into a blogosphere. This paper aims to problematize the definition of the blogosphere by questioning who the actors that form the blogosphere through its interconnections are. Blogs included in the Loglijst, an early manual initiative to index the Dutch blogosphere, as well as several other expert lists, serve as starting points to be retrieved from the Internet Archive. Archives have become indispensable tools to study early web cultures. Whereas the Internet Archive’s interface, the Wayback Machine, privileges single site histories this research aims to repurpose the Wayback Machine to trace and map transitions in linking technologies and practices in the blogosphere over time using digital methods and custom software. We are thus able to create yearly network visualizations of the historical Dutch blogosphere (1999-2009). This approach allows us to study the evolution of linking practices, which suggests that particular blogging practices can be distinguished through the distinct linking patterns of linklogs, lifelogs and platformlogs. Moreover, this approach not only allows us to study the emergence and decline of blog platforms and social media platforms within the blogosphere but it also allows us to investigate whether particular linking technologies or practices are specific to local blog cultures. 1 Introduction Why map the blogosphere? The blogosphere has played an instrumental role in the transition and evolution of linking technologies and practices. Think for example of the introduction and development of the trackback, pingback and RSS and how these where used by bloggers to develop a culture of blogging as a distinct online culture. Important research in this area is practice, event or issue based, and seeks to capture an otherwise fleeting phenomenon in the moment, before it is deleted, overwritten or no longer available. Now that the blogosphere has matured, the first historical accounts are being created. This study seeks to contribute to this body of literature by investigating the more structural platform and software infrastructure of the blogosphere. More specifically, we seek to contribute to empirical research of the national blogosphere by putting forward new methods to explore transitions in the historical Dutch blogosphere in different ways. Additionally, we consider the implications of these methods for thinking about transitions in blogs and the local blogosphere. Working from method, we aim to contribute to the more theoretical body of literature about blogs and blogosphere as choices in method shape the definition of blogs and the blogosphere. In doing so, this paper both addresses methodological questions related to empirically researching the national historical blogosphere, and presents findings of initial research into the Dutch blogosphere. The approach put forward combines techniques used by search engines and web archive crawlers as they discover and analyze the content of the entire web with editorial techniques commonly used in human and social sciences. While preliminary, the research both acts as proof of concept and as a model for studying national and historical blogospheres, as well as providing new insights into the shape of the Dutch blogosphere and its interconnections. The blogosphere is often studied by mapping and visualizing the interconnections between blogs to make the blogosphere tangible and visible. Put differently, in order to become visible, the image of the blogosphere needs to be constructed. There are two common ways the image of the blogosphere is constructed: First, by blogosphere related services such as directories, web rings and blog search (Stevenson 2010:11). Second, by academic work producing network visualizations. Using similar techniques as contemporary blog related services, current network visualizations are commonly constructed by employing RSS feedcrawlers to fetch the content - current and newly updated blogposts and their links - of blogs 2 using their feeds (Bross 2010) or using web crawlers such as the IssueCrawler which performs issue network crawls based on hyperlink network analysis to identify ‘‘patterns of interconnections in the population of websites discovered in the process’’ (Bruns 2007: 1). While different tools and methods produce different network visualizations they provide graphical representations of interconnections and insights into the overall structure of the blogosphere and its actors (Highfield 2009). We pose that choices in method do not only shape the blogosphere but therewith also the definition of the blogosphere and blogs. Historical blogosphere research mainly consists of ethnographic research providing personal stories and anecdotes (Blood 2000; Rosenberg 2009) alongside empirical work. Including work by web historian Michael Stevenson researching the early A-list blogosphere, and Rudolf Ammann’s (2009) research project on the birth of the blogosphere and Ravi Kumar et al researching the structure and evolution of the 2004 LiveJournal blog space. Kumar et al suggest a method based on blogrolls and time stamps to map a blog space over time. The Internet Archive provides a way into studying previous states of the web by providing timestamped snapshots of the web. Although the single-site history is preferred, Internet Archive data may be used in a variety of ways. For example, Ammann studies the emerging blogosphere by mapping linking patterns of early blogs with the Internet Archive and Stevenson outlines a method to re-purpose the Internet Archive – based on single-site histories as one can only look up single URLs – to create a custom archive by using the early blog index EatonWeb as a historical resource to ‘conjure up’ the blogosphere. Our research builds on the methods and tools described by Stevenson as well as develop a number of novel techniques and methods. First, the historical blogosphere research is operationalized by constructing snapshots of the Dutch blogosphere paying specific attention to reconfiguring actor definitions and by reconsidering interlinking practices by introducing fine-grained URL analysis and source code analysis. Second, we seek to further enrich historical blogosphere analysis outside of the Anglo-American context with a specific focus on the Dutch blogosphere. We seek to contribute to the definition of a “national blogosphere” by investigating the Dutchness of Top Level Domains, software and platforms. Finally, we aim to contribute to hyperlink network analysis and issue network analysis research by reconfiguring the actor definition. 3 1. Defining Dutch blogs: Where do bloggers blog? What is a Dutch blog? The question of how to formally define the nationality of an online site has been object of attention in the web archiving community and is often answered by turning to locative technical indicators such as the IP address or Top Level Domain (TLD). Importantly, indicators for location on the web are always ambiguous and their usefulness highly depends on the purpose and application. As an example, for the purpose of saving digital heritage for future posterity, the Dutch web archiving institution has formulated three defining characteristics, including language, TLD, and the more difficult to automate “subject matter related to the Netherlands” (Weltevrede 2009). For the definition of a Dutch blog in this research project, we initially rely on authoritative sources and thus also on their selection criteria for including blogs in their lists. In a second step we transform the question of “what is a Dutch blog?” into “where do Dutch bloggers blog?” in order to enrich and complicate the understanding of the location of web content. The collection of blogs in our corpus are retrieved from a 2001 database dump – containing 631 unique blogs – from the Loglijst, an early Dutch blogosphere indexing initiative. In addition to this list, we compiled expert lists from interviews, books and authoritative lists found on the web and in the Internet Archive. These experts lists include long list nominations from the Dutch blog awards, the Dutch Bloggies from 2001 - 2008, all blogs mentioned in two seminal pieces on the history of the Dutch blogosphere by Schaap (2005) and Meeuwsen (2010) and finally a list citing “Weblogs that really matter” in a December 2010 blogpost1 by Bert Brussen, blogger for the famous Dutch ‘shocklog’ Geenstijl. Relying on these sources to provide us with a collection of Dutch blogs led us to include a small number of Belgian blogs that were considered to be part of the Dutch blogosphere by our sources. We compiled a collection of blogs serving as our starting points from the above mentioned expert lists and using custom tools we aimed to retrieve all these blogs from the Internet Archive. We queried the Internet Archive’s new Wayback Machine for each blog URL and selected the result closest to the middle of the year. This method creates a collection of archived copies of historical Dutch blogs for each year which all have a timestamp near the middle of the year. Only those blogs with a copy in the Internet Archive were retained for 1 http://www.dejaap.nl/2010/12/28/verplicht-in-uw-rss-reader-weblogs-die-er-echt-toe-doen/ 4 further analysis. The following table represents the number of blogs per year that were retrieved from the Internet Archive to serve as starting points 1: 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 24 138 456 816 778 863 850 788 717 860 723 Table 1: Starting points retrieved from the Internet Archive per year While attention has been paid to the role of blog software technology in relation to popularity and the success of weblogs (Du and Wagner 2006) and in relation to blogging practices as enabling or restricting certain actions (Schmidt 2007), we want to interrogate our starting points to look at the question ‘where’ bloggers blog by analyzing the TLDs, platforms and software they use. This question is – according to our knowledge – thus far understudied and, if addressed, is done so by analyzing the demographics of bloggers, such as the location provided in user profiles of blog platforms (Kumar et al 2004). The limitation of this approach is that such information is optional and limited to some platforms only. This study introduces an approach, with the focus on online culture – national digital culture, if you will – recognizing further specificity in Dutch online practice: software and platform use as well as applications that persist, despite Twitter, Technorati and other dominant services from the U.S. Our main effort is to describe Dutch blogging practices by paying specific attention to ‘where’ Dutch bloggers blog. In this study the question of ‘where’ is three-fold and includes TLD analysis, platform analysis and self-hosted blog software analysis. TLD analysis Where do bloggers blog? The Top Level Domain (TLD) analysis presented here is part of a larger series of URL analysis methods discussed in this paper. As a first step we counted the TLD use of our starting points per year by entering URLs in batches corresponding to a single year using the TLDCount tool3 In a next step these counts are copied into a Google Spreadsheet where the absolute number of TLDs is transformed into the relative number of TLDs and visualized using the inbuilt chart generator. Figure 1 shows the relative distribution of TLD usage over time. 1 3 Method described in detail: https://wiki.digitalmethods.net/Dmi/DutchBlogosphere https://tools.digitalmethods.net/beta/tldCounts/ 5 The Dutch blogs in our collection favor the .nl domain over all other domains throughout the years. Moreover, a significant increase in the .nl domain is visible, whereas the .com domain is steadily losing share over time, which – as our preliminary findings in the next section will show – is related to Dutch bloggers moving away from .com blogging platforms such as Blogger’s Blogspot to Dutch .nl blogging platforms. The Dutch .nl domain is one of the top five largest country code Top Level Domains (ccTLDs) in the world (SIDN 2010), which is also reflected in the Dutch blogs. It is however remarkable that the .nl domain is dominant from the beginning because the .nl domain only became available to private individuals as of 2003. As a forerunner, since 2000, individuals were allowed to register third-level domains in the form jansen.123.nl (SIDN 2007) but these domains were very uncommon and are absent from our collection of blogs. As mentioned previously, there are a number of .be blogs present in the Dutch blog collection and from 2000 onwards they remain steadily present. Furthermore, the peak of .tk domains in 2002 is notable. Dot.tk “Renaming the Internet” offers free domain names and includes URL redirection and forwarding services. Lastly, there are a number of domains that are unconventionally used for “commercial or vanity” purposes, including .nu (Niue), marketed as ‘now’ in Dutch and .is (Iceland), which is used as the verb ‘to be’. (Wikipedia/ccTLD) Figure 1: Relative distribution of Top Level Domains (TLDs) in the Dutch blogs over time 6 Platform analysis A second way to answer the question of ‘Where do bloggers blog?’, which complements the TLD analysis, is by rendering visible the variety and proportion of blog platforms used in the Dutch blogosphere. The second URL analysis presented here requires a stronger focus on editorial skills. A list of blog platforms was initially compiled by reading URLs and writing down the blog platforms found. This requires a basic background knowledge of blog platforms and an occasional look-up. With the use of Google Refine, “a power tool for working with messy data” 4 we ‘coded’ each of the blog platforms in GREL (Google Regular Expressions) to automatically search, transform and count the platforms in our set of URLs. 5 The results are presented in figure 2, a custom made visualization combining the blog platform analysis with the self-hosted software analysis as discussed in the next section. The graph shows the rise and popularization of Blogspot, the Blogger platform, in the beginning of 2000. The decline of Blogspot coincides with the rise of Dutch blogging platform Web-Log.nl. The rise of Web-Log.nl is accompanied with the rise of other Dutch blog platforms such as BlogNL, Blogo, Blogse, Punt and Blogeiland. Figure 2 powerfully shows how from 2004-2005 onwards Dutch bloggers – except for a relatively small number of Blogspot and WordPress.com users – move to Dutch platforms, which are color coded orange. It is visible that there are a few bloggers on legacy platforms such as Pitas.com which no longer accept new members but are still functional for old members. Dutch software and platforms play an important role in the Dutch blogosphere and between 2004-2009 over 40% of all bloggers using blog software or blog platforms are running Dutch software or are on Dutch platforms. When zooming into the use of platforms almost all bloggers on blog platforms make use of Dutch platforms (see figure 3). 4 5 http://code.google.com/p/google-refine/ Method described in detail: https://wiki.digitalmethods.net/Dmi/DutchBlogosphere 7 Figure 2: Relative distribution of self-hosted blog software & blog platforms in Dutch blogs Figure 3: The relative amount of Dutch blog platforms over time compared to other blog platforms. 8 Self-hosted software analysis The question of ‘Where do bloggers blog?’ was approached with a URL analysis to investigate the distribution of TLDs and platforms used in the Dutch blogosphere. Our findings suggest that the early Dutch bloggers, the founding fathers of the Dutch blogosphere, do not make use of blog platforms. In general, the early Dutch bloggers prefer to create their blogs manually – written in HTML meaning bloggers would have to manually enforce a reverse-chronology in order to place the latest blogpost on top – or use specifically designed self-hosted blog software. To include these blogs using self-hosted blog software in our “where the Dutch bloggers blog” analysis we developed a method which moves beyond the blog’s URL and instead searches the page’s source code to look for the blog software powering the blog in order to carefully create a list of blog software. Initially, the list was compiled by analyzing the Gephi6 maps (see section 4) and iteratively enhanced with newly found software throughout the research project. Compiling the list of self-hosting software, we made use of the reflexive blog culture to complete the list. Typically, bloggers tend to analyze and describe the practice of blogging (Hourihan 2002; Blood 2002). Searching for our initial list of software thus lead to blog posts comparing or mentioning different types of software (see figure 4). For each year we searched the source code of the collection of archived blog front pages for the presence of the blog software types. The SourceCodeSearch7 tool has the option to return the query entered as well as the trailing characters following the query. The results were editorially checked to establish whether the reference to the software entailed that the blog was running on that software. Especially in the beginning, references to self hosted blog software were not standardized. In later years the ‘powered by’ button in the side bar or footer became standard for most self-hosting software. Figure 4: “Not tonight love / I’m busy playing around with weblog software” 7 October 2002, by blogger users.pandora.be/vrints.html 6 7 Gephi is an open-source software for visualizing and analyzing large networks graphs: http://gephi.org https://tools.issuecrawler.net/beta/SourceCodeSearch 9 Contrary to the blog platform counts, the self-hosted blog software results suggest that the Dutch blog software Pivot/PivotX has been powering Dutch blogs from the start and was the most used software in the heydays of Dutch blogging. The decline of Blogger, the first blog platform used by Dutch bloggers, coincides with the rise of Blogspot - Blogger’s platform. Furthermore, the bar graph shows a boost of blogs powered by WordPress.org in the blogosphere from 2006 onwards. Movable Type and the Belgian Nucleus have a small but loyal share of bloggers running the software. In terms of blog software and blog platforms, the heyday of Dutch blogs was around 2005 for platforms and 2006 for software. Notably, the share of self-hosted software outnumbers one-click publishing platforms, which was not expected by the bloggers themselves. There are a number of posts that can be found by the early bloggers expressing their fear that soon everybody will be blogging and posts expressing rivalry between selfhosting bloggers and platform bloggers (see figure 5). However, figure 2 also clearly shows that the large majority of blogs do not run specifically designed blog software or use blog platforms. A next step in further developing this methodology is to formalize the various types of references to software, throughout the years, and design queries to automate the process of collection and analysis of the results. Figure 5: Early Dutch blogger about the rise of free blog platforms 8 In this first part we focused on designing methods to address the question of ‘Where bloggers blog?’ in order to enrich current methods used to determine the nationality of blogs by enhancing a TLD analysis with a platform and software analysis. In the following part we will look into the interconnections between these blogs and put forward a method to create historical blogospheres. 8 Translation: “Ah, the free blog services. How we, the bloggers of the first hour, with our own domain and a self-made site, despised them, the services that allowed you to put up a blog in a few clicks. Look at him, he has a blogspot, or worse, a web-log.nl, which we scornfully called a web-dash-log.” http://vandenb.com/archive/2009/10/10/898-woorden-mijn-webloggeschiedenis 10 2. Defining the blogosphere The term ‘blogosphere’ was originally coined in 1999 by Brad L. Graham to mark the end of cyberspace “Goodbye, cyberspace! Hello, blogiverse! Blogosphere? Blogmos?” and was revived in 2001 by William Quick as “the intellectual cyberspace we bloggers occupy” with an explicit reference to the blogosphere as a space for serious discourse. Echoing the idea of the blogosphere as a discursive space “the imagined public sphere” (boyd 2006) was presented alongside the idea of blogs as counter voices to mainstream media (Lovink 2007). Besides the notion of the blogosphere as a space for discourse other definitions stress the formalistic characteristics of the blogosphere as an interlinked set of blogs which “allows for the networked, decentralised, distributed discussion and deliberation on a wide range of topics” (Bruns and Kirchhoff 2010). A complimentary approach to the blogosphere as an interlinked set of blogs looks at how blogs are “embedded into a much bigger picture: a segmented and independent public that dynamically evolves and functions according to its own rules and with ever-changing protagonists, a network also known as the 'blogosphere' ” (Bross et al 2010:453). Following this line of thinking of blogs which are embedded in a larger networked ecology with shifting protagonists the blogosphere may also be defined by including the actors they link to and thereby include in their networked ecology: “The notion of a miniblogsphere additionally rests on the extent to which the set of blogs doing an issue are interconnected by links and/or by textual referencing. Blogs also make [sic] be 'connected' together through common references to a third party, e.g., all blogs linking to or referencing a particular piece in the New York Times” (Rogers 2005). Although these two dominant approaches to research the blogosphere may be distinguished by their object of research, they do not exclude each other, as is for instance demonstrated by the US political blogosphere research of Benkler and Shaw (2010). Notwithstanding highly formal, the blogosphere has more of a cultural meaning than a technical meaning, because as the previous section has illustrated, there are many different blog platforms and blog software types available to customize how the blog is being used. Our approach may initially be defined as formalistic, because the definition of the blogosphere follows from the outlined method, which is based on link analysis, below. However, by mapping the formal changes in linking patterns and URLs over time, we are able to suggest findings about specific local cultures of use. The annual blogospheres are created from a collection of blogs retrieved from the Internet Archive using custom tools. One of the consequences of studying transition with a 11 static Internet Archive is that it is only possible to do research on front-page level and not on a post level. Thus, this method may be viewed as a more structural ‘blogosphere’ analysis instead of ‘issue’ or ‘event’ analysis. While we are aware that the choice of our starting points shapes the Dutch blogosphere, the methodology used only retains those blogs found relevant by the other blogs. It is a co-link analysis, the analysis module used by the IssueCrawler. 9 The co-link analysis is performed in two steps: first, for each blog all links on front-page level are extracted (one depth) and subsequently, in Gephi, only nodes receiving at least two links from the starting points are kept in the network visualization (one iteration). The outlinks of the retrieved blogs are co-linked into a network, which means that nodes in the network have to receive at least two links from the starting points to be retained. Whereas the co-link analysis is an analysis module most successfully used for locating issue networks, in our case, the result of the co-link analysis is that issue or event-based links are excluded from analysis. This has three main reasons. First, the starting points are not chosen because they share an issue or an interest in an event, but rather, what the starting points share is the practice of blogging in the Dutch web space. Second, because only the front-pages are crawled, which means that the more structural links in for example blog rolls, to blog related services and to blog software are the stable variable in the analysis, whereas links in posts are only taken into account if they are present on the front-page. Third, the time frame of the network is one year. Combined with the previous point that links from posts are only crawled one level deep, the effect is that links to versatile issues that may dominate the Dutch blogosphere for a short period of time will be excluded, only the more structural issues will prevail. Studying a structural blogosphere follows the idea that blogs are embedded in a larger networked ecology created by bloggers through their linking practices where they also include other actors than blogs, for example blog portals, webrings, news website and social media platforms. In the following we describe how we have constructed the Dutch blogosphere through the Internet Archive and prepared it for further analysis. Specific attention is paid to the process of construction by reconfiguring actor definitions and reconsidering interlinking practices. We further develop methods to study transitions in the historical blogosphere with the static Internet Archive. Methodologically we contribute in three ways. First by refining network analysis with “actor definition” using Gephi and G Atlas software10. Second by introducing fine-grained URL analysis to study transition in platforms and TLD use. Third by introducing source code analysis to study transitions in software use. 9 A software tool that locates and visualizes networks on the web, see: www.issuecrawler.net 10 http://ediasporas.ticmigrations.fr/?lang=en 12 3. Actor definition As previously described, we retrieved snapshots of our blogs from 1999 to 2009 through the Internet Archive and extracted their outlinks on a front-page level and put the results in Gephi’s GEXF format. In Gephi, a simplified version of co-link analysis is performed so that only blogs that receive more than two links from our starting list are kept. Co-link is performed on a ‘by site’ level which is more indulgent than the ‘by page’ option because it counts all links from site to site. In other words, the co-link analysis is performed on the hosts and not on the deep pages. An important methodological contribution is made to a common problem in online network visualizations: the problem of big platform nodes that take a prominent position in the graph. Analysis of these maps often suggest the conclusion that the debate is moving elsewhere (i.e. to social media). In an attempt to demystify the position of the big platform nodes in the Dutch blogosphere, we propose to redefine the nodes of the network to actors. 11 Most network analysis software treats the host and in some cases sub-host as the actor. However, in our case the ‘actor’ or blogger is often defined after the slash. Think, for example, of the early bloggers that started blogging from their personal homepage to the recent micro bloggers on Twitter. To identify nodes in the blogosphere as actors, we redefined what actors are on a URL level. This is not unproblematic, because not all URLs follow the same pattern. For instance, with most websites the ‘actor’ is defined by ‘host’ (e.g. example.com) while actors on blog software are usually defined before the host on a subdomain (e.g. example.blogger.com), actors on personal homepages are often defined by their ~ after the slash (e.g. xs4all.nl/~example) and micro-bloggers on Twitter are also defined after the slash (e.g. twitter.com/example). In the actor definition project we sought to formalize ‘URL patterns’ in the network.12 11 See also section 4 on social media analysis 12 https://wiki.digitalmethods.net/Dmi/DutchBlogosphere 13 4. Analysis of the Dutch blogosphere in transition Mapping the outlinks of the blogs we retrieved from the Internet Archive from 1999 till 2009 allows us to go back in time and study how and where the Dutch blogosphere originated. Using the fine-grained actor definition, the network is visualized with Gephi for each year. Figure 6 shows the rise, evolution and first signs of decline of the Dutch blogosphere, where grey depicts the hyperlink network of all years collapsed and red the blogosphere of that year. While the first Dutch bloggers started mid 1999 they are not interlinked into a ‘sphere’ and we can trace the beginning of a structural Dutch blogosphere to 2000. Figure 6: The Dutch blogosphere in transition. In 1999 there are only four nodes on the map (not displayed) which do not link to each other but they are on the map because they receive at least two links from our selected starting points. The four nodes are Nedstat, Nedstatbasic, Wired and a Dutch linkdump blog by blogger Wessel Zweers 13. A familiar node on the map is Wired, a technology magazine that 13 huizen.dds.nl/~wzweers 14 was also prominent in the USA early blogosphere (Stevenson 2010). The only Dutch blogger on the map is hosted on one of the oldest Dutch hosting services providing free personal homepages the Digital City, or De Digitale Stad (DDS). Known Dutch blogs from that period, for example Sikkema, Prolific and Alt0169 are notably absent because they do not receive two links from the blogs in our starting list. The map in Figure 7 shows that some of the known Dutch bloggers, as for instance mentioned in Meeuwsen (2010), together with less well known bloggers, are present but do not form a blogosphere yet. Most notably Alt0169, ~Wzweers and ~Onnoz reach out to other Dutch blogs and may be read as an effort to establish a community between blogs. Exemplary are links to blogs that list blogs, like Beboo.org/metalog, where the top 50 (international) blogs are listed. Figure 7: The pre-blogosphere in 1999. Early blogs linking outward. 15 Cluster analysis over time 2000 shows the Dutch blogosphere for the first time (see figure 8) dominated by bloggers on personal homepage providers (blue) and student pages (pink). On the left side of the map there is a loosely defined news-tech cluster of Dutch news sites, surrounded by USA and UK news and tech blogs. Similar to the early USA blogosphere, tech and news are prominent in the Dutch blogosphere (Stevenson 2010). On the right side of the blogosphere there is a cluster of Dutch homepages (~) and student homepages. The free homepage provider DDS and Dutch internet service provider XS4ALL are the most prominent homepage providers. The larger nodes in the center are the founding blogs of the Dutch blogosphere, such as Alt0169, Sikkema, S-lr, Smoel, Rikmulder, Tonie, Prolific, Pjoe, Stronk, Ben Bender, Vandenb, Retecool. They are actually a closely linked cluster. Alt0169.com, who was a heavy linker in 1999 but did not receive any links back, is a central node in 2000. Figure 9 shows the Dutch marketing cluster which emerged in 2005 and will continue to be a very dominant cluster in the Dutch blogosphere. Another distinct cluster in the later blogosphere is the Blog.nl cluster. Blog.nl has a very distinct shape because all Blog.nl blogs list and link the other blogs on that platform as can be seen on the right in figure 11. Figure 8: The Dutch blogosphere in 2000. Blue: personal homepages. Pink: student pages. Yellow: blog platforms. 16 Figure 9: The Dutch marketing cluster in 2005. Using the same method for coding blog platforms for our platform analysis we created several categories in order to follow specific transitions in the Dutch blogosphere. The following categories were created and coded in Google Refine: Homepages, University Homepages, Blog Related Services, Platforms, Social Media Platforms, Starting Points, Statistics. The categorization was created through expert URL reading and iteratively enhanced with new findings throughout the project. This categorization allows us to color actors belonging to a specific category in Gephi making it easier to locate actors and to track changes over time. This method allows us, as we will demonstrate below, for example, to look at the role of blog related services and social media in the blogosphere over time. Blog related software: statistics The blogosphere includes a variety of blog-related actors. The network is not only formed by the interconnections between the blogs but also by the interconnections the blogs make with other actors through using external services for the content/monitoring of their blog and by the blog software which automatically creates connections. Blog related services include 17 portals, manual and automatic blog indexers, external comment services and statistics providers. One of the most prominent nodes from 1999 onwards is Nedstat, the Dutch statistics provider. Nedstat – and its basic/free service Nedstatbasic – is a Dutch service providing statistics for webmasters and bloggers about their visitors and will continue to be present in the blogosphere together with other statistics providers over time. Most bloggers made their statistics publicly available. It supports the claim that “the blogosphere is obsessed with measuring, counting, and feeding” (Lovink 2007: 30). Zooming into the node (see Figure 10) shows us all the bloggers linking to and therewith presumably using Nedstat as a statistics provider. Statistics providers expand their presence over time, also by diversifying with new actors monitoring new natively digital objects introduced by blogs such as the site feed which allows users to subscribe to updates to the blog. Feedburner is one of the new actors on the map which monitors the number of subscribers to a blog. Figure 10: Links to Nedstat 18 Social media analysis The early blogosphere is characterized by larger nodes such as Alt0169, Sikkema, Zweers, the founding fathers of the Dutch blogosphere. The heydays of the Dutch blogosphere are characterized by the rise of specific clusters, such as the marketing cluster and the blog platform cluster of Blog.nl, and by the emergence of blog related services such as statistics. The late period is characterized by social media, the widgetized self and content links. In this social media research project, we set out to develop measures to analyze more closely the practices between blogs and social media. Frank Schaap (2005) empirically researched what he calls “the dichotomous nature of the Dutch blogosphere” caused by the sharp division between two distinct types of weblog forms: the linklog and the lifelog. Contributing to the distinction between the lifelog and the linklog, we propose to include the ‘platformlog’ as a third type of blog with particular characteristics. Whereas lifelogs primarily post about their daily life in a diary style and link sparingly, mainly to their about page, their offline contexts and other bloggers, the linklogs link abundantly to other blogs and media through their role of pointing out the best of the web (Schaap 2005). The platformblog is characterized by embedding and linking content from social media platforms like Flickr, YouTube and Facebook and by referring to the author's presence on these platforms in sidebar widgets. The platformlog is often used to present the widgetized self, or the distributed self across social media platforms. Whereas in the mid and late 90s the self was defined on the personal homepage and later on the blog, with the rise of social networking sites and content platforms the self is now also defined and performed elsewhere. Blog software popularized the creation of the widgetized self with its easy drag and drop widgets that allowed bloggers to easily embed content from their other platforms into their blog through the sidebar. The sidebar is no longer only used to link to other bloggers, using the blogroll, but also to link to the self on other platforms such as Last.fm for music, Flickr for photos and YouTube for videos. As our method collects outlinks from the front-page and subsequently performs a co-analysis this method may capture the widgetized self in the sidebar on the front page as a new actor in the blogosphere. In traditional hyperlink analysis social media nodes are disproportionally big because they collapse all references into one node. Comparing the 2009 blogosphere with and without actor definition (figure 11), it becomes clear that the social media platforms privilege a more fine-grained analysis. Social media are the big nodes in the network without actor definition, however, with actor definition the social media platforms seem to lose prominence in the blogosphere. The 19 difference between the two methods/maps is further investigated by asking how this difference may be explained. This undertaking has similarities with Benkler and Shaw’s work on the U.S. political blogosphere, where they seek to analyze what is inside the large network nodes in order to specify their internal differences: [...] earlier studies have counted DailyKos.com and Instapundit.com each as a single, highly connected node in a graph. Doing this masks the fundamental difference between how these two visible blogs function as discursive platforms. [...] Link analysis studies have treated both sites as the same phenomenon: a single node with a very large number of in-links and out-links (2010:4) Figure 11: Big social media nodes. The 2009 blogosphere with and without actor definition. The strategy for research is to further specify what is linked to within social media: user pages or content (e.g. video, photo, status update). Figure 12 shows the big social media platform nodes, with smaller nodes inside. Comparing the various social media platforms, the results suggest that some platforms may be defined as ‘media sharing’ platforms, such as Youtube and Flickr, which mainly consist of embedded content links in blogs. In the blogosphere map with actor definition, these nodes decreased in size. Facebook is a relatively 20 small node in the Dutch blogosphere and the links it receives dissolve into a divers set of profiles, pages, apps, events, and groups. Hyves – the Dutch social network that continues to lead Facebook in the Netherlands14 – is one of the smallest social media references. Although the Dutch blogosphere has a preference for Dutch software and platforms, this is not reflected in social media platform links. Twitter, the largest node in the network is a platforms that mainly receives links to user pages. This means that bloggers refer to themselves or friends on the micro-blogging platform. Figure 12. Social media in the 2009 Dutch blogosphere. A fine-grained URL analysis of the Big social media nodes. References to social media platforms demystified. Link analysis as it has been used, has clear limitations for analyzing the share of social media in blogosphere networks. Our study suggests that the uniform large sized platform nodes are misleading. Similar to Benkler and Shaw’s study it raises a concern with link analysis that zooms out to look at platforms as a whole by treating the entire platform domain as the node, 14 http://www.comscore.com/Press_Events/Press_Releases/2011/4/The_Netherlands_Ranks_number_one_Worl dwide_in_Penetration_for_Twitter_and_LinkedIn 21 and in doing so effaces the individual content link and the individual author. The platform nodes require more nuanced exploration. From this undertaking, the next step is to find out to which extent the bloggers mainly self-reference to their user page on Twitter as a widget of the self. Using Google Refine, we developed a methodology to study platform migration. For the Twitter users, we chopped http://twitter.com, lowercased and excluded characters such as the under score. In a similar way for the blogs, we only retained the domain name and subsequently matched the Twitter user name with the blog domain name. The remaining user names were eyeballed on small differences in spelling and added to the list. The findings are that from the 160 unique bloggers who link to Twitter user pages, 98 (also) link to themselves. For Twitter at least, this supports the claim that the widgetized self can be found in the sidebar, as a new actor in the blogosphere. Conclusion and further research The results of the research indicate that the methods proposed for investigating the national historical blogosphere which has been outlined here appears to be overall useful. This paper aimed to contribute to the growing amount of literature on blogs and the blogosphere by proposing new methods to empirically investigate transitions in the historical blogosphere over time. In doing so, a method was developed and outlined to create a so-called structural blogosphere due to the medium specific characteristics of the Internet Archive which allows for the re-construction of a blogosphere on domain level and not on a post level. The advantage of this method is that it allows for a ‘structural’ blogosphere analysis instead of an ‘issue’ or ‘event’ analysis. Questions that may be addressed with a structural analysis involve software and platform analysis, in this case study used as a new way into the nationality of blogs. We sought to contribute to the ‘where’ question of web content by turning it into “where do Dutch bloggers blog?” by looking into TLD, platform and software usage. Our study suggests that Dutch bloggers increasingly blog on in the .nl space despite the more general trend of software concentration and domination of actors like Blogger and WordPress. Methodologically the paper further developed three analytical techniques and methods to study the national historical blog blogosphere: URL analysis, source code analysis and hyperlink analysis. URLs are very rich information sources that often follow a certain syntax which makes them very suitable for analysis. In this work we used URL analysis in two ways: TLD analysis and platform analysis. With source code analysis we contribute to the 22 study of software more generally and the study of national software specifically. The method developed provides insight into what software powers a blogosphere. Further research may include a fine-grained feature analysis over time, placing special emphasis on collaborative and discursive features such as the comment, plugins and the permalink. Our contribution to link analysis considers ways to treat the big platform nodes in network visualizations. We propose two methods, the first is an actor definition and secondly a fine-grained social media analysis. Whereas traditionally the host is considered the actor, with platforms the actor, or blogger in this case, is often defined after the slash. By detecting URL patterns, new actor definitions may be operationalized before co-link analysis. The fine-grained social media analysis is similar in technique, but instead of only looking for actors, it is aimed at distinguishing actor links from content links. The analysis is performed after co-link analysis. Research questions we would like to answer in the future concern to what extent blognative software features such as the permalink, trackback or RSS feed contributed to the construction of the early blogosphere and later transitions in the blogosphere, in order words, how changing linking practices relate to new features introduced by blog software. We would also like to further develop our hyperlink analysis by looking into different types of hyperlinks, besides the <a href> link now captured from the front pages, by distinguishing between ‘traditional’ hyperlinks, embed codes and social buttons and plugins in a hyperlink analysis. We would also further like to explore how national blogospheres are defined and demarcated by enriching our research with content analysis, with a special interest in language transitions in the Dutch blogosphere. Content clusters do not only arise from linking practices but may also be defined through their common language used. The Dutch blogosphere may further be analyzed through its usage of web words, choosing its most distinctive and significant ones as points of departure, as for example Retecool’s jargon or the specific language used by GeenStijl. A final wish is to further develop the migration technique used as a method for studying platform diversification and possibly migration. 23 About the authors Anne Helmond is PhD candidate with the Digital Methods Initiative, the new media PhD program at the Department of Media Studies, University of Amsterdam. In her research she focuses on software-engine relations in the blogosphere and cross-syndication politics in social media. She also teaches new media courses in the Media Studies department. Contact: anne@digitalmethods.net Esther Weltevrede is PhD candidate with the Digital Methods Initiative, the new media PhD program at the Department of Media Studies, University of Amsterdam, where she also teaches. Esther’s research interests include national web studies as well as platform and engine politics. Additionally Esther has been coordinating the DMI Summer schools and is also a member of Govcom.org, a foundation dedicated to creating political web tools. Contact: esther@digitalmethods.net Acknowledgments We would like to express our sincere thanks to Erik Borra for developing custom tools, discussing methods and probing sharp questions, Mathieu Jacomy for his help with creating Gephi maps and giving access to the Atlas tool for analyzing maps and Jan-Willem Hiddink & Robert-Reinder Nederhoed for providing a database dump from the Loglijst. Tools https://tools.digitalmethods.net http://gephi.org/ Wiki Data and detailed methodological sheets for the projects will be made available shortly on https://wiki.digitalmethods.net/Dmi/DutchBlogosphere. 24 References Ammann, Rudolf. 2009a. 'Blogosphere 1998: Analysis.' Tawawa.org (2009) Available online at: <http:// tawawa.org/ark/2009/11/5/blogosphere-1998-analysis.html> Retrieved 1 May 2011. Benkler, Yochai and Aaron Shaw. ‘A tale of two blogospheres: Discursive practices on the left and right.’ Berkman Center for Internet and Society Working Paper Series (2010) Blood, Rebecca. We've Got Blog: How Weblogs Are Changing Our Culture. Cambridge, MA: Perseus, 2002. Blood. ‘How blogging software reshapes the online community.’ Communications of the ACM (2004) vol. 47 (12) pp. 53-55 boyd, danah. “A Blogger’s Blog: Exploring the Definition of a Medium.” Reconstruction (2006) vol. 6 (4). <http://reconstruction.eserver.org/064/boyd.shtml> Bruns, Axel. ‘Methodologies for mapping the political blogosphere: An exploration using the IssueCrawler research tool.’ First Monday (2007) vol. 12 (5) Bruns and Kirchhoff. ‘Mapping the Australian political blogosphere.’ In WebSci '09, 18-20 Mar. 2009, Athens, Greece. Bross et al. ‘Mapping the blogosphere with rss-feeds.’ 24th IEEE International Conference on Advanced Information Networking and Applications (2010) Du and Wagner. ‘Weblog success: Exploring the role of technology.’ International Journal of Human-Computer Studies (2006) vol. 64 (9) pp. 789-798 Graham, Brad L. ‘Friday, September 10, 1999.’ Bradlands. Available online at: <http://www.bradlands.com/weblog/comments/september_10_1999/> Retrieved 3 May 2011. Highfield, Timothy. ‘Which way up? Reading and drawing maps of the blogosphere.’ Ejournalist (2009) vol. 9(1) pp. 99-114. Hourihan, Meg. ‘What We're Doing When We Blog.‘ 2002. Available online at: <http://oreilly.com/pub/a/javascript/2002/06/13/megnut.html> Retrieved 1 May 2011. Kumar et al. ‘Structure and evolution of blogspace.’ Communications of the ACM (2004) vol. 47 (12) pp. 35-39 Geert Lovink. Zero Comments: Blogging and Critical Internet Culture. New York: Routledge, 2007. Meeuwsen, Frank. Bloghelden. Utrecht: AW Bruna, 2010. Quick, William. ‘Tuesday, January 01, 2002.’ Daily Pundit. Available online at: <http://replay.web.archive.org/20020603131137/http://www.iw3p.com/DailyPundit/2001_12 _30_dailypundit_archive.php#8315120> Retrieved 4 May 2011. Rosenberg, Scott. Say Everything. New York: Crown Publishing Group, 2009. 25 Rogers, Richard. ‘Old and New Media: Competition and Political Space’ Theory & Event (2005) vol. 8 (2) Schaap, Frank. ‘Links, Lives, Logs: Presentation in the Dutch Blogosphere’ in Gurak et al. (eds), Into the Blogosphere. Rhetoric, Community and Culture of Weblogs (2004). Available online at <http://blog.lib.umn.edu/blogosphere/> Retrieved 4 May 2011. Schmidt, Jan. ‘Blogging practices: An analytical framework.’ Journal of Computer‐Mediated Communication (2007) vol 12. (4) SIDN. ‘Jaarverslag 2010: Het jaar dat internet het nieuws beheerste.’ SIDN (2010). Available online at: https://www.sidn.nl/fileadmin/docs/PDF-files_NL/SIDN_Jaarverslag_2010.pdf Retrieved 4 May 2011. SIDN. ‘SIDN kondigt uitfasering persoonsdomeinnamen aan.’ SIDN (2007). Available online at: <https://www.sidn.nl/nieuws/nieuwsbericht/article/sidn-kondigt-uitfaseringpersoonsdomeinnamen-aan/> Retrieved 4 May 2011. Stevenson, Michael. ‘The archived blogosphere: exploring web historical methods using the Internet Archive.’ Paper presented at Digital Methods mini-conference, University of Amsterdam, January 2010. Weltevrede, E., Thinking Nationally with the Web: A Medium-Specific Approach to the National Turn in Web Archiving. M.A thesis, University of Amsterdam, 2009. Wikipedia. ccTLD. Available online at: <http://en.wikipedia.org/wiki/Country_code_toplevel_domain#Commercial_and_vanity_use> Retrieved 4 May 2011. 26