Dynamics of Open Source
Dynamics of Open Source
114
VOL. 96 NO. 2 THE ROOTS OF INNOVATION 115
the possibility of multiple equilibria. The same open-source world because it generates good pub-
project may attract few programmers because lic relations with programmers and customers.
programmers expect other programmers will A for-profit firm that seeks to provide ser-
not be interested; or it may flourish as program- vices and products that are complementary to
mers (rationally) have faith in the project. the open-source product, but are not supplied
When we consider the delayed rewards of efficiently by the open-source community, can
working on an open-source project, the ability be referred to as “living symbiotically.” IBM,
to signal a high level of competence may be which has made open-source software into a
stronger in the open-source mode for three rea- major focus of its consulting business, exempli-
sons. First, in an open-source project, outsiders fies this approach. A commercial company in
can see the contribution of each individual, this situation will want to have extensive knowl-
whether that component “worked,” whether the edge about the open-source movement and may
task was hard, if the problem was addressed in even want to encourage and subsidize open-
a clever way, whether the code can be useful for source contributions, both of which may cause
other programming tasks in the future, and so it to allocate some programmers to the open-
forth. Second, the open-source programmer source project. Because firms do not capture all
takes full responsibility for the success of a the benefits of the investments in the open-
subproject, with little interference from a supe- source project, however, the free-rider problem
rior, which generates information about ability often discussed in the economics of innovation
to follow through with a task. Finally, since should apply here as well. Subsidies by com-
many elements of the source code are shared mercial companies for open-source projects
across open-source projects, more of the knowl- should remain somewhat limited.
edge they have accumulated can be transferred The code-release strategy arises when com-
to new environments, which makes program- panies release some existing proprietary code
mers more valuable to future employers. To and then create a governance structure for the
compare programmers’ incentives in the open resulting open-source development process.
source and proprietary settings, we need to ex- This strategy is akin to giving away the razor
amine how the features of the two environments (the released code) to sell more razor blades (for
shape incentives. From the standpoint of the instance, the related consulting services that
individual, commercial projects typically offer IBM and HP hope to provide). In general, it will
better current compensation than open-source make sense for a commercial company to re-
projects, because employers are willing to offer lease proprietary code under an open-source
salaries to software programmers with the ex- license if the increase in profit in the proprietary
pectation that they will capture a return from a complementary segment offsets any profit that
proprietary project. would have been made in the primary segment,
Commercial companies may interact with an had it not been converted to open source. Thus,
open-source project in a number of ways. While the temptation to go open source is particularly
improvements in the open-source software are strong when the product is lagging behind the
not appropriable, commercial companies can leader and making few profits, but the firm sees
benefit if they also offer expertise in some pro- a possibility that if the released code becomes
prietary segment of the market that is comple- the center of an open-source project and is uti-
mentary to the open-source program. Firms may lized more widely, the profitability of the com-
temporarily encourage their programmers to plementary segment will increase.
participate in open-source projects to learn
about the strengths and weaknesses of this de- II. The Sample
velopment approach. For-profit firms may com-
pete directly with open-source providers in the We built a panel dataset of the contributors to
same market. Firms may also be able to learn approximately 100 open-source projects (for
about potential employees when their staff in- full details on the dataset, see Lerner et al.,
teracts with open-source programmers. Finally, 2006). These projects are stratified to overrep-
commercial companies may interface with the resent the largest open-source projects. We extract
116 AEA PAPERS AND PROCEEDINGS MAY 2006
Sample:
20 large projects Started tracking: 05-2001
78 randomly selected projects Ended tracking: 07-2004
98 total projects
min median max
Lines of source code* 1,253 81,671 4,032,921
Wings 3D jEdit Linux
Absolute change in source code* ⫺145,395 18,951 1,628,979
AOLServer Licq Linux
Number of new versions 1 8 20
Dev-C⫹⫹ Koffice Wine
Imprints BZFlag
Kxicq glibc
KDE
the contributors to the project in each new of- about keeping track of contributors, which re-
ficial version of the program that has been re- flects the fact that giving credit to authors is
leased, using a variety of text editing tools. essential in the open-source movement. This
Table 1 summarizes the projects and high- principle is included as part of the nine key
lights that they differ considerably in their size requirements in the “Open-Source Definition.”3
and other characteristics. Open-source projects This point is also emphasized by Eric Raymond
periodically introduce new versions. The num- (1999), who points out “surreptitiously filing
ber of versions introduced between the begin- someone’s name off a project is, in cultural
ning of data collection and July 2004 varies context, one of the ultimate crimes.” This point
between one and 20.2 We obtained information was also emphasized in our conversations with
on the projects from SourceForge (the leading open-source project managers and SourceForge
on-line depository of open-source projects), officials. Each project release was then associ-
press searches, and project Web sites. Key in- ated with a set of e-mails that appeared in the
formation includes the type of license of the archive.4
project, whether venture capitalists funded the We aimed to distinguish individuals who
company, and whether a corporation released were contributing code on their own behalf
some of its code as an open-source project. from those doing so as part of their employ-
For each project, we opened the Tape Ar- ment. Our approach divided the contributors
chive (known as “tarball”) to count the number into five classes based on their e-mail addresses:
of distinct references to each individual contrib- corporate employees, individual hobbyists,
utor. The archive preserves information such as
user and group permissions, dates, and directory
structures. Open-source projects are scrupulous 3
http://www.opensource.org/docs/definition_plain.php
(accessed December 4, 2005).
4
The database went through an extensive cleaning proc-
2
We did an initial analysis using 20 SourceForge ess to remove invalid e-mail addresses and to deal with
projects beginning in May 2001. In January 2002, we ex- situations where there were two e-mail addresses for the
panded the data collection to include the entire sample, same individual. Examples of the decisions made are in
which was tracked until July 2004. Lerner et al. (2006).
VOL. 96 NO. 2 THE ROOTS OF INNOVATION 117
three classes of otherwise “other” contributors, not able to readily assign them. Table 2 presents
unidentified international contributors, and details of characteristics of contributors.
those from organizations with top-level do-
mains (TLDs) denoted “.org” and “.net,” which III. Analysis of Project Contributions
frequently indicate nonprofit and technical Web
sites. We included as corporate contributors all Our initial analysis seeks to understand the
those with a “.com” address, excluding those distribution of contributions to open-source
sites used primarily as e-mail mailboxes, Inter- project by class of contributor, focusing on con-
net service providers (ISPs), or portals (e.g., tributions by corporations and “hobbyists.” Ta-
“hotmail.com”). We also included overseas ad- ble 3 presents some breakdowns, using the most
dresses that are associated with corporations direct measure: the number of contributions by
(for instance, “co.uk” and “caldera.de”). We each class of contributor for various classes of
included as hobbyists contributions by individ- projects. This table presents the proportion of
uals affiliated with universities and govern- all contributions that are corporate.6 The table
ments (again, employing both addresses with also presents the result of F- and t-tests of the
TLDs such as “.edu” and overseas domains like significance of the reported differences.
“umontreal.ca”), as well as those who made The table shows the share of corporate con-
contributions from addresses associated with tributions is twice as large in the largest quartile
portals, ISPs, and mailboxes.5 The remaining of projects as in the smallest quartile. The pat-
categories—those from TLDs “.org” and “.net,” tern is similar, though somewhat less dramatic,
as well as the remaining international do- when we compare the versions divided into
mains—were not classified in either category, quartiles based on their growth rates, defined here
but rather treated separately, because we were as the difference between the number of lines of
code in the current and previous version. Both
5
differences are highly statistically significant.
One complication was posed by sites such as “aol.
com,” which are used by both corporate employees and as
Patterns regarding license type and venture
an e-mail service. We treat these cases as corporate contrib- capital backing are less sharp. The share of
utors. We have experimented with further portioning the corporate contributions is lowest among those
corporate contributors into subcategories, where cases like
“aol.com” will be considered separately. With this further
6
breakout of the corporate sample, the qualitative results are Results looking at the ratio of contributions by corpo-
similar. rate contributors and hobbyists generate similar results.
118 AEA PAPERS AND PROCEEDINGS MAY 2006