Beyenetwork Research Report Open Source Solutions: Managing, Analyzing and Delivering Business Information
Beyenetwork Research Report Open Source Solutions: Managing, Analyzing and Delivering Business Information
Beyenetwork Research Report Open Source Solutions: Managing, Analyzing and Delivering Business Information
by
Mark Madsen
BeyeNETWORK • 1790 30th Street • Suite 310 • Boulder, CO 80301 • 303-339-7255 • www.BeyeNETWORK.com
TABLE OF CONTENTS
About the Survey .......................................................................................................... 1
Methodology........................................................................................................... 1
Respondent Profile ................................................................................................. 1
Executive Summary and Key Findings........................................................................... 2
Introduction .................................................................................................................. 4
A Primer on Open Source ....................................................................................... 4
Community Versus Commercial Open Source........................................................ 4
Open Source in the Business Intelligence and Data Warehouse Market............... 5
Detailed Findings........................................................................................................... 7
Open Source is Maturing ........................................................................................ 7
Indications of Open Source Growth in the Business Intelligence Market........ 7
Use is Mainly for New Projects ......................................................................... 8
People are Using Open Source Across All Software Categories ............................. 0
Databases........................................................................................................ 10
Data Integration.............................................................................................. 12
Business Intelligence....................................................................................... 14
Advanced Analytics ......................................................................................... 16
Who is Using Open Source and How are They Using It? ...................................... 17
Organization Size and the Use of Open Source .............................................. 17
Scope of Use.................................................................................................... 18
What are Organizations Buying? .................................................................... 19
Use by Consultants and Systems Integrators ................................................. 20
Rationale for Use ............................................................................................ 20
Problems Encountered When Adopting Open Source ......................................... 22
Information Resources Found Useful ................................................................... 25
Recommendations ...................................................................................................... 27
Profile of Survey Participants...................................................................................... 28
Pentaho Solution Overview ........................................................................................ 31
BeyeNETWORK • 1790 30th Street • Suite 310 • Boulder, CO 80301 • 303-339-7255 • www.BeyeNETWORK.com
About the Survey
The report presents conclusions and recommendations based on a survey about open
source software for reporting and analytics. It covers all parts of the data warehouse
stack from the database to end‐user delivery. It is written for business and technical
managers who are responsible for delivering reporting, business intelligence (BI) or
analytics, whether part of a BI program or embedded in applications and websites.
The research evaluated the rationale, practices and benefits that are driving use of open
source as an alternative to the traditional vendors in this market. It also looked at the
specific software projects, the scope and status of its deployment, and the challenges
and practices of participating organizations.
The business intelligence, reporting and analytics market has different drivers and
requirements from the typical IT development and applications market. Most open
source studies target open source impacts on operating systems, development tools and
application infrastructure. The point of the research was to get a better picture of the
factors influencing IT adoption in the BI and data warehouse segment.
Methodology
The research for this report is based on a survey and interviews with both consultants
and IT professionals that Third Nature conducted between July and August of 2009 in
addition to solicited survey participation via the BeyeNETWORK, sponsors' email
distribution lists and websites and the annual MySQL conference. More than 1,000
people completed the survey, although not all respondents answered every question.
The aim was to gather a broad perspective of the evaluation, use and practices in both
open source‐centric communities and in the broader IT market.
Respondent Profile
The majority of survey respondents are corporate IT professionals across firms of all
sizes, with consultants being the next largest group. The composition of roles is shown
in Figure 1. Most respondents are in North America and Europe, with 81 countries
represented in the sample. Computer hardware, software and service companies are
the largest industry represented with 22% of the total, with the rest spread across 15
other industry categories.
Figure 1: Roles of survey respondents
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 1
© BeyeNETWORK and Third Nature 2009
Executive Summary and Key Findings
Venture capital flooding into open source start‐ups over the past several years resulted
in an explosion of enterprise‐ready tools and applications. Many of these start‐ups are
focused on the business intelligence market. Open source rose quickly in the
information management market, from almost nothing a few years ago to community
and commercially supported projects for every possible use.
The goal of this report is to explain aspects of the usage, challenges and practices of
organizations adopting open source in the business intelligence and data warehouse
market. Key questions explored in this research were:
What organizations are using open source in the BI/DW segment of the market?
What software is being deployed?
What are the benefits and challenges?
The survey found that interest and adoption are widespread. One‐third of respondents
stated they use open source reporting, data integration or database software for
analytic uses. More than one‐third are currently evaluating open source alternatives.
Only 12% reported no plans to use open source.
The top reason for adoption is still cost savings, although reduced vendor dependence
and ease of integration followed closely behind. Some companies used open source
deployments as a means of keeping their incumbent vendors honest.
Highlights of the survey findings include:
When dealing with database performance problems, people are more than twice
as likely to migrate a data warehouse to an analytic database as they are to a
different traditional database, open source or not.
While this is good news for analytic database vendors, it's not that good because
people are still married to their current choice of database. They are more likely
to change, redesign or replace every other tool in the BI stack before replacing
the database.
In all software categories except advanced analytics, the most commonly used
open source projects were from commercial open source vendors. The
perception that open source is done largely by amateurs and volunteers is not
true in this market.
Experience breeds adoption. Organizations with less than one year of experience
with open source use only one open source product, i.e., a BI tool, while the rest
of the system is built from proprietary software. If the organization has been
using open source longer, it is likely to be using more tools in different
categories. All organizations with more than three years of production use are
using more than one open source product.
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 2
© BeyeNETWORK and Third Nature 2009
Open source is all about new projects. More than half the usage of open source
was for new projects, with minimal focus on replacing existing tools. This is good
news for open source projects and vendors, and potentially bad news for
traditional vendors. It means open source is being adopted in the growth areas
of the market, and that could be taking new customers from traditional vendors
or taking away the mid‐to‐smaller‐sized organizations which have previously
been priced out of the market.
Traditional BI and data integration vendors have been introducing midmarket
programs as they look for growth. By losing new projects or midmarket
companies, they lose revenue in the fastest growing part of the BI market.
There's a fine line between a community edition of an open source project and
crippleware, and some vendors are crossing that line. By holding back features in
the community edition in order to entice people to pay for the professional
version, some vendors are inadvertently turning away customers. Survey
respondents complained that some community versions were feature limited or
scale limited to the point where they couldn't be used on a real project.
The primary complaints about open source are related to maturity of the software, lack
of internal skills and availability of consulting services. Given the rapid pace of
innovation in open source projects, the gap in both core features and maturity between
open source and traditional vendors is quickly closing.
Roughly one‐third of open source users are purchasing services and support from open
source vendors today. Based on this pattern in what is largely an early adopter segment,
expect the commercial open source vendors to continue their growth.
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 3
© BeyeNETWORK and Third Nature 2009
Introduction
A Primer on Open Source
Open source software (OSS) is released under a license that differs from traditional
software licenses. The license guarantees several freedoms: access to the source code,
the ability share to the code with others and the freedom to modify or deploy it as you
wish.
One misconception is that you must share any changes you make to the code. The
requirement to share only applies if you give or sell the software to others outside your
organization. If you redistribute, then any changes or additions you made must be
provided as source code. If you don't redistribute, you do not need to share your work.
Open source software is available as a project which is maintained by a community of
people who write the code and documentation, provide quality assurance and help to
manage distribution. These people may be independent volunteers, contract
programmers or they may all work for a software company that maintains the software.
Vendors use open source to enable a means of software production and distribution
that provides lower operating costs and other benefits back to the vendor.
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 4
© BeyeNETWORK and Third Nature 2009
Commercial open source evolved with recognition that companies are willing to pay for
support, service, and other less tangible items like indemnification or certifying
interoperability, for example of a BI tool with a given proprietary database.
A commercial open source vendor is just like a traditional software vendor, except that
the source code is not shrouded in secrecy. This enables more and deeper interaction
between customers and developers, making the open source model more community‐
focused than the traditional model.
In contrast to the majority of FOSS projects, commercial open source vendors employ
most of the project's developers and expect to make a profit while doing so. They
provide the same services and support that traditional vendors do, frequently with more
flexibility and lower cost. COSS vendors use elements of the proprietary model such as
providing support contracts or selling non‐open source components that can be
purchased in addition to, or in place of, the free version of the software.
The two different versions of software (community and enterprise) can cause confusion.
When you evaluate software it is important to note whether you are looking at the free
or paid version.
Some less scrupulous software companies have obscured this line or are calling
themselves "open source" without an open source license or with software that you can
get only after paying them for services. If you don't have access to the source code but
they give you a software executable for free, then they are really offering a free trial
version. The terms of your use can change at any time.
Unless the vendor delivers software with source code that is under an OSI‐certified
license, it is not open source. There is no regulation of terms or labels so these
"fauxpensource" vendors will continue to operate until there is a backlash.
COSS vendors are still software companies. If you purchase a paid enterprise version
then you'll find that the experience is not substantially different than buying software
from a proprietary vendor. The key difference is the transparency with which COSS
vendors operate.
As one interviewee noted, "We can see bug reports and enhancement requests made by
anyone and help with prioritization by voting on their implementation. The same applies
to features for upcoming releases." This level of transparency is not often found in
proprietary vendors.
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 5
© BeyeNETWORK and Third Nature 2009
As such, it applies to the commodity market rather than innovative or emerging
technologies.
Commercial open source changes the dynamics of software development by bringing
these technologies to market as open source before they have a chance to go through a
standard proprietary growth phase. This acceleration of the commodity process is one
of the biggest effects open source brings to the enterprise software market.
The survey conducted for this research asked people what open source software they
are using or evaluating to assess the popularity of projects in the BI market. The
emergence of commercial open source accelerated development of the software and
open source adoption over the last several years.
In three of the four software categories examined, the top‐ranked open source software
is provided by commercial open source vendors. The holdout category is still relatively
new to most organizations, whether using open source or not.
FOSS and COSS are available for every possible BI application, from traditional reporting
and OLAP tools to advanced data mining and statistics. Even more exotic tools like
advanced data visualization, simulation and web‐based geographic information systems
are available.
Data integration software is a more recent entrant in the developer tools market, with
ETL, data quality and data federation options available. The survey results showed that
these tools are being applied equally in BI and transaction‐processing environments.
Regardless of what you are seeking, it is likely there is an open source project to fill your
need. Enterprise caliber software is readily available for the core data warehouse and BI
components. The primary question is whether it has the features you are looking for.
Software developed as open source is no different from traditional commercial
software. The difference lies in a license that gives you more freedom with the code
than a proprietary license. This means you should evaluate open source tools as you
would any other software, by asking whether it meets your requirements at a
competitive price.
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 6
© BeyeNETWORK and Third Nature 2009
Detailed Findings
Open Source is Maturing
Indications of Open Source Growth in the Business Intelligence Market
The software for most categories has matured, thanks in large part to venture capital
that allowed commercial open source vendors to add important features and fill in
major gaps. Now that the products meet most organizations’ minimum functional
requirements, the vendors hope to capitalize on the economics behind the open source
method of deploying and distributing.
Download statistics for the most‐used projects have in many cases surpassed the million
mark. While downloads are not a good measure of production use, they are a good
indicator of interest. If even a fraction of a percent of downloads turn into active users,
these products will have as many users as the major vendors. The difference is in paying
customers ‐ approximately half of the active users of open source BI and data
warehouse tools did not purchase anything.
One vendor claims that they have more than 300,000 registered users of the software.
While that claim does not indicate production use, the software is freely downloadable
without registration. This means the users voluntarily went through the process of
registration, so it's safe to assume that they were either doing a hands‐on evaluation or
actively using the product. A single digit conversion rate puts them on par with many
companies that have been in this market for twice as long, a considerable growth rate
for any start‐up business.
The current growth rate looks like it will continue. Roughly 30% of respondents said they
are currently evaluating or piloting open source in one of the four software categories
surveyed. About 20% of the respondents indicate that they are "considering," which is
really just an indication of interest.
We are still in a very early stage of open source use in the BI market. 43% of the
respondents to the survey are not using any open source in their BI environments today.
23% Le ss than 1 ye ar
46% 1 to 3 y ears
M ore than 3 years
31%
Figure 2: Length of use
Almost half of those who are using open source in a production system have been doing
so for less than one year, as shown in Figure 2. This indicates that many are still on the
initial learning curve, and the people running in production today should be considered
early adopters.
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 7
© BeyeNETWORK and Third Nature 2009
An indicator of future growth is that experience with open source breeds new adoption.
There is a direct correlation between the years of experience with open source and the
number of different types of tools in use.
No organization with less than one year of use is deploying more than one open source
product. Some respondents said they wanted to see how well the first tool worked
before deciding whether to replace any of their other tools.
All organizations with more than three years of use had at least three different open
source tools in place.
The use of open source for data delivered or obtained outside the organization is
another area showing strong response. 14% of respondents are delivering information
externally. In this group, more than two‐thirds are using open source BI tools instead of
traditional vendors' products.
53%
50%
Database Data Integration BI Adv. Analytics
41%
36%
25%
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 8
© BeyeNETWORK and Third Nature 2009
and midmarket products and sales programs. By losing new projects or midmarket
companies, they lose a long‐term source of revenue in the fastest growing part of the BI
market.
The other possibility is that open source is providing capabilities to an unserved
segment of the market. Traditional vendors' products have been priced too high for
most small and medium‐sized companies. This sample doesn't provide a clear answer. It
appears that a little of both exists.
Figure 7: Purchases related to open source
Database 18% 13% 18% 29% 22%
Interest in all categories is strong and growing as shown by the number of organizations
in the currently evaluating phase. One interesting finding from the survey is that
experience with open source leads to increased adoption of other open source tools in a
sort of virtuous cycle.
This means we should expect to see more use of open source in the BI and data
warehouse stack as more companies gain experience. It's also a sign the proprietary
vendors will face more open source competition in the future.
This remainder of this section examines each of the four software categories in more
detail.
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 9
© BeyeNETWORK and Third Nature 2009
Databases
Production use of open source databases shows the greatest use in this market. Open
source databases have been in existence for many years, while many prominent
projects in other categories are less than five years old.
The nature of analytic workloads is holding back open source database adoption. Most
of the engineering effort for OSS databases is focused on transaction processing.
Analytic use requires better handling of complex queries, large single query data
volumes and variable user concurrency. Overall, interview data shows that open source
database use would be higher if it weren't for poor analytic support capabilities and
lower query performance.
24%
15%
14% 14% 13%
4%
3%
1%
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 10
© BeyeNETWORK and Third Nature 2009
terabyte in size. Even so, 36% are more than 500GB in size, which is not insignificant for
many organizations today.
Poor interactive BI or analytics performance 69%
Poor performance loading data 37%
Poor ETL or data integration performance 33%
Poor batch reporting performance 33%
Figure 6: Complaints related to open source performance
Query performance dominates complaints. This is not a surprise because query is the
most visible element and affects the largest number of people. It's also the most difficult
to diagnose because there are many design and technology factors that can affect query
speed.
Getting data loaded turns out to be less problematic for most people. That these
problems are so low is a surprise because meeting batch ETL windows has historically
been a major complaint in data warehouse and data mart projects. The standard
solutions for these problems apply: database and application tuning and redesign.
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 11
© BeyeNETWORK and Third Nature 2009
database on the market. MySQL's popularity bodes well for MySQL‐compatible and
MySQL engine products aimed at analytic workloads.
MySQL 75%
Postgres 44%
Infobright 11%
EnterpriseDB 10%
BerkeleyDB 8%
Ingres 7%
Firebird 7%
Palo 3%
CouchDB 3%
SQLite 3%
MonetDB 3%
LucidDB 2%
Kickfire 2%
Bizgres 2%
Figure 7: Open source databases in use
When faced with database performance problems, if the choice is to move from MySQL
to an expensive traditional database, less expensive analytic databases become more
appealing. This is borne out in the survey data which shows that people are more than
twice as likely to migrate a data warehouse to an analytic database as they are to a
different database in the same class.
Because most open source analytic databases are aimed at databases less than 5
terabytes in size, they align well with the bulk of the data warehouse market and
particularly with the open source database market.
Data Integration
As a category, data integration (DI) tools are almost as commonly used as databases,
outranking business intelligence tools. Given the relative youth of these projects, it is
surprising that they are as common as they are. The commercially supported open
source integration tools have been available for a much shorter time than open source
BI tools.
The investments in commercial open source tools have had a significant impact on
product maturity, making most of the suitable for organizations looking for "good
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 12
© BeyeNETWORK and Third Nature 2009
enough" tools. These products support all the basics needed in data integration
projects.
Their weaknesses are in the areas of administration, team support and advanced
integration features such as data quality functions dealing with semi‐structured data.
The open source tools are primarily single purpose, like the early ETL tools, although
Talend has been extending its product line with data quality and master data
management features.
Batch ETL for a data warehouse or mart 30%
Operational integration 21%
Data migration efforts 15%
Data quality efforts 15%
Master data management efforts 10%
Low‐latency ETL for a data warehouse or mart 8%
Figure 8: Uses of open source integration software
A factor increasing the popularity of this category is the use of these tools for
operational as well as analytic data integration, which ranks as second most popular in
the list shown in Figure 8.
There are two different ways to use data integration tools: linking transactional
applications or feeding data to business intelligence systems. These uses affect the
approach, methods, feature requirements and best tools for the job.
BI systems are most often loaded in batch cycles according to a fixed schedule, bringing
data from many systems to one central repository. They have relatively large volumes of
data to process in a short time, but have little concurrent loading activity. Most data
integration products were originally designed to meet the specific needs of the analytic
data integration market.
Most application integration projects need data from one or two other systems, not the
many sources and tables feeding a data warehouse. The scope is usually smaller, with
lower data volumes and narrower sets of data being transferred and minimal
transformation required.
These differences in frequency of execution, data volume, latency and scope of sources
are technical elements that differentiate operational and analytic data integration. Data
integration is a small element of an application project, unlike a data warehouse where
DI may consume 80% of the project budget and timeline.
Hand coding is common in application projects because data integration is thought of in
terms of application glue. In BI projects, hand coding is most often a way to save money
on the high cost of enterprise ETL products.
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 13
© BeyeNETWORK and Third Nature 2009
Community open source data integration tools can provide the cost advantages of hand
coding with the productivity advantages of traditional data integration software. The
most popular open source data integration tools in the survey are Pentaho DI/Kettle,
Talend and Jitterbit, all commercially supported products.
Pentaho and Talend make up 75% of the use in this category. One interesting element is
the breakdown by use. While both are used for ETL, Talend is more likely to be used for
operational data integration than any of the others. The full list of tools and their level
of use is shown in Figure 9.
Pentaho DI / Kettle 42%
Talend 33%
Jitterbit 13%
DataCleaner 8%
Red Hat 5%
Apatar 5%
OSDQ 2%
Open Data Quality 2%
Clover 2%
Figure 9: Open source data integration tools in use
These tools are established in the developer market which has been the traditional
stronghold of open source software. Expect open source to be a key element of data
integration (and especially of operational data integration) in the future, similar to open
source use in application development environments today.
Business Intelligence
A detailed breakdown of the use of BI tools that are in production today is listed in
Figure 10. Traditional reporting and dashboards are the most popular uses. This mirrors
what we seen in with non‐open source BI tools in the market.
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 14
© BeyeNETWORK and Third Nature 2009
Static reports 20.7%
Dashboards or scorecards 17.1%
End user or interactive reporting 16.5%
Reporting against an application database 15.9%
OLAP 14.6%
Figure 10: Breakdown of production use for the BI and reporting tools category
Application reporting and embedding are almost as popular, something you do not see
with most of the traditional BI tools. This is partly due to the technical problem:
marrying features to an application requires easy component integration or
customizable code. Most proprietary BI tools are built to be stand‐alone applications,
making them inappropriate for this type of use.
For software providers, another integration need is customizing the BI interface to
match the interface of the application. Software providers also have to worry about the
incremental cost to the product. Leveraging open source can be a zero or low cost
alternative to using BI tools from one of the proprietary vendors.
One interviewee delivering software‐as‐a‐service applications stated that "Without
open source BI tools, I would not have been able to provide reporting capabilities in my
applications. My margins are too narrow to license BI tools from one of the big vendors.
They're also more difficult to manage in a multitenant environment."
Several respondents using open source BI tools embedded in their applications said that
they chose this alternative because it offered neutrality. Because of the BI market
consolidation, partnering with one of the major BI vendors could alienate customers
invested in a competing vendor's applications.
Pentaho and Jaspersoft together are used by three quarters of the survey respondents.
The numbers do not add up to 100% because people in many organizations are using
more than one tool, often for complementary purposes. One other thing to note: Julian
Hyde is the founder of the Mondrian Project and is the Chief OLAP Architect at Pentaho.
Pentaho hosts the Mondrian project, its forums, continuous‐build server, road map and
wiki. Pentaho developers have accounted for more than 90% of the code contributions
and fixes over the past three years. The list of projects is shown in Figure 11.
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 15
© BeyeNETWORK and Third Nature 2009
Pentaho 47%
Jaspersoft 28%
Mondrian (a Pentaho project) 26%
BIRT 19%
Jfree 14%
Series1
SpagoBI 9%
Openl 5%
MarvelIT 5%
Palo 2%
OpenReports 2%
Figure 11: Open Source BI tools in use
One element of open source BI that doesn't appear in the survey data is the people who
chose to develop their own software using open source components. There are SQL
generators, user interface components, graphing libraries and all the other elements
needed for a do‐it‐yourself model.
This is the approach taken by a number of web‐based companies and government
agencies where the number of users is very high and the information delivery
capabilities are well defined or constrained. While this is a relatively small percentage of
the sample, it's useful to know because 14% of the respondents mention delivering data
to external users within the scope of their deployments.
Advanced Analytics
This advanced analytics category is a combination of different types of software that fall
outside the normal "query and reporting" realm, including statistics, data mining,
visualization, modeling and simulation.
Each type of software is different, but they all share a low overall adoption rate (5% in
production in our sample). The low adoption rate is mostly due to the lower applicability
of these tools for many organizations, as well as the level of analytical sophistication
found in businesses rather than the fact that the tools are open source. The most
popular products in use by survey respondents are shown in Figure 12.
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 16
© BeyeNETWORK and Third Nature 2009
R 46%
Weka (a Pentaho project) 42%
RapidMiner 23%
Knime 8%
Graphviz 8%
Orange 7%
Processing 4%
Axiis 4%
Taverna 3%
Cytoscape 2%
Figure 12: Open source analytics tools in use
The R project has long been popular for statistics, so its appearance at the top of the
chart is not a surprise. Weka data mining software is a Pentaho project and pillar of its
commercial offering. Pentaho employs the lead architect and steward of the Weka
project (Mark Hall), drives the road map and release cycles, and is the only company
that can sell a commercial Weka license, including support. Weka has been in existence
for a long time and is often used in university settings, aiding visibility. Unlike all the
other categories, the top two tools are community open source projects. RapidMiner
and Knime are commercial open source products with freely available community
editions.
There is a tremendous amount of analytics software available as open source. The
challenge for most organizations is that the tools are either single purpose, for example
tied to a specific technique, or they are available as libraries of code rather than tools.
This is the case with most data visualization software.
Who is Using Open Source and How are They Using It?
Organization Size and the Use of Open Source
One persistent myth is that small companies are the primary users of open source.
While there are more small organizations using open source today than mid‐sized or
large, as show in Figure 13, the data also shows that medium and large organizations are
doing more evaluations. This is a change from an earlier survey, where small companies
were leading in both areas.
The change is interesting because it reflects a shift in mid‐size and larger organizations
as users of open source BI and data warehouse products. While the small company base
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 17
© BeyeNETWORK and Third Nature 2009
is important, the products appear to be good enough to meet the more stringent
requirements of larger organizations.
This is good news for commercial open source vendors because the largest group of
production users were the least likely to pay for support or services. As companies with
a better ability to pay move into the market, the revenue growth for open source
vendors should increase and with it the quality of service and support.
Small 32%
Using Medium 23% Small
Large 23% Medium
Large
Small 37%
Evaluating Medium 41%
Large 38%
Figure 13: Open source use and evaluation by size of organization
Scope of Use
One common belief about analytics and BI projects is that open source is more likely to
be used by departments in large organizations and across the company in smaller
organizations.
Figure 14 shows that small organizations are more likely than medium and large to do
company‐wide deployments, and large organizations are doing smaller deployments,
supporting this belief.
Small companies and departments of large organizations share similar characteristics:
they are often constrained by budget, they have a smaller user base and their usage is
more uniform, making smaller deployments easier.
Small Medium Large
40%
38%
35%
32%
27% 27%
Department or Division Corporate‐wide
Figure 14: Scope of open source use
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 18
© BeyeNETWORK and Third Nature 2009
Despite this general pattern, there are enterprise‐wide deployments of open source in
large organizations. 40% of large organizations plan to or have deployed a BI or DW
application corporate wide with some open source components, demonstrating a level
of software maturity.
54%
No purchasee
38%
36% Maintenance or support contract
30%
Small Training
29%
23% Consulting or installation services
14%
13% Phone, email or on‐site support from the vendor
53% Commercial license
38%
28% Phone, email or on‐site support from a third party
28% Subscription to value‐added, enterprise features
Medium
22%
31%
9%
31%
58%
45%
52%
33%
Large
24%
33%
6%
21%
Figure 15: Purchases by size of organization
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 19
© BeyeNETWORK and Third Nature 2009
This information should help to budget for an open source implementation. Based on
the size of your organization, you can see what the most common purchases are and
check the prices on these items.
It is notable that 49% of the consultants and systems integrators are evaluating open
source software today, signaling a possible shift in their use.
What the data says is that, far from leading the technology market, systems integrators
(SIs) and consultants seem to trail it, following the money rather than leading their
customers in innovation. Interviews suggest that mostly smaller local or regional
consulting companies are providing services using open source software.
Even with the sudden rise is evaluation, consultants and SIs significantly trail IT
departments. If you are in an IT organization that relies on consultants for project work
then using open source tools will require that you factor the availability of qualified
consultants into your decision. Given these statistics, they are likely to be rarer than you
expect.
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 20
© BeyeNETWORK and Third Nature 2009
obstacle in analytics and BI projects than the total cost of ownership (as reflected in
ongoing maintenance costs). Figure 17 shows the top reasons given for choosing open
source products.
Reduced vendor dependence is surprisingly high in the list of reasons. The benefits
anticipated are more than the obvious avoiding of a vendor's technology lock‐in, for
example, the requirement that one run Windows and SQLServer in order to use
Microsoft's BI tools. Also mentioned were more options to resolve problems,
community support reducing the requirement for vendor aid and using open source to
offset the effect of vendor acquisitions.
Lower acquisiton costs 66%
Open standards 48%
Reduced dependence on a vendor 44%
Lower maintenance costs 43%
Flexibility in deployment 33%
Speed of innovation of the software 32%
Easier to evaluate or procure 32%
Open development process and road … 32%
Extensibility, customizability of software 28%
Access to the source code 28%
Figure 17: Reasons for using open source
The business intelligence and data warehousing market has seen several years of steady
consolidation across all software categories. This consolidation makes it increasingly
likely that a formerly multivendor installation is now entirely dependent on a single
vendor. Many managers view having all of their technology decisions in the hands of a
single vendor as a risk.
In light of recent price increases and restrictions imposed by vendors, using open source
is proving to be a way to reduce dependence and balance the risk of more vendor
acquisitions or unilateral actions like raising prices on a captive customer base.
Other advantages of open source software are the ease of adjusting deployments, for
example adding or dropping end‐user licenses and customizing or extending it to fit
specific project circumstances. Neither of these is as simple with traditional software.
One item related to rationale is what influenced people's decisions. The answers (shown
in Figure 18) reinforce the social and community aspects of open source and the web.
Product reviews and peer feedback were believed to be the most influential items. One
caveat to the data shown is that is a perilous question to analyze because influence is
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 21
© BeyeNETWORK and Third Nature 2009
often rationalized after the fact and people are often unaware of what is really
influencing their decisions.
Product reviews 47%
Prior success with open source software in other systems 34%
Recommendations from consultants_systems integrators 31%
Opinions of independent IT analysts 29%
Conmmunity size or activity 27%
Information from friends 21%
Figure 18: Factors that influence the choice of software
Familiarity breeds success ‐ the number three item was developers' prior success with
open source software. Open source vendors were also rated as trustworthy, appearing
ahead of consultants and industry analysts.
One item that was not listed but appeared frequently in survey comments was the
hands‐on testing evaluation and testing people did in their own time. With open source
tools, evaluations were easier to do than with proprietary software because there were
no restrictions. It was reported to be much harder to obtain evaluation copies of
software from proprietary vendors without first talking to a sales team.
Figure 19: Responses to "Did any open source software fail your evaluation?"
While there were many different reasons cited for the failed evaluations, reasons
clustered around several key issues shown in Figure 20.
While documentation complaints showed up at the bottom of the list, they were the
biggest write‐in complaint in the survey, indicating that this strongly affects some
people. Documentation is something community‐based open source projects often
struggle with. It's one of the gaps COSS vendors are trying to fill to make the software
more enterprise‐friendly.
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 22
© BeyeNETWORK and Third Nature 2009
Difficulty getting answers to questions or problems ranked in the top ten. This runs
counter to the message that open source vendors and communities are more
responsive and quicker to answer questions. We didn't compare this between open
source and proprietary vendors, so we can't conclude that it's any worse than the
situation with non‐open source vendors. However, this is an indication that there can be
challenges, and that there's room for vendors to improve.
p
Installation or configuration problems 41%
Crashing or other reliability problems 32%
Challenges finding training or education 31%
Difficulty integrating into existing technology environment 30%
Limited data scalability 24%
Difficulty getting answers via the community 21%
Difficulty getting answers through vendor channels 21%
Limited user concurrency 17%
Poor / lacking documentation 5%
Figure 20: Problems encountered with open source
Software installation is the biggest source of problems. Often the causes stem from the
component nature of the open source products. There can be more discrete elements
to configure within the environment. Traditional software components are usually pre‐
integrated.
Installation, configuration and reliability problems are directly related to the maturity of
the software, as is scalability, which appears in two different places in the list.
Performance has been a constant complaint with all BI and data warehouse projects, so
we looked at what people were doing to address performance problems. Figure 21 lists
the most common practices for dealing with performance problems in this environment.
The survey response reflects this, showing the most common solution attempted is
tuning the database, followed by throwing hardware at the problem. The number three
result is surprising: change to a different reporting or BI tool. Given that the database
may be the source of the problem, one would expect database technology changes;
however, these appear at the very bottom of the list.
The most common complaint about performance in BI environments focuses on the
database because it is the hub of most activities.
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 23
© BeyeNETWORK and Third Nature 2009
Database or application tuning 38%
Buy more powerful hardware 34%
Change BI or analytics tools 32%
Redesign the ETL or data integration 32%
Limit the amount of data stored in the system 30%
Rewrite the BI application or reports 26%
Change ETL or data integration tools 18%
Limit the number of users accessing the system 18%
Migrate to an analytic database 10%
Buy a specialized accellerator 8%
Migrate to a different traditional database 4%
Figure 21: Addressing performance problems
A less surprising aspect of database change is that people are more than twice as likely
to change from their database to an analytic database. If you are changing the database
because of performance, going to a similar product makes less sense than trying a
database designed specifically to support BI and analysis workloads.
A key element to performance is understanding that tools are not usually the root cause
of the problems. One IT manager said, "Tools are tools. There is no correcting for bad
design by replacing one with another. The design of the system is still more important
than what any one product can do."
We also asked what people felt the key obstacles were for their organization's use of
open source. The results are shown in Figure 22. Lack of internal skills topped the list,
which might influence spending on training, an item that was trending up in the
purchase data.
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 24
© BeyeNETWORK and Third Nature 2009
Figure 22: Barriers to use
Overall, maturity of the software still a top concern. Items like products missing needed
features, worry over longevity, quality problems and lack of support are all related to
maturity of the software and vendors. One contradiction was found regarding
complaints was longevity: several companies were evaluating open source because the
reporting products they were using had been discontinued or the vendor had been
acquired. It is a lesson that there is no guarantee a product will be around.
In general, procurement issues (corporate standards, IT resistance, license and legal
difficulties) are almost nonexistent problems today as organizations have become more
familiar with open source. Another factor which reduces this complaint is that many
companies purchase commercially licensed or subscription versions.
Overall, open source software is good enough to get onto the short list for many
organizations. The very last item of the top 20 barriers was the vendor or project not
meeting the criteria to be included in an evaluation.
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 25
© BeyeNETWORK and Third Nature 2009
Once into an evaluation or proof of concept, the resources listed in Figure 23 were
found to be helpful.
Online articles 53%
Online documentation / wikis 53%
White papers 48%
Online demos 47%
Community forums 47%
Web seminars or screencasts 37%
Blogs 37%
Vendor evaluation / trial support (free) 32%
Print articles 29%
Web‐based training 28%
Third party books or documentation 27%
Vendor support, paid or as part of a subscription 20%
Outside consultant or systems integrator 19%
Software features in a paid "professional" version of the software 17%
Pre‐bundled software (e.g. a database packaged w ith a BI tool) 16%
Classroom training 14%
Support from a third party 14%
Internet relay chat (IRC) 7%
Figure 23: Ranking of usefulness of open source resources
Online content holds the top spots, with third party white papers ranked third (the only
resource in the top five that directly costs the vendors money). Community‐driven
elements in the form of wikis and forums are two of the top five.
In a blow to the idea of third‐party support, this was rated next to last. Equally surprising
was the low rating given to the features found in enterprise or professional versions of
the software. This may be due to the high percent of people still in the evaluation phase
in our sample as well as the 50% who didn't buy anything.
Bundling of components made a difference for some respondents. 16% said virtual
machines or pre‐integrated software bundles were useful because they provided the
ability to get up and running quickly. This was described by several people as a
differentiator to most proprietary software where quick evaluations were not possible.
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 26
© BeyeNETWORK and Third Nature 2009
Recommendations
Open source in the business intelligence and data warehousing field is out of the
innovation stage and moving into the early mainstream, but there are still challenges.
People responsible for evaluating BI and data warehousing tools can benefit from the
following suggestions.
Don't plan to replace existing software with open source. The single biggest
usage scenario is for new projects, not as an attempt to replace other tools.
One of the obstacles to this as a replacement solution is the high cost of
redeveloping reports or integration jobs. Rather than look at saving money
by replacing software, look at gaps in the BI portfolio or data warehouse
stack and use open source to supplement your systems.
Evaluate open source like any other software. It doesn't matter if the
software is free if it doesn't do what's required. Open source software is still
software and should be evaluated against the same set of criteria you would
use with any similar application from a traditional vendor. In evaluations, be
sure to factor in some of the open source elements people listed as benefits.
Open source tools may not be as feature rich as proprietary software, but
offer other potential benefits like time to market, deployment flexibility and
customizability that can make up for this.
Be clear on what you are evaluating. Commercial open source usually comes
in two versions: a free community edition and a paid enterprise edition.
People have evaluated the community edition and ruled it out without
realizing the features they wanted were available in the enterprise edition.
The community and enterprise features can be significantly different, and the
enterprise version is often a fraction of the cost of the alternative products.
Others evaluated the enterprise edition, assuming the community edition
had the same functions and would cost nothing to implement.
Factor consulting into your plan. Finding skilled workers that are
experienced in open source tools may be difficult. Most consulting
companies have not been trained on these tools, so you are paying for their
generic expertise rather than their open source knowledge. If your project is
predicated on external resources, open source tools may not be the best
choice unless you first verify that the resources are available.
Don't focus solely on cost savings. While cost is important, it is only one
factor. The other top‐ranked benefits are reduced dependence on vendors,
ease of integration and deployment flexibility. If you are building a case to
justify an open source tool, it will help to include these other factors. People
often did not mention these as reasons for initial consideration, but as
benefits they discovered later.
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 27
© BeyeNETWORK and Third Nature 2009
Make open source the default option. When in an environment with few or
no tools, open source should be the preferred alternative. It is the simplest,
fastest and likely the least expensive route when compared to hand coding
or purchasing proprietary products. Look to proprietary tools when open
source tools don't have the required features, or when you have products in
the organization already and expanding licenses is not as big an obstruction.
Develop open source policies. Most organizations are adopting open source
in an ad hoc fashion, project by project. While this works, it can also reduce
cost savings by duplicating evaluation effort and maintenance costs. Open
source can bypass the procurement process, leading to situations where
departments deploy their own tools, unaware that someone else has already
done evaluations or deployed different software. There are also some
differences with open source licenses that can put your organization at risk if
your legal department hasn't done proper review.
While the biggest value of open source may appear to be license cost savings, an
important overlooked benefit is time to market. The full purchase cycle for the
enterprise is usually three to six months (or longer). In the time it takes to bring in a
vendor, meet the sales team and get approval to do a proof of concept with the
software, you can download and prototype your entire application in a similar open
source tool.
The advantage to doing this is that your hands‐on experience will tell you whether the
software will work for you. If it works, you can extend and deploy that prototype. Over
time, expect traditional vendors to become more flexible by allowing trials, offering
subscription pricing and using other practices started by open source vendors.
Open source use is growing rapidly in the BI and data warehousing field. As we move
into the early mainstream stage, the software will become more polished and
comparable to traditional products. There are already signs of a shift to mainstream
adoption as consulting companies and systems integrators begin to evaluate open
source for themselves and their clients.
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 28
© BeyeNETWORK and Third Nature 2009
of lower embedding costs, but more importantly because it is easier to incorporate than
traditionally licensed software.
Several companies mentioned that they were able to add missing features to their
software and turn the code over to the vendor for inclusion in the open source package,
thus saving the vendor from having to maintain the code. This is far less likely to happen
with traditional vendors.
Financial services is the largest of the non‐computer industry categories at 11% which is
not a surprise ‐ financial services was an early adopter of other categories of open
source software. Government interest has picked up significantly over the last two
years, now fourth in the list.
Computer Software, Hardware, Services 22%
Consulting, Business Integrator, VAR, Solution Providers (ISV, ASP) 15%
Accounting, Banking, Financial Services, Insurance, Real Estate, Legal 11%
Government (Federal, State, Local, Military) 8%
Healthcare, Medical, Pharmaceutical, Biotech, Biomed 6%
Retail, Consumer Packaged Goods, Distribution, Trade, Wholesale 6%
Education 6%
Media, Publishing, Advertising, PR, Marketing 6%
Manufacturing, Chemicals 5%
Communications, Telecommunications, Cable 4%
Other 4%
Utilities, Petroleum, Oil, Energy 2%
Travel, Hospitality, Recreation, Entertainment 2%
Transportation, Logistics 2%
Agriculture, Mining, Construction, Architecture, Engineering 1%
Aerospance, Defense 1%
Figure 24: Industries and distribution of survey respondents
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 29
© BeyeNETWORK and Third Nature 2009
The bulk of the respondents are from the U.S. and Europe. representing a total of 81
different countries worldwide. The distribution across regions is shown in Figure 25. This
reflects a similar distribution in other areas of open source software.
3% 2%
5%
North America
7% Europe
Central / South America
55% Asia
29%
Africa and Middle East
Oceania
Figure 25: Geographic distribution of survey respondents
The size of organizations represented in the survey is widely distributed, from Global
100 to very small companies. The size of organizations can be measured by revenue or
number of employees. Rather than use company revenue, this report uses employee
count for the size metric because it is applicable across both private, public and non‐
profit institutions, and because it is a more reliable gauge of the scale of an
organization's use of BI and analytics than their revenues.
For this survey, small organizations are considered to be those with less than 100
employees, mid‐sized organizations are between 100 and 2,000 employees, and large
organizations are those with more than 2,000 employees.
The size distribution of organizations in this survey is shown in Figure 26 by employee
count and small‐medium‐large classification.
Less than 100 employees 37%
Figure 26: Revenue and employee size of organizations in the survey sample
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 30
© BeyeNETWORK and Third Nature 2009
Pentaho Solution Overview
This report explores factors influencing IT adoption of open source for business
intelligence (BI); extract, transform and load (ETL), and data warehousing, with an aim
to provide a better picture of how open source is currently being used and determine
organizations future plans for open source BI deployment. Pentaho co‐sponsored this
report to provide evaluators with a clear view of the value open source can bring to
their firms.
Pentaho’s mission since its inception as the pioneer in commercial open source business
intelligence has been to provide modern, end‐to‐end BI and ETL capabilities at
dramatically lower cost compared to traditional market offerings. Pentaho’s model
enables firms to avoid the common pitfalls of BI projects ‐ high cost, risk of project
failure, lack of integration, lack of flexibility and long, expensive implementation cycles.
To counter these BI pitfalls, Pentaho provides comprehensive data integration, reporting
and analytics on a modern, tightly integrated and standards‐based architecture.
Through unfettered access to software, organizations can rapidly deliver BI solutions
and give business users the critical information they need to understand and improve
organizational performance. Pentaho’s commercial open source model minimizes both
the high cost and the risk associated with BI projects by eliminating large, upfront
software license fees and providing comprehensive technical support and product
maintenance via low‐cost annual subscription.
Pentaho has worked with organizations of all sizes to successfully deploy data
warehousing and business intelligence solutions. Pentaho customers have shown
dramatic cost savings over traditional, proprietary offerings as well as significant return
on investment (ROI) through the insight gained from better visibility into organizational
performance, better integration with existing systems and faster time‐to‐results. Visit
www.pentaho.com/about/customers/ for customer success stories that span all
industries and company sizes, such as Cardiac Science, Close Premium Finance, Lifetime
Networks, Mozilla, Orbitz, Sheetz, Speczsavers, The Swiss Colony, U.S. Naval Air Systems
Command and many others. Thank you for taking the time to learn how commercial
open source ETL and BI from Pentaho is changing the IT landscape.
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 31
© BeyeNETWORK and Third Nature 2009
About the Sponsor
Pentaho Corporation is the commercial open source alternative for business
intelligence (BI). Pentaho BI Suite Enterprise Edition provides comprehensive reporting,
OLAP analysis, dashboards, data integration/ETL, data mining and a BI platform that
have made it the world’s leading and most widely deployed commercial open source BI
suite. Pentaho provides support, services and product enhancements via an annual
subscription that can lower total cost of ownership by 90% compared to traditional,
proprietary BI offerings. Since its 2004 founding as the pioneer in open source BI,
Pentaho's products have been downloaded more than five million times, with
production deployments at companies ranging from small organizations to The Global
2000. For more information, visit www.pentaho.com.
About the Author
MARK MADSEN is president of Third Nature, focused on information management, BI
and analytics. Mark is an award‐winning architect and former CTO whose work has been
featured in numerous industry publications. He is an international speaker and manages
the open source channel at the Business Intelligence Network. For more information or
to contact Mark, visit http://ThirdNature.net.
Third Nature is a research and consulting firm focused on new
practices and emerging technology for business intelligence, data
integration and information management.
Our goal is to help companies learn how to take advantage of new information‐driven
management practices and applications. We offer consulting, education and research
services to support business and IT organizations as well as technology vendors.
Open Source Solutions: Managing, Analyzing and Delivering Business Information Page 32
© BeyeNETWORK and Third Nature 2009