Sesion-8-Seizing The Opport-Merged PDF
Sesion-8-Seizing The Opport-Merged PDF
Sesion-8-Seizing The Opport-Merged PDF
Executive Summary 3
Maturity Model for Digital Analytics and Marketing for Financial Services 5
Optimize the Visitor Website Experience 7
American Bankers Association, “ABA Survey Shows More Consumers Prefer Online Banking,” press release, October 2010.
Aite Group, “Online Marketing Maturity Model for Financial Institutions,” press release, February 28, 2011.
J.D. Power & Associates, “2010 US Retail Banking Satisfaction Study,” June 2010.
J.D. Power & Associates, “2011 US Retail Bank New Account Study,” March 2011.
• Cost reductions are essential. Further cultiva- That assessment is corroborated in a study of 154
tion of online customer interaction and trans- banks and credit unions by The Financial Brand, an
actions is essential for firms to reduce the high online portal focused on financial services marketing.
operational costs of physical offices and call The greatest percentage—45 percent—of respondents
centers. Maturation and refinement of online characterized their firms at the “novice” level of online
transactional systems, and digital marketing of marketing, and only 8 percent claimed “advanced.”7
those services, is becoming a higher priority for Moreover, 65 percent of those engaged in online
all subverticals in financial services. marketing replied “not really” or “sometimes” when
• Mobile devices surge in adoption. Firms are asked if they tracked its effectiveness.
challenged to deliver a satisfying experience for
Which Best Characterizes Your Firm’s
mobile users and optimize websites with mobile-
Online Marketing Efforts?
friendly transactional functionality as the use of
smartphones grows exponentially. One example:
The percentage of bills paid by mobile devices is Novice 45%
expected to surge 377 percent from 2010 to 2013,
the Aite Group predicts.5 Intermediate 43%
Aite Group, “How Americans Pay Their Bills,” October 2010.
Aite Group, “Online Marketing Maturity Model for Financial Institutions,” press release, February 28, 2011.
The Financial Brand, “2010 Online Marketing Study,” August 2010.
Forrester Research, “Organizing for Site Optimization,” August 10, 2010.
Though counterproductive, an array of ad hoc tools is The two axes of the growth model for financial services
commonplace among financial institutions. Most were in Figure 2 are 1) the granularity with web analytics
deployed in haste to meet tactical needs in discrete and 2) the degree of integration across digital market-
areas, instead of planning for an integrated digital expe- ing channels. The more granular and integrated, the
rience for customers that would support the strategic more opportunities can be seized for digital marketing
marketing goals of a relationship oriented business. financial services.
Recognizing the risks and limitations of these disparate
Two Uses of Web Analytics:
technologies is a key first step to eliminating the defi- Aggregate and Individual-Level Insights
ciencies they introduce into a firm’s web analytics and
Web analytics is the practice of monitoring and mea-
digital marketing initiatives.
suring customer behavior online, both website usage
Progressive marketing, business, and IT leaders at and marketing campaign response. It covers a broad
financial services firms are transitioning to an integrated scope that can include analyzing and understanding:
platform for web analytics and digital marketing, built
• Drop-off points in a customer’s online application
with industry-specific features and metrics for financial
for a financial product
services. A unified platform can offer a decided advan-
tage in achieving key business objectives: 1) optimize • Response to email or online advertising by
the visitor website experience 2) drive targeted site clickthrough, conversion, application value,
traffic from across digital channels 3) turn site visitors other metrics
into repeat visitors and loyal advocates 4) capitalize on • Campaign effectiveness by segments of
the mobile channel and social media, and 5) maximize customers that can vary by asset level, gender
customer lifetime value across channels. and age, geographic information, and more
In sum, there is great opportunity for financial services • Consumer usage of the mobile channel and
marketers who can rise above the challenges and keep social media and optimizing for those mediums
pace with the leaders in digital today by harnessing the
power of online behavioral data fused with marketing Conventionally, web marketers use web analytics at an
automation technologies across digital channels. aggregate level, reporting on the performance of their
websites and online advertising, so they can adjust
Harnessing the Power of Online Behavior Data their efforts to improve the results. This is an extremely
worthwhile application that can deliver excellent return
Financial services companies collect huge volumes of on investment.
data on the online behavior of their customers, as well
as back-office data on accounts, transactions, cus- However, if marketers do not also leverage web analytics
tomer profiles, and more. Putting this data to use to as a rich source of insights on the digital journeys
increase revenue and deepen customer relationships of individual prospects and customers, they are
requires the synergistic use of web analytics and digital squandering a huge opportunity. Web analytics can
marketing technologies. It demands the development play a far more direct role in engaging customers,
of customer behavioral insights that reflect historic and improving customer experiences, and increasing sales
real-time activity, such as an online application for an by enabling companies to deeply personalize their
insurance policy, brokerage account, or home loan. communications and interactions.
Maturity Model for Digital Analytics and Marketing Two Levels of Digital Marketing: Website-Centric
for Financial Services vs. Integrated Across Digital Channels
For evaluating the range of opportunities that financial Online marketing has traditionally been website-centric.
services have with the synergistic use of web analytics The website has been the portal for all digital inter-
and digital marketing technologies, it helps to think actions and the use of online advertising and email
along the lines of a maturity model, i.e., growth path. marketing has primarily been used for the purpose of
driving traffic to the website.
Today, financial services are increasingly investing in 1) Optimize the visitor website experience through
mobile applications as another portal for customer measurement, testing, and constant optimization
interactions. Likewise, digital marketers can target of website and online advertising
emails and display ads that are personalized based on 2) Drive targeted site traffic from across digital
the preceding behavior of each individual prospect channels by targeting relevant ads and emails
or customer on the website or in a mobile application. based on insight into previous behavior of
As such, ads and emails become an off-portal con- anonymous or registered visitors
tinuation of on-portal experiences. Finally, customers’
3) Turn site visitors into repeat visitors by bringing
conversations on external social networks and
them back to the website by re-targeting through
employees’ interactions on internal social networks
display ads or email, by using insight into
further decentralize where the action is.
previously abandoned products, etc.
To make the step from conventional online marketing 4) Weave the mobile channel and social media
to today’s digital marketing means to orchestrate a into a cross-channel experience
consistent and compelling experience across digital
5) Maximize customer lifetime value across
channels, i.e., on- and off-portal. To create this experi-
channels and customers’ lifecycles by extending
ence, requires fusing analytics with digital marketing
digital marketing to offline channels
Maturity Model with 5 Digital Marketing Milestones Read on to learn real-life use cases and opportunities
for each of these milestones.
Five major milestones for financial services marketers
are plotted across the maturity model in Figure 2, to
indicate the granularity with web analytics and integra-
tion of digital channels required to capitalize on each
milestone opportunity.
Integration across
Digital Channels
Granularity with
Web Analytics
Aggregate Level Insight into Individuals’
Dashboards, etc. Digital Journeys
Figure 2.
Maturity model for digital marketing in financial services.
Optimize the Visitor Website Experience The equation is further complicated by the increasing
Financial service websites—arguably a firm’s most numbers of practical tools and educational resources
powerful marketing vehicle—are uniquely complex. that financial firms offer in efforts to deepen customer
Compared to relatively straightforward sites run by engagement. These resources can include personal
retailers, the websites run by banks, insurers, broker- financial management microsites, investment guid-
ages, and credit unions offer a broader, more complex ance and tools, calculators for lending and insurance,
array of products and services. A large, diversified and others. Though their intent is sound, such initia-
financial services institution will offer dozens of prod- tives introduce the risk of a cul de sac and consumer
ucts and services across multiple lines of business that distraction from the firm’s objective. Simply posting a
can include retail banking, investing, lending, insurance, 401K primer guide or a student loan planning tool and
and more. neglecting to analyze its usage will limit its value.
This is a double-edged sword: While a rich website Systematic measurement is essential across a firm’s
offers firms a wealth of opportunities to cross-sell and website. Ideally done with an integrated analytics and
up-sell across lines of business, it also introduces the content management system platform such as
daunting task of optimizing multiple elements to deliver IBM® WebSphere® and supported by A/B testing to
a compelling cross-channel customer experience and determine the most effective techniques, web analytics
drive business. can deliver impressive results. A study by Forrester
Research found that testing site features, design and
It also means a perpetual balancing act, as firms must creative elements, navigation paths, and content place-
offer the advanced functionality that experienced users ment improved performance by multiple metrics.10
want, while making the site practical and inviting for (See Figure 3).
new users.
Forrester Research noted the converse effect of greater What Top 3 Benefits Have You Realized
from Site Testing?
functionality and diminished usability: “As the largest
banks continue to bring ever deeper functionality to
their secure sites, it is clear that usability has suffered Increased
along the way.”9 conversion
• Continually test and tune site usability and secure 32.9%
average order value
transactional functionality and ensure clear calls to
action and conversion paths across product lines Increased
• Collect and segment site user data to understand registration
which website experiences result in increased Improved customer
customer satisfaction and value over time 17.7%
Forrester Research, “2010 US Bank Secure Web Site Rankings,” November 11, 2010.
Forrester Research, “Organizing for Site Optimization,” August 10, 2010.
An important principle in website optimization is to behavior. This was noted by Observed Online Financial
take nothing for granted. Experience has shown that Innovations, publisher of—
customer behavior will inevitably surprise even the “Because online initiatives don’t have the proven track
most seasoned online marketer. Rigorous practices record of other marketing techniques, it’s important to
and a comprehensive analytics platform are the only measure every conceivable metric to demonstrate the
sure way to ensure you truly understand customer value of the online channel.”11
IBM Coremetrics has helped many financial services firms optimize their websites.
Below are three examples.
1) Account Application Completion
• Low application completion rates despite site redesign
• Used Coremetrics Scenario Analysis to identify points of abandonment
• Optimized form field placement and eliminated unnecessary content
• Moved complex legal content to end of process
• 29% increase in application completions
Online Financial Innovations (, “2009 Planning Guide.”
Drive Targeted Site Traffic from • Cross-sell and up-sell products and services
Across Digital Channels • Increase usage of cost-effective self-service
As financial services move online, marketing of those transactional tools
services needs to move online as well. But there’s a • Reduce costs for branches, call centers,
huge gulf between simply running generic banner ads and online chat
and executing coordinated, cross-channel campaigns
• Enhance brand image and increase consumer
based on analysis of historical customer behavior;
including both website activity and response to online
campaigns. In fact, research indicates that many
financial services firms have yet to take advantage of Digital marketing automation tools should interoperate
online marketing. closely with an analytics solution to drive continuous
measurement, targeting, and optimization: Consumer
Though email marketing has been around for a decade, response is collected by the analytics platform to enable
it is not used by a surprisingly high 31 percent of banks further refinement. Digital marketing tools include:
and credit unions surveyed by The Financial Brand.12
Usage of display ads, paid search, and Facebook
• Personalized email marketing: Email is a proven
pages were also lower than the norm in consumer-
means of communicating with customers and
oriented industries, the survey found. (See Figure 4).
deepening engagement. The best solutions will
integrate with an analytics platform to automate
Which Online Marketing Tactics
emails to select customer segments–those aban-
Does Your Firm Use?
doning application forms, in the market for a loan,
receiving paper statements in the mail, and more.
Email marketing 69%
• Targeted display ads: Display ads targeted to,
Off-site display advertising 54% especially anonymous, consumers’ known
On-site promotions 53% interests and website activity generate far higher
click through and conversion rates than generic
Facebook page 46% banner ads, and at less cost than paid search ads.
Paid search advertising 36% Display ads may be launched based on a
consumer’s browsing at your website in near
Figure 4. Source:
real-time, syndicated across multiple ad networks.
The Financial Brand, 2010 Online Marketing Study This does not require sharing any personally
identifiable information outside the business.
The stakes are high, especially as young Gen Y con- • Paid search advertising: Paid search, some-
sumers, who grew up with the web, look to acquire times called pay per click (PPC) advertising,
insurance, car loans, credit and checking accounts, presents ads based on keywords that consumers
and more. Unless you can engage a young consumer enter into a search engine. The ideal system offers
in ways relevant to his or her digital lifestyle, you’re apt flexibility in campaign creation and management,
to be disregarded as a brick-and-mortar dinosaur. cost-effective keyword bidding, and tuning and
optimization based on real-time results tracking.
Excellence in digital marketing depends fundamentally
on an advanced and integrated analytics platform that • On-site recommendations: Personalizing con-
creates and evolves individual customer profiles com- tent and marketing offers at your site to a custom-
prising the consumer’s digital journey of interactions er’s known interest triggers additional business.
with your website and digital marketing efforts. Digital For instance, if a customer browsed car loans
marketing has been shown to generate double-digit during her last visit, recommendations technology
and even triple-digit ROI by enabling firms to: prominently displays a car loan offer when she
returns. Advanced algorithms automatically gener-
ate intelligent recommendations more effectively
than possible by manual coding. To the degree
The Financial Brand, “2010 Online Marketing Study,” August 2010.
Forrester Research, “Increasing Online Insurance Self-Service Adoption,” February 8, 2010.
Retargeting uses a digital marketing platform and Capitalize on the Mobile Channel and Social Media
cookies that uniquely identify the visitor’s computer The emergence of powerful, feature-rich smartphones
and the pages browsed by that visitor. For anonymous and tablets has made the development of mobile appli-
visitors, re-targeting enables display ads (broadly cations and optimization of websites for mobile device
syndicated via ad networks). For registered visitors with usage of critical importance. Mobile usage of online
permission to market, retargeting also uses personal- banking and other financial services is in a growth
ized email. Messages can zero in on the user’s exact spike, with 54 percent of U.S. mobile users expected to
interest—if the visitor was browsing auto insurance, the conduct transactions with devices by 2015, a study by
ad can promote auto insurance. Mercatus found.14 In fact, Mercatus predicts that more
For a relationship-oriented business, retargeting offers consumers will use mobile banking than online banking
much greater opportunity than mere completion of by 2015.
abandoned online applications. Several other examples Yet financial services firms stand to be frustrated by
of retargeting in financial services, across a customer the mobile channel unless firms adopt a strategic
digital lifecycle, are to: approach to aggressively evaluate opportunities, track
• Offer a cross-sell or up-sell product or service mobile usage at a granular level (e.g., device type,
applications accessed, visit duration, location, operat-
• Promote new/updated account management tools
ing system), and optimize the website and marketing
• Offer online-only or seasonal promotions efforts to delight customers. “Despite increasing activ-
or features ity and more strategic spending, inconsistent data and
• Highlight online statements and bill pay analytics will plague mobile marketers hoping to make
to a new checking account customer a business case for testing emerging opportunities,”
• Reinforce customer-centric brand image Forrester Research observed.15
To the extent that customers are registered and authen- Opportunities in social media. In financial services,
ticated on both their mobile applications and the social media marketing will not become the force that it
website, the digital marketing opportunities previously is in other consumer-oriented industries until financial
discussed for personalized advertising and retargeting services come up with ways to make the channel work
extend to the mobile channel as well. Ads, offers, and to their unique situation. Many people are simply
recommendations shown within the mobile application disinclined to “like” a bank or lender. Even though they
can be selected based on analytical insight into each have many millions of customers, large diversified
customer’s past and current interactions so that they financial institutions typically have only several
are as relevant as possible. thousand Facebook fans.
In fact, less than half (46 percent) of the 154 banks and
credit unions surveyed by The Financial Brand have a
‘The End of Credit Cards’
Facebook page.17 Just 35 percent use Twitter,
Even as financial services firms strive to optimize 25 percent YouTube, and only 8 percent offer an online
mobile applications and capitalize on opportuni- discussion forum. But that’s not to say that social
ties with the mobile channel, they face a looming media is not a worthwhile investment—given its low
challenge in credit card usage. cost of entry and that younger generations have been
In the next few years, consumers will increasingly weaned on social media, social media can be an asset
purchase goods by scanning a smartphone at a that poses nominal risk. And though financial services
register rather than swiping a credit card. Called has been slow to embrace social media, that appears
contactless mobile payments, this transaction likely to change—in 2012, 90 percent of organizations
channel is expected to grow from “practically surveyed by the Aite Group said they would have
none” in 2010 to $22 billion by 2015, according dedicated social media funding in place.18
to the Aite Group in an article, “The End of Credit
Credit unions and brokerages are among firms making
use of social media. Facebook pages provide an
Google has partnered with financial institutions opportunity to promote community events, highlight
and cell phone providers to roll out a test service special offers, engage investors with stock-picking
in 2011 that puts a consumer’s financial account contests, and generally enhance brand image.
information on a smartphone near-field com- As with other digital interactions, the key is to employ
munications (NFC) chip. Using smartphones at analytics that enable you to understand the value of
ATMs is also in the pipeline. This phenomenon your social media followers. Tools that can correlate
poses major implications for financial services typical interaction with your social media vehicles to
and underscores the need for speed and agility subsequent business activity are essential to deriving
to react swiftly to unexpected challenges and that value, and to plotting your social media strategies.
16, “The End of Credit Cards is Coming,” January 4, 2011.
The Financial Brand, “2010 Online Marketing Study,” August 2010.
American Banker, “Facebook, Twitter Become Online Banking Attractions,” December 28, 2010.
Coremetrics helps financial services firms leverage social media for customer engagement and brand
enhancement. One example:
• Improve brand perception of financial institution
• Launch Facebook page focused on community involvement and financial services benefits
• Provide social forum for customer questions, comments
• Social channel referrals had higher loan and account sign-up rate than more expensive paid search
and email channels
• Customers driven to site through Facebook are fastest to add second product to portfolio
Leverage multichannel data. Many financial firms Coremetrics for Financial Services
have built customer relationship management (CRM) Coremetrics, an IBM Company, offers a set of solutions
and data warehousing systems based on customer suited specifically for the financial services industry.
transactional data and activity in offline channels. Con- Through the fusion of web analytics and digital market-
solidating data from online activity with other channels ing automation, Coremetrics empowers marketers to
provides a complete multichannel view of the customer turn site visitors into repeat customers and loyal advo-
that can sharpen both online and traditional marketing. cates by orchestrating a personalized and compelling
Extend digital marketing to offline channels. The experience throughout each customer’s digital lifecycle.
insight gained into customers’ digital journeys can and To achieve this, Coremetrics tracks customers and
should help make offline interactions more relevant and prospects as they interact with a business’ online pres-
helpful to customers. By extending web analytics and ence providing marketers with a comprehensive view
digital marketing into cross-channel marketing auto- into how consumers are interacting with their brands
mation, offline marketing interactions can be rendered online over time and across channels. This unique
more relevant. For example, customers calling into the insight is used to automate real-time personalized
call center can receive an offer that takes into account recommendations, email targeting, display ad target-
their previous website activity, e.g., that they browsed ing across leading ad networks, and search engine
mortgage pages but didn’t make an application. bid management—delivered to customers through any
digital vehicle including social, mobile, and web.
The long-term, even lifelong, relationship that many
customers have with their financial services providers
ups the ante. If you lose a customer because of a com-
petitor’s more effective marketing, you haven’t lost just
a single sale. You’ve lost years of value with that cus-
tomer, and the opportunity to extend that value across
generations as youngsters often select a financial firm
based on their parents’ guidance.
Coremetrics®, an IBM Company, a leading provider of web analytics and marketing optimization solutions helps
businesses relentlessly optimize their marketing programs to make the best offer, every time, anywhere, automati-
cally. More than 2,100 online brands globally use Coremetrics Software as a Service (SaaS) to optimize their online
marketing. Coremetrics integrated marketing optimization solutions include real-time personalized recommenda-
tions, email targeting, display ad targeting across leading ad networks, and search engine bid management. The
company’s solutions are delivered on the only online analytics platform designed to anticipate the needs of every
customer, automate marketing decisions in real time, and syndicate information across all customer channels.
• Insurance
• E-commerce Increase in company’s overall
quote-view to purchase rate
since launch
• Core Reports
• People
• Machine Learning
• Understand the full
customer journey
• Drive business decisions
THE GOALS of product manager’s time
saved every year for those who
use Mixpanel
with user insights
Lemonade is not your average insurance company. With no brokers, no runarounds, and an experience powered by
• Measure and op;mize
acquisi;on channels ar;ficial intelligence, Lemonade offers consumers a new way to get affordable renters or home insurance, all from
• Increase overall company the comfort of the web or a mobile device.
conversion rates
So when a company like Lemonade is upending how a longstanding industry does business, it has to learn from user
SOLUTION behavior quickly and use that data effec;vely across the whole organiza;on. That’s where Mixpanel came in.
With Mixpanel, Lemonade
has the user insights “Everything we do is based on data,” said Gil Sadis, Head of Product at Lemonade. “Execu;ves, product, marke;ng,
necessary to determine analy;cs, customer service, and even underwri;ng teams learn how to use Mixpanel as soon as they join the
company strategy, help company.”
allocate ;me and resources,
and ensure that teams
could make data-informed Lemonade’s ul;mate business goal is to increase its number of policyholders over ;me. With more than 100,00
decisions every step of the policyholders and coun;ng, the company must op;mize partner and paid acquisi;on channels and educate its
way. consumers on how Lemonade is different than the tradi;onal insurance model. Then, the company quickly serves
personalized quotes so that it’s easy for a consumer to purchase a policy that’s right for them.
THE SOLUTION Ninety-five percent of its employees use Mixpanel, and their decisions
are driven by insights they discover in the tool.
“We’ve had Mixpanel implemented since Day 0 – even before our
public launch, to test that our infrastructure was working,” Gil said. “Part of onboarding a new employee at Lemonade is crea;ng a
Mixpanel account. We want people to engage with data, ask ques;ons,
and find the answers in data to make the right decisions,” Gil said.
COMPANIES BEFORE – IT’S THE GO-TO In fact, their reliance on Mixpanel has allowed the company to operate
in an incredibly nimble and unconven;onal way: “At Lemonade, we only
MARKET SOLUTION WHEN YOU NEED focus on the most important stuff, so the en;re org can have the
PRODUCT AND USER ANALYTICS.” biggest impact. And while we have a very focused vision, we don’t
chase quarterly roadmaps, which many people find strange.
Through Explore and People profiles, the company has an end-to-end
understanding of the user: “Because our product and marke;ng is “However, Mixpanel helps us rigorously priori;ze and balance where
omni-channel, it’s really easy to see the en;re journey a user takes – we innovate versus iterate, all based on what we learn from our user
from acquisi;on, to moving through one of our hundreds of funnels, to behavior data. Nearly everyone here uses Mixpanel, so we are all
ul;mately purchasing a policy.” empowered to make informed decisions with data. For example, when
we release a new innova;ve feature and measure it with Mixpanel, we
see its direct impact on the business and how many people are
“THE DIFFERENCE BETWEEN interested in it.
“Then, we can compare that to the impact a more itera;ve product
TOOLS IS THAT MIXPANEL’S MACHINE change has on our users. Since everything is measured in Mixpanel, we
LEARNING WILL NOTIFY YOU WITH have clear evidence about what to care about and how to balance
From the company level down to the individual, trust in and use of the
“In our busy day-to-day, we’re not looking at every funnel to make sure data saves ;me:
everything is working. But Mixpanel’s anomaly detec;on features tells
us when there’s a steep drop or unexpected up;ck. This is exactly why
we want everyone on Mixpanel. When everyone can access the tool, “MIXPANEL SAVES EVERY PRODUCT
we quickly spot issues or no;ce the trends to capitalize on. MANAGER, AT MINIMUM, HALF A
“In every product spec and with every feature we build, we have DAY’S WORK, EVERY WEEK. AND WITH
designated placeholder for Mixpanel. That way, we can determine the
events we track and how this feature is helping us measure and work
toward our larger business goals.” IMPORTANT USER PRIORITIES
THE RESULTS Increasing purchase rate of Extra Coverage by 50%, thanks to
Fueling high growth and produc2vity with Mixpanel anomaly detec2on
As newcomers to insurance, an industry that has operated without
With the launch of Lemonade’s Extra Coverage product, Mixpanel’s
major innova;on for the past 150 or more years, Lemonade has taken a
machine learning helped the team uncover a key insight: new
design and customer-centric approach, and the strategy is paying off. In
policyholders weren’t fully comple;ng the purchase flow.
just one year, the company secured 70,000 policies, serving more than
100,000 policyholders, and coun;ng.
“To finalize a policy with Extra Coverage, a user has to submit their
policy for review. At this stage there’s no need for payment, but
In fact, Lemonade’s rate of acquiring new policyholders doubles every
Mixpanel helped us find that most people didn’t submit it for review;
10 weeks.
they went through the en;re flow and then suddenly stopped,” Gil
In addi;on to its focus on AI and mobile-first design, Lemonade credits
much of this growth to the insights their teams are able to find and
“Aaer anomaly detec;on no;fied us of the staggering drop-off rate, we
leverage with Mixpanel.
dug deeper into Funnels to find that part of the problem was a
browser-based technical issue. In addi;on, the the call to ac;on was User analy2cs for marketers
not clear enough. It felt like you already submiPed something, even
though there was s;ll one last step.” In addi;on to product and UX improvements, Mixpanel has helped the
Marke;ng team measure and op;mize their paid acquisi;on channels
Tackling both the technical and the UX issues led to a drama;c and spend.
improvement: the product team saw a 50% increase in overall
conversion for Extra Coverage. “When we found that a lot of our paid channels came from mobile
devices, we started to direct people to download our app and not go
Improving company’s overall quote-view to purchase rate by through the mobile web flow. However, we saw a steep decline.
250% since launch
“This prompted us to really improve our mobile web flow, instead, and
In the beginning, Lemonade’s new user acquisi;on rate wasn’t always drive app downloads later in the process. The improvements we made
doubling every 10 weeks. to the mobile web flow led to conversion rates that were hundreds of
percent bePer than what we had before,” Gil said.
“When we launched in 2016, we didn’t know how our customers would
behave. Lemonade is a completely new product and insurance buying “THE MARKETING TEAM USES
experience. We did a lot of user tes;ng, but we didn’t really know how
our acquisi;on funnel would perform,” Gil remembered. MIXPANEL THE SAME WAY AS THE
“Post-launch, we saw a huge drop in the funnel where users first see
their quote view. We went straight to Mixpanel. It was easy not only to MEASURE PERFORMANCE PER
understand at what stage users dropped, but also find and connect
with the specific users to ask for qualita;ve feedback to improve the
user experience.” LEVEL.”
SINCE MAKING PRODUCT AND UX “Moving forward they’ll be u;lizing Plagorm even more, especially in
tying marke;ng and email campaign data to user behavior data.
THROUGH MIXPANEL, THE OVERALL That way they can answer ques;ons like: How did the campaign
perform end-to-end? Did the email language convert? And, did these
PURCHASE RATE SINCE LAUNCH HAS campaigns prompt users to take ac;on within the product?”
IMPROVED BY 250%. By syncing mul;ple data sources to Mixpanel, all of Lemonade’s teams
can see the downstream effects and direct impact they have not only
“We made the top naviga;on really lean. It was a bit cluPered at first within the product, but ul;mately the business.
and introduced too much insurance talk – jargon like ‘annual
deduc;ble.’ We simplified everything so if you went through the
experience all you would see in the quote view is the price and a CTA
to buy. Then there was a small arrow ushering you to send you to the
next sec;on.
The Challenge
With marketing campaign metrics and clickstream data coming from a multitude of
& Life Science
diverse analytics platforms (e.g. Google Analytics, Facebook Insights, YouTube, etc.),
the company had no cohesive way to analyze the data as a whole. Their only recourse Sales & Marketing
for obtaining this data was to visit the metric provider sites individually or to request a
report from their marketing agency. Liaison Solution
Liaison Healthcare’s
A solution to consolidate all these campaign metrics into a single unified view was Cloud-based Web Analytics
sorely needed for business intelligence decision-making. Through consolidation, the Dashboard and Data
company would be able to compare effectiveness data as a whole with defined key Integration Platform
performance indicators (KPIs).
The Solution
The customer selected Liaison because it could provide the differentiated value of
faster time to market and flexibility by leveraging its multi-tenant preexisting and
configurable services. In addition, the scalable solution design offered the future
proofed path the client was looking for to eliminate the obsolescence risk associated
with rapidly evolving technology advances.
The end result was a Liaison solution that seamlessly captures data from seven web
analytics providers and harmonizes it into a single data model for access and analysis
via a thin web application dashboard. Built as a platform as a service solution, it
enabled our pharmaceutical client to dramatically increase the speed and agility in
which valuable information is delivered across the business, while maintaining a high
level of user confidence in its accuracy.
Case Study: Large Pharmaceutical
Customer Benefits
Immediately, Liaison’s customer gained greater visibility of their customer’s buying power and what products resonated in which
regions. With such detail at their fingertips, they were able to focus marketing dollars on initiatives that generated actual revenue,
while reducing overall marketing dollars spent due to improved targeting of potential customers.
3157 Royal Drive Tel +1.866.336.7378 +44 (0) 1425 200620 +31 (0) 20 700 9350 All rights reserved.
Building 200, Suite 200 +1.770.442.4900 Finland Sweden Liaison is a trademark of
Alpharetta, GA 30022 Fax +1.770.642.5050 +358 (0)10 3060 900 +46 8 518 365 00 Liaison Technologies.
For Customer Insights Professionals
by James McCormick
November 7, 2017
For Customer Insights Professionals
by James McCormick
with Gene Leganza and Emily Miller
November 7, 2017
Evaluated Vendors Sit Within Three Of The Optimize Customer Experiences With Digital
Seven Digital Intelligence Categories Intelligence
7 Vendor Profiles
›› Web analytics is core to digital intelligence practices. Many digital data management, analytics,
and optimization technologies compete for attention within a modern digital intelligence (DI)
practice. However, web analytics remains the single most dominate technique (see Figure 1). Almost
three-quarters of respondents in Forrester’s Q2 2017 Global Digital Intelligence Platforms Forrester
Wave™ Customer Reference Online Survey used web analytics from their DI platform providers.
Web analytics adoption is significantly greater than the next three dominant analytics techniques:
application analytics (48%), interaction analytics (43%), and cross-channel attribution (41%).
›› Understanding customer web engagement is still critical to business success. The past
decade has seen a significant shift of proportion of internet traffic and customer engagement from
PC web browsers toward mobile apps, mobile browsers, tablet and TV apps, and other internet-of-
things (IoT) devices.3 However, the vast majority of active internet users still interact via browsers,
which also remain the most important digital channel for consumers to make their purchases.4
›› Modern web analytics technologies extend beyond browser analytics. Although understanding
visitor behaviors is still important to CI pros, today’s digital practitioners are using their web
analytics systems for a lot more. The majority also use these systems to aid behavioral targeting
efforts to personalize customer engagements (70%), manage their digital data within a data
warehouse (63%), integrate it with online testing technologies (57%), and perform application
analytics (53%).5
© 2017 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. 2 or +1 866-367-7378
For Customer Insights Professionals November 7, 2017
The Forrester Wave™: Web Analytics, Q4 2017
Incumbent Vendors Lead While Newcomers Invigorate A Mature Market
FIGURE 1 Web Analytics Remains The Most Dominant Digital Intelligence Platform Capability
Recommendations 32%
Spatial analytics 2%
Evaluated Vendors Sit Within Three Of The Seven Digital Intelligence Categories
Vendors that provide web analytics technologies are part of a broader DI landscape of vendors that
either 1) manage digital interaction data; 2) analyze digital interaction and related data; 3) inform and/
or optimize customer interactions based on insights gained from data, analytics, testing, and machine
learning; or 4) provide any combination of these three.6 Forrester has identified seven categories of
digital intelligence, each representing a different combination of capabilities (see Figure 2). The web
© 2017 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. 3 or +1 866-367-7378
For Customer Insights Professionals November 7, 2017
The Forrester Wave™: Web Analytics, Q4 2017
Incumbent Vendors Lead While Newcomers Invigorate A Mature Market
analytics vendors evaluated within this Forrester Wave come from one of three DI categories. These
three DI categories all have strong digital analytics capabilities but are differentiated from each other:
›› Category 2 DI vendors are digital analytics specialists. AT Internet falls into this category. Other
analytics specialists within this group include those that focus on mobile analytics (e.g., Appsee),
social analytics (e.g., Sprinklr), and interaction analytics (e.g., User Replay).
›› Category 5 DI vendors emphasize data management with analytics. Cooladata sits within
this category. Other vendor types within this group include those focused on digital performance
management (e.g., New Relic), streaming analytics (e.g., Keen IO), and tag management (e.g.,
Commanders Act).
›› Category 7 DI vendors offer data, analytics, and optimization capabilities. Adobe, Google,
IBM, Mixpanel, and Webtrekk reside within this category. These vendors were also evaluated for
their DI capabilities in our Q2 2017 Forrester Wave evaluation of digital intelligence platforms.7 The
majority of category 7 vendors have data warehousing technology; provide a combination of mobile
analytics, web analytics, and attribution analytics; and include some form of behavioral targeting.
FIGURE 2 Evaluated Web Analytics Vendors Come From Three Of The Seven Digital Intelligence Categories
Category 3
Digital engagement
optimization technologies
Category 4
Category 6
IBM Category 7
Category 1 Category 2 AT Internet
Digital data Digital
management analytics
technologies technologies
Category 5
© 2017 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. 4 or +1 866-367-7378
For Customer Insights Professionals November 7, 2017
The Forrester Wave™: Web Analytics, Q4 2017
Incumbent Vendors Lead While Newcomers Invigorate A Mature Market
›› Current offering. The capabilities we evaluated included data ingestion (e.g., web browser
instrumentation and visitor tracking); data repository, model, and access (e.g., data storage and
customer profiling); data ownership (e.g., syndication, privacy, security, and portability); analytics
and reporting (e.g., segmentation; attribution, and mobile analytics); dashboards and alerts;
artificial intelligence; web analytics usability and user experience; and web analytics technology
ecosystem (i.e., APIs and first-party DI products and integrations).
›› Strategy. We reviewed each vendor’s strategy, evaluating its product vision, execution road map,
performance, supporting services, and partner ecosystem. Partner ecosystem is critical because
the web analytics solution sits within a much larger DI and customer engagement technology stack.
›› Market presence. The elements of market presence we evaluated include web analytics revenue,
number of enterprise customers, and average deal size. Forrester examines market presence to
provide assurance that evaluated vendors are financially stable and viable for enterprise customers.
Forrester included seven vendors in the assessment: Adobe, AT Internet, Cooladata, Google, IBM,
Mixpanel, and Webtrekk. Each of these vendors has (see Figure 3):
›› A web analytics solution that motivates client inquiries. Forrester clients often discuss the
vendor’s web analytics products through inquiries; alternatively, the vendor may, in Forrester’s
judgment, warrant inclusion or exclusion in this evaluation because of web analytics technology
trends or market presence.
›› A dedicated web analytics software solution. The vendor offers a software solution specifically
built to deliver web analytics functionality. This functionality is core to the solution and is not
simply an add-on to other analytical functionality, such as interaction analytics from session-based
replay data, customer analytics, or insights-driven optimization. Enterprise customers use the web
analytics solution as a standalone software tool.
›› A solution with a complete set of advanced enterprise web analytics functionality. Evaluated
vendors provide a complete set of advanced web analytics functionality, including in-browser
instrumentation for collecting visitors’ contextual and behavioral data, the provision and
management of an extensive set of out-of-the-box metrics and dimensions, and customizable self-
service dashboards.
© 2017 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. 5 or +1 866-367-7378
For Customer Insights Professionals November 7, 2017
The Forrester Wave™: Web Analytics, Q4 2017
Incumbent Vendors Lead While Newcomers Invigorate A Mature Market
›› An enterprise web user base. Ten or more enterprises are users of the vendor’s web analytics
Product version
Vendor Product evaluated evaluated
1. The web analytics solution has sparked client inquiries and/or the vendor has web analytics
technologies that put it on Forrester’s radar. Forrester clients often discuss the vendor’s web
analytics products through inquiries; alternatively, the vendor may, in Forrester’s judgment, warrant
inclusion or exclusion in this evaluation because of web analytics technology trends or market
2. The vendor provides a dedicated web analytics software solution. In other words, the vendor offers
a software solution that has been specifically built to deliver web analytics functionality. This
functionality is core to the solution and is not simply an add-on to other analytical functionality, such as
interaction analytics from session-based replay data, customer analytics, or insights-driven
optimization (e.g., A/B testing or online testing). The web analytics solution is offered to and used as a
standalone software tool by enterprise customers.
3. The software solution has a complete set of functionality for advanced enterprise web analytics
needs. Evaluated vendors provide a complete set of advanced web analytics functionality, including
in-browser instrumentation for collecting visitors’ contextual and behavioral data, the provision and
management of an extensive set of out-of-the-box metrics and dimensions, and customizable
self-service dashboards.
4. The vendor has an enterprise user base. Ten or more enterprises are users of the vendor’s web
analytics solution. Forrester defines enterprise-sized customers as firms with at least $1 billion in
annual revenue.
© 2017 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. 6 or +1 866-367-7378
For Customer Insights Professionals November 7, 2017
The Forrester Wave™: Web Analytics, Q4 2017
Incumbent Vendors Lead While Newcomers Invigorate A Mature Market
Vendor Profiles
This evaluation of the web analytics market is intended to be a starting point only. We encourage
clients to view detailed product evaluations and adapt criteria weightings to fit their individual needs
through the Forrester Wave Excel-based vendor comparison tool (see Figure 4).
Challengers Contenders Performers Leaders
Go to
to download the
Forrester Wave tool for
AT Internet more detailed product
evaluations, feature
comparisons, and
customizable rankings.
Current Google
Cooladata Mixpanel
Market presence
© 2017 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. 7 or +1 866-367-7378
For Customer Insights Professionals November 7, 2017
The Forrester Wave™: Web Analytics, Q4 2017
Incumbent Vendors Lead While Newcomers Invigorate A Mature Market
gh r’s
ei te
w res
Current offering 50% 4.42 3.79 1.82 2.92 1.78 2.13
Data repository, model, and access 10% 4.20 3.00 1.00 3.80 1.60 2.40
Analytics and reporting 30% 4.80 3.95 1.60 3.15 1.50 2.20
Dashboards and alerts 10% 4.00 3.00 2.00 2.00 2.00 2.00
Web analytics usability and UX 15% 5.00 5.00 4.00 2.00 3.00 1.00
Web analytics technology ecosystem 10% 4.00 3.00 1.00 4.50 1.00 3.50
Execution road map 20% 5.00 5.00 1.00 5.00 1.00 1.00
Web analytics revenue 40% 5.00 3.00 1.00 3.00 2.00 2.00
Number of enterprise customers 40% 5.00 4.00 1.00 2.00 2.00 3.00
Average deal size 20% 4.00 3.00 1.00 5.00 2.00 2.00
›› Adobe maintains its dominant position and strength within the web analytics market. With its
Adobe Analytics spring 2017 release, Adobe has sought to emphasize features that democratize
meaningful and actionable digital insights to anyone in the enterprise. It has concentrated on
making the UI more intuitive and building on capabilities that allow the exploration of data
© 2017 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. 8 or +1 866-367-7378
For Customer Insights Professionals November 7, 2017
The Forrester Wave™: Web Analytics, Q4 2017
Incumbent Vendors Lead While Newcomers Invigorate A Mature Market
breakdowns, relationships, and comparisons. While Forrester has seen some success with this
endeavor, some customers still feel that the interface is a bit daunting and that it can be a challenge
to bring new hires up to speed with the product. B2C customers are by far the largest group using
the product and come from a broad range of buyer types and industry verticals.
›› AT Internet has quietly continued to strengthen its position in the market. Sometimes an
unsung web analytics leader, AT Internet prides itself on Analytics Suite’s focus on helping global
enterprises comply with modern data security, privacy, and confidentiality requirements, backed
by enterprise-class data SLAs — not surprising given the vendor’s European heritage.8 The
vendor’s customer satisfaction scores were some of the best for many of the web analytics areas
assessed in this evaluation. However, some customers we spoke to felt that the UI could be made
simpler and is due for a face-lift. Over half of AT Internet’s revenue comes from the media and
entertainment, financial services, and retail sectors. Its biggest buyers within the enterprise are
digital analytics leads, CMOs, and chief technology officers.
Strong Performers
›› IBM wants to embody easy-to-use web analytics. IBM’s Watson Customer Experience Analytics
product enables its customers to optimize websites, marketing, and digital applications to increase
conversion, loyalty, and customer lifetime value. In absolute terms, IBM has seen success with this.
However, relative to some of its traditional web analytics competitors, the vendor has lost ground
in its current offering and strategic approach. The large majority of enterprise buyers for IBM’s web
analytics product are either CMOs or digital/customer analytics leaders. About half come from the
financial services, retail, and insurance sectors.
›› Google continues its foray into the enterprise web analytics market. With its Analytics 360
Suite, Google combines its digital analytics capacities with that of tag management (Google Tag
Manager 360), site optimization (Google Optimize 360), data visualization (Google Data Studio),
a market survey/research tool (Google Surveys 360), attribution (Google Attribution 360), and
audience management (Google Audience Center 360). The product continues to show its strength
in application usability and UX. However, even though it is now part of a suite, the product still
lacks the level of support and capability for data management, data ownership, analytics, and
reporting provided by some of leading vendors in this report. Google’s product appeals to a broad
range of users within CMO and digital analytics teams, which make up roughly half of the buyers.
Many buyers are from retail, financial services, and media, entertainment, and leisure. Google did
not actively participate in this Forrester Wave evaluation.
›› Webtrekk impresses with good web analytics capabilities at a competitive price. A new
entrant into the Forrester Wave, Webtrekk wants to provide easy access to raw data to all types
of users and third-party tools for in-depth analysis. The vendor also boasts a DMP and marketing
© 2017 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. 9 or +1 866-367-7378
For Customer Insights Professionals November 7, 2017
The Forrester Wave™: Web Analytics, Q4 2017
Incumbent Vendors Lead While Newcomers Invigorate A Mature Market
automation tool into which the web analytics product is closely integrated.9 Many customers feel
the product offers great value for the price range; however, some say that the vendor must improve
supporting material, third-party integration APIs, and a bulky UI. The vendor’s products appeal
mostly to buyers from CMO, chief digital officer, and digital analytics teams at B2C enterprises from
the retail, financial services, and media, entertainment, and leisure verticals.
›› Mixpanel is building web analytics capabilities from the ground up for ease and scale. From
its inception, Mixpanel took advantage of the reducing cost of data storage and processing power
to capture large amounts of user detail and event properties to provide deep insights into user
behaviors. However, some customers feel the experience with the UI, dashboarding/reporting, and
administration could be improved. Some customers don’t use the product as their primary tool
for web analytics and insights democratization. Rather, analytics specialists used it for deep-dive
ad hoc analysis. Most buyers come from the chief product officer, CMO, and customer analytics
teams at companies within the media, entertainment, and leisure; consumer services; and high-
tech industries.
›› Cooladata strives for differentiation by measuring websites and beyond. Cooladata provides
a web analytics product with the ability to capture data from many different digital touchpoints
and perform analysis on the fly using a dynamic data scheme. Some customers comment that the
UI needs some work and that it is easier to generate relevant reports in other systems. This partly
explains why some use the vendor’s product less as their primary web analytics tool and more as a
specialist business intelligence tool for products and product teams. Over two-thirds of buyers for
Cooladata’s web analytics product are from chief product officer and chief digital officer teams. The
vendor’s three largest verticals are online gaming, eCommerce, and media.
© 2017 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. 10 or +1 866-367-7378
For Customer Insights Professionals November 7, 2017
The Forrester Wave™: Web Analytics, Q4 2017
Incumbent Vendors Lead While Newcomers Invigorate A Mature Market
To help you put research Translate research into Join our online sessions
into practice, connect action by working with on the latest research
with an analyst to discuss an analyst on a specific affecting your business.
your questions in a engagement in the form Each call includes analyst
30-minute phone session of custom strategy Q&A and slides and is
— or opt for a response sessions, workshops, available on-demand.
via email. or speeches.
Learn more.
Learn more. Learn more.
Supplemental Material
Online Resource
The online version of Figure 4 is an Excel-based vendor comparison tool that provides detailed product
evaluations and customizable rankings. Click the link at at the beginning of this report to
Forrester used a combination of three data sources to assess the strengths and weaknesses of each
solution. We evaluated the vendors participating in this Forrester Wave, in part, using materials that
they provided to us by October 13, 2017.
›› Vendor surveys. Forrester surveyed vendors on their capabilities as they relate to the
evaluation criteria.
© 2017 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. 11 or +1 866-367-7378
For Customer Insights Professionals November 7, 2017
The Forrester Wave™: Web Analytics, Q4 2017
Incumbent Vendors Lead While Newcomers Invigorate A Mature Market
›› Executive briefings. An executive backed by a product team from each vendor presented and
answered questions on the vendor’s product strategy and market sizing.
›› Product demos. We asked vendors to conduct demonstrations of their products’ functionality and
to answer clarification questions posed to them. We used findings from these product demos to
validate details of each vendor’s product capabilities.
›› Customer surveys and reference calls. To validate product and vendor qualifications, Forrester
also surveyed and conducted phone interviews with three of each vendor’s current customers.
We conduct primary research to develop a list of vendors that meet our criteria for evaluation in this
market. From that initial pool of vendors, we narrow our final list. We choose these vendors based on:
1) product fit; 2) customer success; and 3) Forrester client demand. We eliminate vendors that have
limited customer references and products that don’t fit the scope of our evaluation.
After examining past research, user need assessments, and vendor and expert interviews, we develop
the initial evaluation criteria. To evaluate the vendors and their products against our set of criteria,
we gather details of product qualifications through a combination of lab evaluations, questionnaires,
demos, and/or discussions with client references. We send evaluations to the vendors for their review,
and we adjust the evaluations to provide the most accurate view of vendor offerings and strategies.
We set default weightings to reflect our analysis of the needs of large user companies — and/or
other scenarios as outlined in the Forrester Wave evaluation — and then score the vendors based
on a clearly defined scale. We intend these default weightings to serve only as a starting point and
encourage readers to adapt the weightings to fit their individual needs through the Excel-based tool.
The final scores generate the graphical depiction of the market based on current offering, strategy, and
market presence. Forrester intends to update vendor evaluations regularly as product capabilities and
vendor strategies evolve. For more information on the methodology that every Forrester Wave follows,
please visit The Forrester Wave Methodology Guide on our website.
Survey Methodology
Forrester fielded its Q2 2017 Global Digital Intelligence Platforms Forrester Wave™ Customer
Reference Online Survey to 44 individuals who were current clients of the vendors included in “The
Forrester Wave™: Digital Intelligence Platforms, Q2 2017.” We asked each vendor to supply at least
three customers. For quality assurance, we required all respondents to provide contact information and
answer basic questions about their firms’ revenues and budgets. Forrester fielded the survey between
January and February 2017.
Exact sample sizes are provided in this report on a question-by-question basis. Panels are not
guaranteed to be representative of the population. Unless otherwise noted, statistical data is intended
to be used for descriptive and not inferential purposes. During this research, Forrester questioned
© 2017 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. 12 or +1 866-367-7378
For Customer Insights Professionals November 7, 2017
The Forrester Wave™: Web Analytics, Q4 2017
Incumbent Vendors Lead While Newcomers Invigorate A Mature Market
end users about the features and state of their DI practices. We also asked about the value that DI
approaches are currently providing and their intentions to mature such approaches to attain greater
value in their respective firms. This research was intended to generate a qualitative understanding of
the state of continuous optimization.
Integrity Policy
We conduct all our research, including Forrester Wave evaluations, in accordance with the Integrity
Policy posted on our website.
Forrester estimates that insights-driven businesses are growing at an average of over 30% annually, which will enable
them to globally earn $1.8 trillion per annum by 2021. See the Forrester report “Insights-Driven Businesses Set The
Pace For Global Growth.”
Forrester defines digital intelligence as the practice that includes the capture, management, and analysis of customer
data and insights to deliver a holistic view of customers’ digital interactions for the purposes of continuously
optimizing business decisions and customer experiences across the customer life cycle. See the Forrester report
“Optimize Customer Experiences With Digital Intelligence.”
The Global Mobile Report from comScore shows that just less than three-quarters of the global digital population use
their desktops to access the internet. And a significant proportion of the population in many countries still only uses
desktop computers. Source: Ben Martin, “The Global Mobile Report,” comScore, September 12, 2017 (https://www.
Seventy-eight percent of internet users in the US made their most recent purchase via computers and browsers.
Source: Forrester Data Consumer Technographics® North American Consumer Technology, Media, And Telecom
Customer Life Cycle Survey, Q2 2017 (US).
Source: Forrester’s Q2 2017 Global Digital Intelligence Platforms Forrester Wave™ Customer Reference Online Survey.
Forrester has organized DI vendors into seven categories based on their digital data, digital analytics, and digital
optimization capabilities to guide decision makers on the combination of technology vendors they need to partner with
to build a mature DI practice. See the Forrester report “Vendor Landscape: Digital Intelligence Technology Providers
You Should Care About.”
In Forrester’s 26-criteria evaluation of DI platform providers, we identified the 10 most significant ones — Adobe,
Cxense, Evergage, Google, IBM, Localytics, Mixpanel, Optimizely, SAS, and Webtrekk — and researched, analyzed,
and scored them. See the Forrester report “The Forrester Wave™: Digital Intelligence Platforms, Q2 2017.”
© 2017 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law. 13 or +1 866-367-7378
We work with business and technology leaders to develop
customer-obsessed strategies that drive growth.
Products and Services
›› Core research and tools
›› Data and analytics
›› Peer collaboration
›› Analyst engagement
›› Consulting
›› Events
Client support
For information on hard-copy or electronic reprints, please contact Client Support at
+1 866-367-7378, +1 617-613-5730, or We offer quantity
discounts and special pricing for academic and nonprofit institutions.
Forrester Research (Nasdaq: FORR) is one of the most influential research and advisory firms in the world. We work with
business and technology leaders to develop customer-obsessed strategies that drive growth. Through proprietary
research, data, custom consulting, exclusive executive peer groups, and events, the Forrester experience is about a
singular and powerful purpose: to challenge the thinking of our clients to help them lead change in their organizations.
For more information, visit 136199
Items per order
• New Visitor
Conversion Rate
• Promote targeted content for users
rate • Return Visitor
• Make site more navigable
Figure 2 : Hierarchy of steps in implementing • Landing Page • Landing Page Conversion Rate
Bounce Rates Load Times • Visitor Loyalty &
web analytics emergence strategy • Availability Visitor Recency • Fix issues in the steps/pages where
• Speed
• Geo specific page users are exiting
load times
Steps in implementing web • Improve SEO to increase site traffic
analytics Few common metrics obtained during this
from external search engines
This section provides key steps for step for an e-commerce site is given below: • Understand site performance in
implementing the Web analytics. different geographies
These metrics need to be carefully selected
Essentially any effective web analytics based on the critical business goals and • Personalize the information and flow
implementation should cover these three flows/functionalities. The list should based on user’s interest/usage history
steps: also need to be refined based on the
industry domain. For instance checkout
• Acquire the key metrics A business framework for
abandonment rate is an essential part
• Analyze the information acquired
of e-commerce site whereas it is not
automated web analytics
• Act based on the analysis applicable for information display sites. implementation
Following diagram depicts these three Now we have understood the AAA model
steps with sample metrics information. B. Analyzing of web analytics implementation, we can
Once the required information about key now focus on implementation details. A
metrics is acquired, the business analysts typical implementation process involves
Acquire Analyze Act need to analyze to make sense of the following sequence of steps:
• Site usability • Site usability • Increase obtained information. Most of the web • Business identifying metrics
• Sources • Sources
• Enhance site analytics frameworks provide intuitive
• Visitor Profile • Visitor Profile design • Business providing detailed
• Conversion • Conversion • Design effective visualizations and dashboards to provide
statistics statistics
campaigns requirements to implementation team
• Promote targeted
content for users holistic view of the captured information in
• Make site more
navigable near real-time. Following list provides the • Implementation team updates relevant
• Fix issues in the
steps/pages information that can be analyzed from the page tags and performs testing
where users are
Improve SEO
metrics captured in the previous step: • Operations team deploys the updated
• Understand site
code to production
• Personalize the Site usability Sources
information statistics
• Information • Campaign • Visitor stickiness As we can see that above process involves
discoverability effectiveness • Transaction
multiple teams and there is definitely
Figure 3 : The AAA framework of web • Issues with page • SEO effectiveness Completion/aba
design • External ad ndonment rate a lag when the business decides to its
analytics implementation • Site interactivity effectiveness • Exit rate
• Path/Information steps/pages actual implementation in production
Visitor Profile
environment. However in some mission
effectiveness • Search
A. Acquiring effectiveness critical applications the lag is not an
• Geo/browser
acceptable option.
This step consists of acquiring the metrics visitor percentage
related information. The metrics can be This section discusses methods and
either obtained by mapping them to techniques to completely automate the
business goals (intended strategy) or can C. Act above sequence of steps. We will also
be obtained from markers at key flows and go one step ahead and eliminate the
After the analysis, the business team can
transactions (emerging strategy). involvement of other teams. So when
come up with an action plan to “act” on
business decides the changes to its
the analysis. This is the crucial step in the
web analytics framework it can use this
model. Following is an indicative list of
framework which deploys the changes
actions that can be taken from the analysis
2. U
pon page load the required tracking Business team has to specify that the Let’s consider another scenario wherein the
code/JS variables are populated. products landing page wants a “page visit” business wants to track the click of button.
tracking and also the values for the above The business team wants to specify the
3. T he tracking code will be sent to the values as given below:
variables. They can specify that details in
web analytics server as an image
the interface: • Page: The URL of the page
request. The code is sent when the
Page Analytics Tracking Code
appropriate trigger happens (page load • Section/Module: The module which URL=D_url
or event) Browser= D_browser contains the button
User = D_user
4. W
eb analytics server builds a near Geo = D_geo
Locale = D_locale
• Element: Id or class name of the button
real time reports based on the data it Campaign = S_1234
• Event name: Click or mouseover
receives.* productName = D_prodname
• Tracking code: Tracking code as
Web analytics implementation framework Table 3 : Framework interface details specified in the previous table.
is built on top of these steps and provides
The interface also offers highly intuitive
following additional capabilities: Let’s examine what the above table does:
features to visualize the sections while
• Automatic injection of tracking code The “page” column indicates to the web
specifying markers. For instance they
values. For instance population of analytics framework that the URL specified
can preview the section and component
values related to locale, user attributes needs to be tracked. In the “Analytics
(button) for which the tagging is specified
etc. Tracking code” column the business team
to ensure that the marking is done for the
can also specify the values for tracking
• Automatic addition of the events correct component.
that need to be tracked. For instance Specifying tracking code in the interface
tracking the click of a new button. Note: The details of tracking code is
is the only key step that business needs to
technical; to abstract the inner details of
• Automatic re-deployment of the perform. Once it is done they can submit
this tracking code from business team, the
updates to all publishing servers. the details to the framework and it takes
interface pre-populates the tracking code
These steps would essentially equip a care of handling all subsequent steps till
names required for tracking. Business team
1. Eric T. Peterson, “Web Analytics Demystified: A Marketer’s Guide to Understanding how Your Web,”
2. Avinash Kaushik, Web Analytics 2.0: The Art of Online Accountability and Science of Customer.
3. Jason Burby and Shane Atchison, “Actionable Web Analytics: Using Data to Make Smart Business Decisions”
5. Dennis R. Mortensen, “Yahoo! Web Analytics: Tracking, Reporting, and Analyzing for Data-Driven”
7. Alistair Croll and Sean Power “Complete Web Monitoring: Watching Performance, Users, and Communities”
© 2018 Infosys Limited, Bengaluru, India. All Rights Reserved. Infosys believes the information in this document is accurate as of its publication date; such information is subject to change without notice. Infosys
acknowledges the proprietary rights of other companies to the trademarks, product names and such other intellectual property rights mentioned in this document. Except as expressly permitted, neither this
documentation nor any part of it may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, printing, photocopying, recording or otherwise, without the
prior permission of Infosys Limited and/ or any named intellectual property rights holders under this document.
Key words: web analytics tools, web metrics, user satisfaction, business models, survey
of the IT and marketing sectors in Croatia
Received: October 5, 2014; accepted: October 5, 2015; available online: October 31, 2015
DOI: 10.17535/crorr.2015.0029
1. Introduction
The world is becoming increasingly aware that the Internet is evolving rapidly
and constantly growing as more and more users get online. A presence in the
web sphere is necessary for all organizations and businesses. The Internet
provides numerous multimedia features enabling and changing the way
organizations communicate with their customers, suppliers, competitors and
employees [8]. The web sphere has a direct impact on a user's perception of
Corresponding author.
business success [4] and the strategic importance of web context for modern
business. It also shifts numerous business activities towards the web creating in
the same time new context of business models so called web business models.
According to [10], a business model is described as a business method used by a
particular company to generate revenue and add new value to its
product/services. The same author has also distinguished nine basic categories
of web business models such as: (1) brokerage model, (2) advertising model, (3)
model of information agent, (4) commercial model, (5) manufacturing model, (6)
affiliate/collaborative model, (7) virtual community model, (8) subscription
model and (9) utility/ancillary services model.
Within these business models five common goals [10] can be identified:
• Selling products or services online and the measuring the outcomes by
the number of products sold or services
• Creation of potential client databases and measures of the outcomes
based on the number of collected visitor contacts via web sites
• Content publication directed towards attracting as many visitors as
possible and thereby increase revenue from advertising
• Providing information to the website visitors
• Company branding
Without proper web metrics applied to a business model on a website, it is
almost impossible to measure the effects on visitors, hence proper the proper
choice of a web analytics tool is important. On the other hand, gaining insight is
also important, as to what tool is apt for unique user needs. Based on what has
been said so far, the research task is to investigate the following: Web analytics
tools track and improve a user’s satisfaction with web-based business models.
pages, images or PDF files [3]. On the client side, site tagging is carried out
using JavaScript code inserted into every web page and is run and recorded
each time a user opens a tagged webpage. Visitor behavior is then recorded in a
separate file [12].
Furthermore, the purpose of analyzing data (c) is to transform data into
information useful for a decision-making process [12]. In that sense, special
attention should be given to selecting appropriate web analytics tools while
taking into account a company’s specific characteristics and goals, as well as
employing the staff who are competent in “discovering” useful information for
supporting decisions that are based on large amounts of acquired data.
Finally, reports are generated (d) based on selected metrics outputs which
in turn are useful for company management.
Data originating from the Internet offers relevant information on website
traffic, website transactions, server performance and information submitted by
users themselves [9]. Understanding of the web and website optimization
provides a more adapted approach to a target audience with the goal of
increasing conversion rates [12], as well as customer loyalty [4]. Analyzing
website traffic provides insight into the number of visitors, their geolocation,
visitor locations, time spent on websites and other parameters. Web analytics
also provides other advantages such as increasing efficiency and cost reduction
[3]. Marketers can also find web analytics data useful for improving
products/services and evaluating the success of a marketing campaign. In
addition, web designers and web developers use such data for improving website
usability and consequently, website user satisfaction. Web analytics provides
company management with the insight into how to generate revenue from a
website, how to create appropriate user experience and improve its competitive
advantage [6], as well as to support continuous improvement and
competitiveness [12].
The said author proposes a definition of web analytics as follows: the
analysis of qualitative and quantitative data on the website in order to
continuously improve the online experience of visitors, which leads to more
efficient and effective realization of the company’s planned goals. Quantitative
data provide insight into visitor behavior such as the previous web page prior to
reaching the actual website. In addition, the acquisition of qualitative data
provides answers as to why visitors behave in a certain way. Continuous
improvement of online user based on information obtained in web analytics is a
key aspect of the web analytics concept.
Improved business results based on decisions supported by information
gained from web analytics certainly justify further expenditure in web analytics.
Web analytics is not a technology for just reporting, but a cyclical process of
website optimization which, among other things, measures costs, identifies the
376 Ivan Bekavac and Daniela Garbin Praničević
Figure 1: Activities in the web analytics process, Waisberg and Kaushik [12]
Web analytics tools and web metrics tools: An overview and comparative analysis 377
A variety of web analytics tools have been developed and are available on the
market that aim to obtain quantitative and qualitative data as a basis for the
decision-making process, The author [11] has classified web analytics tools into
five categories:
1. Traditional web analytics tools that mostly relied on clickstream data
obtained by the visitors themselves, competitors and data from the
company’s internal sources.
Clickstream data generally corresponds to the question “what happens
on websites” or “visitor behavior while browsing the website” and “how
many conversions have been achieved on the website”.
2. Web analytics tools that track performance on social networks
3. Web analytics tools for gathering visitor feedback aim to answer the
question "reasons for the visitor behaving or not behaving in a certain
4. Web analytics tools for mobile websites with a growing importance in
line with an increase in website turnover caused by the use of mobile
devices. These tools provide insight into visitor behavior on websites
accessed via mobile devices similar to traditional web analytics tools,
and are necessary for achieving compatibility with mobile devices [5].
5. Web analytics tools for experimenting, testing and find optimal
technical or design solutions that should improve visitor satisfaction.
In terms of the process of selecting a web analytics tool, the working team is
responsible for the following [7]:
• Distinguishing whether the company needs to implement either
reporting or an analyzing process in its business model. Accordingly,
certain ineffective tools are eliminated.
• Assessing a company’s temporary IT capabilities.
• Taking into consideration web tool features in line with a company’s
Here, the focus is reduced solely to traditional web analytics tools based on
clickstream data within can be used to identify two categories based on the form
the tool is available: first, the software is installed on an organization’s
computers and, secondly, as a service (SaaS - Software as a Service) provided by
ASP - Application Service Provider [4]. The traditional web metrics tools are
available on the market as open source or a commercial package. Each of these
has its advantages and disadvantages (Table 1).
378 Ivan Bekavac and Daniela Garbin Praničević
The online source for each presented web analytics tool is found at the end of this paper.
Web analytics tools and web metrics tools: An overview and comparative analysis 379
The wide range of web analytics tools makes the selection process more
complex and time consuming. Accordingly, selecting the appropriate tool should
take into consideration a company’s unique characteristics. In the process of
selecting the web tool, the team using the web tool should consider usability,
functionalities, technical details and the total cost of the tool. In other words,
the market should be studied while focusing on the tool’s features such as the
possibility of installing and deploying software locally, customer support, costs,
data segmentation possibilities, download options, ownership of collected data
and the possibilities of integrating data from other sources in the actual web
analytics tool.
The common features of web metrics including collecting specific visitor actions
and the exclusion of search engine robots that search content on the website
while indexing it. Effective web metrics has to be based on generally accepted
terms, definitions and practices [13]. Web analytics incorporate web metrics,
thus providing benefits for online businesses [14] such as the ability to analyze
and increase sales, ability to track revenue generated by the site, ability to
identify exit pages, and consequently improving website content, the monitoring
of visitor traffic and detection of website errors. The most common types of web
metrics are available as option in web analytics tools as presented in Table 1
The empirical section of the paper presents the results of a one-month research
(March 2015) conducted among the employees from 200 Croatian IT and
marketing firms. Employees were asked to assess their satisfaction with web
analytics tools used by their company and for the associated business model.
Therefore, the questionnaire was created using the Google Form option on
Google Drive and was sent out via e-mail. The return rate was almost 54%
i.e.107 questionnaire were completed.
The methodology analyzed the use of web analytics tools and user
satisfaction, and included descriptive statistics, Friedman test, bivariate
correlation and multiple regressions. The software package SPSS Statistics 17.0
software package was used for statistical processing and calculations.
Descriptive analyses revealed that the major part of survey respondents
were male (58.9 %), in the age group of 21 to 30 years of age (73%), university
educated (85%), introduced as users of web metrics for business purposes (67%).
Web analytics tools are used for different purposes by the respondents:
marketing (75.7%), management (36.4%), web development (13.1%) and others
fields (4.7%). The most frequently used web analytic tool was Google Analytics
(93, 5%) with the other tools (6, 5%) being Webtrends Analytics, FireStats,
Webalizer, Tableau, Flurry and ARIS Connect.
When comparing the frequency of using web analytics tool for activities
such as measuring, collecting, analyzing and reporting data, it was observed
that web tools are mostly used for collecting (mean= 3.43) and analyzing data
(mean = 3.75). Each activity also involves analyzing the user’s satisfaction with
using a web tool in regard to proper support for particular activities. The
respective mean values are presented in Table 3.
Frequency Satisfaction
Measuring 3.07 3.64
Collecting 3.43 3.75
Analyzing 3.35 3.73
Reporting 2.96 3.52
Table 3: Mean values of frequency and satisfaction with web analytics tools according
to activities (N=107)
382 Ivan Bekavac and Daniela Garbin Praničević
In order to explore which activities that are supported with certain web
analytics tool contribute more to the level of satisfaction with the business
model integration ability, the multiple regression is applied. It was found that
activities such as collecting and reporting activities, significantly contribute the
level of satisfaction with a business model. Table 6 indicates that the users are
significantly satisfied with the integration of web analytics tools in business
models during data collection activities (p=0.035) and the reporting of data
Coefficients a
Unstandardized Standardized
Model Coefficients Coefficients t p
B Std. Error Beta
(Constant) 1.821 0.282 6.447 0.000
Satisfaction_measuring -0.009 0.079 -0.013 -0.109 0.913
Satisfaction_collecting 0.188 0.088 0.258 2.133 0.035
Satisfaction _analyzing 0.076 0.083 0.107 0.913 0.363
Satisfaction_ reporting 0.143 0.077 0.220 1.864 0.065
**. Dependent Variable: Satisfaction of web analytical tools integration/models
Table 6: Multiple regression, the business model using web analytics tools (N=107)
The empirical results based on the analysis of data collected from 107 survey
respondents indicate that web analytics tools are well accepted and applied in
IT and marketing companies. A descriptive statistic output revealed that
acceptance of web analytics tools conforms to any other acceptance of
technological innovation. In this survey, younger users dominate the age group
of users, possibly implying that the use of web analytic tools is more popular
among the younger generation [2]. The use of web analytics tools in business
fields, as expected, is most frequent in the marketing industry.
Specifically, a comparison of results points out that web analytics tools are
mostly used for data collection and analysis (Table 3), indicating the highest
observed correlation corresponds to satisfaction with measuring and analyzing
activities (Table 4), and evidently there is some room for improvement in
functionalities regarding user satisfaction in collecting data. The
recommendation is that software companies developing tools should place
additional focus on this particular set of functionalities. The survey has shown
that data collection is the most frequent tool activity and any improvement or
enhancement in this activity could have a significant impact on user
Moreover, according to the obtained results, web analytics tools are most
frequently used in advertising and commercial models where users also expressed
satisfaction. Additional results indicated that the highest correlation between
the usage frequency of web analytics tools and the satisfaction with such tools is
observed in the brokerage model and in the advertising model. The expectation
was that more frequent usage of web analytics tools in the advertising model
implies a higher user satisfaction. This research has shown that use in the
brokerage model is low, and the correlation between usage frequency and usage
satisfaction of web analytics tools is the highest. It implies that web analytics
tools in the brokerage model are not recognized enough and promotion
campaigns for additional use in the model is required. The results providing a
correlation between frequency and satisfaction for the mentioned activities
(Table 4) and a correlation between frequency and satisfaction in the business
models (Table 5) are as expected.
Finally, emphasis should be placed on the fact that user satisfaction with
web analytics tools used for the collection of data and reporting of activities
improves significantly when such tools are integrated in the business models.
The conclusion that the authors stress, based on the outcome of theoretical
research, is that web analytics tools and the associated web metrics as statistical
indicators of website activity can potentially improve user satisfaction if a
Web analytics tools and web metrics tools: An overview and comparative analysis 385
business model’s website. Regardless of the fact that some of analyzed web
analytics tools and the associated web metrics are freely available, whereas
others are not, it is undisputable that each of the tools can be integrated into
the respective business models. Successful implementation of web analytics tools
requires proper selection given that each website is unique and determined by
the nature of related business model and its supporting technologies. Thus
focusing on the proper either tools either metric, form the basis to strengthen
management support and, that may imply better business results.
The empirical research results based on the perception of web analytics
tools indicate that web analytics tools support user satisfaction on the web of
business models although further modifications are evident, at the technical and
organizational level.
Future studies in this field should include an assessment of web analytics
tools based on clickstream data for web analytics tools featured to track
performance on social networks or mobile devices, as well as tools for collecting
feedback from visitors, and for conducting different testing and experiments.
[1] Burby, J., Brown, A. and WAA Standards Committee (2007). Web Analytics
Definitions – Version 4.0, Web Analytics Association.
[2] Chunga, J. E, Parkb, N., Wangc, H., Fulkd, J. and McLaughlind, M. (2010). Age
differences in perceptions of online community participation among non-users: An
extension of the Technology Acceptance Model. Computers in Human Behavior, 26,
6, 1674–1684. doi:10.1016/j.chb.2010.06.016.
[3] Clifton, B. (2010). Advanced Web Metrics with Google Analytics (2nd ed.). Indiana:
Wiley Publishing, Inc.
[4] Creese, G. and Veytsel, A. (2000). Web Analytics: Translating Clicks into Business.
Boston: The Aberdeen Group, Inc.
[5] Gupta, R., Mehta, K., Bhavsar, K. and Joshi, H. (2013). Mobile web analytics.
International Journal of Advanced Research in Computer Science and Electronics
Engineering (IJARCSEE) 2, 3, 288–292.
[6] Kaushik, A. (2007). Web Analytics: An Hour a Day. Indiana: Wiley Publishing, Inc.
[7] Kaushik, A. (2009). Web Analytics 2.0: The Art of Online Accountability and
Science of Customer Centricity. Indiana: Wiley Publishing, Inc.
[8] Omidvar, M. A., Mirabi, V. R. and Shokry, N. (2011). Analyzing the impact of
visitors on page views with Google Analytics. International Journal of Web &
Semantic Technology (IJWesT), 2, 1, 14–32. doi:10.5121/ijwest.2011.2102.
[9] Peterson, E.T. (2004). Web Analytics Demystified: A Marketer's Guide to
Understanding How Your Web Site Affects Your Business. Celilo Group Media and
[10] Rappa, M. (2010). Business models on the web. [Accessed on 12 March 2014].
386 Ivan Bekavac and Daniela Garbin Praničević
[11] Teixeira, J. (2011). Get Involved: 5 Types of Web Analytics tools to start using
of-web-analytics-tools-to-start-using-today.html [Accessed on 03 June 2014].
[12] Waisberg, D. and Kaushik, A. (2009). Web Analytics 2.0: Empowering Customer
Centricity, 2, No. 2.
[13] Web Analytics Association (2008). Web analytics definitions – draft for public
sDefinitions.pdf [Accessed on 10 September 2014].
[14] Zara, I. A., Velicu, B. C., Munthiu, M. C. and Tuta, M. (2012). Using analytics for
understanding the consumer online. [Accessed on 10
September 2014].
0 309
5 authors, including:
Some of the authors of this publication are also working on these related projects:
The Rise of the Promoters: User Classes and Contribution Patterns in Enterprise Social Media View project
All content following this page was uploaded by Constantinos Coursaris on 28 June 2016.
With the introduction of Web Analytics into Web Marketing, organizations now have the opportunity to measure, track, and
analyze the behavior of website users. The REAN model, standing for Reach, Engage, Activate and Nurture, appears to be
the most relevant model to plan and measure activities. This model is used to set goals, objectives and define metrics in order
to improve a website’s performance using Web analytics. Based on academic papers, official sources, white papers, and best
practices, the main research objective of this paper is to establish a list of optimization actions to be implemented, and to test
if these actions have a positive impact on website performance. Preliminary findings from this research-in-progress paper
may assist managers on: 1) how to attract new visitors to expand website traffic, 2) how to transition visitors to users with an
increase in registrations, and 3) how to build a loyal audience with repeat website visitors.
REAN model, Web Analytics, website performance, optimization actions.
In January 2012, there were more than 500 million websites worldwide (Netcraft, 2012). The abundance of websites made
the Internet a highly competitive environment, which mandated the need for websites to constantly improve performance in
order to be the best among its competitors. Visitors are key for websites. Therefore, improving website performance involves
being able to follow and measure visitor/user behavior. There are several measurement tools that provide large database
reporting on users behavior.
With the emergence of Web analytics tools early in 2000, data analysis became a strategic element for website optimization.
The availability of measures for quantifying website visibility—such as user traffic and behaviors—has resulted in growing
strategic importance of these measures for companies. However, having a manager monitor audience behaviors and manage
measurement tools is not enough as most websites also use a Web analytic tool. Indeed, it is hard to determine what actions
to undertake in order to have a positive impact on traffic analytic indicators. Besides, Web analytics tools are efficient for
assessment and optimization of a website. A Web manager’s time should therefore be used to respond to new items requiring
in-depth, critical thinking to inform strategic decision-making and problem solving. In addition to following Key
Performance Indicators (KPI), it is necessary to master the different levers of website performance and, more specifically, the
actions that can lead to website optimization. Data collected by Web analytics tools allow advertisers to not only rely on the
feeling that this particular lever works properly but also drive business strategy based on concrete data while optimizing
operations in real time.
This research aims to provide an overview of website performance based on pre-selected criteria as well as show if defined
optimization actions have a positive impact on performance. As there are endless ways to define a website’s performance, the
main objective is to determine a model showing the main topics that illustrate performance. Additionally, there are infinite
ways to improve a website. The second objective is therefore to select optimization actions from different sources according
to the defined objectives and to test each of these actions and verify if indeed there is a positive impact on said performance.
Proceedings of the Nineteenth Americas Conference on Information Systems, Chicago, Illinois, August 15-17, 2013. 1
Coursaris et al. Website Performance Optimization via Web Analytics
Finally, it is necessary to select relevant metrics to measure performance in order to be able to monitor the performance
evolution according to different actions.
Hence, the research has two main objectives: 1) Establish a list of optimization actions to be implemented based on extant
academic papers, official sources, white papers, and best practices; and 2) Test if these actions have a positive impact on
website performance. The main query of this research is how to drive a website so as to improve its performance. As it is a
very broad issue, it is mandatory to factor performance aspects of a website and to define the appropriate assessment criteria.
Based on our research objectives, here are the sub questions that we want to answer which form the guidelines for the study:
o How does a website attract new visitors, thereby increasing site traffic?
o How are visitors transformed into users by generating registrations?
o How does a website establish a loyal audience base and incite users to return?
Measuring a website’s performance is complex because there is a multitude of criteria to define performance. Fortunately, the
literature on this issue is plentiful and broad and it is therefore necessary to choose an appropriate model covering relevant
criteria which reflect website performance measurement. At first sight, the Awareness, Interest, Desire and Action (AIDA)
model seemed to be relevant because it offers a general understanding of the effectiveness of communication endeavors with
respect to advertising (Lewis, 1898; Glowa, 2002). This model could be applied to illustrate website performances in the
sense that it implies a website must generate awareness, interest, desire, and action (Ber and Jouffroy, 2012). However, given
the inherent subjective nature of the desire construct, quantitative operationalization and measurement through Web analytics
is infeasible (Jackson, 2009).
Most studies report using the ACT model (Kabani, 2010), a three pillars model based on the Attraction, Conversion and
Transformation. However, the ACT model only accounts for the initial attraction of users, but ignores the equally important
activity of user retention. Therefore, the REAN model appears to be the most relevant and complete to cover the key research
questions (Blanc, Kokko, 2006). Indeed, it takes into account four essential goals—reach, acquire, convert and retain—that
define a successful website (Kermorgant, 2008).
The REAN model is a powerful framework that gives a clear overview of a website’s performance structure and helps to
define a measurement strategy (Jackson, 2009). The model can be used to define and plan online activities for optimization in
order to measure Return On Investment (ROI) (Shannak and Qasrawi, 2011).
Proceedings of the Nineteenth Americas Conference on Information Systems, Chicago, Illinois, August 15-17, 2013. 2
Coursaris et al. Website Performance Optimization via Web Analytics
“Reach sources the methods you use to attract people to your offer. It also includes how you raise awareness among
your target audience” (Jackson, 2009).
Thus, the aim is to generate traffic to the website. According to Visser and Weideman (2010), there are four types that traffic
is composed of:
• Direct Traffic: “when a visitor visits the website directly (by typing in the URL directly into the browser or by
means of bookmarks and/or favorites),”
• Referral Traffic: “when a visitor visits the website via a link from another website, also without making use of a
search engine,”
• Search Traffic (Organic): traffic generated by “unpaid search result listings,”
• Search Traffic (Paid): traffic generated by “paid search result listings” (Visser and Weideman, 2010).
High ranking on a search engine’s results page increases website traffic (Oneupweb, 2005). Moreover, 91.8% of search
queries in France (the context of our case study) are made from Google (Médiametrie, 2012). Finally, 75% of users do not
look beyond the first page of the search engine results (Jenkins, 2011). This is why it is necessary to optimize a website’s
presence on search engines.
To do this, there are Search Engine Optimization (SEO) actions that are essential to implement and likely to dramatically
increase the number of page visits (Berger, 2011). Such SEO actions are summarized in Table 1.
• In-Page optimization
It is mandatory to optimize web pages at its HTML source. Several in-page criteria must be respected (King, 2008). Google
provides a starter guide for SEO, which includes several actions to be implemented within website source code, as
summarized in Table 1.
• Off-Page optimization
The very first search engines (i.e. Altavista) used to operate solely on the basis of in-page criteria. Then, Google arrived and
started to use relevance criteria based on context, environment, or popularity (Andrieu, 2012). Google measures page
popularity with Page Rank (PR), which is essentially based on the number of pages redirecting to the website (Brin and Page,
1998). Net linking is the most efficient strategy to promote a website by creating new links redirecting to it, and therefore
improve its PR (Prat, 2011).
Name Action to implement
Description meta tag Use a 150-character description within the meta description summarizing the page's content.
Improve URL structure Use simple-to-understand URLs in order to enhance Google's spider crawling.
Use a sitemap Make a sitemap consisting of a hierarchical listing of the pages of the website.
Use <hn> tags Present in <hn> tags the structure of the page.
Proceedings of the Nineteenth Americas Conference on Information Systems, Chicago, Illinois, August 15-17, 2013. 3
Coursaris et al. Website Performance Optimization via Web Analytics
“Engage is how people interact with your business. Engage is essentially the process before a point of action that
helps your prospect come to decisions” (Jackson, 2009).
However, as aforementioned, user engagement is not studied in this research and, consequently, will not be further discussed.
“Activate means a person has taken a preferred point of action. Typical examples include a person purchasing a
product, a newsletter subscription or a sign-up” (Jackson, 2009).
“Conversion Rate (CR) is the art and science of persuading your site visitors to take actions that benefit you” (King, 2008,
P111). As defined in the research questions, one of the strategic objectives of a website is to increase registrations.
To this end, it is essential to understand the impact a registration form can have on a website. Eighty-six percent of Internet
users are likely to leave a website, because they are asked to sign in (Rolka, 2012). According to Rolka (2012), 42% of users
think that this process is too long. Therefore, it is essential to simplify the process to enhance conversions. Making
registration processes quick and easy with intuitive navigation and a minimum number of clicks will decrease the
abandonment rate, and therefore, increase the CR (Dodson and Davis, 2011).
In addition, in order to increase the CR, it is essential to provide reasons to register by clearly defining and communicating
why the visitor should register on the website with benefit-oriented headlines (Page, Ash, and Ginty, 2012). Table 2 shows
actions to transform visitors into users.
“Nurture describes the method of retaining and re-engaging with activated consumers. The consumer is a person
who has already taken at least one preferred point of action” (Jackson, 2009).
Nurture can be also defined as the capacity of the website to make users return (Kermorgant, 2008). In other words, it is
necessary to ensure that visitors will have a reason to come back again, thus building visitor (customer) loyalty. There are
very few actionable resources that deal with the concept of website e-loyalty. However, in order to better understand the
concept of loyalty it is relevant to review the best practices by Social Networking Sites (SNSs). As SNSs are highly addictive
(Kuss and Griffiths, 2011), they can provide insights regarding features that incite users to return. Thus, it is relevant to
review key success factors of some SNSs.
First, the Logged-In Landing Page (LP) has to be user centric. This means that the LP needs to have a personalized
dashboard, which contains the main features available to that user (Fanelli, 2010). The “who's visited your profile?” feature is
popular (Glad, 2011), while the query "who's viewed my Facebook profile" in Google generates 497,000,000 results. To
illustrate this features popularity, the popular professional SNS, LinkedIn, provides it, and Viadeo even monetizes it.
“News Feed highlights what's happening in your social circles” (Sanghvi, 2006). The news feed feature is one of the key
success factors of SNSs and is a reason why users become so loyal, i.e. why they come back (Yu, Hsu, Yu and Hsu, 2012).
Table 3 shows these nurture options.
Proceedings of the Nineteenth Americas Conference on Information Systems, Chicago, Illinois, August 15-17, 2013. 4
Coursaris et al. Website Performance Optimization via Web Analytics
Web Analytics
Web analytics is a fairly recent domain with implications and value added benefits still being discovered. The Web Analytics
Association defines it as: “the objective tracking, collection, measurement, reporting and analysis of quantitative Internet data
to optimize websites and marketing initiatives” (Burby and Brown, 2007).
Three factors have contributed to the emergence of this discipline within companies of all sizes and from all sectors (Arson,
2012): 1) the possibility to measure a greater part of actions performed by website users, 2) the increasing contribution of
online activities in earnings, and 3) the growing availability of Web analytics tools and options.
Therefore, the literature on this subject is plentiful, and Web analytics belongs in the family of Web marketing. This is its
cornerstone: without analytics, it is impossible to measure the Return On Investment (ROI) of Web marketing or any other
actions (e.g. an e-mail campaign). Web analytics allows the analysis of quantitative and qualitative data of a website and its
competitors in order to bring continuous improvement of its users’ experience (Chardonneau, 2011). There are many Web
analytics solutions on the market that can measure different variables (i.e. Omniture, Urchin, Google Analytics, and many
The free tool Google Analytics was already implemented within our case study’s website—which will be further explained
below— meaning that data have been collected over a period of two years, which was the primary reason for the selection of
this tool in this study. Google Analytics is a quantitative analytics tool that measures the volume of clicks, informs about
where visitors come from, and informs web administrators about users’ behaviors. Google Analytics provides several metrics
that can be categorized according to the Digital Analytics Association (DAA), such as:
- “Count: the most basic unit of measure; a single number, not a ratio”
- “Ratio: typically a count divided by a count, although a ratio can use either a count or a ratio in the numerator or
- “KPI (Key Performance Indicator): while a KPI can be either a count or a ratio, it is frequently a ratio” (Burby and Brown,
We propose a conceptual framework (see Figure 2) which encapsulates the various optimization actions according to the
REAN Model, and can be used to develop our hypotheses below:
Ø H1: Improved URL structure implemented in a sitemap with the use of <hn> tags to operate Net Linking and
improve description meta-tags will have a positive impact on the traffic.
§ H2: Enhance the conversion funnel by implementing a call-to-action feature will have a positive impact on the
number of registration members.
§ H3: Change the sign-in page by implementing news-feeds and the “who's visited my profile?” widget will have a
positive impact on user loyalty therefore resulting in repeat visits.
It should be noted that this study will only focus on the first of the three hypotheses, although all three were presented for the
sake of providing a comprehensive consideration of the REAN model in action.
Proceedings of the Nineteenth Americas Conference on Information Systems, Chicago, Illinois, August 15-17, 2013. 5
Coursaris et al. Website Performance Optimization via Web Analytics
As the proposed study will be conducted within the context of a specific website—i.e., through a case study—it is necessary
to determine a clear plan of action in line with managerial objectives. The action plan includes four main phases. The first
phase involves preparation, consisting of the determination of the objectives, followed by the choice of the Key Performance
Indicators (KPIs). The next phase is essentially composed of a literature review: according to the given objectives,
optimization actions have to be found resulting from various studies in order to answer to the managerial problem (or
objectives). Then, those proposed actions have to be implemented. Finally, the implemented actions need to be evaluated, so
this step defines how and where data are extracted from.
Preparation Phase
Define the purpose of the website.
For any business, it is key to define the website’s purpose, and how this site will contribute to the success of the
The strategic objectives are explicitly tied to the website’s purpose, however, they need to be measurable. Hence, the
following three main strategic objectives were defined:
1. Increase site traffic
2. Increase the number of user registrations
3. Increase site loyalty
Proceedings of the Nineteenth Americas Conference on Information Systems, Chicago, Illinois, August 15-17, 2013. 6
Coursaris et al. Website Performance Optimization via Web Analytics
Setting operational objectives can pave the way to the achievement of the aforementioned strategic objectives of a
website. Operational objectives are distinct from strategic objectives in that they are more closely connected to an action.
Hence, if an operational objective does not reach its target, it is easier to determine what action plans to subsequently
implement. We propose the following list of key operational objectives:
• Increase Search traffic (Organic)
• Increase Conversion Rate (CR)
• Increase Click-Though-Rate (CTR)
• Increase visit frequency
• Increase direct access
With the various types of objectives having been defined, it is now necessary to translate them into metrics.
Metrics Definition
Search engine traffic refers to the volume of visitors who arrive at a website by clicking search
Search Traffic
results leading to that particular website.
The number of times a visitor completes a target action divided by the number of times that
Conversion Rate
link was viewed.
The number of click-throughs for a specific link divided by the number of times that page was
A visit is an interaction, by an individual, with a website consisting of one or more requests for
an analyst-definable unit of content (e.g. “page view”). If an individual has not taken another
Visits per month
action (typically additional page views) on the site within a specified time period, the visit
session will terminate.
Direct Traffic Visitors who visited the site by typing the URL directly into their browser.
The number of inferred individual people (filtered for spiders and robots), within a designated
Unique Visitors reporting timeframe, with activity consisting of one or more visits to a site. Each individual is
counted only once in the unique visitor measure for the reporting period.
Registrations The number of users who have completed the registration process.
Returning The number of Unique Visitors with activity consisting of a Visit to a site during a reporting
Visitors period and where the Unique Visitor also Visited the site prior to the reporting period.
Source: DDA, Web Analytics Definitions, (2007)
Table 4. List of Key Metrics for Web Analytics
Proceedings of the Nineteenth Americas Conference on Information Systems, Chicago, Illinois, August 15-17, 2013. 7
Coursaris et al. Website Performance Optimization via Web Analytics
Implementation Plan
Once the previous work is done, it remains to convince managers of both the accuracy of the findings and the validity of the
recommended actions. The ability to convey a message properly is essential for any web practitioner.
Visual data analysis and presentation skills are indeed key topics. It is mandatory to be able to “give a picture to information
and ideas” (McCandless, 2011). With visuals it is easy to quickly and easily highlight findings, lessons, and actions to be
taken out of the considerable amount of data provided. Well-designed graphics highlight the facts and reveal opportunities.
Indeed, the presentation of results should lead to decisions and actions. The audience must be convinced by the
demonstration based on facts and figures and leave the presentation with the idea of implementing the recommendations
Data Collection Plan
For each Google Analytics Report it is possible to export data as a CSV (Comma Separated Value) format (and possibly be
used in Excel) and reviewed for insights. At the present time, results are not available yet for presentations.
Proceedings of the Nineteenth Americas Conference on Information Systems, Chicago, Illinois, August 15-17, 2013. 8
Coursaris et al. Website Performance Optimization via Web Analytics
10. Fanelli, M. (2010) Guide pratique des réseaux sociaux - Twitter, Facebook... des outils pour communiquer, Dunod, Paris.
11. Fétique R., (2010), Internet Marketing, Elenbi Editeur, Paris.
12. Glad V., (2011), Peut-on savoir qui visite son profil Facebook?, Slate,
13. Glowa, T. (2002) Advertising Process Models, White Paper, P8-10.
14. Jackson, S. (2009) Cult of Analytics: Driving online marketing strategies using web analytics, Butterworth-Heinemann.
Oxford, UK.
15. Jenkins K., (2011) Overview of Search Marketing: SEO & SEM, Sanger & Eby Design,
16. Kabani S., (2010), The Zen of Social Media Marketing, BenBella Books, Inc. Dallas, USA.
17. Kermorgant V., (2008), Evaluating your on-line success with web analytics, White Paper, P3-5.
18. King A., (2008), Website Optimization, O'Reilly Media Inc. California, USA.
19. Kuss D., Griffiths M., (2011) "Online Social Networking an addiction - a review of the psychological literature"
International Journal of Environmental Research and Public Health, 8 (9), 3529-3552.
20. Linkedin (2011), Qui a consulté votre profil?, Linkedin,
21. McCandless, D. (2011) Datavision, Robert Laffont Ed. Paris
22. Médiamétrie (2012) La fréquentation des sites internet français, Médiamétrie,
23. Netcraft (2012) March 2012 Web Server Survey, Netcraft
24. Oneupweb (2005) Target Google's Top Ten to Sell Online,
25. Page, R.; Ash, T.; and Ginty, M. (2012) Ten best practice to drive on site engagement, Janrain | Amplifying the Power of
the Social Web,
26. Prat, M. (2011) Référencement de votre site Web, Editions ENI. Paris.
27. Rolka, L. (2012) How to Solve the Online Registration Challenge, Janrain,
28. Sanghvi, R. (2006) Facebook gets a facelift, Official Facebook Blog,
29. Shannak, R. and Qasrawi, E. (2011) Using Web Analytics to Measure the Effectiveness of Online Advertising Media: A
Proposed Experiment, Eurojournal,
30. Viadeo (2010) Guide d'utilisation,
31. Visser, B. and Weildeman, M. (2010) An empirical study on website usability elements and how they affect search
engine optimisation, South African Journal of Information Management, 13 (1), 1-9.
32. Yu, S.; Hsu, W.; Yu, M.; and Hsu, H. (2012) Is the use of Social Networking Sites Correlated with Internet Addiction?
Facebook Use among Taiwanese College Students. World Academy of Science, Engineering and Technology, 68, 1659-
Proceedings of the Nineteenth Americas Conference on Information Systems, Chicago, Illinois, August 15-17, 2013. 9
For instance, a museum marketing manager
the course of the production’s run this Objectives: Before accessing Google
means that on average Rumble Theatre sells Analytics, Nancy defines her objectives and
6,125 tickets. For Rumble Theatre to measurement tools. After all, “you can’t
increase total tickets sold by 15% the configure Google Analytics if you don’t
theatre would need to sell just over 25 know what you need” (Cutroni). She selects
additional tickets per show, or about 920 two broad objectives: Increase online ticket
Nancy believes that by publishing Red Increase online ticket sales: Audience
Warrior related content on the theatre’s members can purchase tickets by visiting
website and diverting audiences to its the theatre’s website, calling the box office,
website through its promotional strategies, or visiting the box office in-person. Rumble
audiences will engage with the storyline, Theatre’s website recently underwent a
intellectual and emotional themes, and renovation. And it’s now easier than ever
actors more deeply, increasing the likelihood before to buy a ticket online. The theatre
that a website visitor will buy a ticket. has received positive feedback on this
Nancy decides to direct audiences to the feature, especially from those in its current
website in all marketing materials so can she dominant demographic. Prior to the new
can use Google Analytics to monitor website design only 35% of Rumble
fluctuations in traffic and user behavior in Theatre’s total main stage performance
response to marketing campaign efforts. tickets, about 2,144 tickets, were sold
website to 50%.
Raise awareness: Based on Rumble frequency, time frame, and digital
area, Nancy believes that she can increase opening night and continuing through its
the number of tickets sold in Rumble’s final show. She uses this calendar to
choose marketing strategies that were increase awareness vs ticket sales, as well
audiences, and combine them with Nancy will monitor Rumble Theatre’s
strategies that reach a larger amount of website activity as each new marketing
people within her target audience. Not only campaign tactic is deployed and each new
will she monitor the number of tickets sold alternation to the website is made. Isolating
through the website but she will also expect these attributes will help her in assessing
to see an increase in the amount of traffic the effect of her marketing strategies in
Theatre website.
Nancy believes her marketing strategies will For instance, if Google Analytics displayed
attract an additional 25 people per show. If social referral spikes similar to those seen in
she is successful, Rumble Theatre’s total exhibit 2 Nancy would be able refer to her
ticket sales will be 7,000 tickets. A 14% editorial calendar to quickly identify which
increase in ticket sales. Referring back to marketing tactics were active when the
her first objective, if Rumble Theatre sold spikes occurred. Using this knowledge,
7,000 tickets Nancy hopes to sell 3,500 of Nancy could dive into Google Analytics
those tickets through the website. targeting relevant reports, pages, and
Exhibit 2: Sample social media referral timeline displaying a surge in activity occurring during Red
Warrior’s production.
In order for Nancy to understand how her site metrics, for instance a 22% bounce rate, would
and strategies are performing, she needs some be impossible. Nancy decides to compare the
stage show, It’s All Right With Me. Even if by a manager’s best guess or by previous
Nancy didn’t have a past show to use as a experience, is useful because it advances a
measurement tool, she still could find an manager’s analysis: managers can use the
alternative benchmarking option. For instance, results from that comparison to inform the
Nancy could compare Rumble’s website usage next round of measurement (Kanter 49).
against that of a peer organization, against a Google Analytics commonly displays data
time period on Rumble Theatre’s website comparisons using side-by-side bar charts,
without an active performance, or against an tables, or lines graphs, as seen below in exhibit
Exhibit 3
Key Performance Indicators Nonprofit.’!
Now that Nancy knows where she wants
derive meaningful insight on activity relating
Audience Segments, Goals, and
measuring increased community interest may
‘Technology’ window Nancy could also views, time spent on a page, or PDF
• Traffic Sources: under ‘Source,’ Google Analytics tracks four different types of
with her objective. She does this using Google . To learn how to construct goals in Google
Analytics’ ‘goals’. A traditional goal differs Analytics visit AMT Lab or Google Analytics
increased sales in terms of types of data that one of these actions is referred to as a
it can collect within the program, like page ‘conversion’. Google Analytics is outcome-
oriented, meaning that there are desired outcome. (Tonkin). After a goal is
measurements in place, like goals, to help users created, Nancy can then sort data in each of
assess whether or not a desired outcome was the reporting categories according to that
Exhibit 5: The Acquisition Overview within the Acquisition reporting tab allows Nancy to sort by goals
to see the completion rate for each.
Exhibit 6: Nancy can also filter result according to goals, juxtaposed with other metrics.
Selecting Metrics to follow There are hundreds of metrics and dimensions
So far Nancy has programmed Google Analytics bound to be irrelevant. Nancy uses her KPIs as
to track one audience profile and two actions. a starting point for identifying which metrics
But Nancy also wants to collect website data would best supplement her website analysis
relating to the effects of her marketing and illustrate the activity surrounding his KPIs
newspaper ad is released.
There are eight data collection categories in In addition to evaluating which metrics are
Nancy explores the ‘Audience,’ ‘Acquisition,’ Twitter and Facebook. Possible results:
metrics are necessary to calculate her KPIs and o Visitors from social media
Total&orders&from&a&visitor&segment/total&orders Increase(online(ticket(sales Total(customers(converted
Total&new&visitors/all&visitors Increase(awareness(of(show Total(visitors Acqusition>(All(traffic
Total(new(visitors( New(vs(Returning
Exhibit 8
! 1!
quantity of website visitors website would immediately
sharing that webpage outward redirect the user to the Rumble
back over social media Theatre’s main webpage. The
• Newspaper ad: The newspaper ad is advantage of this feature is
taken out in a local weekly paper and Google Analytics will chronicle
incorporates a redirect site, the redirect website as a new, so traffic source. Because Nancy
Nancy can track the traffic stimulated only used this site on the
by the ad. During the ad’s week on the newspaper ad she knows that
stands, Nancy expects to see higher any metrics recorded as a result
website traffic to the Red Warrior show of this traffic source are
page. specifically attributed to the
o Marketers have a difficult time newspaper ad readers.
attributing traffic to specific o Discount codes are unique
advertising methods like coupon codes users can upon
billboards, radio spots, or checkout to reduce the cost of
magazine or newspaper ads. their purchase. If Nancy wanted
Google Analytics cannot tell to track ticket purchases from
Nancy if the radio ad or the radio listeners one discount
newspaper ad prompted a user code could be ‘rumbleWXPN’. To
to visit the website or purchase use discount codes in Google
a ticket without additional Analytics E-Commerce
assistance, like the use of a capabilities must be enabled and
redirect link, exit survey, or html coding written into the
discount code. program. Google Analytics
o Redirect links are websites that allows the user to sort
automatically redirect visitors to purchases by code usage, much
an alternative site. For instance, like traffic sources sort visitor
Nancy could refer newspaper origins. For more on
readers to programming discounts codes click here.
warrior. When visited, this
• Radio: Higher visitor counts to the
Exhibit 9: When Nancy outlined the activity she expects to see resulting from his campaign strategies
she did not know which specific metrics she would be looking for or which will be most relevant to
report. Nancy’s’ strategy was to use this exercise to understand where this type of activity is
recorded and what supplementary data is recorded as well
• Email: Each email sent out will have the Nancy will examine single traffic metrics like
increased traffic to both of those pages webpages. Traffic behavior on the Rumble
for 1-3 days after the email is sent out. Theatre website will help Nancy determine if
Throughout the entire campaign, Nancy will use pages associated with her goals and if visitors
Facebook’s, Twitter’s, and her emailing engage with the site’s content. Google
program’s analytics to compliment the website Analytics ‘User Flow’ chart, exhibit 10, nicely
Exhibit 10
Most reporting categories enable the user to were on prior to visiting the Rumble
Dimensions are shared attributes, or categories • Medium: how a user finds the website
in which metrics can be grouped. For instance, be it a referral (shared link), organic
Exhibit 11
(keywords in a search engine), direct • Source/Medium: cross filters data
typing in the URL into their browser) or Each reporting category provides different
dashboard. Because blog metrics are diverse, Dashboards help users efficiently review
from traffic sources to popular pages to unique important metrics and dimensions and can be
visitor information, rather than hunting for easily printed or shared for quick reporting In
each metric every time she wanted to check its addition, because dashboards are traditionally
statues she saves individual graphs, charts, created around a theme (eg. Blog activity
etc. to a dashboard. Exhibit 14 displays a metrics) and rely on visual displays of data, like
sample blog dashboard built for a professional table and charts, they help create data-friendly
Exhibit 14: The dashboard includes visitor information, traffic sources, audience demographic
information (location), popular page metrics, and social media information. Source:
cultures through visually storytelling. Google Analytics for immediate use. For
Exhibit 15
! 7!
Measure: Nancy measures the number of Action: Nancy first decides to test this trend
sessions created by referrals from social media by tweeting more frequently, while maintaining
between April 1 and April 17. This information the same posting schedule for Facebook. If
can be found in the “Acquisition” report under Twitter continues to drive an increased volume
“All Traffic” and “Referrals”. She applies the of visitors to the website, Nancy will decide to
Red Warrior segment on to his search to maintain the more frequent posting schedule
narrow the results to her target audience. because it is successfully engaging with her
Analyze: As seen within the red outline in decline in traffic referrals Nancy’s conclusion
exhibit 15, it appears that Twitter ( is could be that once past a certain tweeting
driving about 3 times as much traffic to the frequency, Rumble’s Twitter activity elicits
Rumble Theatre website compared to Facebook little response out of the target audience or
that Rumble’s target audience is active on To analyze when her target audience is active
Twitter and positively responding to its on Twitter, and thus more likely to follow a link
content. Although these reasons most likely embedded in a tweet, Nancy conducts another
explain the Twitter trend, Nancy must keep test. She selects two timeframes and issues
other possible reasons in mind for the surge in similar tweets in each. Still using the
Twitter traffic, like the time of day the Tweet Acquisition report, Nancy can break down
was published or the day of the week the traffic referrals by hour to identify trends, as
Exhibit 16. Source:
By the first week of May, Nancy has published Right with Me.
Red Warrior promotional content on Facebook,
Twitter, and the website, sent two email blasts, Nancy uses her “Warrior tkts” goal, which
and released the newspaper ad. A change tracks how many visitors arrive at the “Thank-
between It’s All Right with Me’s marketing you for your purchase” online ticket
campaign and Red Warrior’s campaign is the completion page, to track and compare ticket
addition of a second pre-opening email blasts sales data in Google Analytics. She visits the
and the newspaper ad. Nancy wants to know if “Goals Overview” report in “Conversions.”
there are more ticket sales leading up to the Within the “Overview” report, under the
opening of Red Warrior compared to the same “Overview” tab, she selects the two goes she
Exhibit 17
Measure: She enters in two time frames, April campaign: the second email blast and
1-June 28 (Red Warrior) and Jan 1-March 27 newspaper ad. Nancy used these channels to
(It’s All Right with Me), and compares her raise both awareness and ticket sales.
“Warrior Tkts” ticket sales goal and her Consequently, Nancy will analyze the amount
previously existing “AllRight Tkts” ticket sales of traffic referrals and the amount of tickets
Analyze: The second email blast and Email analyses: Nancy, with the assistance of
newspaper ad were released on May 8th, a week a few how-to websites, like Constant Contact,
before opening. Nancy sees that pre-show Campaign Monitor, and Web Market Central,
ticket sales for It’s All Right With Me outpaced and her IT Director, enabled Google Analytics
that of Red Warrior, until May 11th when ticket to track visitor traffic originating from her
remain higher than that of It’s All Right With Nancy goes back to the ‘All Traffic-Channels’
As mentioned earlier, there were two traffic by the ‘Medium’ primary dimension,
distinguishing features between the Red ‘Email’ appears as a referral method. She
Warrior campaign and the It’s All Right With Me double checks her timeline to make sure her
two pre-show time periods are still in place. by side. She sees that not only have the
Under the ‘Explorer’ tab she selects ‘Goal Set amount of sessions increased but has the rate
1’ then, using the below drop down menu, of online tickets sold compared to It’s All Right
Exhibit 18
Newspaper ad analyses: Nancy chose to Analytics. Nancy visits the ‘All Traffic-
incorporate the use of a redirect website, Source/Medium’ report under ‘Acquisition’ and
she could clearly delineate traffic caused by dimension ‘Source/Medium.’ In exhibit 19 she
the ad. Since a newspaper ad wasn’t used in sees the URL for her redirect site as well as
the It’s All Right With Me campaign Nancy conversion metrics for ‘Warrior tkts’ ticket sale
Exhibit 19
She sees that she had success in attracting Of the 87 sessions created, the vast majority
website visitors from her target demographic. belonged to her target segment. In addition, of
! 1!
the 7 tickets sold online, 5 of them belonged not have older show data to use as a
to her target demographic. However, the measurement for success. Instead, she
5.63% sessions and inspired only 7 Red generated from the ad with the cost of
Warrior online tickets sales. Because no producing and publishing the ad (see exhibit
All#sessions 87 &#formatting#the#ad 4
Target#segment# Total#graphic#designer#
sessions 74 cost#($50/hour) $200
Total#tickets#sold 7 fee $200
Total&ticket&revenue& Total&newspaper&ad&
($40/ticket) $280& cost $400
Exhibit 20
Insight: Based on exhibit 18, it appears that audience segment in future shows. She also
the email blast generated a greater amount of pencils in possibly sending out a third email as
pre-show ticket sale revenue compared to the a follow-up test. Overall, the newspaper ad was
same time period of It’s All Right With Me. not a worthwhile investment. However, the
While the newspaper ad did generate some newspaper ad still may be a worthwhile
revenue and visitor traffic the amount of investment if the ad could be reused a second
resources spent on producing and publishing time. Even though the publishing costs would
Action: Nancy makes a note to send two email possible the repeated ad might contribute to
blasts prior to the next show opening to building ‘buzz’ around the show if used over
encourage pre-show ticket sales within this longer period of time, as well. If enough
revenue was generated to cover the costs of
the ad then Nancy may consider continuing to • Raise awareness of the show as
Theatre’s shows. With the next newspaper ad target segment to the website and
she can use the Red Warrior newspaper ad increased online ticket sales.
Red Warrior concluded its run on June 28th, 28, Nancy sees a total of All Sessions created
2015. On Monday the 29th Nancy sits down at and sessions created by her target segment,
her desk to see if she was successful in either “RedWarrior F/35-55/PGH.” At the top of this
of her original objectives: screen she selects from the dropdown menu
• Increase total ticket sales from 6,125 “Conversion:” her goal “Warrior Tkts.” Now,
tickets to 7,000 tickets over the alongside of the session metrics, Nancy also
course of the show and to increase the sees her goal’s conversion metrics (exhibit 21).
Exhibit 21
Analyze: Nancy sees that over the 3 months Analytics a total of 3,626 tickets, or
Rumble Theatre was promoting and producing 52.9% of all tickets sold, were sold
the show, a total of 10,560 sessions were through the website. Additionally, of
• Ticket Sales: A quick call to the box through the website, Nancy’s target
office reveals that the theatre sold only segment had a higher frequency of
represents a total ticket sale increase the purchase rate of all visitors total
during the production’s timeframe, effective marketing strategy and using Google
visitors within her target segment Analytics to adjust her strategy based on
represented 4,756 of them, or 45%. visitor behavior has given her a powerful
Attracting new visitors from her target template to experiment with and apply to next
of all the sessions created, about Nancy used Rumble Theatre’s marketing
45.5% were first-time visitors. strategy for Red Warrior as a tool to help her
Moreover, she also sees that there was navigate the program’s seemingly endless
a higher rate of new sessions created supply of data and focus on what is relevant to
by the target segment (66.4%), as well her mission. Above all, she derived insights
as a higher rate of tickets purchased from data trends and used those insights to
online by the target segment (31.6%) alter content, channels used, and publication
compared to tickets purchased online frequency. Every shift in her website campaign
Insight: Even though the theatre collectively activity for Nancy to analyze and explore for
success. Nancy is excited to see that 52.9% of Over the course of the campaign she
all tickets sold were sold through the website, discovered by routinely comparing critical
signifying that she successfully reached her website data points she possessed the ability
goal of increasing the proportion of total to nimbly adjust campaign tactics to better
tickets sold online from 35% to 50%. reach and engage her target audience. She
Heightened activity in both ticket sales as well makes a note to check Google Analytics more
as new sessions created by her target audience regularly, about twice a week. By investing
suggests that Nancy’s campaigns were not time into Google Analytics, Nancy was
only successful in reaching her target audience rewarded with a greater understanding of
but also in persuading them to visit the site which advertising methods and content were
which enticed the audience to learn more
future internet
Understanding the Digital Marketing Environment
with KPIs and Web Analytics
José Ramón Saura 1,† , Pedro Palos-Sánchez 2, *,† ID
and Luis Manuel Cerdá Suárez 3,†
1 Department of Business and Economics, Rey Juan Carlos University, Paseo Artilleros s/n,
28027 Madrid, Spain;
2 Department of Business Management, University of Extremadura, Av. Universidad, s/n,
10003 Cáceres, Spain
3 Department of Business Organization and Marketing and Market Research, International University of La
Rioja, Av. de la Paz, 137, 26006 Logroño, La Rioja, Spain;
* Correspondence:; Tel.: +34-9-2725-7580
† These authors contributed equally to this work.
Abstract: In the practice of Digital Marketing (DM), Web Analytics (WA) and Key Performance
Indicators (KPIs) can and should play an important role in marketing strategy formulation. It is the
aim of this article to survey the various DM metrics to determine and address the following question:
What are the most relevant metrics and KPIs that companies need to understand and manage in
order to increase the effectiveness of their DM strategies? Therefore, to achieve these objectives, a
Systematic Literature Review has been carried out based on two main themes (i) Digital Marketing
and (ii) Web Analytics. The search terms consulted in the databases have been (i) DM and (ii) WA
obtaining a result total of n = 378 investigations. The databases that have been consulted for the
extraction of data were Scopus, PubMed, PsyINFO, ScienceDirect and Web of Science. In this study,
we define and identify the main KPIs in measuring why, how and for what purpose users interact
with web pages and ads. The main contribution of the study is to lay out and clarify quantitative and
qualitative KPIs and indicators for DM performance in order to achieve a consensus on the use and
measurement of these indicators.
1. Introduction
The growth of the Internet over the past decade is one of the most widely used examples to help
explain globalization. In the information age and the increasingly networked economy, electronic
Commerce (e-Commerce) is seen as one of the main instruments to foster business growth, labour
movement and interpersonal relationships. DM is not just a transactional tool, but also generates
change at the commercial and microeconomic level, which in turn demands changes in marketing
practice and theory [1]. From a historical perspective, it is clear that all types of companies have had to
adapt all their business practices to the availability/progress of new technology, new management
techniques and an ever-changing communications landscape.
The rapid spread of computing power in all manner of devices has fostered the creation of the
Digital Economy, or “a new socio-political and economic system characterised by an intelligent space
consisting of information access tools and information processing and communication capabilities” [2].
While WA is widely used by popular websites to provide useful data for client companies, its rising
popularity among users is not necessarily reflected in academic research. The research that is done
also paints a rather discouraging picture showing that most WA use is ad-hoc, the analysis is not
used strategically, and the benefits tend to be imprecise. Thus, in practice, many marketing managers
remain wary of performance measurement data and prefer to rely on intuition and experience for
decision-making [3]. Given the evolving nature of WA, this is understandable. This study therefore
suggests that the main benefits of WA for DM performance measurement will be determined by how
companies exploit the system under specific contextual circumstances.
Understanding the effectiveness of DM strategies requires the ability to analyze and measure
their impact [4]. Appropriate, accurate and timely DM metrics are critical for a company to assess
whether they are achieving their objectives, or whether the selected strategy is appropriate to achieve
organizational goals [5].
DM is the simultaneous integration of strategies on the web, through a specific process
and methodology, looking for clear objectives using different tools, platforms and social media.
The importance of DM for companies resides in changes in the ways that today’s consumers gather
and assess information and make purchasing decisions, in addition to the channels they use for this
process [6].
According to [7], we can distinguish four types of control necessary to guarantee the outcome of a
marketing plan for business: Control of the annual plan; Control of Profitability; Efficiency Control
and Strategic Control. In this research, we cover the necessary actions for the Control of Profitability
in DM, Strategic Control in the web measurement and analytical KPI’s relevant to the consumer or
Internet users.
While there are a great number of possible metrics and indicators, each one designed to measure a
specific aspect of the DM plan [8], the choice of which metrics will enable insightful and useful analysis
remains a tricky question for business managers. It is the aim of this article to survey the various DM
metrics to determine and address the following question: What are the most relevant metrics and
KPIs that companies need to understand and manage in order to increase the effectiveness of their
DM strategies?
Identifying KPIs for DM and WA, Marketing professionals and Academics can efficiently measure
key indicators related to the development of tactics and actions that are performed in the digital
environment. By identifying the most important indicators, companies could improve conversion
rates and consequently, increase their visibility on the Internet.
2. Methodology
Potentially apt articles (n = 82)
Excluded articles after analysis of the
complete article (n = 63):
Not fitting search terms
No relation with the research topic
No quality evaluation
No description and specification of
Included articles (n = 26)
Figure 1. PRISMA 2009 Flow Diagram.
Figure 1. PRISMA 2009 Flow Diagram.
Future Internet 2017, 9, 76 4 of 13
The objective is to achieve the highest possible amount of evidence in the results based on quality
studies. Some of the variables used in AMSTAR to evaluate the quality of the systematic review were
(i) the relationship of the research question to the criteria included in the study, (ii) the extraction of data
from at least two independent researchers (iii) the quality of the literature review, (iii) identification and
definition of concepts, and (iv) the quality of the references used throughout the study. As developed
by [11] we have included the following criteria for the development of the methodology:
In the first phase, databases and search terms were identified, obtaining a total sample of n = 378.
Secondly, after analyzing each article individually, a total of 296 articles were excluded from the
initial sample due to inadequate topics. Consequently, in the third phase of the systematic review,
a total of potentially appropriate articles of n = 82 were obtained. However, after applying the
exclusionary processes after analysis: not fitting the search terms; no explicit relation to the research
topic; Investigations without quality evaluation and lacking a description and specification of terms,
the sample obtains a total of n = 26 articles.
3. Results
As Table 3 shows, the research presented over the last years is categorized into two main research
themes: Business, Computer Science and Information Systems.
MD has a significant business perspective since it is used as a tool for promotion and sale on the
Internet, but from the Computer Science perspective, a high and technical value is provided in order
to implement and develop these techniques as well as from the category of Information Science.
The research theme in DM and WA is a mix of these three research sciences. The total number
of investigations that have been selected after passing the quality filters developed in the systematic
review of literature can be appreciated in Table 3. In addition, it also shows the quality of the Journal
of Research when presenting the classification by Quartiles.
The Journals of Industrial Marketing Management (Business), Interactive Marketing (Business),
International Journal of research in Marketing (Business), Business Research (Business) International
Journal of Information Management (Information Science and Library Science) and The Journal
of Academic Librarianship (Information Science and Library Science) are key to understand this
research topic.
In Table 4, we show one of the most common, and conceptually simple methods found in the literature
for calculating the profitability of DM actions.
3.4. DM Techniques
According to the [34]: “WA is the practice of the measurement, collection, analysis and reporting
of Internet data in order to understand how a site is used by an audience and how to optimize it”.
The focus of WA is to understand the users of a site, their behaviour and activity [35].
If WA is to be meaningful, the data collection process must be carefully designed to deliver
consistent and reliable data. Analysts working in WA should be WA are of how systems work and
how they generate data. They should be able to audit its implementation and operation. The first step
in the analysis is to be sure of the veracity of information. It is at this point that the technical tools must
perform their actions correctly [36].
As is well-known, there are different techniques of DM such as search marketing (SEO or SEM),
social media marketing, affiliate marketing or content marketing. However, to establish the main
KPIs and metrics for companies we will focus on the analysis of the techniques of SEO and SEM.
Search marketing shares, the main KPIs with other DM techniques because search engines are the
main channel of contact between the user and Internet companies [37].
The difference between visibility with search engines compared to other models in traditional
Internet advertising is that the user voluntarily seeks a service, product or information [38].
The accepted thinking in SEO and SEM is that to attract user traffic, it is essential that a website
is among the first two or three positions on the first page of the search engine results page (SERPs),
as derived through keyword ranking.
At this point we refer to the use of SEO as a technique or process for improving the visibility of a
page to search engines (to move up into the top results) in the rankings. From there, the results can be
assessed and analysed to calculate conversion rates (conversion here means moving from being seen to
being acted upon, as in clicking). Being present on the Internet at the right time with a relevant search
term can become a business opportunity [27,38].
The SEM proposition, however, is that it can make DM work better by promoting websites and
raising their visibility in SEPRs. Techniques such as SEO and SEM have led to consolidated contracting
models for advertising campaigns that can be applied to both display ads and text ads (see Table 2).
This is a positive development because firms need clear media buying models in order to achieve the
aims of their campaigns [38].
DM use of WA provides indicators of the effectiveness of each individual Internet Marketing
technique employed. In turn, these indicators are related to the different pricing models in digital
advertising, and therefore feed into the payment models used in DM strategies. It is the monetization
of SEO and SEM combined with WA that should enable marketers to calculate the return on investment
of their marketing efforts and to determine the basis for measuring the profitability and effectiveness
of DM campaigns [31,39,40].
Future Internet 2017, 9, 76 7 of 13
However, as seen in the literature review, there is little consensus in the DM ecosystem about
which particular metrics are most useful—for example, clicks, impressions, or number of page views
which are based on user behaviour on a website [41].
In order to reliably use these metrics, we must first examine the different contracting models used
in the calculation of digital advertising rates. The literature review shows that the metrics analyzed
in each type of research are determined by the type of contracting models in which companies have
invested. In Table 5 we can see the main contracting models used in the researches analyzed.
In addition, it is important to highlight that qualitative indicators related to social media are
important. Social interactions between companies and users should be analyzed. Social media
consumer interactions are important to any company in the Digital Marketing environment.
These social factors are defined and identified by many authors who focus their research on evaluating
social indicators and social commerce interactions to improve their DM strategies [14,20,21,36,43].
is for a user, but the time spent on the page can be measured. If this is long, we can assume that the
content of the page is useful; Achievable (ii), the objectives considered when setting KPIs must be
credible. Sometimes too much information can be a problem and there are dozens of KPIs to choose
from, but only a few provide information of interest. Finally, a KPI must be Available at least for a time
(iii), KPIs must meet deadlines, and be available for reasonable periods of time [1,45,46].
In order to define a set of indicators for companies, after researching the overview and analysis of
the relevant articles on this topic, we define the following basic KPIs that companies should follow
and analyse with WA in their DM strategies as laid out in Table 8.
KPI in DM Description
The average number of conversions per click in SERP results or in Ads click
(depends on the marketing objective), shown as a percentage. Conversion rates are
Conversion Rate calculated by simply taking the number of conversions and dividing that by the
number of total ad clicks/actions that can be tracked to a conversion during the
same time period.
A goal represents a completed activity (also called a conversion). Examples of goals
Goals Conversion Rate include making a purchase -e-commerce-, completing a game level (App), or
submitting a contact information form (Lead generation site).
New Visitors. They are users who visit your site for the first time. Returning Visitors.
Type of Users They are users who visit your site for the second or more times. It is important
because it shows the interest of your business and website for the target audience.
Source. Every referral to a web site has an origin, or source. Medium. Every referral
to a website also has a medium, such as, according to Google Analytics: “organic”
Type of Sources (unpaid search), CPC, referral, email and “none”, direct traffic has a medium of
none. Campaign. Is the name of the referring AdWords campaign or a custom
campaign that has been created.
Keywords in DM, are the key words and phrases in a web content that make it
possible for people to find a site via search engines. A non-branded keyword is a one
Keywords/Traffic of Non branded
that does not contain the target website’s brand name or some variation.
Ranking for non-branded keywords is valuable because it allows a website to
obtain new visitors who are not already familiar with the brand.
Rank is an estimate of your website’s position for a particular search term in some
Keyword Ranking search engines’ results pages. The lower the rank is, the easier your website will be
found in search results for that keyword.
4. Conclusions
This paper provides a comprehensive and systemic overview of the current status of the theoretical
and empirical literature on DM and WA.
The development of the Internet and electronic commerce involves a change in marketing thinking
and practice due to the fact that traditional marketing has had to develop new techniques for the
Internet. This has resulted in the existence of a gap between the development of new techniques of
DM and measurement processes that have to be performed for the correct measurement of results.
Due to the increased use of DM in the last decade and the investment made by companies
in the last few years we have carried out an investigation to determine the key indicators to which
companies should pay attention in order to measure their digital marketing actions. Researches present
concerns expressed by companies about the lack of knowledge of what metrics they should use to
justify their marketing investments [15]. This remains true despite the many articles published on
DM measurement topics in the last few years. Researchers use a wide variety of different metrics and
indicators to measure the efficiency and effectiveness of DM techniques and calculate profitability
(ROI), however, our study shows that little consensus has been formed about the use of these indicators,
or on the definition of the key factors for measuring the DM performance.
In summary, this research presents the main analytical indicators to measure the performance of
DM. It highlights the most commonly used indicators that might therefore offer potential for increasing
standardisation and comparability of results across studies [47–49].
Future Internet 2017, 9, 76 10 of 13
Second, the indicators defined in this study are based on the use of relevant analytical indicators
in the field of MD and WA. The goal was to define correctly these indicators to group the main KPIs
for the measurement of DM return of investment.
The contribution of the theoretical framework demonstrates how companies should understand
the different contracting models in DM to establish relevant indicators and how they should understand
the main models of performance measurement in DM. In the study, we can see what the main
contracting models studied in the main works of DM are. This means that the understanding of the
different models of recruitment advertising on the Internet is important to determine the indicators to
be measured and calculate the ROI.
Third, the literature shows the importance of using two types of WA as a basis of assessment
in DM; (i) quantitative analytical indicators, which allow work on real data, quantifying different
goals or conversions and which are the main indicators studied by the authors, and (ii) qualitative
analytical indicators that are used in DM to show how the user understands a website, helping to
define KPIs to understand the on-line buying process and user behaviour. This study makes the
additional contribution of clarifying the main qualitative indicators from the literature.
Following the indicators identified in this research strategies and actions in DM can be improved.
Marketers and Academics can check the efficiency of their activities by consulting the ROI or CTR
in their actions in DM. To measure and optimize each process carried out by users on the website,
Marketers and Academics can consult the indicated qualitative indicators. In addition, it will allow
them to optimize and structure their strategies. On the other hand, they can use this research to
improve the online shopping process and the User Experience (UX). This will increase the conversion
rates. In order to measure the online strategies objetives, this research suggests that different KPIs
should be determined to assess the impact of each action. Each Marketer or Academic, could use these
indicators to improve their strategies and account the goals achieved in DM.
Author Contributions: José Ramón Saura, Pedro Palos-Sánchez and Luis Manuel Cerdá Suárez conceived
and designed the review; José Ramón Saura performed the methodology; Pedro Palos-Sánchez and
Luis Manuel Cerdá Suárez analyzed the results; José Ramón Saura, Pedro Palos-Sánchez and Luis Manuel Cerdá Suárez
wrote the paper.
Conflicts of Interest: The authors declare no conflict of interest.
1. Chaffey, D.; Patron, M. From web analytics to digital marketing optimization: Increasing the commercial
value of digital analytics. J. Direct Data Digit. Mark. Prac. 2012, 14, 30–45. [CrossRef]
2. Baye, M.R.; Santos, B.D.; Wildenbeest, M.R. Search engine optimization: What drives organic traffic to retail
sites? J. Econ. Manag. Strategy 2015, 25, 6–31. [CrossRef]
3. Germann, F.; Lilien, G.L.; Rangaswamy, A. Performance implications of deploying marketing analytics. Int. J.
Res. Mark. 2013, 30, 114–128. [CrossRef]
4. Pauwels, K.; Aksehirli, Z.; Lackman, A. Like the ad or the brand? Marketing stimulates different electronic
word-of-mouth content to drive online and offline performance. Int. J. Res. Mark. 2016, 33, 639–655.
5. Yang, Z.; Shi, Y.; Wang, B. Search engine marketing, financing ability and firm performance in E-commerce.
Procedia Comput. Sci. 2015, 55, 1106–1112. [CrossRef]
6. Leeflang, P.; Verhoef, P.; Dahsltröm, P.; Freundt, T. Challenges and solutions for marketing in a digital era.
Eur. Manag. J. 2014, 32, 1–12. [CrossRef]
7. Kotler, A.E. Principles of Marketing; Pearson: Boston, MA, USA, 2016.
8. Kaushik, A. Web Analytics 2.0: The Art of Online Accountability and Science of Customer Centricity; John Wiley & Sons:
Hoboken, NJ, USA, 2009.
9. Smith, V.; Devane, D.; Begley, C.M.; Clarke, M. Methodology in conducting a systematic review of systematic
reviews of healthcare interventions. BMC Med. Res. 2009, 11, 15. [CrossRef] [PubMed]
10. AMSTAR is a Reliable and Valid Measurement Tool to Assess the Methodological Quality of Systematic Reviews.
Available online: (accessed on 12 September 2017).
11. Bosch, M.V.; Sang, A.O. Urban natural environments as nature based solutions for improved public
health—A systematic review of reviews. J. Transp. Health 2017, 5, S79. [CrossRef]
12. Seggie, S.H.; Cavusgil, E.; Phelan, S.E. Measurement of return on marketing investment: A conceptual
framework and the future of marketing metrics. Ind. Mark. Manag. 2017, 36, 834–841. [CrossRef]
Future Internet 2017, 9, 76 12 of 13
13. Li, L.-Y. Marketing metrics’ usage: Its predictors and implications for customer relationship management.
Ind. Mark. Manag. 2011, 40, 139–148. [CrossRef]
14. Järvinen, J.; Töllinen, A.; Karjaluoto, H.; JayWAardhena, C. Digital and social media marketing usage in B2B
industrial section. Mark. Manag. J. 2012, 22, 102–117. [CrossRef]
15. Royle, J.; Laing, A. The digital marketing skills gap: Developing a Digital Marketer Model for the
communication industries. Int. J. Inf. Manag. 2014, 34, 65–73. [CrossRef]
16. Bates, J.; Best, P.; Mcquilkin, J.; Taylor, B. Will web search engines replace bibliographic databases in the
systematic identification of research? J. Acad. Librariansh. 2017, 43, 8–17. [CrossRef]
17. Choudhary, V.; Currim, I.; Dewan, S.; Jeliazkov, I.; Mintz, O.; Turner, J. Evaluation set size and purchase:
Evidence from a product search engine. J. Interact. Mark. 2017, 37, 16–31. [CrossRef]
18. Aswani, R.; Kar, A.K.; Ilavarasan, P.V.; Dwivedi, Y.K. Search engine marketing is not all gold: Insights from
Twitter and SEOClerks. Int. J. Inf. Manag. 2018, 38, 107–116. [CrossRef]
19. Dotson, J.P.; Fan, R.R.; Feit, E.M.; Oldham, J.D.; Yeh, Y. Brand attitudes and search engine queries.
J. Interact. Mark. 2017, 37, 105–116. [CrossRef]
20. Oberoi, P.; Patel, C.; Haon, C. Technology sourcing for website personalization and social media marketing:
A study of e-retailing industry. J. Bus. Res. 2017, 80, 10–23. [CrossRef]
21. Jayaram, D.; Manrai, A.K.; Manrai, L.A. Effective use of marketing technology in Eastern Europe: Web
analytics, social media, customer analytics, digital campaigns and mobile applications. J. Econ. Financ.
Adm. Sci. 2015, 20, 118–132. [CrossRef]
22. Fishkin, R.; Høgenhaven, T. Inbound Marketing and SEO: Insights from the Moz Blog; Wiley:
Hoboken, NJ, USA, 2013.
23. Nabout, A.; Skiera, B.; Stepanchuk, T.; Gerstmeier, E. An analysis of the profitability of fee-based
compensation plans for search engine marketing. Int. J. Res. Mark. 2012, 29, 68–80. [CrossRef]
24. Wilson, R.F.; Pettijohn, J.B. Affiliate management software: A premier. J. Website Promot. 2008, 3, 118–130.
25. Wilson, R.D. Using web traffic analysis for customer acquisition and retention programs in marketing.
Serv. Mark. Q. 2004, 26, 1–22. [CrossRef]
26. Kent, M.L.; Carr, B.J.; Husted, R.A.; Pop, R.A. Learning web analytics: A tool for strategic communication.
Public Relat. Rev. 2011, 37, 536–543. [CrossRef]
27. Lee, G. Death of ‘last click wins’: Media attribution and the expanding use of media data. J. Direct Data Digit.
Mark. Pract. 2010, 12, 16–26. [CrossRef]
28. Fagan, J.C. The suitability of web analytics key performance indicators in the academic library environment.
J. Acad. Librariansh. 2014, 40, 25–34. [CrossRef]
29. Plaza, B. Google analytics intelligence for information professionals. Online 2010, 34, 33–37.
30. Xu, Z.; Frankwick, G.L.; Ramirez, E. Effects of big data analytics and traditional marketing analytics on new
product success: A knowledge fusion perspective. J. Bus. Res. 2016, 69, 1562–1566. [CrossRef]
31. Palos Sanchez, P.R. Aproximación a los factores claves del retorno de la inversión en formación e-learning.
3C Empresa 2016, 5, 12. [CrossRef]
32. Fiorini, P.M.; Lipsky, L.R. Search marketing traffic and performance models. Comput. Stand. Interfaces 2012,
34, 517–526. [CrossRef]
33. Järvinen, J.; Karjaluoto, H. The use of Web analytics for digital marketing performance measurement.
Ind. Mark. Manag. 2015, 50, 117–127. [CrossRef]
34. Bourne, M.; Neely, A.; Platts, K.; Mills, J. The success and failure of performance measurement initiatives:
Perceptions of participating managers. Int. J. Oper. Prod. Manag. 2002, 22, 1288–1310. [CrossRef]
35. Digital Analytics Association. 2018. Available online: (accessed on 5 September 2017).
36. Vásquez, G.A.; Escamilla, E.M. Best practice in the use of social networks marketing strategy as in SMEs.
Procedia Soc. Behav. Sci. 2014, 148, 533–542. [CrossRef]
37. Nabout, N.A.; Skiera, B. Return on quality improvements in search engine marketing. J. Interact. Mark. 2012,
26, 141–154. [CrossRef]
38. Hwangbo, H.; Kim, Y.S.; Cha, K.J. Use of the smart store for persuasive marketing and immersive customer
experiences: A case study of Korean apparel enterprise. Mob. Inf. Syst. 2017, 2017, 4738340. [CrossRef]
Future Internet 2017, 9, 76 13 of 13
39. Kim, J.; Xu, M.; Kahhat, R.; Allenby, B.; Williams, E. Designing and assessing a sustainable networked
delivery (SND) system: Hybrid business-to-consumer book delivery case study. Environ. Sci. Technol. 2009,
43, 181–187. [CrossRef] [PubMed]
40. Mathews, S.; Bianchi, C.; Perks, K.J.; Healy, M.; Wickramasekera, R. Internet marketing capabilities and
international market growth. Int. Bus. Rev. 2016, 25, 820–830. [CrossRef]
41. Mavridis, T.; Symeonidis, A.L. Identifying valid search engine ranking factors in a Web 2.0 and Web 3.0
context for building efficient SEO mechanisms. Eng. Appl. Artif. Intell. 2015, 41, 75–91. [CrossRef]
42. Welling, R.; White, L. Web site performance measurement: Promise and reality. Manag. Serv. Qual. 2006,
16, 654–670. [CrossRef]
43. Thaichon, P.; Quach, T.N. Online marketing communications and childhood’s intention to consume unhealthy
food. Australas. Mark. J. 2016, 24, 79–86. [CrossRef]
44. Moreno, J.; Tejeda, A.; Porcel, C.; Fujita, H.; Viedma, E. A system to enrich marketing customers acquisition
and retention campaigns using social media information. J. Serv. Res. 2015, 80, 163–179. [CrossRef]
45. File, K.M.; Prince, R.A. Evaluating the effectiveness of interactive marketing. J. Serv. Mark. 1993, 7, 49–58.
46. Peters, K.; Chen, Y.; Kaplan, A.M.; Ognibeni, B.; Pauwels, K. Social media metrics—A framework and
guidelines for managing social media. J. Interact. Mark. 2013, 27, 281–298. [CrossRef]
47. Meghan, L.M.; Tang, T. Mobile marketing and location-based applications. In Strategic Social Media: From
Marketing to Social Change; John Wiley & Sons: Hoboken, NJ, USA, 2016; pp. 130–143. [CrossRef]
48. Arch, G.; Woodside, J.; Milner, W. Buying and Marketing CPA Services. Ind. Mark. Manag. 1992, 21, 265–272.
49. Palos Sanchez, P.R.; Cumbreño, E.; Fernández, J.A. Factores condicionantes del marketing móvil: Estudio
empírico de la expansión de las apps. El caso de la ciudad de Cáceres. Rev. Estudios Econ. Empres. 2016,
28, 37–72.
© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (
The defined action that visitors should perform on a website, or the Don’t worry if you’re not a ‘numbers’ person – working with data is very little about
Goal number crunching (the technology usually takes care of this for you) and a lot about
purpose of the website.
542 543
Data analytics › Working with data Data analytics › Tracking and collecting data
analysing, experimenting, testing and questioning. All you need is a curious mind
To determine a pattern, they had to explore 47 000 000 000 000 000 000 000
and an understanding of the key principles and tools.
combinations of factors, obviously, too many to evaluate without using
Here are some data concepts you should be aware of. machines. These combinations came from 35 touchpoints, including the
website, campaigns, and other marketing channels, and 37 analytics points,
21.3.1 Performance monitoring and trends including auto buyers and smartphone users.
Data analytics is all about monitoring user behaviour and marketing campaign The brand was able to spot relevant patterns, such as that consumers who
performance over time. The last part is crucial. There is little value in looking at bought a certain brand of car were more likely to download brochures, but not
a single point of data, you want to look at trends and changes over a set period to more likely to request test drives. This allowed them to segment the consumers
encourage a dynamic view of data. who bought cars into those who started the purchase process by downloading a
brochure, and those who started with a test drive.
For example, it is not that helpful to say that 10% of this month’s web traffic converted.
The first group was detail-oriented, so ads featuring specific models with links
Is that good or bad, high or low? But saying that 10% more users converted this
to the specifications page helped to drive conversions.
month than last month shows a positive change or trend. While it can be tempting to
NOTE focus on single ‘hero’ numbers and exciting-looking figures such as ‘Look, we have The second group wanted to know how driving the car felt, so they were targeted
5 000 Facebook fans!’, these really don’t give a full picture if they are not presented with ads that appealed to their senses and included a call to action about
Pay close attention
to any changes in the in context. In fact, we call these ‘Vanity metrics’ they look good, but they don’t tell scheduling a test drive. This helped to drive media efficiency and campaign
expected data, good or you much. performance.
bad, and investigate any
21.3.2 Big data 21.4 Tracking and collecting data
Big data’ is the term used to describe truly massive data sets, the ones that are so A key problem with tracking users on websites used to be that it was impossible
big and unwieldy that they require specialised software and massive computers to to track individual users - only individual browsers, or devices, since this is done
process. Companies like Google, Facebook and YouTube generate and collect so through cookies. So, if Joe visits the website from Chrome on his home computer
much data every day that they have entire warehouses full of hard drives to store it and Safari on his work laptop, the website will think he’s two different users. If
all. Susan visits the site from the home computer, also using Chrome, the website will
think she’s the same user as Joe, because the cookie set when Joe visited the site
Understanding how it works and how to think about data on this scale provides some
will still be there.
valuable lessons for all analysts.
• Measure trends, not absolute figures: The more data you have, the more
Email opens aren’t tracked with cookies. Instead, when the images in the email
meaningful it is to look at how things change over time.
load, a tiny 1×1 pixel also loads and tracks open rate. This means that if the user
• Focus on patterns: With enough data, patterns over time should become is blocking images, their activity will not be tracked.
apparent so consider looking at weekly, monthly or even seasonal flows.
To track if those who did open your email then visited your page, or eventually
• Investigate anomalies: If your expected pattern suddenly changes, try to find converted, links within the email include utm tags. UTM tags are codes in the
out why and use this information to inform your actions going forward. url that enable your analytics software to track where a user has come from.
In this link:
21.3.3 Data mining campaign=AugNewsletter
Data mining is the process of finding patterns hidden in large numbers and The campaign tracking tag appended on the end of the URL is:
databases. Rather than having a human analyst process the information, an ?utm_source=newsletter&utm_medium=email&utm_campaign=AugNewsletter
automated computer program pulls apart the data and matches it to known patterns
to deliver insights. Often, this can reveal surprising and unexpected results, and An additional concern was the decline of cookies. Most modern browsers allow
tends to break assumptions. users to block them. With growing consumer privacy concerns, and laws like the
EU Privacy Directive, which requires all European websites to disclose their cookie
usage, cookies began to fall out of favour, making tracking more difficult.
Data mining in action
Google’s Universal Analytics changed all that. Because of Google’s dominance in the
Krux (2016) offers the example of examining an enormous dataset for an search engine market, we will focus on them for this section.
automotive brand that wanted to improve brochure downloads and increase
requests for test drives. The data they analysed related to consumers, consumer 21.4.1 Universal analytics
attributes, and marketing touchpoints.
Google’s universal analytics allows you to track visitors (that means real people)
rather than simply sessions. By creating a unique identifier for each customer,
544 545
Data analytics › Tracking and collecting data Data analytics › Tracking and collecting data
universal analytics means you can track the user’s full journey with the brand, How to set up Google Analytics NOTE
regardless of the device or browser they use. You can track Joe on his home First, you need a primary Google account, used for services such as Gmail or You will need to make
computer, work laptop, mobile phone during his lunch break, and even when he YouTube. You can use this to set up your Analytics account. This should be set up adjustments to your
swipes his loyalty card at the point of sale allowing you to combine offline and online using a Google account that will always be available for your business. Analytics account so
information about users. that you can get the
Next, go to and follow the steps to sign up. You can set up most out of tracking
Crucially, however, tracking Joe across devices requires both universal analytics multiple accounts here if you want to track a website, an app, or multiple websites your users. You can
and authentication on the site across devices, in other words, Joe has to be logged and apps. learn a little more
in to your website or online tool on his desktop, work laptop and mobile phone in about that here: moz.
After the sign-up process, you will be issued a Google Analytics tracking ID. This will com/blog/absolute-
order to be tracked this way. If he doesn’t log in, we won’t know he’s the same beginners-guide-to-
person. Users who use Gmail are easy for Google to track because they’ll be logged be UA followed by a series of numbers. You need to add this code to the HTML file of google-analytics.
in across devices. your website, before the </head> tag, on each of your pages.
You can see: Now Google is tracking every visitor to your website! NOTE
Google Analytics is, obviously, not the only analytics package available. Other Try it now – go to a
• How visitors behave depending on the device they use (browsing for quick
packages exist for detailed tracking of social media accounts, emails, and website random website, such
ideas on their smartphone, but checking out through the eCommerce portal as www.redandyellow.
on their desktop). data. Website analysis should always account for any campaigns being run. For, and right click
example, generating high traffic volumes by employing various digital marketing on it, then click ’View
• How visitor behaviour changes the longer they are a fan of the brand, do they
tactics such as SEO, PPC, and email marketing can be a pointless and costly exercise page source’ to view
come back more often, for longer, or less often but with a clearer purpose? the HTML code for
if visitors are leaving your site without achieving one or more of your website’s goals.
• How often they’re really interacting with your brand. the site. Do a search
Conversion optimisation aims to convert as many of a website’s visitors as possible for ’UA-‘ to view the
• What their lifetime value and engagement is. into active customers. tracking code for that
site. The tracking code
Another useful feature of universal analytics is that it allows you to import data from 21.4.2. Gathering data for the website above
is UA-43748615-1.
other sources into Google Analytics, for example, CRM information or data from a Google Analytics can measure almost anything about the customers that visit your
point-of-sale cash register. This gives a much broader view of the customers and website. To gather the kind of data that can help you optimise your site, you’ll need NOTE
lets you see a more direct link between your online efforts and real-world behaviour. to know a little about where to look. When you log into your analytics account, you Read more about this
How does universal analytics work? will see seven main menu items on the left. They are: in the Conversion
optimisation chapter.
Universal analytics has three versions of the tracking code that developers can Views
implement, helping them track users on: The Views button lets you switch between various pictures of the data.
1. Websites
2. Mobile apps
3. Other digital devices such as game consoles and even information kiosks.
It collects information from three sources to provide the information that you can
access from your Google Analytics account: Figure 1. The Views button.
1. The HTTP request of the user: This contains details about the browser Customization
and the computer making the request, including language, hostname, and The Customization tab lets you create dashboards that give you an overview of
referrer. different data elements, custom reports, shortcuts, or custom alerts.
2. Browser and system information: This includes support for Java and Flash
What kind of privacy
concerns might a user
and screen resolution.
have about the data 3. First-party cookies: Analytics sets and read these cookies to obtain user
you’re collecting about session and ad campaign information.
Real time
Real time allows you to monitor activity as it happens on your website. Data updates
continuously so that you can see how many users are on your site right now, where
they are from, the keywords and sites that referred them, which pages they are
viewing, and what conversions are happening.
This section shows how users interact with your content, how the content performs,
its searchability and its interactivity. You can see how fast your pages load, how
successful users are when searching the site, how any interactive elements on your
site are being used, popular content, which pages drive revenue, and more.
The audience section helps you understand the characteristics of your audience,
including their demographics, interests, behaviour (level of engagement). The mix
of new and returning users and how their behaviour differs, and the browsers,
networks and mobile devices they are using to access your site.
Conversions does exactly what it says on the box, it shows you how users are
converting on your site. You can look at:
• The Goals tab, which shows how well your site meets business objectives
• The eCommerce tab, which shows what your visitors buy and can link it to
other data to show what drives your revenue
• Multi-channel funnels, which shows how your channels work together to
generate sales and conversions (for example, if a customer sees a display
ad about your brand, visits your site to do research, and later does a search
Figure 4. Part of Google’s Audience overview tab. for a specific product before converting)
• Attribution, which shows you how traffic from various channels converts.
Acquisition lets you compare traffic from search, referrals, email, and marketing
campaigns. It shows you which sources drive the most traffic to your site.
548 549
Data analytics › Tracking and collecting data Data analytics › Tracking and collecting data
• Returning visitor: A unique visitor who makes two or more visits (on the New visitors show
that you are reaching
same device and browser) within the time period being analysed. new audiences and
markets, while
You can play around on returning visitors
Google’s Analytics site are an indicator of
using their free demo brand loyalty. Most
account, here: https:// websites should aim Figure 7. Part of Google’s Goals overview in the Conversions tab. for a healthy balance
com/analytics/web/ between the two.
21.4.3 The type of information captured
By now, you should know the difference between objectives, goals, KPIs, and targets.
KPIs are what you’ll be focusing on when you measure data that has been captured.
KPIs are the metrics that help you understand how well you are meeting your
objectives. A metric is a defined unit of measurement. Definitions can vary between
various web analytics vendors depending on their approach to gathering data, but
the standard definitions are provided here.
Web analytics metrics are divided into:
• Counts: These are the raw figures that will be used for analysis. Figure 8. A breakdown of new versus returning visitors in Google Analytics.
• Ratios: These are interpretations of the data that is counted.
Here are some of the key metrics you will need to get started on with website
Building-block terms
These are the most basic web metrics. They tell you how much traffic your website
is receiving. For example, looking at returning visitors can tell you how well your
website creates loyalty; a website needs to grow the number of visitors who come Figure 9. A dashboard showing some important KPIs for an eCommerce page.
back. An exception may be a support website where repeat visitors could indicate
that the website has not been successful in solving the visitor’s problem. Each Visit characteristics
website needs to be analysed based on its purpose.
These are some of the metrics that tell you how visitors reach your website, and how
• Traffic: The number of users that visit a website they move through the website. The way that a visitor navigates a website is called a
• Page: Unit of content (so downloads and Flash files can be defined as click path. Looking at the referrers, both external and internal, allows you to gauge
pages). the click path that visitors take.
• Page views: The number of times a page was successfully requested.
550 551
Data analytics › Tracking and collecting data Data analytics › Tracking and collecting data
• Entry page: The first page of a visit. • Conversion metrics: These metrics give insight into whether you are
• Landing page: The page intended to identify the beginning of the user achieving your analytics goals (and through those, you overall website
experience resulting from a defined marketing effort. objectives).
• Exit page: The last page of a visit. • Event: A recorded action that has a specific time assigned to it by the
browser or the server.
• Visit duration: The length of time in a session.
• Conversion: A visitor completing a target action.
• Referrer: The URL that originally generated the request for the current
• Internal referrer: A URL that is part of the same website.
• External referrer: A URL that is outside of the website.
• Search referrer: A URL that is generated by a search function.
• Visit referrer: A URL that originated from a particular visit.
• Original referrer: A URL that sent a new visitor to the website.
• Clickthrough: The number of times a link was clicked by a visitor.
• Clickthrough rate: The number of times a link was clicked divided by the
number of times it was seen (impressions).
• Page views per visit: The number of page views in a reporting period divided
Figure 11. Goal conversions in Google Analytics.
by the number of visits in that same period to get an average of how many
pages being viewed per visit.
Content characteristics
Mobile metrics
When a visitor views a page, they have two options: leave the website, or view another NOTE
page on the website. These metrics tell you how visitors react to your content. When it comes to mobile data, there are no special, new or different metrics to use.
Why do you think
A high bounce rate is Bounce rate can be one of the most important metrics that you measure. There are However, you will probably be focusing your attention on some key aspects that are Google Analytics has a
not always bad. On a a few exceptions, but a high bounce rate usually means high dissatisfaction with a particularly relevant here namely, technologies and the user experience. separate category for
blog, for example, most tablets, rather than
users click through
web page. • Device category: Whether the visit came from a desktop, mobile or tablet including them under
from a search to read • Page exit ratio: Number of exits from a page divided by total number of page device mobile devices?
one article and, having
views of that page • Mobile device info: The specific brand and make of the mobile device
satisfied their curiosity,
leave without visiting • Single page visits: Visits that consist of one page, even if that page was • Mobile input selector: The main input method for the device (such as
any other pages. viewed a number of times touchscreen, click wheel, stylus)
• Bounces (or single page view visits): Visits consisting of a single page view • Operating system: The OS that the device uses to run, sch as iOS or Android.
• Bounce rate: Single page view visits divided by entry pages.
552 553
Data analytics › Tracking and collecting data Data analytics › Analysing data
improvement. Look at user intent to establish if your website meets users’ goals, who progress from one step to the next will go a long way to improving the overall
and if these match with the website goals. Look at user experience to determine how conversion rate of the site.
outcomes can be influenced.
13,430 total visitors to the site Persuasion
of visitors
of visitors
100 80 20 10
of visitors
Figure 15. Reviewing conversion paths can give you insight into of visitors
Here are some examples of possible objectives, goals and KPIs for different
User experience websites.
To determine the factors that influence user experience, you must test and determine Hospitality eCommerce site, such as
the patterns of user behaviour. Understanding why users behave in a certain way on
Objective: Increase bookings
your website will show you how that behaviour can be influenced to improve your
outcomes. This is covered in the next chapter on Conversion optimisation. Objective: Decrease marketing expenses
Event 4: Enter personal and payment details and confirm booking (conversion). Length of visit
Average time spent on website
One naturally expects fewer users at each step. Increasing the number of visitors Percentage of returning visitors.
556 557
Data analytics › Analysing data Data analytics › Analysing data
KPIs help you to look at the factors you can influence in the conversion process. For Connection speed, operating system, browser
example, if your objective is to increase revenue, you could look at ways of increasing
your conversion rate, that is, the number of visitors who purchase something on Consider the effects of technology on the behaviour of your users. A high bounce rate
your site. One way of increasing your conversion rate could be to offer a discount. So, for low-bandwidth users, for example, could indicate that your site is taking too long
you would have more sales, but probably a lower average order value. Or, you could to load. Visitors who use open source technology may expect different things from
look at ways of increasing the average order value, so that the conversion rate would your website to other visitors. Different browsers may show your website differently,
stay the same, but you would increase the revenue from each conversion. how does this affect these visitors?
Once you have established your objectives, goals and KPIs, you need to track the
Geographical location
data that will help you to analyse how you are performing, and will indicate how you
can optimise your website or campaign. Do users from different countries, provinces or towns behave differently on your
website? How can you optimise the experience for these different groups?
21.5.3 Segmentation
Every visitor to a website is different, but there are some ways in which we can First-time visitors
characterise groups of users, and analyse metrics for each group. This is called How is the click path of a first-time visitor different from that of a returning visitor?
segmentation. What parts of the website are most important to first-time visitors?
Some segments include: There are many factors that could be preventing your visitors from achieving specific
end goals. From the tone of the copy to the colour of the page, anything on your
website may affect conversions. Possible factors are often so glaringly obvious that
Referral source one tends to miss them, or so small that they are dismissed as trivial. Changing one
Users who arrive at your site via search engines, those who type in the URL directly, factor may result in other unforeseen consequences and it is vital to ensure that you
and those who come from a link in an online news article are all likely to behave don’t jump to the wrong conclusions.
differently. As well as conversion rates, click path and exit pages are important Hotjar ( another popular analytics tool, demonstrates how
metrics to consider. Consider the page that these visitors enter your website from, heatmaps can help you improve your web page. You can find more information here:
can anything be done to improve their experience?
Landing pages
21.6 Data Visualisation
Users who enter your website through different pages can behave very differently.
What can you do to affect the page on which they are landing, or what elements of In the Data-driven decision making chapter, we discussed the importance of
the landing page can be changed to positively influence outcomes? reporting on data and making sure that the information gets to the right users, in the
right way. Not everyone is adept at understanding a detailed financial breakdown,
558 559
Data analytics › Analysing data Data analytics › Analysing data
and analytics reports often intimidate people, so how can a data-focused marketer
present information in a way that’s accessible to everyone?
The answer lies in data visualisation, which involves placing data in a visual context
to help users understand it. Data visualisation software can help demonstrate
patterns and trends that might be easily missed in purely text-based data reporting.
It can refer to something as simple as an infographic, or something as complex as a
multi-point interactive program that lets users decide what to compare.
Figure 21. Clever use of the layout of a clock and plotting points for
representing what Americans spend their time doing each day.
Figure 22. Word clouds are becoming popular ways to visualise data,
where the size of the word represents its importance or frequency.
Many data visualisation online are also interactive. Visit this link to see an
interactive data visualisation about voting habits of Americans: https://
For a good lesson in data visualisation, including how to start using it, check out
this article from SAS: Data Visualization: What it is and why it matters - https://
Figure 20. Representing data in different ways.
560 561
Data analytics › Analysing data Data analytics › Case study
NOTE It can be challenging to decide on what data you want to visualise and the information
you want to communicate, but as long as you know how your audience is likely to 21.9 Case study: eFinancialCareers
For some tips on how
to begin with data
process visual information and what they need to know, you should be able to choose
something that conveys the necessary information simply. 21.9.1 One-line summary
visualisation, take a
quick look at some eFinancialCareers, the world’s leading financial services careers website, used Google Analytics
tools and some more
resources on the topic. 21.7 Tools of the trade 360 and DoubleClick Manager to improve its programmatic display remarketing.
The Guardian actually
has a remarkably The first thing you need is a web analytics tool for gathering data. Some are free and 21.9.2 The challenge
useful article: https:// some need to be paid for. You will need to determine which package best serves your needs. Bear in mind that if you switch vendors, you may lose historical data. eFinancialCareers uses dynamic remarketing ads to drive leads to its site, where their goal – the
global-development- major conversion they hope for – is for the user to fill out a job application. The company wanted
professionals- Below are some leading providers: to boost the number of conversions coming to them from programmatic ads.
aug/28/interactive- • Google Analytics –
• AWStats –
• Webalizer –
• Hotjar –
• GoSquared –
• Kissmetrics –
• Clicky –
When it comes to running split tests, if you don’t have the technical capacity to run
these in-house, there are some third-party services that can host them for you.
Google Optimize, which you would have learnt about in the Conversion optimisation
chapter is Google’s platform for running tests and assessing your website’s
A test the significance of basic split tests, a split-test calculator is available at: When you use cookie-based tracking,
you need to add code tags to your web pages and these need to be maintained,
updated and changed occasionally. Google Tag Manager (
tagmanager) makes it easy to add and work with these tags without requiring any
coding knowledge. Other professional tag management tools include TagMan (www., Ensighten ( and Tealium (
562 563
Data analytics › Case study Data analytics › Further reading
After collecting data about their website users for six weeks, they segmented them into: 21.11 Summary
• Passive users, who have visited the website and potentially registered for job updates, The ability to track user behaviour on the Internet allows you to analyse almost every level of a
but haven’t viewed or applied for any jobs. digital campaign, which should lead to improved results over time. The foundation of successful
web analytics is to determine campaign and business objectives upfront and to use these to
• Active users, who have viewed and applied for jobs.
choose goals and KPIs grounded in solid targets.
Once they had this data about each segment, they could tailor programmatic remarketing ads Web analytic packages come in two flavours – server-based and cookie-based tracking – although
to send the right message to individual users. Messages could encourage passive users to apply some packages combine both methods.
for vacancies, while active users could be sent tailored ads based on information such as the job Data can be analysed to discover how users behave, whether outcomes have been achieved,
sector in which they had displayed interest. and how appealing the user experience is. Testing to optimise user experience can demonstrate
They created almost 300 different remarketing lists, adding layers of detail to each identified ways in which to influence user behaviour so that more successful outcomes can be achieved.
segment. segmenting the audience allows specific groups of users to be analysed.
Because Analytics 360 and DoubleClick can be integrated, updated remarketing lists can be
automatically passed to DoubleClick to ensure relevant targeting for programmatic ads. 21.12 Case study questions
21.9.4 The results 1. Why did eFinancialCareers create so many remarketing lists?
Because the new system allowed remarketing to reach users within the ideal conversion 2. Describe what analytics data was gathered to create these lists. Why did they choose to
period, and with relevant messaging that was updated based on the user’s site activity, the ads focus on this data?
performed considerably better. eFinancialCareers saw:
3. How did the integration of various digital elements improve this brand’s remarketing
• A 21% increase in site traffic from real-time bidding campaigns efforts?
• A 423% increase in conversion rates for job applications coming from remarketing
21.13 Chapter questions
(Google, 2016)
1. Why is it so important to use data to inform business decisions?
21.10 The bigger picture
2. What would you learn from a single-page heat map?
Tracking, analysing and optimising are fundamental to any digital marketing activity, and it is
3. What is the difference between a goal and a KPI?
possible to track almost every detail of any online campaign.
Most analytics packages can be used across all digital marketing activities, allowing for an
integrated approach to determining the success of campaigns. While it is important to analyse 21.14 Further reading
each campaign on its own merits, the Internet allows for a holistic approach to these activities. – Avinash Kaushik is an analytics evangelist, and his regular insight on his blog,
The savvy marketer will be able to see how campaigns affect and enhance each other. Occam’s Razor, is essential reading for any digital marketer.
The data gathered and analysed can provide insights into the following fields, among others:
Web Analytics 2.0 by Avinash Kaushik – if you are looking to get started in web analytics, you can’t go
• SEO: What keywords are users using to search for your site, and how do they behave wrong with this book by the web analytics legend.
once they find it? – Analytics Pros has a blog with great advice and thoughts about analytics.
• Email: When is the best time to send an email newsletter? Are users clicking on the
links in the newsletter and converting on your website? – Adobe has a good blog with a lot of analytics information as well.
• Paid media: How successful are your paid advertising campaigns? How does paid traffic – Believe it or not, the Content Marketing Institute has some
compare to organic search traffic?
great analytics tips.
• Social media: Is social media driving traffic to the website? How do fans of the brand
behave compared to those who do not engage socially? – Google Analytics Help Center is an excellent starting point
• Mobile: How much of your traffic comes from mobile devices? Is it worth optimising your for anyone who wants to get to grips with this free, excellent web analytics service.
site for these? (It usually is!)
564 565
Data analytics › Figure acknowledgements
measure of
your customer
Ending the tyranny
of the session
In the mid-1990s, website server log file analysts (creating a practice
later to be known as web analytics) created the concept of the
session. The “session” was, philosophically, a measure reflecting
the singular amount of attention that a visitor gave to a website, by
landing on and loading pages, navigating links and reading content.
Out of necessity, these analysts needed to define a time span around
which to base good measurements and arbitrarily chose 30 minutes,
and the session was born. For measurement purposes, the session
might not last 30 full minutes or the session might last much longer,
but if a visitor was inactive for 30 minutes, the session was over. This
framing has dominated the field of web — now digital — analytics ever
since, with Adobe, Google, Webtrends, Coremetrics and other tools
currently defining a session (or “visit”) as the continuous, sequential
behavior of a visitor on an online property until there has been no
activity for 30 minutes.1
way that we think about online two decades ago — a timestamp is combined with a cookie or IP
address combination, and sessions are cut off with 30 minutes of
behavior that no one questions gap in the clickstream. Most standard digital analytics behavior is
measured within this session framework: “conversion rate” is actually
the concept or why it was “conversions per session”; “landing pages” are the first page-
created in the first place. load within a session; “campaign response” is the campaign that
generated a session; “referrers” are the websites immediately before
that session, and an “exit page” is the last thing a visitor did in that
session before that 30-minute window expired. The session concept
has become so ingrained in the way that we think about online
behavior that no one questions the concept or why it was created in
the first place.
Over the years, slight nuances have been made to account for outlier traffic, with
Adobe, for example, also cutting a visit off if there has been continuous activity for
24 hours.
In common digital reporting, it is difficult to escape the pervasiveness of the session, whether analysts are using the common web analytics
tools (Adobe, Google, Webtrends, IBM Coremetrics) or are looking at data through clickstream feeds. Below is a sample of such metrics and the
reasons why they have been questioned in recent years:
Session-based metrics
The above table points out issues with individual session-based metrics. To mitigate these issues usually requires very careful reporting,
additional rounds of technical implementation or tool configuration, spreadsheet-based heuristics or data manipulation or an understood
“grain of salt” on the part of metric stakeholders. But the underlying assumption behind these metrics is also flawed: the session, as originally
and currently defined, no longer represents the actionable unit at which a customer’s or prospect’s digital attention to the brand singularly
| 2. Drivers of changes
The growing obsolescence of the session as a unit of measurement • Social ads delivered based on display ad impressions
can be attributable broadly to several factors: the growth in the
• Audience data collected from off-platform programming or native
multi-device habit among visitors, the expansion of the digital
advertising (e.g., YouTube, Snap, Forbes BrandVoice)
experience beyond owned properties and the greater availability of
experience data. In addition to making onsite behavioral measurement problematic,
these trends are blurring the line between website optimization and
Growth in multi-device engagement
marketing. Outbound marketing campaigns are now informed by
According to ComScore, 66% of the digital population is website activity and vice versa. The segments being targeted for
multiplatform.2 Nielsen reports an average of four digital devices marketing and communications are the same segments that are
owned by a typical American household, and Google reports that activating onsite personalization.
only 10% of consumers surf the internet from only one device.3 The
Digitization of the real world
same machine — particularly laptops and desktops — could have
multiple browsers like Internet Explorer, Mozilla Firefox, Apple Safari Online interactions have, from the beginning, been highly measurable
and Google Chrome installed and used concurrently, and all major and analyzable, and now offline interactions are becoming digitized
browsers allow users to browse on multiple tabs. Finally, ComScore and quantified to a degree that enables the full customer journey to
reported that more than half of all internet behavior is done through be measured and optimized. Commonly referred to as a “360-View,”
apps, not browsers.4 The practical implication for these trends is joinable data is becoming available, underscoring the smallness of the
that multi-session behavior by the same individual is the norm, not individual website session. Such data includes:
the exception, and that even though cookie-based measurement
• Call center data, with voice transcripts often digitized and classified
challenges have increased, the need to measure behavior across
browsers and devices has become more acute. • In-store analytics: the ability to track shoppers’ behavior in and
around a physical store through mobile and video technology,
Expansion of digital engagement beyond owned assets
digital kiosks and POS data collection
Furthermore, digital customer engagement with the brand no longer
• Internet-of-Things, including onboard devices (OBD), usage meters
takes place mostly on owned websites or apps, but also on social
or supply chain data
media, search engines, interactive emails and the wider display
and video advertising universe. What this means is that a visitor’s • Customer satisfaction, or net promoter score (NPS) data collected
engagement with a brand, their true “macro-session,” could really through surveys on either a transactional or overall customer
start on an email and end on Facebook. Because of the broad scale relationship basis
at which certain tags are deployed, cookie aggregators like Oracle
Measuring digital performance across this proliferation of devices,
BlueKai or Google DoubleClick can piece together these visitor macro-
experiences and platforms presents a very different set of challenges
sessions and use this data for audience segmentation and targeting.
than the ones addressed 20+ years ago when a website was simply
The industry is increasingly moving in this direction and looking for
trying to measure basic traffic and impact across a relatively small
additional solutions to solve ecosystem measurement and audience
number of web pages with limited multimedia, ecommerce or social
profiling challenges. Some practical analytics examples of how
networking capabilities.
these unowned properties are beginning to be incorporated into the
customer journey include the following:
• Segment targeting of display ads leading to personalization
of website content through distinct landing pages and onsite
experiences In addition to making onsite
• Email retargeting campaigns based on website interactions (owned behavioral measurement
websites and external)
problematic, these trends are
Comscore Inc. Investor Luncheon – Final. Fair Disclosure Wire, Oct 27, 2016
"Mobile Marketing Statistics Compilation," Smart Insights website, http://www.
blurring the line between website
statistics. Accessed 20 November 2017. optimization and marketing.
"The 2016 U.S. Mobile App Report," comScore website, https://www.comscore.
Accessed 20 November 2017.
Industry example (media and entertainment): A video-on-demand As soon as a visitor enters any website included in any of these
service knows that members of a single household use the same login networks (and a single website can belong within more than one),
to access their services. In order to personalize the recommendations they can be deduplicated down to a single ID, and all their sessions
available to the user, device fingerprinting is used in conjunction with on multiple websites can be joined. These data environments are
look-alike segmentation to identify which devices should be targeted often referred to as “walled gardens” or “closed platforms” because
with “kids,” “family,” “drama,” or “action” recommendations, even this deduplicated ID is only natively available to the parent network/
though only the aggregate mix of content is associated with a single DMP, within which visitor-level segmentation can occur for website
authentication ID. and marketing personalization purposes. A company wishing to
take advantage of this rich segmentation and targeting data-pool
Walled gardens
must subscribe to the DMP provider, be included in the network, or
Cookie-aggregators emerged during the last decade as part of otherwise pay for these marketing services or visitor-level data.
marketing targeting and audience aggregation, working with online
Industry example (media/entertainment): A studio selling movie
properties to deploy a simple third-party JavaScript tag on their
tickets wants to target individuals who have watched movie trailers
websites in return for better data visibility, competitive reporting
on YouTube. But since YouTube is part of the Google “walled garden,”
and advertising targeting. Today, they have expanded to include a
that organization must subscribe to Google Analytics 360 Suite in
wide range of analytics, personalization and data-joining capabilities
order to reach these individuals, or pay for a back-end Google data-
and, as a consequence, are often branded as Data Management
integration with their current DMP. At no point does Google provide
Platforms or “DMPs.” Many of these have been acquired by traditional
individual-level data to the organization in such a way that it can be
database technology companies. About a dozen major vendors
analyzed internally.
account for the major market-share in this space and include Adobe
Audience Manager (originally Demdex), Oracle® DMP (originally All three methodologies above complement each other and can
BlueKai), Salesforce DMP (originally Krux), MediaMath, Neustar®, be used together, making it possible today to map the complete
Nielsen, Acxiom® and Google (through DoubleClick, Google Analytics, customer journey to a degree of accuracy unavailable in the past.
AdWords and YouTube).6 To this list should be added the social media In doing so, organizations are now able to break through the single-
platforms (Facebook, Twitter, Snapchat, LinkedIn) that have scaled session focus of traditional web analytics and optimize the customer
user data-capture not only on their own sites but very widely on the experience holistically.
internet; online retail behemoths (principally Amazon and Walmart);
and media conglomerates (e.g., Comcast/NBCU, Disney/ABC/ESPN,
Time Warner, Turner, Hearst and Cox).
Forrester Wave ™ “Data Management Platforms, Q2 2017” (June 1, 2017) lists eleven
platforms, but did not evaluate Google, which released its Audience Center as part of its
Analytics 360 Suite as a beta in 2016.
As the digital world moves inexorably towards omnichannel Behavioral signatures are sets of distinct actions that define a
experiences, session-based metrics must be supplanted by measures customer’s life cycle stage and their intent or reason for engaging in a
of the efficiency that brands move visitors and customers along their particular digital experience. An organization must understand how to
experience. This will deliver a seamless cross-session and cross-device measure what these customer intents are and quantify their ability to
framework for digital analytics reporting, analysis and optimization. successfully meet the expectation of the customer vs. that intent. The
true measure of engagement then is not necessarily based on what
To achieve this measurement, digital analytics teams should adopt the
happens in a session, but on what happens relative to one or many
following road map:
behavioral signatures.
Customer journey mapping
In digital analytics, the behavioral signatures are imprints within the
Customer journey maps exist as routine work products among clickstream data collected from web behavior. Below is an example of
CX (Customer Experience) teams in most organizations. But what a typical journey map on an acquisition website:
many analytics teams have failed to do is translate these diagrams,
descriptions and PDFs into quantitatively measurable metrics. For the
digital analyst, this requires the creation of behavioral signatures and
journey micro-conversion metrics.
► Checkout Buy
Spurious ► Form submissions
► Marketing
► Social media
► Homepage Learn ► Calculators Shopping cart
► Landing pages ►
or tools
► Forms Decide
► Re-assurers
detail pages
“Spurious” visitors are “one-and-done” — they come to the website, tagging. The customer’s existing life cycle stage is set within a cookie
don’t do anything indicating intent and never return. If they come and modified according to each action taken. Note that this process
to the site intentionally, they are considered as “Land.” From there, can take place in the absence of any personal identifiers outside of the
the journey may take them through a “Learn” or “Customize” phase, standard, anonymous cookie-based tracking, but if authentication is
finally and presumably “Decide” and “Buy.” Behavioral signatures present, then customer data can be made available in real time and
encompass those actions that define a visitor in each phase and can pushed into the web analytics solution.
be defined in real time through JavaScript logic within web analytics
► Entry back to
prospect journeys
► Self-service tools
Maintain ► Order summaries
► Auto pay
► Bill pay
Alerts and
► At-a-glance
notifications Develop
► App downloads
► Online
► Order or
► Onboarding
service issues
Familiarize content
Help, forums
► Email
and chat
► Support
► Call center
Time (months)
Non-digital data related to customer experience (e.g., call center As a final example, content-oriented websites (for example, publishing
interactions, transactional history or voice-of-customer), when or video-on-demand) have an engagement journey lifespan that
combined with the behavioral signature, can provide additional is very long (similar to existing customer, self-service journeys),
refinement and color to the above website journey map, signaling but often lack customer data-enrichment opportunities created by
churn, identifying acute issues and encouraging upsell. account ID’s. Such a journey might look like this:
► Content consumption
Spurious ► Video consumption
► Advertising Engage Dedicate
► Bookmark
► Consistent visitation
Familiarize ► Navigation
► Chat
► Social Share
Land Post
► SEO or social
Time (months)
Decide Buy
% %
Land Learn Customers
and leads
Figure 5: Journey-based micro-conversion metrics: My Account or Self-Service website
Develop Maintain
Familiarize Acute needs
Familiarize accounts Contact rate post digital interaction
All new accounts Issue resolution rate
Use product
Maintain accounts
Familiarize accounts
Add and
These micro-conversion metrics are only the framework for understanding what led a customer through an experience conversion
experience optimization: their tactical and operational use lies in (or conversely, why they failed to do so).
Milestones and KPIs For example, suppose I have an ecommerce website with a journey as
described in Figure X above. My behavioral signature for the “Learn”
So far we have avoided the term “KPI,” because a metric by itself —
phase might consist of the following elements:
while useful tactically — isn’t a key performance indicator. What makes
it “key” is whether a particular behavior can be analytically shown • A customer is in “Learn” phase in any of the following situations:
to improve customer satisfaction, increase sales or otherwise get
• They click a homepage banner
customers or visitors to do what they want they want to do online,
significantly, statistically and demonstrably. Actions and metrics must • They use internal search directly
be analyzed to identify those that stand out as significantly driving
• They click through from an offer email
customers along their life cycle journey; these actions and metrics
then become milestones and KPIs. • They browse by category using filters
• Pulling micro-conversion metrics for each use case might show the
following results:
Statistical methods — either built into the web analytics tool or done The above example also illustrates that the customer journey mapping
in-house — can reveal whether these percentages are significantly and behavioral signatures are best deployed in an iterative manner,
different (simple T-tests to more sophisticated GLM, decision tree or much like customer behavioral segmentation. As improvements are
binary logistic models to control for covariate effects). In this case, the made to online assets, or marketing channels become optimized,
compare products tool can be considered a “milestone” for customers customer journeys and behavioral segments can change. Cluster
in the “Learn” phase, and use of that tool could become a KPI. or principal components analysis could show that the “compare
Experience improvements can, thus, be focused on getting potential products” behavior displays a degree of independence from other
customers to that section of the experience. website use cases and could constitute its own step in the customer
and KPIs
Customer journey refinement process
Once milestones have been identified within the digital customer life and data can be reorganized around milestones that are agnostic of
cycle journey, the use of sessions as the basis for digital behavioral calendar time or device. Traditionally, web data has been organized
measurement and optimization can disappear. Instead, variables like this:
This data architecture places undue emphasis on the session and Instead, the two most important data points — the customer and the
obscures the journey-to-purchase undertaken by the customer. item — should be used as the organizing principle:
Organizing data in this manner also allows for the inclusion of campaign codes, product category, individual transaction ID. The
additional touchpoints (social, email, display ad impressions, etc.), point is that the framework for digital measurement represented in
or even to offline data (call center, point-of-sale). Other table designs the first table is traditional, session-based. In the second table, the
could be used towards different data activation use-cases: marketing framework is based on customer journey.
| 5. The future of digital analytics
With increased customer omnichannel engagement and the real • Web analytics tools will need to adapt to compete, either by
improvements in technology to link data, the session is no longer the expanding to become audience DMPs or by focusing as single-
optimal organizing principle for the collection, storage and activation channel, granular data collection platforms that can be integrated
of digital behavioral data. Rather, mature organizations are moving with existing DMP solutions.
toward the customer journey as this organizing principle, replacing
Web analytics tools that do not enable integration and view of
the session with an over-time, multiple touchpoint view organized
the full customer life cycle do not capture the insights these
by life cycle stages and milestones, and collected through behavioral
integrations enable. As the industry evolves away from session-
signatures. What are the longer-term implications of this trend for
based customer views, tools will need to follow suit to stay
digital analytics? We predict the following:
• Customer journey and experience mapping will increasingly
• Cloud-based data repositories will slowly be brought in-house in
become a data-led exercise informed as much by data scientists and
order to integrate PII (Personally Identifiable Information), PHI
analytics professionals as experience strategists.
(Protected Health Information), financial, legal, or transactional
The best target state experience designs will be informed by deep data.
insights derived from measuring behavioral signatures, and the
As powerful as they are and will become in the next year or two,
best analytics implementations will be designed from understanding
third-party cloud-based DMPs are stymied because they cannot
a journey map — with a sound test strategy built in to make sure
store private information. Secure as they may be (or may become),
both the analytics and experience people have it right.
most large organizations with large repositories of customer
• Just as web analytics expanded to become digital analytics, digital information will be legally (or socially) prevented from sending
analytics teams will blend with customer analytics, customer this to third-party DMPs. As the analytics power of full customer
insights, or customer experience teams. journey analytics becomes widely exploited, the most mature
organizations will need to bring as much data as possible in-house
Between 2008 and 2011, different emerging specializations such
in order to differentiate themselves analytically.
as social analytics, mobile analytics, web analytics and — to some
degree — marketing analytics became blurred under the heading This means that organizations will demand full feeds — including
of “digital analytics.”7 Today, we have customer analytics, digital visitor IDs — from their third-party DMPs. Walled gardens could
analytics, finance analytics, supply chain analytics and text analytics open their doors, even if they charge a hefty price at their gates.
— as databases become more comprehensive, it is likely that the line
• IoT (Internet of Things), in-store analytics, OBD, voice and mobile
between some of these specializations will become blurred.
geolocational data will be the new frontiers in customer analytics,
• Web analytics implementations will become more focused on those eventually to be integrated with the above.
behavioral dimensions that indicate key segments and steps in the
Each new device, sensor and interaction point becomes another
customer journey.
data source to better understand behavioral signatures and the
Implementations of web analytics tools consisting of hundreds context that best drives outcomes along the experience map.
of custom variables will become outdated. Rather, web-specific Session data is clearly not relevant to the connected home, car
behavioral dimensions will become standardized in a CMS-fed or quantified self, so it is time to remove it from our other digital
data layer to become one of many customer dimensions of equal experience data sources.
usefulness for segmentation and targeting.
Forward-thinking organizations have begun to invest valuable time
and resources in journey mapping and planning the customer
experience across omnichannel interactions. Many also have mature
digital analytics implementations that are divorced from this journey
construct. The time has come to bring these worlds together and
retire the session in favor of the customer experience.
As an example, the Web Analytics Association officially changed its name to the Digital Vivat Experientia!
Analytics Association in 2011.
... mature organizations are moving toward the customer journey as this
organizing principle, replacing the session with an over-time, multiple touchpoint
view organized by life cycle stages and milestones, and collected through
behavioral signatures.
Taking the measure of your customer experience | 11
Chad Richeson
Advisory, PI
Chris Gianutsos
Executive Director
Advisory, PI
Thomas Buchte
Advisory, PI
Jason Bennett
Advisory, PI
Brian Clifton
Copyright Statement: All content © 2010 by Brian Clifton - Copyright holder is licensing this under the Creative Commons License,
Attribution-Noncommercial-No Derivative Works 3.0 Unported, (This means you can post
this document on your site and share it freely with your friends, but not resell it or use as an incentive for action.)
Table of Contents
Introduction........................................................................................4 Why PPC Vendor Numbers Do Not Match ................................. 15
How Web Sites Collect Visitor Data...............................................4 Tracking URLs: Missing Paid Search Click-throughs................... 15
Page Tags and Logfiles ....................................................................4 Slow Page Load Times................................................................... 15
Cookies in Web Analytics................................................................6 Clicks and Visits: Understanding the Difference........................... 16
Understanding Web Analytics Data Accuracy.............................7 PPC Account Adjustments ............................................................. 16
Issues Affecting Visitor Data Accuracy for Logfiles...................7 Keyword Matching: Bid Term versus Search Term...................... 16
Dynamically Assigned IP Addresses ...............................................7 Google AdWords Import Delay ...................................................... 16
Client-Side Cached Pages ...............................................................8 Losing Tracking URLs Through Redirects .................................... 16
Counting Robots................................................................................8 Data Misinterpretation ................................................................... 17
Issues Affecting Visitor Data From Page Tags............................8 Why Counting Uniques Is Meaningless...................................... 18
Setup Errors Causing Missed Tags .................................................8 Ten Recommendations For Enhancing Accuracy .................... 18
JavaScript Errors Halt Page Loading...............................................9 Summary .......................................................................................... 19
Firewalls Block Page Tags ...............................................................9 Acknowledgements........................................................................ 19
Logfiles “See” Mobile Users .............................................................9
Issues Affecting Visitor Data When Using Cookies....................9
Visitors Rejecting or Deleting Cookies ............................................9
Users Owning and Sharing Multiple Computers ...........................10
Latency Leaves Room for Inaccuracy ...........................................11
Offline Visits Skewing Data Collection...........................................11
Comparing Data From Different Vendors ...................................12
First-Party Versus Third-Party Cookies .........................................12
Page tags: Placement Considerations ..........................................12
Did You Tag Everything?................................................................12
Pageviews: A Visit or a Visitor? .....................................................12
Cookies Timeouts............................................................................13
Page-tag Code Hijacking ................................................................13
Data Sampling .................................................................................13
PDF files: A Special Consideration ................................................13
E-commerce: Negative Transactions.............................................13
Filters and Settings: Potential Obstacles.......................................13
Time Differences .............................................................................14
Process Frequency: Understanding glitches.................................14
Goal Conversions versus Pageviews ............................................14
In the past decade, the Internet has transformed marketing, but
anyone expecting to increase their revenue and profitability using With these types of metrics, marketers and webmasters can
the web needs to get their facts straight with respect to web traffic. determine the direct impact of specific marketing campaigns. The
Of course, the web is a great medium to market and sell products level of detail is critical. For example, you can determine if an
and services. But if you don’t understand the behaviour of your web increase in pay-per-click advertising spend for a set of keywords on a
site visitors in sufficient detail, your business is going nowhere. single search engine – increased the return on investment during that
time period. So, as long as you can minimise inaccuracies, web
So it is no great surprise that the business of web analytics has analytics tools are effective for measuring visitor traffic to your online
grown in tandem with business use of the Internet. Put simply, web business. The remainder of this document examines, in detail, how
analytics are tools and methodologies used to enable organisations inaccuracies arise and how organisations can counter them.
to track the number of people who view their site and then use this
to measure the success of their online strategy.
How Web Sites Collect Visitor Data
The danger is, too many businesses take web analytics reports at
face value and this raises the issue of accuracy. After all, it isn’t Page Tags and Logfiles
difficult to get the numbers.
There` are two common techniques for collecting web visitor data –
However the harsh truth is web analytics data can never be 100 page tags and logfiles.
percent accurate, and even measuring the error bars is difficult.
Logfiles refer to data collected by counter the disadvantages of the other. This is known as a hybrid
your web server independently of a method and some vendors can provide this.
visitor’s browser: the web server
logs its activity to a text file that is Table 1 – Page Tag versus logfile data collection
usually local. the analytics customer
views reports from the local server, Page Tagging Logfile Analysis
as shown in Figure 2. this
technique, known as server-side Advantages Advantages
data collection, captures all • Breaks through proxy and • Historical data can be
requests made to your web server, caching servers—provides more reprocessed easily.
including pages, images, and accurate session tracking. • No firewall issues to worry
PDFs, and is most frequently used by stand-alone licensed software • Tracks client-side events—e.g., about.
vendors. JavaScript, Flash, Web 2.0 • Can track bandwidth and
(Ajax). completed downloads—and
In the past, the easy availability of web server logfiles made this • Captures client-side e-commerce can differentiate between
technique the one most frequently adopted for understanding the data—server-side access can be completed and partial
problematic. downloads.
behaviour of visitors to your site. In fact, most Internet service
• Collects and processes visitor • Tracks search engine spiders
providers (ISPs) supply a freeware log analyzer with their web- data in nearly real time. and robots by default.
hosting accounts (Analog, Webalizer, and AWstats are some • Allows the vendor to perform • Tracks legacy mobile visitors
examples). Although this is probably the most common way people program updates for you. by default.
first come in contact with web analytics, such freeware tools are too • Allows the vendor to perform data
basic when it comes to measuring visitor behaviour and are not storage and archiving for you.
considered further in this book.
Disadvantages Disadvantages
In recent years, page tags have become more popular as the
method for collecting visitor data. Not only is the implementation of • Setup errors lead to data loss—if you • Proxy and caching
page tags easier from a technical point of view, but data- make a mistake with your tags, data is inaccuracies—if a page is
management requirements are significantly reduced because the lost and you cannot go back and cached, no record is logged
data is collected and processed by external SaaS servers (your reanalyze. on your web server.
• Firewalls can mangle or restrict tags. • No event tracking—e.g., no
vendor), saving website owners the expense and maintenance of • Cannot track bandwidth or completed JavaScript, Flash, Web 2.0
running licensed software to capture, store, and archive information. downloads—tags are set when the tracking (Ajax).
page or file is requested, not when the • Requires your own team to
Note that both techniques, when considered in isolation, have their download is complete. perform program updates.
limitations. Table 1 summarizes the differences. A common myth is • Cannot track search engine • Requires your own team to
that page tags are technically superior to other methods, but as spiders— robots ignore page perform data storage and
Table 1 shows, that depends on what you are looking at. By tags archiving.
combining both techniques, however, the advantages of one • Robots multiply visit counts.
different page than the one a first-time visitor would view, such as a
Other Data-Collection Methods “welcome back” message to give them a more individual experience
or an auto-login for a returning subscriber.
Although logfile analysis and page tagging are by far the most
widely used methods for collecting web visitor data, they are The following are some cookie facts:
not the only methods. Network data-collection devices
(packet sniffers) gather web traffic data from routers into • Cookies are small text files (no larger than 4 Kb), stored locally,
black-box appliances. Another technique is to use a web that are associated with visited website domains.
server application programming interface (API) or loadable • Cookie information can be viewed by users of your computer,
module (also known as using notepad or a text editor application.
a plug-in, though this is not strictly correct terminology). • There are two types of cookies: first party and third party.
These are programs that extend the capabilities of the web • A first-party cookie is one created by the website domain. A
server—for example, enhancing or extending the fields that visitor requests it directly by typing the URL into their browser or
are logged. Typically, the collected data is then streamed to a by following a link.
reporting server in real time. • A third-party cookie is one that operates in the background and is
usually associated with advertisements or embedded content
that is delivered by a third-party domain not directly requested by
the visitor.
• For first-party cookies, only the website domain setting the
Cookies in Web Analytics cookie information can retrieve the data. this is a security feature
built into all web browsers.
Page tag solutions track visitors by using cookies. Cookies are small • For third-party cookies, the website domain setting the cookie
text messages that a web server transmits to a web browser so that can also list other domains allowed to view this information. the
it can keep track of the user’s activity on a specific website. The user is not involved in the transfer of third-party cookie
visitor’s browser stores the cookie information on the local hard information.
drive as name–value pairs. Persistent cookies are those that are still • Cookies are not malicious and can’t harm your computer. they
available when the browser is closed and later reopened. can be deleted by the user at any time.
Conversely, session cookies last only for the duration of a visitor’s • A maximum of 50 cookies are allowed per domain for the latest
session (visit) to your site. versions of IE8 and Firefox 3. Other browsers may vary (opera 9
currently has a limit of 30; Safari and Google Chrome have no
For web analytics, the main purpose of cookies is to identify users limit on the number of cookies per domain).
for later use—most often with an anonymous visitor id. Among many
things, cookies can be used to determine how many first-time or
repeat visitors a site has received, how many times a visitor returns
each period, and how much time passes between visits. Web
analytics aside, web servers can also use cookie information to
present personalized web pages. A returning customer might see a
Web Analytics Accuracy Page 6 of 20
! Brian Clifton
Understanding Web Analytics Accuracy
Conflicting Data Points Are Common
When it comes to benchmarking the performance of your website,
web analytics is critical. However, this information is accurate only if A UK survey of 800 organizations revealed that almost two-
you avoid common errors associated with collecting the data— thirds (63 percent) of respondents say they experience
especially comparing numbers from different sources. Unfortunately, conflicting information from different sources of online
too many businesses take web analytics reports at face value. After measurement data (“Online Measurement and Strategy Report
all, it isn’t difficult to get the numbers. the harsh truth is that web 2009,”, June 2009).
analytics data can never be 100 percent accurate, and even
measuring the error bars can be difficult.
So what’s the point? Next, I’ll discuss in detail why such inaccuracies arise, so you can put
this information into perspective. the aim is for you to arrive at an
Despite the pitfalls, error bars remain relatively constant on a acceptable level of accuracy with respect to your analytics data.
weekly, or even a monthly, basis. Even comparing year-by-year Recall from Table 1 that there are two main methods for collecting
Behaviour can be safe as long as there are no dramatic changes in web visitor data—logfiles and page tags—and both have limitations.
technology or end-user behaviour. As long as you use the same
yardstick, visitor number trends will be accurate. For example, web
analytics data may reveal patterns like the following:
Issues Affecting Visitor Data Accuracy for
• Thirty percent of site traffic came from search engines. Logfiles
• Fifteen percent of site revenue was generated by product
page x.html. Logfile tracking is usually set up by default on web servers. Perhaps
• We increased subscription conversions from our email because of this, system administrators rarely consider any further
campaigns by 20 percent last week. implications when it comes to tracking.
• Bounce rate decreased 10 percent for our category pages
during March.
Dynamically Assigned IP Addresses
With these types of metrics, marketers and webmasters can
determine the direct impact of specific marketing campaigns. The Generally, a logfile solution tracks visitor sessions by attributing all
level of detail is critical. For example, you can determine if an hits from the same IP address and web browser signature to one
increase in pay-per-click advertising spending—for a set of person. This becomes a problem when ISPs assign different IP
keywords on a single search engine—increased the return on addresses throughout the session. A U.S.-based comScore study
investment during that time period. As long as you can minimize (
s/2007/Cookie_Deletion_Whitepaper) showed that a typical home unnamed robots exist. For this reason, a logfile analyzer solution is
PC averages 10.5 different IP addresses per month. Those visits likely to over count visitor numbers, and in most cases this can be
will be counted as 10 unique visitors by a logfile analyzer. This issue dramatic.
is becoming more severe, because most web users have identical
web browser signatures (currently internet explorer). As a result,
visitor numbers are often vastly over counted. This limitation can be
Issues Affecting Visitor Data From Page
overcome with the use of cookies. Tags
Client-Side Cached Pages Deploying a page tag on every single page is a process that can be
automated in many cases. However, for larger sites 100 percent
Client-side caching means a previously visited page is stored on a correct deployment is rarely achieved. Perhaps it is because the
visitor’s computer. In this case, visiting the same page again results page tag is hidden to the human eye or there is so much other data
in that page being served locally from the visitor’s computer, and available that those errors often go unnoticed for long periods.
therefore the visit is not recorded at the web server. Having a full deployment is crucial to the accuracy and validity of
data collected by this method.
Server-side caching can come from any web accelerator technology
that caches a copy of a website and serves it from their servers to Setup Errors Causing Missed Tags
speed up delivery. This means that all subsequent site requests
come from the cache and not from the site itself, leading to a loss in The most frequent error by far observed for page tagging solutions
tracking. Today, most of the Web is in some way cached to improve comes from its setup. Unlike web servers, which are configured to
performance. For example, see Wikipedia’s cache description at log everything delivered by default, a page tag solution requires the webmaster to add the tracking code to each page. Even with an
automated content management system, pages can and do get
Counting Robots missed.
Robots, also known as spiders or web crawlers, are most often used In fact, evidence from analysts at Maxamine
by search engines to fetch and index pages. However, other robots (—now part of Accenture Marketing
exist that check server performance—uptime, download speed, and Sciences—who used their automatic page auditing tool has shown
so on—as well as those used for page scraping, including price that some sites claiming that all pages are tagged can actually have
comparison, e-mail harvesters, competitive research, and so on. as many as 20 percent of pages missing the page tag—something
These affect web analytics because a logfile solution will also show the webmaster was completely unaware of. In one case, a corporate
all data for robot activity on your website, even though robots are business-to-business site was found to have 70 percent of its pages
not real visitors. missing tags. Missing tags equals no data for those pageviews.
JavaScript Errors Halt Page Loading vendors can revert to using the visitor’s IP address for tracking in
these instances, but mixing methods is not recommended. As
Page tags work well, provided that Javascript is enabled on the discussed previously in “issues affecting visitor data accuracy for
visitor’s browser. Fortunately, only about 1 to 3 percent of Internet logfiles” (comScore report), using visitor IP addresses is far less
users have disabled Javascript on their browsers, as shown in accurate than simply not counting such visitors. It is therefore better
Figure 3. However, the inconsistent use of Javascript code on web to be consistent with the processing of data.
pages can cause a bigger problem: Any errors in other Javascript
on the page will immediately halt the browser scripting engine at Logfiles “See” Mobile Users
that point, so a page tag placed below it will not execute.
A mobile web audience study by comScore back in January 2007
( showed that in
the United States, 30 million (or 19%) of the 159 million U.S. Internet
users accessed the Internet from a mobile device. At that time, the
vast majority of mobile phones did not understand Javascript or
cookies, and hence only logfile tools were able to track visitors who
browsed using their mobile phones.
Corporate and personal firewalls can prevent page tag solutions Visitors Rejecting or Deleting Cookies
from sending data to collecting servers. In addition, firewalls can
also be set up to reject or delete cookies automatically. Once again, Cookie information is vital for web analytics because it identifies
the effect on visitor data can be significant. Some web analytics visitors, their referring source, and subsequent pageview data. The
current best practice is for vendors to process first-party cookies machine, a web analytics solution will ‘see’ them as a
only. This is because visitors often view third-party cookies as different and new visitor every time.
infringing on their privacy, opaquely transferring their information to
third parties without explicit consent. Therefore, many anti-spyware
programs and firewalls exist to block third-party cookies Correcting Data for Cookie Deletion and Rejection
automatically. It is also easy to do this within the browser itself. By
contrast, anecdotal evidence shows that first-party cookies are Calculating a correction factor to account for your visitors
accepted by more than 95 percent of visitors. either deleting or rejecting your web analytics cookies is quite
straightforward. All you need is a website that requires a user
Visitors are also becoming savvier and often delete cookies. login. That way you can count the number of unique login IDs
independent surveys conducted by Belden Associates (2004), and divide it by the number of unique users your web analytics
Jupiterresearch (2005), Nielsen//Netratings (2005) and comScore tool reports. The result is a correction factor that can be
(2007) concluded that cookies are deleted by at least 30 percent of applied to subsequent data (number of unique visitors, number
internet users in a month. of new visitors, or number of returning visitors).
Users Owning and Sharing Multiple Computers Having a website that requires a user login is, thankfully in my
view, quite rare, because people wish to access information
User behaviour has a dramatic effect on the accuracy of information freely and as easily as possible. So, although the correction-
gathered through cookies. Consider the following scenarios: factor calculation is straightforward, you most probably don’t
have any login data to process. Fortunately, a small number of
Same user, multiple computers websites can calculate a correction factor to shed light on this
• Today, people access the Internet in any number of ways – issue. These include online banks and popular brands such as
from work, home, or public places such as Internet cafes. Amazon, FedEx, and social network sites, where there is a real
One person working from three different machines results in user benefit to both having an account and (most importantly)
three cookie settings, and all current web analytics solutions using it when visiting the site.
will count each of these anonymous user sessions as
unique. A specific example is Sun Microsystems Forums
(, a global community of developers with
Different users, same computer nearly 1 million contributors. A 2009 study by Paul Strupp and
• People share their computers all the time, particularly with Garrett Clark, published at,
their families, and, as a result, cookies are shared too reveals some interesting data.
(unless you log off or switch off you computer each time it is
used by a different person). In some instances, cookies are When using third-party cookies:
deleted deliberately. For example, Internet cafes are set up
to do this automatically at the end of each session. So even • 78% is the correction factor for monthly unique users.
if a visitor uses that cafe regularly and works from the same • 20% of users delete (more correctly defined as lose) their
measurement cookie at least once per month.
Latency Leaves Room for Inaccuracy Another issue to consider is how your offline marketing is tracked.
Without taking this into account, visitors who result from your offline
The time it takes for a visitor to be converted into a customer campaign efforts will be incorrectly assigned or grouped with other
(latency) can have a significant effect on accuracy. For example, referral sources and therefore skew your data.
For page tag solutions, it is not the completed PDF download that is Data can vary when a filter is set up in one vendor’s solution but not
reported, but the fact that a visitor has clicked on a PDF file link. in another. Some tools can’t set up the exact same filter as another
tool, or they apply filters in a different way or at a different point Process Frequency: Understanding glitches
during data processing.
the frequency of processing is best illustrated by example: google
Consider, for example, a page-level filter to exclude all error pages Analytics does its number crunching to produce reports hourly.
from your reports. Visit metrics such as time on site and page depth however, because it takes time to col- late all the logfiles from all of
may or may not be adjusted for the filter depending on the vendor. the data-collecting servers around the world, reports are three to four
This is because some vendors treat page-level metrics separately hours behind the current time. in most cases, it is usually a smooth
from visitor-level metrics. pro- cess, but sometimes things go wrong. For example, if a logfile
transfer is interrupted, then only a partial logfile is processed.
because of this, google collects and reprocesses all data for a 24-
Time Differences hour period at the day’s end. other vendors may do the same, so it is
important not to focus on discrepancies that arise on the current day.
A predicament for any vendor when it comes to calculating the time
on site or time on page for a visitor’s session involves how to Goal Conversions versus Pageviews
calculate for the last page viewed. For example, time spent on
pageA is calculated by taking the difference between the visitor’s Using Figure 4 as an example, assume that five pages are part of
timestamp for pageA and the subsequent timestamp for pageB, and your defined funnel (click-stream path), with the last step (page 5)
so on. But what if there is no pageC; How can the time on page be being the goal conversion (purchase). During checkout, a visitor goes
calculated for pageB if there is no following timestamp? back up a page to check a delivery charge (step A) and then
continues through to complete payment. The visitor is so happy with
Different vendors handle this in different ways. Some ignore the final the simplicity of the entire process that she then purchases a second
pageview in the calculation; others use an onUnload event to add a item using exactly the same path during the same visitor session
timestamp should the visitor close their browser or go to a different (step B).
website. Both are valid methods, although not every vendor uses
the onUnload method. The reason some vendors prefer to ignore Depending on the vendor you use, this process can be counted in
the last page is that it is considered the most inaccurate from a time various ways, as follows:
point of view— perhaps the visitor was interrupted to run an errand
or left their browser in its current state while working on something • Twelve funnel page views, two conversions, two transactions
else. Many users behave in this way; that is, they complete their • Ten funnel page views (ignoring step A), two conversions,
browsing task and simply leave their browser open on the last page two transactions
while working in another application. A small number of pageviews • Five funnel page views, two conversions, two transactions
of this type will disproportionately skew the time-on-site and time- • Five funnel page views, one conversion (ignoring step B),
on-page calculations; hence, most vendors avoid this issue. two transactions
Note: Google Analytics ignores the last pageview of a visitor’s Most vendors, but not all, apply the last rationale to their reports. That
session when calculating the time-on-site and time-on-page is, the visitor has become a purchaser (one conversion); and this can
metrics. happen only once in the session, so additional conversions
(assuming the same goal) are ignored. For this to be valid, the same
rationale must be applied to the funnel pages. In this way, the data Note: in the above example, the total number of pageviews is
becomes more visitor-centric. 12 and should be reported as such in all pageview reports. It is
the funnel and goal conversion reports that will be different.
long page load is only two seconds (see ‘blue shoes’ and clicks on your ad. Web analytics vendors may report the search term, the bid term or both.
Google AdWords Import Delay
Clicks and Visits: Understanding the Difference
Within your AdWords account, you’ll see that data is updated hourly.
Remember that PPC vendors, such as Google AdWords, measure This is because advertisers need this information to control budgets.
clicks. Most web analytics tools measure visitors who can accept a Google Analytics imports AdWords cost data once a day. This is for
cookie. Those are not always going to be the same thing when you the data range minus 48 to 24 hours from 23:59 the previous day (so
consider the effects on your web analytics data of cookie blocking, AdWords cost data is always at least 24 hours old).
Javascript errors, and visitors who simply navigate away from your
landing page quickly—before the page tag collects its data. Why the delay? because it allows time for the AdWords invalid-click
Because of this, web analytics tools tend to slightly underreport and fraud- protection algorithms to complete their work and finalize
visits from PPC networks. click-through numbers for your account. therefore, from a reporting
point of view, the recommendation is to not compare AdWords visitor
PPC Account Adjustments numbers for the current day. this recommendation holds true for all
web analytics solutions and all PPc advertising networks.
Google AdWords and other PPC vendors automatically monitor
invalid and fraudulent clicks and adjust PPC metrics Note: Although most of the AdWords invalid click updates take
retroactively. For example, a visitor may click your ad several place within hours, final adjustments may take longer. For this
times (inadvertently or on purpose) within a short space of time. reason, even if all other factors are eliminated, AdWords
Google AdWords automatically investigates this influx and numbers and web analytics reports may never match exactly.
removes the additional click-throughs and charges from your
account. However, web analytics tools have no access to these
systems and so record all PPC visitors. For further information Losing Tracking URLs Through Redirects
on how Google treats invalid clicks, see: Using third-party ad-tracking systems—such as Adform, Atlas
Search, Blue Streak, DoubleClick, Efficient Frontier, and SEM
Keyword Matching: Bid Term versus Search Term Director—to track click-throughs to your website means your visitors
are passed through redirection URLs. This results in the initial click
The bid terms you select within your PPC account and the search being registered by your ad company, which then automatically
terms used by visitors that result in your PPC ad being displayed redirects the visitor to your actual landing page. The purpose of this
can often be different: think ‘broad match’. For example, you may two-step hop is to allow the ad-tracking network to collect visitor
have set up an ad group that targets the word ‘shoes’ and solely statistics independently of your organization, typically for billing
relies on broad match to match all search terms that contain the purposes. Because this process involves a short delay, it may
word ‘shoes’. This is your bid term. A visitor uses the search term prevent some visitors from landing on your page. The result can be a
small loss of data and therefore failure to align data.
However, the biggest issue for counting uniques is how many Ten Recommendations For Enhancing
devices people use to access the Web. For example, consider the
following scenario: Accuracy
• You and your spouse are considering your next vacation. Your
spouse first checks out possible locations on your joint PC at 1. Be sure to select a tool that uses first-party cookies for data
home and saves a list of website links. collection.
2. Don’t confuse visitor identifiers. For example, if first-party cookies
• The next evening you use the same PC to review these links. are deleted, do not resort to using IP address information. It is
Unable to decide that night, you email the list to your office, and better simply to ignore that visitor.
the next day you continue your vacation checks during your 3. Remove or report separately all non-human activity from your
lunch hour at work and also review these again on your mobile data reports, such as robots and server-performance monitors.
while commuting home on the train. 4. Track everything. Don’t limit tracking to landing pages. Track
your entire website’s activity, including file downloads, internal
• Day 3 of your search resumes at your friend’s house, where you search terms, and outbound links.
seek a second opinion. Finally, you go home and book online 5. Regularly audit your website for page tag completeness (at least
using your shared PC. monthly for large websites). Sometimes site content changes
result in tags being corrupted, deleted, or simply forgotten.
This scenario is actually very common—particularly if the value of 6. Display a clear and easy-to-read privacy policy (required by law
the purchase is significant, which implies a longer consideration in the European union). This establishes trust with your visitors
because they better understand how they’re being tracked and campaigns (paid search, email, banners), keywords, geographies, or
are less likely to delete cookies. devices (PC, Mac, mobile) are used.
7. Avoid making judgments on data that is less than 24 hours old,
because it’s often the most inaccurate. When all the possibilities of inaccuracy that affect web analytics
8. Test redirection URLs to guarantee that they maintain tracking solutions are considered, it is apparent that it is ineffective to focus on
parameters. absolute values or to merge numbers from different sources. if all web
9. Ensure that all paid online campaigns use tracking URLs to visitors were to have a login account in order to view your website,
differentiate from non-paid sources. this issue could be overcome. in the real world, however, the vast
10. Use visit metrics in preference to unique visitor metrics because majority of internet users wish to remain anonymous, so this is not a
the latter are highly inaccurate. viable solution.
These suggestions will help you appreciate the errors often made As long as you use the same measurement for comparing data
when collecting web analytics data. Understanding what these ranges, your results will be accurate. This is the universal truth of all
errors are, how they happen, and how to avoid them will enable you web analytics.
to benchmark the performance of your website. Achieving this
means you’re in a better position to then drive the performance of
your online business.
With thanks to the following people for their generous feedback in
Summary compiling this whitepaper: Sara Andersson, Nick Mihailovski, Alex
Ortiz-Rasado, Tomas Remotigue.
So, web analytics is not 100 percent accurate and the number of
possible inaccuracies can at first appear overwhelming. However,
get comfortable with your implementation and focus on measuring
trends rather than precise numbers. For example, web analytics can
help you answer the following questions:
If the trend shows a 10.5% reduction, for example, this figure should
be accurate, regardless of the web analytics tool that was used.
these examples are all high-level metrics, though the same accuracy
can also be maintained as you drill down and look at, for example,
which specific referrals (search engines, affiliates, social networks),
Web Analytics Accuracy Page 19 of 20
! Brian Clifton
Understanding Web Analytics Accuracy
Brian has been involved in web design and SEO since as far back
as 1997, when he built his first website and started defining best
practise to advise clients. From 2005-8 he was Head of Web
Analytics for Google EMEA, defining the adoption strategy and
building a team of pan-European product specialists from scratch. A
legacy of his work is the online learning centre for the Google
Analytics Individual Qualification (GAIQ).
Add your comments on the blog - Measuring Success Advanced Web Metrics with Google Analytics is available form:
Amazon (including Kindle), Barnes & Noble and directly from Wiley.
Follow my interests and thoughts @BrianClifton
Join your peers on the LinkedIn Group A PDF ebook is also available
Carlos Gonzalo-Penela
Pompeu Fabra University
Department of Communication
Roc Boronat, 138. 08018 Barcelona, Spain
In recent years, a number of digital news media outlets have begun to include paid links in their content. This study seeks
to identify and analyse this content whose sole purpose is to improve the website authority of the advertisers and their
search engine rankings. To do so, it employs two basic methodologies: first, it undertakes a systematic review of off-page
SEO practices, the digital press and native advertising; and, second, it reports a case study based on the identification
and analysis of 150 news items that contain specially commissioned links resulting from a commercial transaction. The
study provides evidence of a new revenue stream for the digital news media, one that is not clearly disclosed and which
is based on the sale of links. The article includes a discussion of the case study findings, and presents future guidelines
for the use of paid links based on the emerging concept of ‘native advertising’.
Digital news media; Online journalism; Digital journalism; SEO; Off-page SEO; Web positioning; Link building; Native
advertising; Journalism ethics.
2. Off-page SEO
It is worth recalling that Google was the first search engine to apply a technique based on hyperlink (i.e. the links be-
tween web pages) analysis to determine the relative importance of all pages on the World Wide Web.
For analyses of this type, the inventors of Google based their work on citation analysis in the academic world and its
corresponding impact factor. In this way, they designed a metric –PageRank- that serves to express the results of such
an analysis (Brin; Page, 2000).
Given its enormous efficacy, Google has had an enduring influence on the way in which search engines display their
results pages, with all of them adopting the same basic idea (Kleinberg, 1998; Lewandowski, 2012; Giomelakis; Veglis,
2015). The reason for its widespread adoption is that it provided the first genuinely efficient response to all the challen-
ges posed by Internet searches (Gonzalo-Penela; Codina; Rovira, 2015), although initially no firm in the search engine
sector seemed to realise.
More specifically, the new idea developed by Google was the following: instead of calculating the relevance of each page
exclusively in terms of its intrinsic characteristics –including, for example, the number of times the keyword appears-, it
also took into account its extrinsic characteristics, most notably, the number and quality of links it receives (Harry, 2013).
What was the underpinning rationale? In broad terms, given two pages addressing the same theme, the more important
of the two is considered to be the one that receives the greater number of backlinks from websites which, in turn, are
highly linked (Brin; Page, 2000; Thelwall, 2004; Gonzalo-Penela, 2006).
Here, the key point is that part of a page’s PageRank can be transferred to other pages if they are linked to it. PageRank
is also a measure of a page’s authority in the same sense that a journal’s impact factor is a measure of its authority.
In this way, the net effect of these links –indistinctly known as backlinks, inbound links or external links- is to transfer
authority from the page that points to the linked page, improving its visibility in the search engines (Crowe, 2017; Gio-
melakis; Veglis, 2016).
Consequently, the number and quality of the links that
On-page SEO: actions to optimize web
link to a website are an indicator of its essential rele- page content.
vance, as well as being one of the most influential po- Off-page SEO: link building actions (to
sitioning factors (Fishkin, 2016; García-Carretero et al., obtain backlinks)
2016). It is not surprising, given these circumstances,
that firms’ SEO managers seek to implement link building strategies (Gonzalo-Penela, 2006; Serrano-Cobos, 2015). This,
in turn, has led to two major branches of SEO:
- On-page SEO: actions to optimize web page content.
- Off-page SEO: actions to obtain backlinks, that is link building.
Several link building procedures have been developed (Monterde, 2016; Publisuites, 2018), among which two stand out:
- Natural or editorial link building: this is based on a similar logic to that of the impact factor of academic articles, whe-
reby a high quality article is one that will be highly cited, thus establishing itself as an article of great authority. In the
case of the web, this type of link building is achieved by creating high quality content.
- Strategic link building: this is a proactive practice that requires direct contact between the website manager and the
author of another site to which a link is requested. If performed on a massive scale, Google, Bing, Yahoo, Yandex, etc.
are able to identify patterns of unnatural links, and if so, penalize those web sites by pushing them down the search
result rankings, or even excluding them from their indexes
The main goals, therefore, of off-page SEO professionals are (Cámaras-León, 2018; Rowe, 2018):
- to search for and obtain a large number of backlinks;
- to multiply the strength of backlinks by ensuring that the sites from which the links originate are in turn highly linked.
There exist various websites where it is possible to ob-
tain free backlinks. Primarily they can be obtained from Dofollow links link related themes. Given
web profiles, forums, social networks, blogs 2.0, com- their editorial nature, they transfer au-
ments on websites/blogs, wikis, content aggregators, thority to the linked website
directories, newspapers, third-party websites, etc., and
of course from other websites (Cooper, 2012).
From a technical point of view, but with far-reaching implications for the matter in hand, there are two types of backlink:
dofollow links (also known as follow), and nofollow links (Dean, 2018).
Both types of link are identified by means of the corresponding labelling of the source code (not visible on the page).
They can be explained as follows:
- dofollow links fulfil the original function of hyperlinks, that is, they link related themes. Due to their editorial nature,
Google considers them a way of transferring authority to the linked website, and the amount of authority or of PageRank
transferred depends on the quality or authority of the page that creates the link. Dofollow means that Google will follow
the link and attribute PageRank to the page that receives it. In theory, dofollow links are limited to editorial links. Dofollow
links do not have a brand. In other words, a standard link, without any additional brand, is a dofollow link.
- nofollow links, on the other hand, include a source code tag that tells searchers that this link cannot be used for Page-
Rank. It is a code that informs search engine robots not to follow the link (hence its name). Since they correspond to
advertising links, the transmission of authority in this case is zero.
2.1. Anchor text
Nofollow links include a label that tells
The links are made up not only of the corresponding the search engines they cannot be used
URL, but also of a text known as the anchor text (Gonzá-
to transfer authority because they are
lez-Villa, 2017). This is the portion of the text that acti-
vates the link on the web page from which it originates. advertisements
For Google, the anchor text forms part of the content of the linked site, and it is used to determine whether that site is
relevant for the keyword contained in the anchor text (Figure 1).
In short, we should stress the following: the authority of the site from which a link originates, the link’s anchor text and
the context in which that link is included are the most important elements of link building.
Figures 1 and 2 illustrate the main concepts associated with links, as presented above.
Figure 1 shows the structure of a link using the source code. It can be seen that:
- the link’s destination, that is, the page that will open in the browser if the user clicks on it is
- the anchor text is Unesco.
This is a dofollow link because it does not include any additional coding (see Figure 2). For this reason, this link transfers
PageRank or authority to the Unesco page. If the page containing this link belongs to a leading digital newspaper, such
as The New York Times, the authority transferred will be very high. Moreover, Google will understand that the Unesco
keyword in the anchor text is part of the content of the destination page.
Figure 2 shows the structure of a nofollow link, since it incorporates the ‘rel’ attribute, with the nofollow value. Due to
this attribute, the link does not transfer authority to the (fictitious) destination page Store. In this case, the authority of
the page containing the link is of no significance. Moreover, because of this attribute, Google will not follow the link and
will not transfer any value.
Table 1. PrensaRank
Name PrensaRank
Its website claims they have 3,660 customers, 408 newspapers from which they can obtain links, and (as of February
2018) they had sold 30,024 articles to the news media.
On registration the user obtains a link purchase interface. We identify 305 newspapers, distributed geographically as
Digital news media follows: Andorra: 1 newspaper; Saudi Arabia: 1; Argentina: 9; Chile: 4; Spain: 236; USA: 2; Mexico: 45; Nicaragua: 1; Peru:
1; Portugal: 1; United Kingdom: 2; Venezuela: 2.
Current affairs; Love, weddings, relationships, and couples; Betting and casinos; Art, decoration and design; Film and tele-
vision; Cooking and gastronomy; Dating; Sports; Economics and politics; Education and culture; Company (advertising);
Themes Home, decoration and DIY; Humour and leisure; Computers and technology; Games and video consoles; Marketing and
SEO; Pets and nature; Music and shows; Fashion and beauty; Cars; Women, babies, and children; Others; Religion, mysti-
cism and esotericism; Health; Estate agency services; Sex shops; Sexuality; Tarot; Travel, hotels and tourism.
Table 2. Unancor
Name Unancor
Description Its website claims (as of February 2018) they have 6,000 customers and 500 newspapers from which they can obtain links.
On registration the user obtains a link purchase interface. We identify 431 newspapers, distributed geographically as
follows: Germany: 53 newspapers; Argentina: 39; Canada: 1; Chile: 13; Colombia: 6; Costa Rica: 1; El Salvador: 1; Arab
Digital media
Emirates: 1; Spain: 213; USA: 18; France: 17; Italy: 1; Morocco: 1; Mexico: 44; Monaco: 1; Nicaragua: 1; Panama: 1; Peru:
2; Uruguay: 3; Venezuela: 5
All these newspapers are associated with one or more of the following topics: Art and culture; Health and sport; Econo-
mics and business; Education; Home, decoration and DIY; Cooking and recipes, gastronomy; Computers, technology,
Themes mobiles and apps; Marketing (offline and online); Nature (animals and plants); Cars and motorcycles; Cinema, TV and
music; News and politics; Travel and tourism; Others; Fashion and beauty; Erotica; Love, relationships, couples; Services
(locksmiths, home improvements, plumbers, etc.); Legal; Children; Tarot
Table 3. Publisuites
Name Publisuites
Its website claims they have 54,967 users and 478 newspapers from which they can obtain links. As of February 2018,
they had sold 39,334 articles to the news media and blogs.
On registration the user obtains a link purchase interface. We identify 478 newspapers, distributed geographically as
follows: Argentina: 19 newspapers; Australia: 1; Bolivia: 1; Brazil: 4; Chile: 6; Colombia: 4; El Salvador: 1; Spain: 304; USA:
Digital media
3; France: 26; Honduras: 1; Italy: 70; Jersey: 1; Mexico: 14; Nicaragua: 1; New Zealand: 1; Panama: 1; Paraguay: 1; Peru: 6;
Portugal: 3; United Kingdom: 1; Dominican Republic: 1; Senegal: 1; South Africa: 1; Venezuela: 6.
All these newspapers are associated with one or more of the following topics: Betting, casinos and lotteries; Celebri-
ties; Cooking, recipes and gastronomy; Trivia; Sports; Economy; Education and training; Entrepreneurs and SMEs; Com-
Themes puters and programming; Literature and culture; Music and radio; Marketing, SEO and social platforms; Miscellaneous;
Fashion and accessories; Cars and motorcycles; Nature and ecology; News; Leisure and free time; Politics; Health;
Technology; Mobile telephones and apps; Travel and tourism.
Table 4. RT Gopress
Name RT Gopress
Its website claims it is the most economically competitive Seo MarketPlace, Social Media and Growth Hacking firm in
the market. They do not indicate how many newspapers or customers they have.
On registration the user obtains a link purchase interface. We identify 155 newspapers distributed geographically as
Digital media
follows: Argentina: 3; Mexico: 26; Spain: 126.
All these newspapers are associated with one or more of the following topics: Current affairs; Stock market; Sports;
Economics; Gastronomy; Marketing; Cars; Tourism; News; Technology; Health; Video games.
Maximum price for link The website operates a price filter, but it appears not to be operative.
Table 5. Dofollow
While offering a similar service to the above firms, it operates differently. Thus, they offer what they call the dofollow
Description pack. The customer writes a press release including two links to its website (1 for a brand and the other a keyword)
and they undertake to publish the press release in four digital newspapers.
Its website includes general, regional, and specialized newspapers of all types. They state that these media may vary
Digital media
depending on availability.
Themes Unknown.
Maximum price for link € 339 for its most complete package.
4. Native advertising
The general absence of studies examining the link buying/selling sector in the digital news media and, hence, the deve-
lopment of any guidelines for its self-regulation, leads us here to consider the possibility of applying best practices in the
so-called native advertising industry.
The Native Advertising Institute defines native advertising as the use of paid ads that match the look, feel and function
of the content of the platform in which they appear (Schauster; Ferrucci; Neill, 2016; Pollitt, 2018).
Native advertising consists of news items, reports and,
in general, quality content, its effectiveness being ba- Native advertising must present a bran-
sed on credibility. These characteristics can be used to ded message that allows readers to re-
provide quality content to publications (Sweetser et al., cognize not only the fact that it is spon-
2016; Carlson, 2016). An essential point is that native sored content, but also the logical intent
advertising must present a branded message that allows
readers to recognize not only the fact that it is sponso-
of the advertisement to persuade and sell
red content (Ferrer-Conill, 2016; Amazeen; Muddiman,
2017; Amazeen; Wojdynski, 2018), but also the logical intent of the advertisement to persuade and sell (Mathiasen,
As such, the idea is that the digital press should have a model for including sponsored content that allows it to be diffe-
rentiated from their editorial content, and which, moreover, ensures it can be integrated naturally in the publication,
maintaining a level of quality similar to that of the platform that hosts it (Cramer, 2016; Li, 2017; Batsell, 2018).
5. Case study
Having identified the key components and actors operating in the link buying/selling industry, we present our case study,
which consists of a comparative analysis. We examined 150 news items that have been published as a direct result of
the buying/selling of links. As such, we are dealing with content specially commissioned with the aim of including links
to improve the website authority of the customers who purchase them.
To shed greater light on this procedure, we first explain how the whole process works. First, the customer contacts one
of the link building firms described in the section above to purchase backlinks to its website from the digital news media.
This news outlet then publishes content that includes links to the customer’s website. In so doing so the process is termi-
nated, following payment by the buyer at the price stipulated for receipt of backlinks. It should be stressed that what is
purchased is the link or backlink and that the content is merely the vehicle in which it is included, which generally results
in content unrelated to the newspapers normal editorial line.
To explore this market, we conducted an analysis whose object of study was three digital news media of medium to high
Given the nature of this analysis, we do not explicitly identify the name of each news outlet, but rather describe them as
accurately as possible using a series of data files (Tables 6 to 8). In these files we specifically incorporate the data provi-
ded by Alexa Rank, a ranking developed by Amazon, based on web traffic.
In addition, to lend greater credibility to the ranking of these three digital news media companies, we incorporate daily
unique user data for each of the three websites. To do so, we used Site Worth Traffic, which measures website traffic pro-
viding unique and total user data, social network performance metrics, and a complete analysis of the site’s evolution.
To select the news stories from the three media com-
News media websites can offer backlinks
panies, we purchased three items from the Prensarank
website (one item for each news media). Then, having of great authority
Table 6. Digital news media company 1 Table 7. Digital news media company 2
Alexa ranking Ranked 252nd in Spain (June 2018) Alexa ranking Ranked 702nd in Spain (June 2018)
examined these news stories, we were able to identify a Table 8. Digital news media company 3
search pattern for each item, and with this to create what Media company 3 (MC3)
is known as its ‘footprint’: that is, a type of advanced
Media type Generalist
search (Google, 2018) that allows the highly precise selec-
tion of well-characterized web page types. Country Spain
In this way we identified three footprints that allowed us Alexa ranking Ranked 4,763rd in Spain (June 2018)
to locate 50 news items purchased from each of the three Daily unique users 5,979
media companies. Each of these footprints, in the form of
an advanced search equation, is constructed as follows (using the site search operator):
- site: MC1(media company 1) + name of link buying/selling company.
- site: MC2(media company 2) + the word “remitido” (or “press release”).
- site: MC3(media company 3) + name of a news item contributor.
Having obtained the 150 news items (50 for each media company) by applying the respective equations, we were then
able to isolate the following elements by responding to the six questions below, based on recommendations made by
the Native Advertising Institute and the Nieman Reports:
- Is the news item specifically identified as sponsored content?
- Is the story reported newsworthy, that is, is the item directly linked to a breaking news story or current affairs?
- How many hyperlinks are included in each news item?
- Are the hyperlinks coherent with the content of the news item?
- Do the hyperlinks point to an authoritative website providing users with complementary quality information?
- What themes are the commissioned news items included in?
6. Results
Below, we first present our main findings. Next, we re- A number of intermediaries have emer-
view our research objectives and questions in order to ged to act as go-betweens for the websi-
present our conclusions, and we finish with proposals te managers that need backlinks and the
for the development of new lines of research. online news media
6.1. Main findings
From our study of the 150 news items commissioned in the three digital news media companies, the following results
can be highlighted:
- News originating from the purchasing of a link is not clearly identified as sponsored content or advertising.
- The content does not describe or narrate a breaking news story or current affairs, that is, it is not a typical news story,
but rather the content is timeless, generally involving recommendations and advice.
- The need to include the literal anchor text (the text that activates the link) as commissioned by the customer leads to
errors of grammar and syntax in the writing of the content. The reason for this is that the authors opt to respect the
keyword or phrase commissioned by the customer even if it does not fit with the syntax or phrase in which it is embe-
- When a news item contains more than one link, the need to maintain two or more links in the same item for sites of
distinct natures results in a lack of coherence between the links and the content of the news story.
Tables 9 to 11 show the results for each of the three digital news media companies in greater detail.
Are news items identified as sponsored Identification somewhat ambiguous. Items are identified as a Communicado (or news release).
content? The headline is displayed in the following format: “News release: title of the story”.
Is the content newsworthy? No. It is timeless involving recommendations and offering tips.
Are the hyperlinks coherent with the No. Most are shoehorned into the item; others use a syntactically incorrect generic anchor text.
content? In some of the items with more than one hyperlink, there is no thematic link between them.
What themes are the commissioned news The main themes are business, the home, beauty, tourism, productivity, cars, weddings, fas-
items included under? hion, health, and decoration.
Are news items identified as sponsored Ambiguous. Items are identified with a tag that reads Remitido (or news/press release) followed
content? by the headline.
Is the content newsworthy? No. It is timeless involving recommendations and offering tips.
Are the hyperlinks coherent with the No. Most are shoehorned into the item; others use a syntactically incorrect generic anchor text.
content? In some of the items with more than one hyperlink, there is no thematic link between them.
What themes are the commissioned news The main themes are work, recipes and gastronomy, business, cars, healthy living, gadgets,
items included under? cooking, fashion trends and styles, and fortunes and tarot.
Are the hyperlinks coherent with the No. Most are shoehorned into the item; others use a syntactically incorrect generic anchor text.
content? In some of the items with more than one hyperlink, there is no thematic link between them
What themes are the commissioned news The main themes are business, virtual spaces, tourism, music, health, cars, investments, holi-
items included under? days, problem pages, and travel.
Objective 1. To analyse and characterise a new line of activity in the digital news media centred on link buying/selling
and to identify the actors involved.
We have shown that a new model of economic activity has emerged based on link buying/selling and that this activity
is becoming increasingly more commonplace, as demonstrated by our close monitoring of the sector over the last two
years. As a result, the number of news media companies now included on the websites studied here (Prensarank, Unan-
cor, Publisuite and RT Gopress) has experienced constant growth.
We have shown that this line of activity adds value to each party involved –the digital news media, the customers that
buy links and the firms that act as intermediaries in the sales transaction- as it seeks to fulfil three main objectives:
- Providing a new revenue stream, albeit that for the time being it remains a fairly marginal stream for news media
- Obtaining greater website authority and improving the visibility of the websites that buy backlinks.
- Generating revenue in the form of commissions to the intermediary firms dealing in hyperlinks.
Objective 2. To classify the content published as a result of this activity and its implications for off-page SEO strategies.
We have shown that the sector does not operate a system of self-regulation, since each of the three news media com-
panies analysed applies different criteria. Furthermore, contrary to native advertising, the sponsored content does not
conform to the look, feel and function of the content of the platform on which they appear.
Different degrees of ethical awareness can also be identified, since news media companies 1 and 2 at least go some way
to specifically identifying this content (by labelling items as comunicados or remitidos), while company 3 avoids drawing
any distinction between editorial and sponsored content.
Objective 3. To provide guidelines for the possible improvement of this activity by developing a set of best practices
modelled on so-called native advertising.
Based on native advertising regulations, an initial proposal of best practices for the writing of news items for link selling
should consider the following guidelines:
- There should be a clear indication that the news story published is sponsored content or advertising – the distinction
being that the latter is provided by the advertiser, the former by the news media company itself.
- The news item should match the look, feel and function of the content of the platform on which it appears.
- The information included in the commissioned news item should be newsworthy or, at least, useful for the reader, and
should be based on current news stories. The news story ought to be written with the user in mind and should not
be motivated solely by the hyperlink that has been purchased. Its features should serve not only the needs of the link
buyers but also those of the readers.
- More than one link can be included in a news story provided there is a thematic connection between them that does
not affect the story’s overall coherence.
- The hyperlinks and their anchor texts must be orthographically and syntactically coherent with the text of the news
- As a rule, if the hyperlinks do not lead to an authoritative website that provides useful, complementary information to
readers, then this link should not be added to editorial content. Instead, these links should be published in a section
dedicated exclusively to sponsored content or advertising and separated from the newspapers’ usual sections.
Research questions
Next, we return to the research questions posed at the outset to examine the responses obtained from the case study
reported above.
Question 1. What are the main characteristics of this new line of activity centred on link buying/selling in the digital
news media and who are its main actors?
We have shown that it is possible to both clearly identify and determine the characteristics of this line of activity in the
news media centred on the acquisition of links and content that act as vectors for these links and content.
It is a business model in which the three main actors, i.e. the digital news media, their customers, and link buying in-
termediaries, all benefit. The news media and the intermediaries obtain an economic return, while the clients obtain
greater web site authority and visibility. The loser in the activity is, however, journalistic quality and, with it, the readers
of the news media.
Question 2. What are the main characteristics of the content published as link vectors?
The analysis shows that the news items identified in this case study present the following characteristics:
- They do not carry clear labels identifying their content as advertising or sponsored.
- They are timeless, focusing primarily on providing advice and basic recommendations on a huge variety of topics ran-
ging from tourism, cooking, and cars, to investments, beauty, and technology, and many others.
- They can include up to four backlinks. These links are often shoehorned into the content, not only because they are
poorly constructed in terms of their semantics but also because they link to websites that do not provide complemen-
tary quality information for their readers.
Question 3. Is it possible to develop a set of best practices for this activity based on native advertising in order to
improve sector practices?
Here, we have taken the concept of native advertising as our reference because it can be considered to provide interes-
ting precedents and, as such, to be a model for future regulations governing paid links in the digital press.
Broadly speaking, the news items in our sample point clearly to the need to develop a set of best practices, preferably so
that the media companies can self-regulate themselves, rather than depend on an external regulator.
Digital news media readers deserve the highest degree of quality and transparency, characteristics that ultimately bene-
fit the news media themselves, especially if we consider the acute crisis they are currently experiencing. It is important
that the media generate additional revenue streams, which is why this line of business should be understood as being
both necessary and timely.
However, the sector’s legitimacy calls for a highly transparent and stringent system of self-regulation and, here, we have
identified some of the essential elements that need to be taken into consideration in developing such a system. The
key idea in the process is that the transfer of authority The transfer of authority effected by the
effected by link buying/selling should not negatively im-
link buying/selling should not negatively
pact the content quality or the reading experience of the
news media that participate in this business model. Ad- impact the quality or the reading expe-
ditionally, maximum transparency must be guaranteed rience of the news media involved in this
at all times. business model
8. Future research
More ethical studies need to be undertaken within the digital news media to determine best practices for the selling of
commissioned news items and hyperlinks. In this way it should be possible to reconcile the sector’s legitimate interest
for sponsorship or advertising revenue with the interests of their users who consume news and with their right to recei-
ve quality content which, even if sponsored, should be in line with the general orientation of the news outlet.
Within the field of SEO, analyses could be undertaken of the actual impact of links of this type in terms of improving
the ranking of the websites that receive them. To do this, analytical frameworks need to be designed and employed in
conjunction with such SEO tools as Sistrix, SEMrush, Ahrefs, or Majestic, among others.
9. References
Amazeen, Michelle A.; Muddiman, Ashley R. (2017). “Saving media or trading on trust? The effects of native advertising
on audience perceptions of legacy and online news publishers”. Digital journalism, v. 6, n. 2, pp. 176-195.
Amazeen, Michelle A.; Wojdynski, Bartosz W. (2018). “The effects of disclosure format on native advertising recognition
and audience perceptions of legacy and online news publishers”. Journalism, pp. 1-20.
Batsell, Jake (2018). “4 steps to bring ethical clarity to native advertising”. Neiman report, September 23rd.
Booth, Andrew; Papaionnou, Diana; Sutton, Anthea (2012). Systematic approaches to a successful literature review.
London: Sage. ISBN: 978 0 857021359
Brin, Sergey; Page, Lawrence (2000). “The anatomy of a large-scale hypertextual web search engine”. Stanford Univer-
Cámaras-León, Nuria (2018). “Linkbuilding 2018, guía de enlazado perfecto (+12 predicciones expertos)”. Unancor, 11th
Carlson, Matt (2014). “When news sites go native: Redefining the advertising – editorial divide in response to native
advertising”. Journalism, v. 16, n. 7, pp. 849-865.
Cooper, Jon (2012). “Link building tactics. The complete list”. Point Blank SEO, April 1st.
Cramer, Theresa (2016). “The deal with disclosure and the ethics of native advertising”. Digital content text, Sept. 23rd.
Crowe, Anna L. (2017). “Illustrated guide to link building”. Search engine journal.
Dean, Brian (2018). “The definitive guide (2018 update)”. Backlinko, March 11th.
Ferrer-Conill, Raul (2016). “Camouflaging church as state”. Journalism studies, v. 17, n. 7, pp. 904-914.
Fishkin, Rand (2016). “Targeted link building in 2016 - Whiteboard Friday”. Moz, Jan 29th.
García-Carretero, Lucía; Codina, Lluís; Díaz-Noci, Javier; Iglesias-García, Mar (2016). “Herramientas e indicadores SEO:
características y aplicación para análisis de cibermedios”. El profesional de la información, v. 25, n. 3, pp. 497-504.
Giomelakis, Dimitrios; Veglis, Andreas (2015). “Employing search engine optimization techniques in online news arti-
cles”. Studies in media and communication, v. 3, n. 1, pp. 22-33.
Giomelakis, Dimitrios; Veglis, Andreas (2016). “Investigating search engine optimization factors in media websites. The
case of Greece”. Digital journalism, v. 4, n. 3, pp. 379-400.
González-Villa, Juan (2017). “Cómo hacer link building: estrategias y ejemplos prácticos. Useo, 30th March.
Gonzalo-Penela, Carlos (2006). “Tipología y análisis de enlaces web: aplicación al estudio de los enlaces fraudulentos y
de las granjas de enlaces”. BiD: textos universitaris de biblioteconomia i documentación, n. 16.
Gonzalo-Penela, Carlos; Codina, Lluís; Rovira, Cristòfol (2015). “Recuperación de información centrada en el usuario y
SEO: categorización y determinación de las intenciones de búsqueda en la Web”. Index comunicación, v. 5, n. 3, pp. 19-27.
Google (2018). Google guide making searches even easier. Search operators.
Harry, David (2013). “How search engines rank web pages”. Search engine watch, Sept. 23rd.
Hart, Chris (2008). Doing a literature review: Releasing the social science research imagination. London: Sage. ISBN: 978
0 761959755
Kleinberg, Jon M. (1998). “Authoritative sources in a hyperlinked environment”. In: Procs. of the ACM-SIAM Symposium
on discrete algorithms, pp. 1-33.
Lewandowski, Dirk (2012). “A framework for evaluating the retrieval effectiveness of search engines”. In: Jouis, Chris-
tophe; Biskri, Ismail; Ganascia, Jean-Gabriel; Roux, Magali. Next generation search engine: Advanced models for infor-
mation retrieval. Hershey, PA: IGI Global, pp. 456-479. ISBN: 978 1 466603318
Li, You (2017). “Contest over authority”. Journalism studies, pp. 1-19.
Mathiasen, Stine F. (2018). “10 quick takeaways from native advertising days 2018”. Native Advertising Institute, Sept.
Monterde, Nacho (2016). “Introducción al link building”. SEO azul, March 4th.
Pollitt, Chad (2018). The global guide to technology 2018. A resource for marketers, advertisers, media buyers, commu-
nicators, publishers and ad tech professionals.
Publisuites (2018). “Estudio del uso de linkbuilding”. Publisuite, 15th March.
Rowe, Kevin (2018). “How link building will change in 2018”. Search engine journal, Feb. 2nd.
Schauster, Erin E.; Ferrucci, Patrick; Neill, Marlene S. (2016). “Native advertising is the new journalism: How deception
affects social responsibility”. American behavioral scientist, v. 60, n. 12, pp. 1408-1424.
Serrano-Cobos, Jorge (2015). “SEO: Introducción a la disciplina del posicionamiento en buscadores”. Colección EPI Scho-
lar. Barcelona: Editorial UOC. ISBN: 978 84 9064 956 5
Sweetser, Kaye D.; Joo, Sun; Golan, Guy J.; Hochman, Asaf (2016). “Native advertising as a new public relations tactic”.
American behavioral scientist, v. 60, n. 12, pp. 1442-1457.
Thelwall, Mike (2004). Link analysis: An information science approach. Amsterdam: Elsevier. ISBN: 978 0 12 088553 4
Yin, Rober K. (2014). Case study research. Design and methods. Canada: SAGE. ISBN: 978 1 452242569
Small businesses that want to learn how to attract more 2. Setting SEO Objectives
and Goals....................................2
customers to their website through marketing strategies such 3. Do-It-Yourself Options................3
as search engine optimization will find this booklet useful. You 4. Choosing an SEO Specialist
may want to read this booklet in conjunction with other booklets to Work With................................3
in this series such as Successful Online Display Advertising and 5. Understanding Best Practices,
Pitfalls and Barriers...................4
Social Media for Small Business.
Implementing SEO...............................6
1. Keywords....................................6
Key Concepts 2. Finding the Right Keywords.......7
3. Search Engine
Search engine optimization (SEO) involves designing, writing, and coding a Optimization Techniques............7
website in a way that helps to improve the volume and quality of traffic to
4. Link Popularity – A Key
your website from people using search engines. These “free,” “organic,” or
Factor for Increasing a
“natural” rankings on the various search engines can be influenced, but not Website’s Page Ranking.............8
controlled, by effective SEO techniques. Websites that have higher rankings
5. Keyword Conversion...................9
(i.e. presented higher in the search results) are identified to a larger number
of people who will then visit those sites. Test, Measure, Test Again....................9
1. Webmaster Tools........................9
The majority of web traffic is driven by major search engines, including
2. Tracking Your Progress/
Google, Bing, YouTube, AOL, Yahoo, Duck Duck Go, Ask Jeeves and other Website Analytics......................10
country-specific ones (e.g. Baidu in China).
Future of Search
Engine Optimization...........................10
Glossary of Terms..............................11
Disclaimer: This booklet is intended for informational purposes only and does not constitute legal, technical, business or other
advice and should not be relied on as such. Please consult a lawyer or other professional advisor if you have any questions
related to the topics discussed in the booklet. The Ontario Government does not endorse any commercial product, process
or service referenced in this booklet, or its producer or provider. The Ontario Government also does not make any express or
implied warranties, or assumes any legal liability for the accuracy, completeness, timeliness or usefulness of any information
contained in this booklet, including web-links to other servers. All URLs mentioned in this document will link to an external website.
Search engines have four functions—crawling, building 4. Who are my allies and associates to help influence
an index, calculating relevancy and rankings, and serving my optimization results?
results. They scour your website and, for each page, index
all of the text they can pick up, as well as a great deal 5. What is the budget and time I can allocate to
of other data about that page’s relation to other pages, optimizing my website initially and going forward?
and in some cases all or a portion of the media available
on the page as well. Search engines index all of this 2. Setting SEO Objectives and Goals
information so that they can run search queries efficiently.
The ultimate goal of search engine optimization is to
Search engines create these databases by performing boost your revenue by driving traffic to your website.
periodic crawls of the Internet. They must weigh the value However, there are other important objectives:
of each page and the value of the words that appear on it.
• To establish you as an expert in your field. Visibility
Search engines employ secret algorithms (mathematical
in search engines creates an implied endorsement
formulas) to determine the value they place on such
effect where searchers associate quality, relevance
elements as inbound and outbound links, density of
and trustworthiness with sites that rank highly for
keywords and placement of words within the site
their queries.
structure, all of which may affect your SEO ranking.
Search engines have difficulty indexing multimedia, but • To enhance product awareness. It is better to have an
there are workarounds which will be discussed later in image or video displayed as opposed to just text since
the booklet. it will attract more attention.
The newest trend in search engines, and likely the future • To increase sales leads. The goal is to drive the
of search in general, is to move away from keyword-based right traffic to your site by encouraging people to
searches to concept-based personalized searches. When provide qualified contact information for future
a person clicks on certain search results, search engines relationship building.
like Google, Bing and others record this information to
collect trends of interest and then will personalize the • To reduce cost per order. Free search engine traffic
search results based on specific interests. This is still a will help you reduce the cost of advertising compared
developing field, but appears to have good potential in to other media channels.
making searches more relevant.
• To encourage repeat visitors. Optimized pages help
The following pages outline SEO techniques that will help customers find additional products or services more
you to draw more visitors to your website. easily and quickly after they have purchased from you,
thus improving customer support and service.
Many small business owners hire the services of an SEO • Have the appropriate metrics in place for
expert to optimize their websites. If you would prefer to do review and analysis so you can continue to
it yourself, here are some questions to ask yourself first: improve your positioning.
• Write appropriate content, utilizing keyword best 9. How can I expect to communicate with you? Will
practices (see below). you share with me all the changes you make to my
• Understand what is entailed to optimize a website. site, and provide detailed information about your
Know who to ask to implement your ideas, or have recommendations and the rationale behind them?
the appropriate resources (software) or knowledge 10. How do you charge for your services? Is this a
base to do it yourself. one-time fee or an ongoing contract? What are
the deliverables if it is ongoing?
Types of Services Most SEO Specialists Provide • Keyword and competitive research
• Review of your site content and/or structure • SEO training
• Advice on technical aspects of SEO and their
impact on your website
• Content development and/or editing CAUTION: Be wary of emails that may appear
legitimate but are often spammers offering SEO
• Management of online business
services and claiming they can “Guarantee #1 Ranking”.
development campaigns
Category Actions
Keyword Search • Find the words and phrases your customers use rather than industry jargon.
• Look for synonyms.
• Reflect the answers to viewer questions in your keywords and in the content
of your pages.
• Move away from thinking of keywords as data. Imagine instead the person
who will be typing in that keyword and what they are searching for.
Quality Content • Content is still “king” and search engines are looking for keyword phrases
surrounded by semantic phrasing that supports the overall theme of that
section. Try to incorporate synonyms.
SEO Local (Optimizing your site • Optimize your website to acquire foot traffic or local area interest. Add
to attract local business) service location addresses and/or city listings and include good local links
to your pages.
• Choose the proper categories in Google Places.
• Make sure Google can recognize that your website and your Google Places
page are associated and linked.
• Ask users for citations and reviews which will link back to you.
Social SEO (Optimizing your • Use social cues such as Twitter shares, Facebook likes and social bookmarking
site for social networks and which heavily influence search rankings. For more optimization suggestions such
social media) as tagging and keyword titles, see the Social Media for Small Business booklet.
Guest SEO • Be a guest blogger on other related blogs.
• Consider targeting blogs that aren’t direct matches to your industry to get
a leg up on your competition. For more about blogging, see the Blogs for
Small Business booklet.
continued on next page
Category Actions
Link Building • Use reciprocal link exchange moderately. Instead, let link building happen
naturally through people retweeting and passing on your good content
and articles.
• Ensure quality of links rather than quantity. The higher the quality of links,
the more trust and authority will be established.
• Use targeted keywords in anchor text.
• When you do a link exchange, don’t always link back to your home page.
Instead, provide a link to the most relevant section of your site that relates
to the anchor link – e.g. “what you need to know about gluten-free products”
should link to the page about gluten-free products.
• Ensure any out of date or old web pages are redirected to the relevant new
pages through a 301 redirect.
Technical Considerations • Keyword density (how often keyword phrases are used in comparison to total
number of words on that page) of over 10% is considered suspicious and does
not look like naturally written text. Aim for 3–7% of total words per page to be
the keywords you are trying to optimize for.
• Have a short title tag (6 or 7 words at most), with the most important keywords
near the beginning and used only once.
• Avoid cumbersome URLs. Instead, create user-friendly URLs for easy accessibility
by viewers and search engines – e.g. change
&type=2&kind=3&node=5&arg=6 to
• Submit an XML site map to the search engines after making any major
structural changes to your site.
Category Actions
Marketing • When choosing keyword phrases, avoid using the word “free” unless you’re
offering something unconditionally.
• Be careful adjusting page titles without altering page content. Google may view
it unfavourably.
• Avoid overly aggressive or manipulative SEO techniques, such as loading too
many keywords in the website’s content, which might get your site excluded
from a search engine.
• Exclude linking affiliates that are not relevant as this could negatively affect
your positioning.
continued on next page
Category Actions
Technical • In structural design, do not use frame set up (check with your developer).
• Avoid free hosting since it is difficult for search engines to wade through
all the data when there are many sites sharing free hosting.
• Watch for and remove any broken links (URL links that no longer work and
are not accessible).
• Do not use FLASH, videos or images without alternative tags (alt tags),
transcripts or synopsis content.
• Do not try to optimize by using an excessive number of keywords, especially
unrelated ones, as this will affect the performance of all your keywords.
• Avoid artificially inflated keyword density (over 10%) or you will risk getting
banned from search engines.
• Keep the number of links on a page to a reasonable number since Google
does not like pages that consist mainly of links.
• Avoid writing text that is the same colour as page background since you
will be penalized by Google for this practice.
• Prioritize money-making keyword phrases (those that 5. Ensure that the content on your website is high quality
lead to a sale, as identified by examining your metrics) and is consistent with your keywords.
and make sure you position them within your page
so that search engines and your audience can easily 6. Check out various online resources (some are free)
identify them. that will help you assess the relevance of your keywords
by typing “keyword tools” into a search engine.
• Use keyword phrases in your web address that
best describe the page content – e.g. http://www. 3. Search Engine
Optimization Techniques
• For each page, choose a title (60 characters), SEO involves a wide range of techniques, some of which
description (150 characters) and Meta keywords you may be able to do yourself and others that will require
reflecting the theme and content of the page, web development expertise. Techniques include increasing
with each word separated by a comma. the number of links from other websites to your web
pages, editing the content of the website, reorganizing
• Placement of keywords is very important. Think
the structure of your website, and coding changes. It also
backwards, and put the “result/ conclusion” at the
involves addressing problems that might prevent search
beginning, thus keeping priority keywords “above
engines from fully “crawling” a website.
the fold” (i.e. closest to the top of the page). When
indexing your site, search engines move their robots
from top to bottom, left to right, so the placement of On Page and Off Page Optimization Techniques
your keywords should be optimized strategically in for Increasing Traffic to Your Website
order to be picked up by the engines.
On Page Optimization Techniques
• Avoid filling the top part of your site with large images
or image navigation (text that is actually an image) On page optimization includes those techniques that can
as you will miss the opportunity to have keywords be done on the pages of a website. On page optimization
captured for indexing. relates to those things that are within your control – i.e. the
content of your website. On page optimization techniques
help the search engine crawlers read the website content.
2. Finding the Right Keywords A readable site helps to show quality and will result in
higher ranked web pages.
Steps for finding the right keywords
(or search terms) Review the following with your website developer to
1. Determine what your existing and potential customers ensure that all these items have been considered:
might be looking for. Ask your current customers On Page SEO Checklist
what terms they would use to find the products and
services offered by your business. • Always start with keyword selection, research
and testing.
2. Brainstorm a list of keywords that are related to
your business. • Have a Meta description tag for each page.
• Create ALT tags for all images.
3. Check out the competition to see how many other
websites are listed in search engines (particularly • Place a keyword phrase in H1, H2, H3, H4 tags and
Google) for that keyword. You can use tools such in URL structure (domain name and your pages).
as Google Page Rank—which goes from 1 to 10
• Develop an internal linking strategy.
(10 being the best ranked)—to see which websites
hold the top positions. • Have relevant, keyword-rich content.
4. Select the best keywords—the ones most suited to • Follow keyword density rules.
your business and target audience. • Create and submit site maps, both XML and user facing.
• Design for usability and accessibility. 3. Host awards or certifications within your industry,
create an award ‘badge’ and then have it link back to
• Track target and converting keywords.
your site.
Off Page Optimization Techniques 4. Utilize social media and search strategies:
Off page optimization includes those techniques that can Twitter. This can be very effective in
be done outside your website to increase traffic to your capturing attention.
website. The various free off page optimization techniques
(also known as free traffic sources) that you can use to • Follow thought leaders within your industry.
drive traffic to your website and increase its ranking level • Make a note of what they like to tweet about.
in major search engines include link swaps, blogging, social
networks, white papers, infographics and forum postings. • Check their personal websites for more info.
• Look at what kind of content they retweet.
Additional guidelines to consider:
• Retweet their content.
• Include the keyword in your domain name when
registering both your main website and any micro-sites. • Interact with them constructively.
• Ask for their opinion.
• Include the keyword when naming your digital
resources such as a white paper, video and images Blog commenting. Find high-quality blogs written by
(e.g. Rather than naming your image, “image-1.jpg”, people with whom you are seeking links and provide
include a descriptive name like “weather-satellite.jpg” constructive, useful comments. This can prompt them to
or “weather-satellite-tips.pdf”). click through to your website and create a relationship
building opportunity where they can either become
• Ensure targeted anchor text is keyword rich so that
influencers willing to share your content and/or invite
when people copy blog posts or other content, that
you to become a guest blogger with your byline
linked anchor text will come with it and generate
pointing back to your site. All this contributes to gaining
more links back to you.
“inbound” links, which help your overall link popularity.
4. Link Popularity – A Key Factor for Profiling. Complete your profiles in the social circles
and map listings.This improves the likelihood for
Increasing a Website’s Page Ranking
your business to be more visible to search engines.
One of the most critical ways to improve your website’s
ranking in the search engine results pages is improving Keyword tags. Use these in content posts and post
the number and quality of websites that link to your site. titles to reinforce the specific keywords that your
Google PageRank is a system for ranking web pages audience is searching for.
used by the Google search engine. PageRank assesses
the extent and quality of web pages that link to your web RSS feeds. Constantly making fresh content available,
pages. Because Google is currently one of the most combined with “pinging” and natural share among
popular search engines worldwide, the ranking of your followers, shows search engines how popular your
web pages received on Google PageRank will affect the “brand” is.
number of visitors to your website. Techniques to 5. Build a free tool that links back to your site. If it is
improve your page rank are discussed below. useful, people will want to download it and possibly
share it. Include a byline that brings them back to
Link Building Techniques your site.
1. Send out product samples and ask for reviews, then 6. Build supplementary micro-sites and link back to
ask reviewers to link to your site. your main site.
2. Participate as a guest blogger or guest newsletter
contributor, with a link back to your site.
Participate in Link Campaigns Businesses can ask partners (e.g. suppliers), other businesses,
professional organizations, chambers of commerce and customers to
add links from their websites. Be prepared to link back to their sites.
The more relevant the partner, the better.
Create Interesting Content Provide interesting information that users may find useful. Offer tips to
benefit users (e.g. Pitfalls to Avoid When Hiring Contractors). Create
tools that people will use (e.g. Product Quality Checklists). Create lists
(people love Top 10 lists). Submit to article directories, press release
websites, and/or guest blog.
Give Testimonials Many would think it is important to get testimonials, but it is equally
important to give testimonials, as long as they are authentic and well
deserved. Ask to have your link come back to your site.
Turn Raw Numbers into a Data Story Be creative in taking your numbers and crafting an interesting story.
When you are the resource reference, then all others who use this story
will link back to you.
5. Keyword Conversion engines find from indexing your site. You can also see
broken links and the pages from where they originate
Site owners used to be satisfied simply to attract traffic in order to fix them quickly.
to their sites; now they want to know what is working,
such as which keywords most often lead to a sale. Google Webmaster tools can be found at
If you’re trying to sell a product or service through webmasters/tools. You require a Google account to set up
your website, knowing the keywords that lead to this the webmaster account. You will then need to follow
conversion at a high rate is an enormously valuable Google’s instructions to verify your website. Verified site
marketing asset. With this knowledge, you can adjust owners can see information about how Google crawls,
your site content accordingly. The following reference indexes and ranks your site. One of the key indicators
link will help you with conversion tracking: that can be gathered from this tool is an understanding of the keywords that visitors to your site are using, and then comparing them to the keywords that Google finds
on your site. You can modify your content to reduce the
“Bounce Rate”, which occurs when someone enters
your site, doesn’t find what they like and leaves before
Test, Measure, Test Again proceeding to any other part of the site.
1. Webmaster Tools Yahoo and Bing have now merged together to offer
The three major search engines, Google, Bing and Yahoo, Bing Webmaster Tools. Getting started involves setting
have webmaster tools that you as a site owner can sign up a Bing webmaster tools account at
up for at no cost to manage your website statistics, webmaster/WebmasterManageSitesPage.aspx, then
submit your content and site map and view diagnostic validating your website, creating and uploading a
errors, malware or other concerns that the search sitemap and developing a search optimization plan.
You should sign up for both.
Anchor text: Words used in the link text. It is best to Malware: Malicious software that can destroy a
stay away from words like [Read More], [See More], and computer. Common examples of malware include
use a more keyword-rich phrase pointing to where the viruses, Trojans, worms and spyware.
link is directed – e.g. “Discover Niagara’s Hotel Package
Meta tags: Keywords, description and content describing
Deals today”.
your website that is contained in the section of HTML
Backlinks: Links from external sites that connect to coding and is not visible on your website.
your website. They are also referred to as inbound links.
Outbound links: Links from your website to
Blogs: A blog (short for weblog) is an online journal. Most other websites.
blogs have an open format that allows any Internet user
Reciprocal links: Exchange of links between websites.
to post entries (comments, questions) to other bloggers.
Blog discussions are usually organized according to Referrers: Sites that suggest your site through links
certain themes or topics. coming from their website, blog, email, directory, tool, etc.
Bounces/Bounce Rate: Bounce rate is the percentage of Search Engine Results Page (SERP): The pages that
visitors that visit one page on your website then exit the result from a search engine query run by a user. You can
site before visiting another page. run a search using certain keywords to assess where
your web pages are ranking.
Crawlers: (also called spiders, robots, or bots).
A program which searches or browses the Web Social media optimization (SMO): Using social media
in a logical, automated manner. Search engines activity to attract visitors to websites by using methods
use crawlers to find up-to-date information. such as adding social media features (e.g. RSS feeds,
sharing buttons) to the website content and doing
Frames: (Frame set-up). A browser display area
promotional activities like blogging, participating in
(web page) is divided into two or more sections (frames).
discussion groups and updating social networking profiles.
The contents of each frame are taken from a different
web page. Submission: The process of submitting a website to
search engines so they are aware of the website and
Google PageRank: A rough indication of the popularity
can crawl it.
and importance of sites that point to your page. A higher
PageRank indicates a more popular page.
Gary Marchionini, University of North Carolina, Chapel Hill
XML Retrieval
Mounia Lalmas
Faceted Search
Daniel Tunkelang
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations
in printed reviews, without the prior permission of the publisher.
DOI: 10.2200/S00191ED1V01Y200904ICR006
Series ISSN
ISSN 1947-945X print
ISSN 1947-9468 electronic
Understanding User –Web
Interactions via Web Analytics
Bernard J. ( Jim) Jansen
Pennsylvania State University
This lecture presents an overview of the Web analytics process, with a focus on providing insight
and actionable outcomes from collecting and analyzing Internet data. The lecture first provides
an overview of Web analytics, providing in essence, a condensed version of the entire lecture. The
lecture then outlines the theoretical and methodological foundations of Web analytics in order to
make obvious the strengths and shortcomings of Web analytics as an approach. These foundational
elements include the psychological basis in behaviorism and methodological underpinning of trace
data as an empirical method. These foundational elements are illuminated further through a brief
history of Web analytics from the original transaction log studies in the 1960s through the informa-
tion science investigations of library systems to the focus on Websites, systems, and applications.
Following a discussion of on-going interaction data within the clickstream created using log files
and page tagging for analytics of Website and search logs, the lecture then presents a Web analytic
process to convert these basic data to meaningful key performance indicators in order to measure
likely converts that are tailored to the organizational goals or potential opportunities. Supplemen-
tary data collection techniques are addressed, including surveys and laboratory studies. The overall
goal of this lecture is to provide implementable information and a methodology for understanding
Web analytics in order to improve Web systems, increase customer satisfaction, and target revenue
through effective analysis of user–Website interactions.
Web analytics, search log analysis, transaction log analysis, transaction logs, log file, query logs,
key performance indicators, query log analysis, Web search research, Webometrics
I have based this lecture on my research and practical work in the Web analytics area, along with the
work of many others. One advantage of sustained work over time in a given field is the ability to go
back and correct, modify, expand, and improve, with any luck, previous efforts, documentations, and
writings. Additionally, sustained contribution to a body of knowledge in an area often leads one to
works by and interchanges with other researchers, practitioners, and scholars who enhance and add
to the field. This lecture is the outcome of this continual and reiterative learning process. I hope the
content within jumpstarts the learning process of others in the field of Web analytics.
My goal with this lecture is to present the conceptual aspects of Web analytics, relating these
facets to processes and concepts that address the pertinence of Web analytics, and provide meaning
to the techniques used in Web analytics. To that end, each section of the lecture represents a major
component of the Web analytics field. While many sections are built on previous publications, I
have enhanced each with updated thoughts and directions in the field, drawing from my own in-
sights as well as those of others who are pushing and defining the field of Web analytics, including
both academics and practitioners. Collectively, the sections offer an integrated and coherent primer
of the exciting field of Web analytics.
This is not necessarily a “how-to” book. The field of Web analytics is dynamic, and any
“how-to” book would likely be out of date before it could be published. Instead, in this lecture, I
offer foundational elements that are more enduring than implementation techniques could be. As
such, the lecture may have some enduring value.
I sincerely acknowledge and thank the many collaborators with whom I have worked with in the
Web analytics area. I am appreciative of suggestions of material from Eric Peterson and Mark
Ruzomberka, both excellent pracitioners in the art and science of Web analytics. I am also thank-
ful to the reviewers of this manuscript for their valuable input, specifically Dietmar Wolfram and
Fabrizio Silvestri, both excellent researchers in the field of Web log analysis. Finally, I am indebted
to Diane Cerra and Gary Marchionini for their patience and assistance.
Preface ......................................................................................................................vii
Acknowledgments....................................................................................................... ix
References ................................................................................................................. 89
Let us pretend for the moment that we run an online retail store that sells a physical product, per-
haps the latest athletic shoe, as just an example. How do potential customers find our online store?
Do they find us via major search engines or from other sites? How will we know, and why should
we care? What might it mean if they come to our Website and then immediately leave? What if
the potential customer explores several pages and then leaves? Do these customers’ actions tell us
anything valuable about our Website or call for actions on our part? If a customer starts to make a
purchase but then leaves before completing the order, should we look at a site redesign? To make our
hypothetical online store successful, we need to understand why potential customers behave as they
do, and the possible answers to our questions lie within the field of Web analytics.
The Web Analytics Association (WAA) defines Web analytics as “the measurement, collec-
tion, analysis, and reporting of Internet data for the purposes of understanding and optimizing Web
usage” (
This seemingly clear-cut definition is not so clear-cut when we consider the numerous un-
stated assumptions, methods, and tools needed for its implementation. In fact, the definition raises
several critically unanswered questions. For example, the definition leaves Internet data undefined.
What is this Internet data? What are its strengths and shortcomings? Where does one get it? Once
defined, collection implies some application that can do the collecting. What application is doing
the collecting? Measurement implies processes and benchmarks. Where are these processes and
benchmarks? Analysis implies both a methodology and strategy for its conduct, which leads one
from the data to understanding and insight. What is this methodology, and how does one define the
strategy? What constructs is this strategy based upon? Reporting implies an organizational unit in
which to report for some external purpose. Optimizing implies a focus on technology or processes.
Understanding implies a focus on people or contexts.
Answering some of these questions is the goal of this lecture, and the questions and assump-
tions of our definition provide the structure for our discussion of the increasingly important field of
Web analytics.
Given the commercial structure of the Web, Web analytics has typically taken a business
perspective in the practitioner arena. Within academia, a near parallel movement is focusing on
transaction log analysis (TLA) and Webometrics areas [147], where interesting academic research
is occurring. In this lecture, the perspective will shift from one paradigm to the other. Because the
jargon from the practitioner side is rapidly gaining wider acceptance, this lecture leverages that
terminology, in the main. As such, much of the discussion will take a “business” and “customer”
perspective that could be somewhat alien to some academic readers. I would recommend these aca-
demic readers to adapt. This is the direction that the field is heading, and in the main, it is for the
better. It is where the action is. Practice is informing research.
The commercial force on the Web is pushing Web analytics research outside of academia
at a near unbelievable pace. The drivers for these movements are clear. The Web has significantly
shortened the distance between a business and its customers, both physically and emotionally. The
distance between business and customer is now the duration of a single click. These clicks drive
the economic models that support our Web search engines and provide the economic fuel for an
increasing number of businesses. The click (with the associated customer behavior that accompanies
it) is at the heart of an economic engine that is changing the nature of commerce with the near
instantaneous, real-time recording of customer decisions to buy or not to buy (or some other analyst
defined conversion other than purchase).
As such, Web analytics deals with Internet customer interaction data from Web systems.
From an academic research perspective, this data is known as trace data (i.e., traces left behind
that indicate human behaviors). A basic premise of Web analytics is that this user-generated In-
ternet data can provide insight to understanding these users better or point to needed changes or
improvements to existing Web systems. This can be data collected directly on a given Website or
gathered indirectly from other applications. Almost all direct data that we can collect is behavioral
data, which is data that relates to the behavior of a user on a Web system. As such, this data pro-
vides wonderful insights into what a user is doing. It tells us the “what.” However, its shortcoming
is that it offers little insight into the motivations or decision processes of that user. These are what
academics call the contextual, situational, cognitive, and affective aspects of the user. For example, a
click online could indicate extreme interest, slight consideration, or perhaps a serendipitous experi-
ence. To explore the “why,” we need attitudinal data (i.e., the contextual, situational, cognitive, and
affective stuff ). For these insights, one must typically use other forms of data collection methods to
supplement behavioral data, such as surveys, interviews, or laboratory studies. However, behavioral
data is a great starting point to isolate the most promising possibilities (based on some external goal)
and then move to attitudinal data collection methods in order to investigate possible meanings or
As researchers, we collect this behavioral data using an application that logs user behavior
on the Website, along with other associated measures. These logging applications come in a variety
of flavors, with continually changing structure, coding, and features. However, they all perform the
same core activities—collect and archive data in some type of storage location. This storage location
is typically a log file, characteristically known as a transaction log, hence the name transaction log
analysis (i.e., Web analytics in academic circles). While transaction log formats vary, they all gener-
ally report similar behavioral data, along with associated contextual data concerning the computer
and related software (i.e., operating system, browser type, etc.).
The issue of the computer is extremely important as it serves to highlight one of the key
shortcomings of Internet data, namely, that the data can sometimes be inaccurate. There are sev-
eral sources of data error. Primarily, Internet behavioral data are traceable back to a computer or
computer browser and not necessarily to an exact person, assuming the person does not log into
an account. In many cases this issue may make little difference. If a product is sold, a product is
sold. However, in other situations such as search and visitor counts, this issue can cause numerous
problems. This is especially so with Web search engines where one can log on anonymously. In
addition, common use computers can skew the data. Additionally, with the proliferation of scrap-
ping software one cannot always tell whether the visitor was even a human. In the case of popular
Websites, many times most of the server load is software generated. The behavior of these bots can
significantly skew the data. Other sources of data inaccuracies include the use of cookies, internal
visitors, caching servers, and incorrect page tagging. Finally (and we computer scientists rarely dis-
cuss this), data catch applications are not perfect, based on personal experience error rates are often
in the 5% to 10% range.
Once we have collected the data, as accurately as possible, we begin the process of getting
value by reporting and analyzing it. Reporting is somewhat straightforward and generally involves
compiling data in some aggregate way for clarity and simplification. In analysis, we attempt to lever-
age the data to understand some set goal and, perhaps, to make recommendations for improvement,
identify opportunities, or highlight specific findings. In order to enable practitioners to get value
from the analysis, researchers must establish proven processes and methodologies. Methodologies
must be identified and used to correct for data inaccuracy and typically must be scalable (i.e., able to
handle large volumes of data). The tactical aspects of analysis (and the related issues of data clean-
ing before analysis) can be extremely time-consuming until one establishes an efficient procedure.
However, several commercial tools can aid in the process. The effective analysis must generate the
proper metrics and key performance indicators (KPIs).
From a formal perspective KPIs measure performance based on articulated goals for the busi-
ness, user understanding, or Web system. Each KPI, then, should link directly to goals; therefore,
KPIs enable goal achievement by defining and measuring progress. The setting of these KPIs is of
paramount importance in achieving a related technology, user, or organizational goal.
In defining KPIs, we identify the actions that are desired behaviors and then relate these
desired behaviors toward measurable goals. KPIs will vary based on the organization and Web
system. Typically, KPIs for commercial sites are overall purchase conversions, average order size,
and items per order. For lead generation sites, KPIs might be overall conversions, conversion by
campaigns, dropouts, and conversions of leads to actual customers. Customer service sites might
focus on reducing expenses and improving customer experiences. For advertising on content sites,
KPIs could be visits per week, page viewed per visit, visit length, advertising click ratio, and ratio of
new to returning visitors.
Analysis results are of little value until one takes action driven by the data that is in line with
the established KPI. One generally refers to this as actionable outcomes. In academic circles, this
may mean generating publications that shed insight on user behavior, or changes to some methods
or system. In a business, this means calculated change to improve the Website or business process
that is directly dependent on the KPI selected.
We directly link KPIs to goals by monetizing (i.e., assigning value to) the desired behaviors
that these indicators reflect. Generally, these goals relate to generating additional revenue, reducing
costs, or improving the user experience. If we want more visitors, we must determine how much
each visitor is worth to us. If we are interested in items ordered, we identify the value of each addi-
tional item ordered to the organization. By clearly articulating this linkage between KPIs and goals,
we can then see the impact of these indicators and make choices about prioritizing opportunities
and problems. This type of analysis can also aid in eliminating unsuccessful projects and determin-
ing the impact of system changes. Such a linkage process can aid in determining the value of Web
campaigns and recognizing the investment return on Web system use.
In a nutshell, this is the field of Web analytics. In the following sections of this lecture, we
investigate each of the concepts and areas in more detail, beginning with an examination of the
theoretical foundations of Web analytics.
• • • •
What are the foundational elements that provide confidence that Web analytics is providing use-
ful insights? To address such a question, we must investigate the underlying constructs of Web
analytics. This section explains the theoretical and methodological foundations for Web analytics,
addressing the fundamentals of the field from a research viewpoint and the concept of Web logs as
a data collection technique from the perspective of behaviorism. By behaviorism, we take a more
liberal view than is traditional, as will be explained.
From this research foundation, we then move to the methodological aspects of Web analyt-
ics and examine the strengths and limitations of Web logs as trace data. We then review the con-
ceptualization of Web analytics as an unobtrusive approach to research and present the power and
deficiency of the unobtrusive methodological concept, including benefits and risks of Web analytics
specifically from the perspective of an unobtrusive method. The section also highlights some of the
ethical questions concerning the collection of data via Web log applications.
Conducting research involves the use of both a set of theoretical constructs and methods for
investigation [74]. For empirical research, the results are linked conceptually to the data collection
process. High-quality research requires a thorough methodological frame. In order to understand
empirical research and the implications of the results, we must thoroughly understand the tech-
niques by which the researcher collected and analyzed data. A variety of methods is available for
research concerning users and information systems on the Web, including qualitative, quantitative,
and mixed methods. The selection of an appropriate method is critical if the research is to have
efficient execution and effective outcomes. The method of data collection also involves a choice of
methods. Web logs (including both transaction logs and search logs) and Web analytics (including
TLA and search log analysis [SLA]) are approaches to data collection and research methodology,
respectively, for both system performance and user behavior analysis that has been used since 1967
[105], in peer-reviewed research since 1975 [116], and in numerous practitioner outlets since the
1990s [118].
A Web log is an electronic record of interactions that have occurred between a system and
users of that system. These log files can come from a variety of computers and systems (Websites,
to online public access catalogs or OPACs, user computers, blogs, listserv, online newspapers, etc.),
basically any application that can record the user–system–information interactions. Web analytics
also takes various forms but commonly involves TLA, which was preceded by log analysis in the
academic fields of library, information, and computer science. TLA is the methodological approach
to studying online systems and users of these systems. Peters [117] defines TLA as the study of
electronically recorded interactions between online information retrieval systems and the persons
who search for information found in those systems. Since the advent of the Internet, we have had
to modify Peters’ (1993) definition, expanding it to include systems other than information re-
trieval systems. In general, the practitioner side of Web analytics seems to have developed relatively
independently, with few people venturing out and sharing learning between the practitioner and
academic camps.
Partly as a result of this separate development, Web analytics is a broad categorization of
methods that covers several sub-categorizations, including TLA (i.e., analysis of any log from a
system), Web log analysis (i.e., analysis of Web system logs), blog analysis (i.e., analysis of Web logs),
and SLA (analysis of search engine logs), among others. The study of digital libraries is also an in-
teresting domain that involves both searching and browsing. Web analytics enables macro-analysis
of aggregate user data and patterns and microanalysis of individual search patterns. The results
from the analyzed data help to develop systems and services based on user behavior or system per-
formance, and these services and performance enhancements are usually leveraged to achieve other
From a user behavior perspective, Web analytics is one of a class of unobtrusive methods
(a.k.a., non-reactive or low-constraint). Unobtrusive methods are those that allow data collection
without directly contacting participants. The research literature specifically describes unobtrusive
approaches as those that do not require a direct response from participants [102, 112, 154]. This
data can be gathered through observation or from existing records. In contrast to unobtrusive meth-
ods, obtrusive or reactive approaches, such as questionnaires, tests, laboratory studies, and surveys,
require a direct response from participants [153]. A laboratory experiment is an example of an
extremely obtrusive method. The metaphorical line between unobtrusive and obtrusive methods is
unquestionably blurred, and instead of one thin line there is a rather large gray area. For example,
conducting a survey to gauge the reaction of users to information systems is an obtrusive method.
However, using the posted results from the survey is an unobtrusive method. Granted, this may be
making a strictly intellectual distinction, but the point is that log data falls in the gray area. In some
respects, users know that their actions are being logged or recorded on Websites. However, logging
applications are generally so unobtrusive that they fade into the background [4].
With this introduction, we now address the specific research and methodological foundations
of Web analytics. We first address the concept of transaction logs as a data collection technique
from the perspective of behaviorism, and then review the conceptualization of Web analytics as
trace data and an unobtrusive method. We present the strengths and shortcomings of the unobtru-
sive methodology approach, including benefits and shortcomings of Web analytics specifically from
the perspective of an unobtrusive method. We end with a short summary and open questions of
transaction logging as a data collection method.
The use of transaction logs for research purposes certainly falls conceptually within the confines
of the behaviorist paradigm of research. Therefore, behaviorism is the conceptual basis for Web
Behaviorism is a research approach that emphasizes the outward behavioral aspects of thought.
Strictly speaking, behaviorism also dismisses the inward experiential and procedural aspects [137,
152]; importantly, behaviorism has been heavily criticized for this narrow viewpoint. Some of the
pioneers in the behaviorist field are shown in Figure 2.1.
For the area of Web analytics, however, we take a more open view of behaviorism. In this
more accepting view, behaviorism emphasizes observed behaviors without discounting the inner
aspects (i.e., attitudinal characteristics and context) that may accompany these outward behaviors.
This more open outlook of behaviorism supports the position that researchers can gain much from
studying expressions (i.e., behaviors) of users interacting with information systems. These expressed
behaviors may reflect aspects of the person’s inner self as well as contextual aspects of the environ-
ment within which the behavior occurs. These environmental aspects may influence behaviors while
also reflecting inner cognitive factors.
The primary proposition underlying behaviorism is that all things that people do are be-
haviors. These behaviors include utterances, actions, thoughts, and feelings. With this underlying
proposition, the behaviorist position is that all theories and models concerning people have observa-
tional correlates. Moreover, the behaviors and any proposed theoretical constructs must be mutually
Strict behaviorism would further state that there are no differences between the publicly ob-
servable behavioral processes (i.e., actions) and privately observable behavioral processes (i.e., think-
ing and feeling). Due to affective, contextual, situational, or environmental factors, however, there
may be disconnections between the cognitive and affective processes. Therefore, there are sources of
behavior both internal (i.e., cognitive, affective, and expertise) and external (i.e., environmental and
situational). Behaviorism focuses primarily on only what an observer can see or manipulate.
Behaviorism is evident in any research where the observable evidence is critical to the re-
search questions or methods, and this is especially true in any experimental research where the
“operationalization” of variables is required. A behaviorist approach, at its core, seeks to understand
events in terms of behavioral criteria [134, p. 22]. Behaviorist research demands behavioral evi-
dence, and this is particularly important to Web analytics. Within such a perspective, there is no
knowable difference between two states unless there is a demonstrable difference in the behavior
associated with each state.
Research that is grounded in behaviorism always focuses on somebody doing something in a
situation. Therefore, all derived research questions focus on who (actors), what (behaviors), when
(temporal), where (contexts), and why (cognitive). The actors in a behaviorist paradigm are people,
at whatever level of aggregation (e.g., individuals, groups, organizations, communities, nationalities,
societies), whose behavior is studied. All aspects of what the actors do are studied carefully. These
behaviors have a temporal element, and thus researchers need to study when and how long these
behaviors occur. Similarly, the behaviors occur within some context, which are all the environmental
and situational features in which these behaviors are embedded, and this context must be recognized
and analyzed. Finally, the cognitive aspect to these behaviors is the thought and affective processes
internal to the actors executing the behaviors.
From this research perspective, each of these aspects (i.e., actor, behaviors, temporal, context,
and cognitive) are behaviorist constructs. However, for Web analytics, we are primarily concerned
with defining what a behavior is.
Defining a behavior is not as straightforward as it may seem at first glance, yet defining a behavior
is critical for Web analytics. In research, a variable represents a set of events where each event may
have a different value. In Web analytics, session duration or number of clicks may be variables that
interest a researcher. The particular variables that a researcher is interested in stem from the research
questions driving the study.
We can define variables by their use in a research study (e.g., independent, dependent, extra-
neous, controlled, constant, and confounding) and by their nature. Defined by their nature, there are
three types of variables: environments (i.e., events of the situation, environment, or context), subjects
(i.e., events or aspects of the subject being studied), and behavioral (i.e., observable events of the
subject of interest).
For Web analytics, behavior is the essential construct of the behaviorist paradigm. At its
most basic, a behavior is an observable activity of a person, animal, team, organization, or system.
Like many basic constructs, behavior is an overloaded term, as it also refers to the aggregate set of
responses to both internal and external stimuli. Therefore, behaviors can also address a spectrum of
actions. Because of its many associations, it is difficult to characterize a word like behavior without
specifying a context in which it takes place to provide the necessary meaning.
However, one can generally classify behaviors into three general categories:
In some manner, the researcher must observe these behaviors. In other words, the researcher
must study and gather information on a behavior concerning what the actor does. Classically, obser-
vation is visual, where the researcher uses his/her own eyes, but recording devices, such as a camera,
can assist in the observation. Technology has extended the concept of observation to include other
recording devices. For Web analytics, we extend the notion of observation to include logging soft-
ware. Logging software is really nearly invisible to many users; thus, it allows for a more objective
measure of true user behavior. Web analytics focuses on descriptive observation and logging the
behaviors, as they would occur in a user–system interaction episode.
When studying behavioral patterns with Web analytics and other similar approaches, re-
searchers often use ethograms. An ethogram is a taxonomy or index of the behavioral patterns that
details the different forms of behavior that a particular user exhibits. In most cases it is desirable to
View results Behavior in which the user viewed or scrolled one or more
pages from the results listing. If a results page was present and
the user did not scroll, we counted this as a View Results Page.
but No Results in Window User was looking for results, but there were no results in the listing.
Selection Behavior in which the user makes a selection in the results listing.
Click URL (in results listing) Interaction in which the user clicked on a URL of one of the results
in the results page.
Next in Set of Results List User moved to the Next results page.
Previous in Set of Results List User moved to the Previous results page.
Execute Query Behavior in which the user entered, modified, or submitted a query
without visibly incorporating assistance from the system. This cat-
egory includes submitting the original query which was always the
first interaction with system.
Find Feature in Document Behavior in which the user used the FIND feature of the browser.
Create Favorites Folder Behavior in which the user created a folder to store relevant URLs.
Switch/Close browser window User switched between two open browsers or closed a browser win-
Copy–Paste User copy–pasted all of, a portion of, or the URL to a relevant docu-
View/Implement assistance Behavior in which the user viewed the assistance offered by the
Implement Assistance Behavior in which the user entered, modified, or submitted a query,
utilizing assistance offered by the application.
create an ethogram in which the categories of behavior are objective and discrete, not overlapping
with each other. In an ethogram, the definitions of each behavior and category of behaviors should
be clear, detailed, and distinguishable from each other. Ethograms can be as specific or general as
the study or investigation warrants.
Spink and Jansen [140] and Jansen and Pooch [69] outline some of the key behaviors for
SLA, a specific form of Web analytics. Hargittai [52] and Jansen and McNeese [67] present ex-
amples of detailed classifications of behaviors during Web searching. As an example, Table 2.1
presents an ethogram of user behaviors interacting with a Web browser during a searching session
employed in the study.
There are many way to observe behaviors. In TLA, we are primarily concerned with observ-
ing and recording these behaviors in a file, and we then can view the recorded fields as trace data.
cal remains of interaction [154, pp. 35–52]. These remains can be intentional (i.e., notes in a diary
or initials on a cave wall) or accidental (i.e., footprints in the mud or wear on a carpet). However,
trace data can also be through third party logging applications. In TLA, we are primarily interested
in this data from third party logging.
Many researchers use physical or, as in the case of Web analytics, virtual traces as indicators
of behavior. These behaviors are the facts or data that researchers use to describe or make inferences
about events concerning the actors. Researchers [154] classify trace data into two general types: ero-
sion and accretion. Erosion is the wearing away of material leaving a trace. Accretion is the buildup
of material, making a trace. Both erosion and accretion have several subcategories. In TLA, we are
primarily concerned with accretion trace data.
Trace data (a.k.a., trace measures) offer a sharp contrast to data collected directly. The great-
est strength of trace data is that it is unobtrusive, meaning the collection of the data does not inter-
fere with the natural flow of behavior and events in the given context. Since the data is not directly
collected, there is no observer present where the behaviors occur to affect the participants’ actions,
and thus, the researcher is getting data that reflects natural behaviors. Trace data is unique; as un-
obtrusive and nonreactive data, it can make a very valuable research contribution. In the past, trace
data was often time consuming to gather and process, making such data costly. With the advent of
transaction logging software, trace data for the studying of behaviors of users and systems in Web
analytics is much cheaper to collect, and consequently, Web analytics and related fields of study have
really taken off.
Interestingly, in the physical world, erosion data is what typically reveals usage patterns (i.e.,
trails worn in the woods, footprints in the snow, fingerprints on a book cover). However, with Web
analytics, logged accretion data indicates the usage patterns (i.e., access to a Website, submission
of queries, Webpages viewed). Specifically, transaction logs are a form of controlled accretion data,
where the researcher or some other entity alters the environment in order to create the accretion
data [154, pp. 35–52]. With a variety of tracking applications, the Web is a natural environment for
controlled accretion data collection. With the user of client apps (such as desktop search bars and
what not), the collection of data is nearly unlimited from a technology perspective.
Like all data collection methods, trace data for studying users and systems has strengths and
limitations. Certainly, trace data are valuable for understanding behavior (i.e., behavioral actions)
in naturalistic environments and may offer insights into human activity obtainable in no other way.
For example, data from Web transaction logs is on a scale available in few other places. However,
one must interpret trace data carefully and with a fair amount of caution because trace data can
be incomplete or even misleading. For example, with the data in transaction logs the researcher
can say a given number of Website users only looked at the Website’s homepage and then left
(a.k.a., homepage bounce rate). However, using trace data alone the researcher could not conclude
whether the users left because they found what they were looking for, were frustrated because they
could not find what they were looking for, or were in the wrong place to begin with. However,
with some experimental data one could make some reasonable assumptions concerning this user
Research using trace data from transaction logs should be analyzed based on the same criteria
as all research data and methods. These criteria are credibility, validity, and reliability.
Credibility concerns how trustworthy or believable the data collection method is. The re-
searcher must make the case that the data collection approach records the data needed to address
the underlying research questions.
Validity addresses whether the measurement actually measures what it is supposed to mea-
sure. There are three kinds of validity:
• Face or internal validity: the extent to which the contents of the test, method, analysis, or
procedure that the researcher is employing measure what they are supposed to measure.
• Content or construct validity: the extent to which the content of the test, method, analy-
sis, or procedure adequately represents all that is required for validity of the test, method,
analysis, or procedure (i.e., are you collecting and accounting for all that you should collect
and account for).
• External validity: the extent to which one can generalize the research results across popu-
lations, situations, environments, and contexts of the test, method, analysis, or procedure.
In inferential or predictive research, one must also be concerned with statistical validity (i.e.,
the degree of strength of the independent and dependent variable relationships). Statistical validity
is actually an important aspect for Web analytics, given the needed ties between data collected and
Reliability is a term used to describe the stability of the measurement. Essentially, reliability
addresses whether the measurement assesses the same thing, in the same way, in repeated tests.
Researchers must always address the issues of credibility, validity, and reliability. Leveraging
the work of Holst [58], the researcher must address six questions in every Web analytics research
project that uses trace data from transaction logs.
• Which data are analyzed? The researcher must clearly communicate in a precise manner
both the format and content of recorded trace data. With transaction log software, this is
much easier than in other forms of trace data, as logging applications can be reverse engi-
neered to articulate exactly what behavioral data is recorded.
• How is this data defined? The researcher must clearly define each trace measure in a man-
ner that permits replication of the research on other systems and with other users. As TLA
has proliferated in a variety of venues, more precise definitions of measures are developing
[114, 151, 158].
• What is the population from which the researcher has drawn the data? The researcher
must be cognizant of the actors, both people and systems, that created the trace data. With
transaction logs on the Web, this is sometimes a difficult issue to address directly, unless the
system requires some type of logon and these profiles are then available. In the absence of
these profiles, the researcher must rely on demographic surveys, studies of the system’s user
population, or general Web demographics.
• What is the context in which the researcher analyzed the data? It is important for the
researcher to explain clearly the environmental, situational, and contextual factors under
which the trace data was recorded. With transaction log data, this includes providing com-
plete information about the temporal factors of the data collection (i.e., the date and time
the data was recorded) and the make-up of the system at the time of the data recording, as
system features undergo continual change. Transaction logs have the significant advantage
of time sampling of trace data. In time sampling, the researcher can make the observations
at predefined points of time (e.g., every 5 minutes, every second), and then record the ac-
tion that is taking place, using the classification of action defined in the ethogram.
• What are the boundaries of the analysis? Research using trace data from transaction logs
is tricky, and the researcher must be careful not to overreach with the research questions
and findings. The implications of the research are confined by the data and the method
of the data collected. For example, with transaction log data we can rather clearly state
whether or not a user clicked on a link. However, transaction log trace data itself will not
inform us as to why the user clicked on a link. Was it intentional? Was it a mistake? Did
the user become sidetracked?
• What is the target of the inferences? The researcher must clearly articulate the relation-
ship among the separate measures in the trace data either to inform descriptively or in order
to make inferences. Trace data can be used for both descriptive research to improve our
understanding and predictive research in terms of making inferences. These descriptions
and inferences can be at any level of granularity (i.e., individual, collection of individuals,
organization, etc.). However, Hilbert and Redmiles [55] point out, based on their experi-
ences, that transaction log data is best used for aggregate level analysis. I disagree with this
position. With enough data at the individual level, one can tell a lot from log data.
If the researcher addresses each of the six questions, transaction logs are an excellent way to
collect trace data on users of Web and other information systems. The researcher then examines this
data using TLA. The use of trace data to understand behaviors makes the use of transaction logs and
transaction logs analysis an unobtrusive research method.
observer bias in the data collection. However, as with other methods, trace data has no effect on the
observer bias in interpreting the results from data analysis.
Given the justifications for using unobtrusive methods, we will now turn our attention to
three types of unobtrusive measurement that are applicable to Web analytics, namely indirect analy-
sis, context analysis, and second analysis. Web analytics is an indirect analysis method. The re-
searcher is able to collect the data without introducing any formal measurement procedure. In this
regard, TLA typically focuses on the interaction behaviors occurring among the users, system, and
information. There are several examples of utilizing transaction analysis as an indirect approach [cf.
Refs. 2, 15, 32, 57].
Content analysis is the analysis of text documents. The analysis can be quantitative, quali-
tative, or a mixed methods approach. Typically, the major purpose of content analysis is to iden-
tify patterns in text. Content analysis has the advantage of being unobtrusive and, depending on
whether automated methods exist, can be a relatively rapid method for analyzing large amounts of
text. In Web analytics, content analysis typically focuses on search queries or analysis of retrieved
results. A variety of examples are available in this area of transaction log research [cf. Refs. 7, 16,
51, 151, 158].
Secondary data analysis, like content analysis, makes use of already existing sources of data.
However, secondary analysis typically refers to the re-analysis of quantitative data rather than text.
Secondary data analysis uses data that was collected by others to address different research questions
or to use different methods of analysis than was originally intended during data collection. For ex-
ample, Websites commonly collect transaction log data for system performance analysis. However,
researchers can also use this data to address other questions. Several transaction log studies have
focused on this aspect of research [21, 22, 29, 30, 34, 77, 107, 129].
As a secondary analysis method, Web analytics has several advantages. First, it is efficient in
that it makes use of data collected by a Website application. Second, it often allows the researcher to
extend the scope of the study considerably by providing access to a potentially large sample of users
over a significant duration [81]. Third, since the data is already collected, the cost of using existing
transaction log data is cheaper than collecting primary data.
However, the use of secondary analysis is not without difficulties. First, secondary data is
frequently not trivial to prepare, clean, and analyze [66], especially large transaction logs. Second,
researchers must often make assumptions about how the data was collected because third parties
developed the logging applications. A third and perhaps more perplexing difficulty concerns the
ethics of using transaction logs as secondary data. By definition, the researcher is using the data in a
manner that may violate the privacy of the system users [53]. In fact, some critics point to a grow-
ing concern for unobtrusive methods due to increased sensitivity toward the ethics involved in such
research [112]. Log data may be unobtrusive, but it can certainly be quite invasive.
• Scale: Transaction log applications can collect data to a degree that overcomes the critical
limiting factor in laboratory user studies. User studies in laboratories are typically restricted
in terms of sample size, location, scope, and duration.
• Power: The sample size of transaction log data can be quite large, so inference testing can
highlight statistically significant relationships. Interestingly, sometimes the amount of data
in transaction logs from the Web is so large that nearly every relation is significantly cor-
related. Due to the large power, researchers must account for the size effect.
• Scope: Since transaction log data is collected in natural contexts, researchers can investi-
gate the entire range of user–system interactions or system functionality in a multi-variable
• Location: Transaction log data can be collected in naturalistic, distributed environments.
Therefore, users do not have to be in an artificial laboratory setting.
• Duration: Since there is no need for recruiting specific participants for a user study, trans-
action log data can be collected over an extended period.
All methods of data collection have strengths not available with other methods, but they also
have inherent limitations. Transactions logs have several shortcomings. First, transaction log data
is not nearly as versatile relative to primary data because the data may not have been collected with
the particular research questions in mind. Second, transaction log data is not as rich as some other
data collection methods and therefore not available for investigating the range of concepts some
researchers may want to study. Third, the fields that the transaction log application records are many
times only loosely linked to the concepts they are alleged to measure. Fourth, with transaction logs
the users may be aware that they are being recorded and may alter their actions. Therefore, the user
behaviors may not be altogether natural.
Given the inherent limitations in the method of data collection, Web analytics also suffers
from shortcomings derived from the characteristics of the data collection. Hilbert and Redmiles
[56] maintain that all research methods suffer from some combination of abstraction, selection,
reduction, context, and evolution problems that limit scalability and quality of results. Web analytics
suffers from these same five shortcomings.
• Reduction problem—how does one reduce the complexity and size of the data set before
reporting and analysis?
• Context problem—how does one interpret the significance of events or states within state
• Evolution problem—how can one alter data collection applications without impacting
application deployment or use?
Because each method has its own combination of abstraction, selection, reduction, context,
and evolution problems, astute researchers will employ complementary methods of data collection
and analysis. This is similar to the conflict inherent in any overall research approach. Each research
method for data collection tries to maximize three desirable criteria: generalizability (i.e., the degree
to which the data applies to overall populations), precision (i.e., the degree of granularity of the mea-
surement), and realism (i.e., the relation between the context in which evidence is gathered relative
to the contexts to which the evidence is to be applied). Although the researcher always wants to
maximize all three of these criteria simultaneously, in reality it cannot be done. This is one funda-
mental dilemma of the research process. The very things that increase one of these three features
will reduce one or both of the others.
Recordings of behaviors via transaction log applications on the Web opens a new era for research-
ers by making large amounts of trace data available for use. The online behaviors and interactions
among users, systems, and information create digital traces that permit collection and analysis of
this data. Logging applications provide data obtained through unobtrusive methods, and impor-
tantly, these collections are substantially larger than any data set obtained via surveys or laboratory
studies. As noted earlier, these applications allow the data to be collected in naturalistic settings
with little to no impact by the observer. Researchers can use these digital traces to analyze a nearly
endless array of behavior topics.
Web analytics is a behaviorist research method, with a natural reliance on the expressions of
interactions as behaviors. The transaction log application records these interactions, creating a type
of trace data. As a reminder, trace data in transaction logs are records of interactions as people use
these systems to locate information, navigate Websites, and execute services. The data in transaction
logs is a record of user–system, user–information, or system–information interactions. Moreover,
transaction logs provide an unobtrusive method of collecting data on a scale well beyond what one
could collect in confined laboratory studies. Figure 2.2 provides a recap of the foundation of Web
The massive increased availability of Web trace data has sparked concern over the ethical
aspects of using unobtrusively obtained data from transaction logs. For example, who does the
trace data belong to—the user, the Website that logged the data, or the public domain? How does
(or should) one seek consent to use such data? If researchers do seek consent, from whom does the
researcher seek it? Is it realistic to require informed consent for unobtrusively collected data? These
are open questions.
• • • •
There have been an increasing number of review articles on Web analytics research in academia.
One of the first, Jansen and Pooch [69] provide a review of Web transaction log research of Web
search engines and individual Websites through 2000, focusing on query analysis. After reviewing
studies conducted between 1995 and 2000, Hsieh-Yee [59] reports that many studies investigate the
effects of certain factors on Web search behavior, including information organization and presenta-
tion, type of search task, Web experience, cognitive abilities, and affective states. Hsieh-Yee [59]
also notes that many studies lack external validity.
Bar-Ilan [13] presents an extensive and integrative overview of Web search engines and the
use of Web search engines in information science research. Bar-Ilan [13] provides a variety of per-
spectives including user studies, social aspects, Web structure, and search-engine evaluation.
Two excellent historical reviews are Penniman [115, 116], who examines log research from
the very beginning as a participant/observer, and Markey [98, 99], who reviews twenty-five years of
academic research in the area.
Given the availability of these comprehensive reviews, we will touch on some of the previous
work simply to identify the overall trends and to provide historical insight for Web analytics today.
Web analytics studies fall into three categories: (1) those that primarily use transaction-log analysis,
(2) those that incorporate users in a laboratory survey or other experimental setting, and (3) those
that examine issues related to or affecting Web searching.
Web-based system to prepare for U.S. college admissions tests. The researchers noted several non-
optimal behaviors, including a tendency toward deferring study and a preference for short-answer
verbal questions. The researchers discussed the relevance of their findings for online learning.
Wen et al. [156] investigated the use of click-through data to cluster queries for question
answering on a Web-based version of the Encarta encyclopedia. The researchers explored the simi-
larity between two queries using the common user-selected documents between them. The results
indicate that a combination of both keywords and user logs is better than using either method alone.
Using a Lucent proxy server, Hansen and Shriver [50] used transaction-log analysis to cluster search
sessions and to identify highly relevant Web documents for each query cluster.
Collectively, these studies provide better descriptions of user behaviors and help to refine
transaction log research for Web log analysis of searching and single Websites from an academic
underlying the inconsistent performance of automatic topic identification with statistical analysis
and experimental design techniques. Xie and O’Hallaron [160] investigated caching to reduce both
server load and user-response time in distributed systems by analyzing a transaction log from the
Vivisimo search engine, from 14 January to 17 February 2001. The researchers report that queries
have significant locality, with query frequency following a Zipf distribution. Lempel and Moran
[92] also investigated clustering to improve caching of search engine results using more than seven
million queries submitted to AltaVista. The researchers report that pre-fetching of search engine
results can increase cache–hit ratios by 50 percent for large caches and can double the hit ratios of
small caches. There is much ongoing work in the area of using logs for search engine and server
caching [10].
In what appears currently to be one of the longest temporal studies, Wang et al. [151] ana-
lyzed 541,920 user queries submitted to an academic-Website search engine during a four-year
period (May 1997 to May 2001). Conducting analysis at the query and term levels, the researchers
report that 38% of all queries contained only one term and that most queries are unique. Eiron and
McCurley [38] used 448,460 distinct queries from an IBM Intranet search engine to analyze the
effectiveness of anchor text.
Pu [122] explored the searching behavior of users searching on two Taiwanese Web search
engines, Dreamer and Global Area Information Servers (GAIS). The average length of English
terms on these two Web search engines was 1.0 term for Dreamer and 1.22 terms for GAIS.
Baeza-Yates and Castillo [9] examined approximately 730,000 queries from TodoCL, a Chilean
search system. They found that queries had an average length of 2.43 terms. A lengthier analysis is
presented in Baeza-Yates and Castillo [8]. Montgomery and Faloutsos [107] analyzed more than
20,000 Internet users who accessed the Web from July 1997 through December 1999 using data
provided by Jupiter Media Metrix ( The researchers report users
revisited 54 percent of URLs at least once during a searching session.
They also report that browsing patterns follow a power law and the patterns remained stable
throughout the period of analysis. Rieh and Xu [127] analyzed queries from 1,451,033 users of
Excite collected on 9 October 2000. The researchers examined how each user reformulated his/
her Web query over a 24-hour period. Out of the 1,451,033 users logs collected, the researcher
used various criteria to select 183 sessions for manual analysis. The results show that while most
query reformulation involves content changes, about 15% of the reformulation relate to format
Huang et al. [60] propose an effective term-suggestion approach for interactive Web search
using more than two million queries submitted to Web search engines in Taiwan. The researchers
propose a transaction log approach to relevant term extraction and term suggestion using relevant
terms that co-occur in similar query sessions.
Jansen and Spink [70] determined that the typical Web searching session was about 15 min
from an analysis of click through data from The researchers report that the Web
search engine users on average view about eight Web documents, with more than 66% of searchers
examining fewer than five documents in a given session. Users on average view about two to three
documents per query. More than 55% of Web users view only one result per query. Twenty percent
of the Web users view a Web document for less than a minute. These results would seem to indicate
that the initial impression of a Web document is extremely important to the user’s perception of
Beitzel et al. [15] examine hundreds of millions of queries submitted by approximately 50
million users to America Online (AOL) over a 7-day period from 26 December 2003 through 1
January 2004. During this period, AOL used results provided by Google. The researchers report that
only about 2% of the queries contain query operators. The average query length is 2.2 terms, and
81% of users view only one results page. The researchers report changes in popularity and unique-
ness of topically categorized queries across hours of the day. Park, Bae, and Lee (Forthcoming)
analyzed transaction logs of NAVER, a Korean Web search engine and directory service. The data
was collected over a one-week period, from 5 January to 11 January 2003, and contained 22,562,531
sessions and 40,746,173 queries. Users of NAVER implement queries with few query terms, seldom
use advanced features, and view few results pages. Users of NAVER had an average session length of
1.8 queries. Wolfram et al. [159] analyze session clusters from three different search environments.
Web analytics is also entering a variety of areas, including keyword advertising and sponsored
search [65].
Clearly, these research projects provide valuable information for understanding and perhaps
improving user–system and system–information interactions.
What does a historical review of transactional log analysis inform us about the current and possible
future state of Web analytics? In one of the earliest studies employing transaction logs, Penniman
[116, p. 159] stated,“The promise (of transaction logs) is unlimited for evaluating communicative
behavior where human and computer interact to exchange information.” Since the mid-1960s, we
have seen the use of transaction logs evolve from an almost purely descriptive approach focusing
primarily on system effectiveness to one focusing on the combined aspects of the both user and
system. Today, we see these tools being leveraged for more insightful and predictive aspects of the
user–system interaction. Combined with associated research methods, transaction logs have served
a vital function in understanding users and systems.
• • • •
As the previous brief review of research demonstrates, data for Web analytics is plentiful. How the
data is collected, however, is important. There is a proliferation of techniques (e.g., performance
monitors, Web server log files, cookies, and packet sniffing), but the most common individual tech-
niques generally fall into one of two major approaches for collecting data for Web analysis: log files
and page tagging [80]. Most current Web analytic companies use a combination of the two methods
for collecting data. Therefore, anyone interested in Web analytics needs to understand the strengths
and weaknesses of each.
still producing a log file readable by most Web analytics tools. The extended format contains user
defined fields and identifiers followed by the actual entries, and default values are represented by a
dash (-). Table 4.2 shows an example of an extended log file.
System log files offer several benefits for gathering data for analysis. First, using system log
files does not require any changes to the Website or any extra software installation to create the log
files. Second, because Web servers automatically create these logs and store them on a company’s
own servers, the company has freedom to change their Web analytics tools and strategies at will. Ad-
ditionally, using system log files does not require any extra bandwidth when loading a page, and since
everything is recorded server-side, it is possible to log both page request successes and failures.
Using log files also has some disadvantages. One major disadvantage is that the collected data
is limited to only transactions with the Web server. This means that they cannot log information in-
dependent from the servers, such as the physical location of the visitor. Similarly, while it is possible
to log cookies, the server must be specifically configured to assign cookies to visitors in order to do
so. The final disadvantage is that while it is useful to have all the information stored on a company’s
own servers, the log file method is only available to those who own their Web servers.
TABLE 4.3: Web server log files versus page tagging [19].
Does not require Can only record Near real-time Requires extra
changes to the Website interactions with reporting code added to the
or extra hardware the Web server Website
Does not require extra Server must be Easier to record Uses extra
bandwidth configured to assign additional information bandwidth each
cookies to visitors time the page loads
Freedom to change Only available to Able to capture visitor Can only record
tools with a relatively companies who run interactions within successful page
small amount of hassle their own Web servers Flash animations loads, not failures
only capable of recording page loads, not page failures. If a page fails to load, it means that the tag-
ging code also did not load, and there is therefore no way to retrieve information in that instance.
Although log files and page tagging are two distinct ways to collect information about the
Website users or visitors, it is possible to use both together, and many analytics companies provide
ways to use both methods to gather data. Even so, it is important to understand the strengths and
weaknesses of both. Table 4.3 presents the advantages and disadvantages of log file analysis and
page tagging.
Regardless of whether log files or page tagging is used (or new approaches that may be developed),
the data will eventually end up in a log file for analysis. In other words, while the data collection may
differ, the method of analysis remains the same.
• • • •
To understand and derive the benefits of Web analysis, one must first understand metrics, the dif-
ferent kinds of measures available for analyzing user information [19, 111]. Although metrics may
seem basic, once collected we can use these metrics to analyze Web traffic and improve a Website
to meet better the expectations of the site’s traffic. These metrics generally fall into one of four
categories: site usage, referrers (or how visitors arrived at the site), site content analysis, and quality
assurance. Table 5.1 shows examples of types of metrics that we might find in these categories.
Although the type and overall number of metrics varies with different analytics vendors, a set
of basic metrics is common to most. Table 5.2 outlines eight widespread types of information [63]
that measure who is visiting a Website and what they do during their visits, relating each of these
metrics to specific categories.
Each metric is discussed below.
Demographics and The physical location and information of the Site Usage
System Statistics system used to access the Website
Internal Search Information on keywords and results pages viewed Site Usage
Information using a search engine embedded in the Website
Referrering URL and Which sites have directed traffic to the Website and Referrers
Keyword Analysis which keywords visitors are using to find the Website
Top Pages The pages that receive the most traffic Site Content
Visit Length The total amount of time a visitor spends on the Website Site Usage
Visitor Path The route a visitor uses to navigate through the Site Content
Website Analysis
Visitor Type Who is accessing the Website (returning, unique, etc.) Site Usage
measured easily. Since there is less uncertainty with visits, it is considered to be a more concrete and
reliable metric than unique visitors. This approach is also more sales-oriented because it considers
each visit an opportunity to convert a visitor into a customer instead of looking at overall customer
behavior [17].
The goal of measuring the data in this way is to keep the percentage of visitors who stay on
the Website for less than five seconds as low as possible. If visitors stay on a Website for such a short
amount of time, either they arrived at the site by accident or the site did not have relevant informa-
tion. By combining this information with information from referrers and keyword analysis, we can
determine which sites are referring well-targeted traffic and which sites are referring poor quality
In addition to demographic location, companies also need information about the hardware
and software with which visitors access a Website, and system statistics provide information such
as browser type, screen resolution, and operating system. By using this information, companies can
tailor their Websites to meet visitors’ technical needs, thereby ensuring that all customers can access
the Websites.
• Identify products and services for which customers are looking, but that are not yet pro-
vided by the company
• Identify products that are offered, but which customers have a hard time finding
• Identify customer trends
• Improve personalized messages by using the customers’ own words
• Identify emerging customer service issues
• Determine if customers are provided with enough information to reach their goals
• Make personalized offers
By analyzing internal search data, we can use the information to improve and personalize the
visitors’ experiences.
mediately following it. In other words, the only page that influences visitors’ behavior on a Website
is the one they are currently viewing. For example, visitors on a news site may merely peruse the
articles with no particular goal in mind. This method of analysis is becoming increasingly popular
because companies find it easier to examine path data in context without having to reference the
entire site in order to study the visitors’ behavior.
Errors are the final metric. Tracking errors has the obvious benefit of being able to identify and fix
any errors in the Website, but it is also useful to observe how visitors react to these errors. The fewer
visitors who are confused by errors on a Website, the less likely visitors are to exit the site because
of an error.
Once we understand these eight fundamental metrics, we can begin to develop a coherent Web
analytics strategy.
• • • •
Through unobtrusive transaction logs and page tags we can gather a massive amount of data about
user–system interaction, and by employing fundamental metrics we can evaluate human behavior
within the interactional context. In order to gain the most from these massive datasets, however, we
must strategically select and employ the fundamental metrics in relation to KPIs. For example, by
collecting various Web analytics metrics, such as number of visits and visitors and visit duration, we
can develop KPIs, thereby creating a versatile analytic model that measures several metrics against
each other to define visitor trends [19, 111]. One primary concern in developing a coherent Web
analytic strategy is understanding the relationships among the foundational metrics and KPIs.
• Measurement: In the most general terms, measurement can be regarded as the assignment
of numbers to objects (or events or situations) in accord with some rule (measurement
function). The property of the objects that determines the assignment according to that
rule is called magnitude, the measurable attribute; the number assigned to a particular
object is called its measure, the amount or degree of its magnitude. Importantly, the rule
defines both the magnitude and the measure.
• Web Page Significance: Significance metrics formalize the notions of “quality” and “rel-
evance” of Web pages with respect to users’ information needs. Significance metrics are
employed to rate candidate pages in response to a search query and to influence the quality
of search and retrieval on the Web.
• Usage Characterization: Patterns and regularities in the way users browse Web resources
can provide invaluable clues for improving the content, organization, and presentation of
Establish a Process of
Continuous Improvement
to be truly beneficial, it must integrate input from all major stakeholders. Involving people from
different parts of the company also makes it more likely that they will embrace the Website as a
valuable tool.
the Website. It is important that the improvements are adding value to the site and meeting
• What is the difference between your tool and free Web analytics tools? Since the com-
pany who owns the Website will be paying money for a service, it is important to know
why that service is better than free services (e.g., Google Analytics). Look for an answer
that outlines the features and functionality of the vendor. Do not look for answers about
increased costs because of privacy threats or poor support offered by free analytics tools.
• Do you offer a software version of your tool? Generally, a business will want to look for
a tool that is software based and that can run on their own servers. If a tool does not have
a software version but plans to make one in the future, it shows insight into how prepared
they are to offer future products if there is interest.
• What methods do you use to capture data? As stated earlier, there are two main ways to
capture visitor data from a Website: log files and page tagging. Ideally, we prefer a vendor
that offers both, but what they have used in the past is also important. Because technol-
ogy is constantly changing, we want a company that has a history of keeping up with and
perhaps even anticipating market changes and that has addressed these dynamics through
creative solutions.
• Can you help me calculate the total cost of ownership for your tool? The total cost of
ownership for a Web analytics tool depends on the specific company, the systems they have
in place, and the pricing of the prospective Web analytics tool. In order to make this calcu-
lation, we must consider the following:
1. Cost per page view
2. Incremental costs (i.e., charges for overuse or advanced features)
3. Annual support costs after the first year
4. Cost of professional services (i.e., installation, troubleshooting, or customization)
5. Cost of additional hardware we may need
6. Administration costs (which includes the cost of an analyst and any additional employ-
ees we may need to hire)
• What kind of support do you offer? Many vendors advertise free support, but it is impor-
tant to be aware of any limits that could incur additional costs. It is also important to note
how extensive their support is and how willing they are to help.
• What features do you provide that will allow me to segment my data? Segmentation al-
lows companies to manipulate their data. Look for the vendor’s ability to segment the data
after it is recorded. Many vendors use JavaScript tags on each page to segment the data as
it is captured, meaning that the company has to know exactly what it wants from the data
before having the data itself; this approach is less flexible.
• What options do I have to export data into our system? It is important to know who ulti-
mately owns and stores the data and whether it is possible to obtain both raw and processed
data. Most vendors will not provide companies with the data exactly as they need it, but it
is a good idea to realize what kind of data is available before making a final decision.
• Which features do you provide for integrating data from other sources into your tool?
Best practice, as noted previously, recommends using multiple technologies and methods in
order to inform decision making. If a company has other data it wants to bring to the tool
(such as survey data or data from an ad agency), then it is important to know whether this
information can be integrated into the vendor’s analytic tool.
• What new features are you developing that would keep you ahead of your competition?
Not only will the answer to this question tell how much the vendor has thought about
future functionality, but it will also show how much they know about their competitors.
If they are trying to anticipate changes and market demands, then they should be well in-
formed about their competition.
• Why did you lose your last two clients? Who are they using now? The benefits of this
question are obvious—by knowing how they lost previous business, the business can be
confident that it has made the right choice.
With an effective Web analytics strategy in place, we can turn our attention to understanding user
behaviors and identifying necessary or potentially beneficial system improvements. In practice, this
is rarely the end. Web analytics strategy typically supports some overarching goals.
• • • •
In order to get the most out of Web analytics, we must first effectively choose which metrics to ana-
lyze and then combine them in meaningful ways [19].This means knowing the Website’s business
goals and then determining which KPIs will provide the most insight for these business goals.
There are many possible methods for meeting these criteria. One is Alignment-Centric Per-
formance Management [14]. This approach goes beyond merely reviewing past customer trends to
carefully selecting a few key KPIs based on future business objectives. Even though a wealth of met-
rics is available from a Website, not all of the metrics are relevant to a company’s needs. Moreover,
reporting large quantities of data is overwhelming, so it is important to look at metrics in context
and use them to create KPIs that focus on outcome rather than activity. For example, a customer
service Website might view the number of emails responded to on the same day they were sent as
a measurement of customer satisfaction. A better way to measure customer satisfaction, however,
might be to survey the customers on their experience. Although this measurement is subjective, it
is a better representation of customer satisfaction because even if a customer receives a response the
same day he or she sent an email, the customer may still be dissatisfied with the service experience
Following “The Four M’s of Operational Management,” as outlined by Becher [14], can
facilitate effective selection of KPIs:
By carefully choosing a few, quality KPIs to monitor and making sure everyone is involved
with the strategy, we can more easily align a Website’s goals with the company’s goals because the
information is targeted and stakeholders are actively participating.
Another method for ensuring actionable data is Online Business Performance Management
(OBPM) [132]. This approach integrates business tools with Web analytics to help companies make
better decisions quickly in an ever-changing online environment where customer data is stored in
a variety of different departments. The first step in this strategy is gathering all customer data in a
central location and condensing it so that the result is all actionable data. Once this information is
in place, the next step is choosing relevant KPIs that align with the company’s business strategy and
then analyzing expected versus actual results [132].
In order to choose the best KPIs and measure the Website’s performance against the goals of
a business, there must be effective communication between senior executives and online managers.
The two groups should work together to define the relevant performance metrics, the overall goals
for the Website, and the performance measurements. This method is similar to Alignment-Centric
Performance Management in that it aims to aid integration of the Website with the company’s
business objectives by involving major stakeholders. The ultimate goals of OBPM are increased
confidence, organizational accountability, and efficiency [132].
Of course, one must identify KPIs based on the Website type. Unlike metrics, which are
numerical representations of data collected from a Website, KPIs are tied to a business strategy and
are usually measured by a ratio of two metrics. By choosing KPIs based on the Website type, a busi-
TABLE 7.1: The four types of Websites and examples of associated KPIs [101].
ness can save both time and money. Although Websites can have more than one function, each site
belongs to at least one of the four main categories: commerce, lead generation, content/media, and
support/self-service [101]. Table 7.1 shows common KPIs for each Website type.
We discuss each Website type and related KPIs below.
7.1.1 Commerce
The goal of a commerce Website is to get visitors to purchase goods or services directly from the
site, with success gauged by the amount of revenue the site brings in. According to Peterson, “com-
merce analysis tools should provide the ‘who, what, when, where, and how’ for your online purchas-
ers” [118, p. 92]. In essence, the important information for a commerce Website is to answer the
following questions: Who made (or failed to make) a purchase? What was purchased? When were
purchases made? From where are customers coming? How are customers making their purchases?
The most valuable KPIs used to answer these questions are conversion rates, average order value,
average visit value, customer loyalty, and bounce rate [101]. Other metrics to consider with a com-
merce site are which products, categories, and brands are sold on the site and an internal site product
search that could signal navigation confusion or a new product niche [118].
A conversion rate is the number of users who perform a specified action divided by the total
of a certain type of visitor (i.e., repeat visitors, unique visitors, etc.) over a given period. Types of
conversion rates will vary by the needs of the businesses using them, but two common conversion
rates for commerce Websites are the order conversion rate (the percent of total visitors who place
an order on a Website) and the checkout conversion rate (the percent of total visitors who begin
the checkout process). There are also many methods for choosing the group of visitors on which to
base the conversion rate. For example, businesses may want to filter visitors by excluding visits from
robots and Web crawlers [5], or they may want to exclude the traffic that “bounces” from the Web-
site or (a slightly trickier measurement) the traffic that is determined not to have intent to purchase
anything from the Website.
Commerce Websites commonly have conversion rates of around 0.5%, but generally good
conversion rates will fall in the 2% range, depending on how a business structures its conversion rate
[41]. Again, the ultimate goal is to increase total revenue. According to eVision, a search engine
marketing company, for each dollar a company spends on improving this KPI, there is 10 to 100
multiple return [39]. The methods a business uses to improve the conversion rate (or rates), how-
ever, are different depending on which target action that business chooses to measure.
Average order value is a ratio of total order revenue to number of orders over a given period.
This number is important because it allows the analyst to derive a cost for each transaction. There
are several ways for a business to use this KPI to its advantage. One way is to break down the
average order value by advertising campaigns (i.e., email, keyword, banner ad, etc.). In this way, a
business can see which campaigns are bringing in the best customers and then opt to spend more
effort refining strategies in those areas [119]. Overall, however, if the cost of making a transaction is
greater than the amount of money customers spend for each transaction, then the site is not fulfill-
ing its goal. There are two main ways to correct this. The first is to increase the number of products
customers order per transaction, and the second is to increase the overall cost of purchased products.
A good technique for achieving either of these goals is product promotions [101], but many factors
influence how and why customers purchase what they do on a Website. These factors are diverse
and can range from displaying a certain security image on the site [97] to updating the site’s internal
search [161]. Like many KPIs, improvement ultimately comes from ongoing research and a small
amount of trial and error.
Another KPI, average visit value, measures the total number of visits to the total revenue and
essentially informs businesses about the traffic quality. It is problematic for a commerce site when,
even though it may have many visitors, each visit generates only a small amount of revenue. In that
case, increasing the total number of visits would likely increase profits only marginally. The average
visit value KPI is also useful for evaluating the effectiveness of promotional campaigns. If the aver-
age visit value decreases after a specific campaign, it is likely that the advertisement is not attracting
quality traffic to the site. Another less common factor in this situation could be broken links or a
confusing layout in a site’s “shopping cart” area. A business can improve the average visit value by
using targeted advertising and employing a layout that reduces customer confusion.
One way to assess customer quality is to identify customer loyalty. This KPI is the ratio of
new to existing customers. Many Web analytics tools measure this using visit frequency and transac-
tions, but there are several important factors in this measurement including the time between visits
[100]. Customer loyalty can even be measured simply with customer satisfaction surveys [133].
Loyal customers will not only increase revenue through purchases but also through referrals, poten-
tially limiting advertising costs [123].
Yet another KPI that relates to customer quality is bounce rate. Essentially, bounce rate
measures how many people arrive at a homepage and leave immediately. Two scenarios gener-
ally qualify as a bounce. In the first scenario, a visitor views only one page on the Website. In the
second scenario, a visitor navigates to a Website but only stays on the site for 5 seconds or less [6].
This could be due to several factors, but in general visitors who bounce from a Website are not
interested in the content. Like average order value, this KPI helps show how much quality traf-
fic a Website receives. A high bounce rate may reflect counterintuitive site design or misdirected
common conversion rate is the ratio of total visitors to leads generated. The same visitor filtering
techniques mentioned in the previous section can be applied to this measurement (i.e., filtering
out robots and Web crawlers and excluding traffic that bounces from the site). This KPI is an es-
sential tool in analyzing marketing strategies. Average lead generation sites have conversion rates
ranging from 5–6% to 17–19% for exceptionally good sites [46]. Conversion rates that increase
after the implementation of a new marketing strategy indicate that the campaign was successful.
Decreases in conversion rates indicate that the campaign was not effective and probably needs to be
Another way to measure marketing success is CPL, which is the ratio of total expenses to total
number of leads or how much it costs a company to generate a lead; a more targeted measurement
of this KPI would be the ratio of total marketing expenses to total number of leads. A good way to
measure the success of this KPI is to make sure that the CPL for a specific marketing campaign is
less than the overall CPL [155]. Ideally, the CPL should be low, and well-targeted advertising is
usually the best way to achieve this.
Lead generation bounce rate is the same measurement as the bounce rate for commerce sites.
This KPI measures visitor retention based on total number of bounces to total number of visitors
(a bounce is characterized by a visitor entering the site and immediately leaving). Lead generation
sites differ from commerce sites in that they may not require the same level of user interaction. For
example, a lead generation site could have a single page where users enter their contact information.
Even though they only view one page, the visit is still successful if the Website is able to collect the
user’s information. In these situations, it is best to base the bounce rate solely on time spent on the
site. As with commerce sites, the best way to decrease a site’s bounce rate is to increase advertising
effectiveness and decrease visitor confusion.
The final KPI is traffic concentration, or the ratio of the number of visitors to a certain area
in a Website to total visitors. This KPI shows which areas of a site have the most visitor interest. For
lead generation Websites, it is ideal to have a high traffic concentration on the page or pages where
users enter their contact information.
7.1.3 Content/Media
Content/media Websites focus mainly on advertising, and the main goal of these sites is to increase
revenue by keeping visitors on the Website longer and to keep visitors coming back to the site. In
order for these types of sites to succeed, site content must be engaging and frequently updated. If
content is only part of a company’s Website, the content used in conjunction with other types of
pages can be used to draw in visitors and provide a way to immerse them in the site. The main KPIs
are visit depth, returning visitors, new visitor percentage, and page depth [101].
Visit depth (also referred to as depth of visit or path length) is the measurement of the ratio
between page views and unique visitors, or how many pages a visitor accesses each visit. As a general
rule, visitors with a higher visit depth interact more with the Website. If visitors are only viewing
a few pages per visit, then they are not engaged, indicating that the site’s effectiveness is low. One
way to increase a low average visit depth is by creating more targeted content that would be more
interesting to the Website’s target audience. Another strategy could be increasing the site’s interac-
tivity to encourage the users to become more involved with the site and to motivate them to return
Unlike the metric of simply counting the number of returning visitors on a site, the return-
ing visitor KPI is the ratio of unique visitors to total visits. A factor in customer loyalty, this KPI
measures the effectiveness of a Website attracting repeat visitors. A lower ratio for this KPI is best
because it indicates more repeat visitors and more visitors who are interested in and trust the con-
tent of the Website. If this KPI is too low, however, it might signal problems in other areas such as
a high bounce rate or even click fraud. Click fraud occurs when a person or script is used to gener-
ate visits to a Website without having genuine interest in the site. According to a study by Blizzard
Internet Marketing, the average for returning visitors to a Website is 23.7% [157]. As with many of
the other KPIs for content/media Websites, the best way to improve the returning visitor rate is by
having quality content and encouraging interaction with the Website.
Content/media sites are also interested in attracting new visitors, and the new visitor ratio
compares new visitors with unique visitors to determine if a site is attracting new people. New visi-
tors can be brought to the Website in a variety of different ways, so a good way to increase this KPI
is to try different marketing strategies to determine which campaigns bring the most (and the best)
traffic to the site. When using this KPI, we must keep the Website’s goal in mind. Specifically, is
the Website intended more to retain or to attract customers? When measuring this KPI, the age
of the Website plays a role—newer sites will want to attract new people. As a rule, however, the
new visitor ratio should decrease over time as the returning visitor ratio increases. The final KPI for
content/media sites is page depth. This is the ratio of page views for a specific page and the number
of unique visitors to that page. This KPI is similar to visit depth, but its measurements focus more
on page popularity. Average page depth can indicate interest in specific areas of a Website over time
and measure whether the interests of the visitors match the goals of the Website. If one particular
page on a Website has a high page depth, then that page is of particular interest to visitors. An ex-
ample of a page in a Website expected to have a higher page depth would be a news page. Informa-
tion on a news page is updated constantly so that, while the page is still always in the same location,
the content of that page is constantly changing. If a Website has high page depth in a relatively
unimportant part of the site, it may signal visitor confusion with navigation in the site or an incor-
rectly targeted advertising campaign.
7.1.4 Support/Self-Service
Websites offering support or self-service are interested in helping users find specialized answers
for specific problems. The goals for this type of Website are increasing customer satisfaction and
decreasing call center costs; it is more cost-effective for a company to have visitors find information
through its Website than it is to operate a call center. The KPIs of interest are visit length, content
depth, and bounce rate. In addition, other areas to examine are customer satisfaction metrics and
top internal search phrases [101].
Page depth for support/self-service sites is the same measurement as page depth content/me-
dia sites, namely the ratio of page views to unique visitors. With support/self-service sites, however,
high page depth is not always a good sign. For example, a visitor viewing the same page multiple
times may show that the visitor is having trouble finding helpful information on the Website or even
that the information the visitor is looking for does not exist on the site. The goal of these types of
sites is to help customers find what they need as quickly as possible and with the least amount of
navigation through the site. The best way to keep page depth low is to keep visitor confusion low.
As with the bounce rate of other Website types, the bounce rate for support/self-service sites
reflects ease of use, advertising effectiveness, and visitor interest. A low bounce rate means that
quality traffic is coming to the Website and deciding that the site’s information is potentially useful.
Poor advertisement campaigns and poor Website layout will increase a site’s bounce rate.
Customer satisfaction deals with how users rate their experience on a site and is usually
collected directly from the visitors (not from log files), either through online surveys or through
satisfaction ratings. Although it is not a KPI in the traditional sense, gathering data directly from
visitors to a Website is a valuable tool for figuring out exactly what visitors want. Customer satis-
faction measurements can deal with customer ratings, concern reports, corrective actions, response
time, and product delivery. Using these numbers, we can compare the online experience of the
Website’s customers with the industry’s average and make improvements according to visitors’ ex-
pressed needs.
Site navigation is important to visitors, and top internal search phrases, which apply only to
sites with internal search capabilities, can be used to measure what information customers are most
interested in that can inform site navigation improvements. Moreover, internal search phrases can
be used to direct support resources to the areas generating the most user interest, as well as to iden-
tify which parts of the Website users may have trouble accessing. Other problems may also become
obvious. For example, if many visitors are searching for a product not supported on the Website,
then this may indicate that the site’s marketing campaign is ineffective.
Regardless of Website type, the KPIs listed above are not the only KPIs that can prove useful
in analyzing a site’s traffic, but they provide a good starting point. The main thing to remember is
that no matter what KPIs a company chooses to use, they must be aligned with its goals, and more
KPIs do not necessarily mean better analysis: quality is more important than quantity.
Any organization, business, or Website must start with clearly defined goals because they are essen-
tial for any successful strategy. With a clearly defined and understood strategy, we can then plan and
implement the tactics necessary for executing this strategy. These tactics are based on KPIs—which
are the measures and metrics of performance. As such, KPIs are the foundation for any Web goal
• • • •
While the relatively unobtrusive methods of data collection that we have discussed thus far are very
valuable, proponents of using transaction logs for Web analysis typically admit that the method has
shortcomings [66, 91], as do all methodological approaches. These shortcomings include failing to
understand the affective, situational, and cognitive aspects of system users. Therefore, we must look
to other methods in order to address some of these shortcomings and limitations [124]. Fortunately,
the Web and other information technologies provide a convenient means for employing surveys and
survey research for such a purpose.
As an overview, we discuss surveys and laboratory studies as viable alternative methods for
Web log analysis, and then present a brief review of survey and laboratory research literature, with a
focus on the use of surveys and laboratory studies for Web-related research. The section then identi-
fies the steps in implementing survey research and designing a survey instrument and a laboratory
Survey research is a method for gathering information by directly asking respondents about some
aspect of themselves, others, objects, or their environment. As a data collection method, survey in-
struments are very useful for a variety of research designs. For example, researchers can use surveys
to describe current characteristics of a sample population and to discover the relationship among
variables. Surveys gather data on respondents’ recollections or opinions; therefore, surveys provide
an excellent companion method for Web analytics that typically focus exclusively on actual behav-
iors of participants [125].
After reviewing some studies that have used surveys for Web research, we will discuss how to
select, design, and implement survey research.
demographical aspects of Web use over time [83] or one particular Website feature [150]. Treiblma-
ier [148] presents an extensive review of the use of surveys for Website analysis.
Survey respondents may include general Web users or samples from specific populations. For
example, Huang [61] surveyed users of continuing education programs. Similarly, Jeong et al. [76]
surveyed travel and hotel shoppers, and Kim and Stoel [86] surveyed female shoppers who have
purchased apparel online.
For academic researchers, a convenience sample of students is often used to facilitate survey
studies, including the users of Web search engines [139]. McKinney et al. [103] used both under-
graduate and graduate students as their sample examining Website use. The major advantages of
using students that are often cited include a homogeneous sample, access [62], familiarity with the
Internet [67], and creation of experimental settings [130]. There are concerns in generalizing these
results [1], most notably for Websites and services where students have limited domain or system
knowledge [86, 89]. However, as a sample of demographic slice of the Web population, students
appear to be a workable convenience sample with results from studies with students [cf. Refs. 67,
84] similar to those using more rigorous sampling methods [cf. Refs. 51, 83]. Organizations such
as the Pew Research Center’s Internet and American Life Project use random samples of the U.S.
Web population for their surveys [125].
For the Web, the most common type of survey instruments are electronic or Web surveys.
Jansen et al. [75] define an electronic survey as “one in which a computer plays a major role in both
the delivery of a survey to potential respondents and the collection of survey data from actual re-
spondents” (p. 1). Several researchers have examined electronic survey approaches, techniques, and
instruments with respect to methodological issues associated with their use [33, 35, 40, 43, 90, 145].
There have been mixed research results concerning the benefits of electronic surveys [85, 104, 142,
149]. However, researchers generally agree that electronic surveys offer faster response times and
decreased costs. The electronic and Web-based surveys allow for a nearly instantaneous data collec-
tion into a backend database, which reduces potential errors caused by manual transcription.
Regardless of which delivery method is used, survey research requires a detailed project plan-
ning approach.
To execute a survey, the researcher must identify the content area, construct the survey in-
strument, define the population, select a representative sample, administer the survey instrument,
analyze and interpret the results, and then communicate the results. While these steps are some-
what linear, they also overlap and may require several iterations. A 10-step survey research process
is illustrated in Table 8.1, based on a process outlined in Graziano and Raulin [45].
Steps 1 and 2: Determine the specific information desired and define the population to
be studied. The information being sought and the population to be studied are the first tasks of
the survey researcher. The goals of the survey research will determine both the information being
sought and the target population. Additionally, the goals will drive both the construction and ad-
ministration of the survey. If we use a survey to supplement ongoing Web log analysis, then these
decisions will follow the established parameters.
Step 3: Decide how to administer the survey. There are many possibilities for administering
a survey, ranging from face-to-face (i.e., an interview), to pen and paper, to the telephone (i.e., phone
survey), to the Web (i.e., electronic survey). A survey can also be a mixed mode survey, combining
more than one of these approaches. The exact method selected really depends on the answers to steps
one and two (i.e., what information is needed and what population is studied). Used in conjunction
with Web analytics, surveys can be conducted either before or after a laboratory study. A survey can
also be used to gain insight into the demographics of the wider Web population.
Step 4: Designing a survey instrument. Developing a survey instrument involves several
steps. The researcher must determine what questions to ask, in what form, and in what order. The
researcher must construct the survey so that it adequately gathers the information being sought. A
basic rule of survey research is that the instrument should have a clear focus and should be guided
by the research questions or hypotheses of the overall study. This implies that survey research is not
well suited to early exploratory research because it requires some orderly expectations and focus
from the researcher.
Step 5: Pretest the survey instrument with a subsample. Once the researcher has the survey
instrument ready and refined, the researcher must pilot test the survey instrument. In this respect,
a survey instrument is like developing a system artifact, where a system is beta-tested before wider
deployment. Generally, one conducts the pilot test on a sample that represents the population being
studied, after which the researcher may (generally, will) refine the survey instrument further. De-
pending on the extent of the changes, the survey instrument may require another pilot test.
Step 6: Select a sampling approach and representative sample. Selecting an adequate and
representative sample is a critical and challenging factor in survey research. The population for sur-
vey study is the larger group about or from whom the researcher desires to obtain information. From
this population, we need to survey a representative sample. If we are administering a survey to the
respondents of a laboratory study, the representativeness is not a problem because the respondents
are the representative sample. Selecting a representative sample of Web users, however, requires
careful planning.
Whenever we use a sample as a basis for generalizing to a population, we engage in an induc-
tive inference from the specific sample to the general population. In order to have confidence in
inductive inferences from sample to population, the researcher must carefully choose the sample to
represent the overall population. This is especially true for descriptive research, where the researcher
wishes to describe some aspect of a population that may depend on demographic characteristics. In
other cases, such as verifying the application of universal theoretical constructs, for example, Zipf ’s
Law [164], sampling is not as important since these universal constructs should apply to everyone
within the population.
Sampling procedures typically fall into three classifications:
• Convenience sampling (i.e., selecting a sample with little concern for its representative-
ness to some overall population),
• Probability sampling (i.e., selecting a sample where each respondent has some known
probability of being included in the sample), and
• Stratified sampling (i.e., selecting a sample that includes representative samples of each
subgroup within a population).
Step 7: Administer the survey instrument to the sample. For actually gathering the survey
data, the researcher must determine the most appropriate manner to administer the survey instru-
ment. Many surveys are administered via the Web or electronically, as the Web offers substantial
benefits in its easy access to a wide population sample. Additionally, administering a survey elec-
tronically, even in a laboratory study, has significant advantages in terms of data preparation for
analysis. The survey can be administered once to a cross-sectional portion of the population, or it
can be administered repeatedly to the same sample population.
Step 8: Analyze the data. Once the data is gathered, we must determine the appropriate
method for analysis. The appropriate form of analysis is dependent on the research questions, hy-
potheses, or types of questions used in the survey instruments. The available approaches are qualita-
tive, quantitative, and mixed methods.
Step 9: Interpret the finding. Like many research results, the interpretation of survey data
can be somewhat subjective. When results are in question, it may point to the need for further re-
search. One of the best aids in interpreting results is the literature review. What have results from
previous work pointed out? Are these results in line with those previous researches? Or do the re-
sults highlight something new?
Step 10: Communicate the results to the appropriate audience. Finally, the results of any
survey research must be packaged for the intended audience. For academic purposes, this may mean
a scholarly paper or presentation. For commercial organizations, this may mean a white paper for
system developers or marketing professionals.
Each of these steps can be challenging. However, designing a survey instrument (e.g., Steps 4
and 5) can be the most difficult aspect of the survey research. We address this development in more
detail in the following section.
A survey instrument is a data collection method that presents a set of questions to a re-
spondent. The respondent’s responses to the questions provide the data sought by the researcher.
Although seemingly simple, it can be very difficult to develop a set of questions for a survey instru-
ment. Some general guidelines for developing survey instruments are [113]:
• State on the survey instrument the research goal: At the top of the survey instrument, in-
clude a very brief statement explaining the purpose of the survey and assuring respondents
of their anonymity.
• Provide instructions for completing the survey instrument: To assist in ensuring that
survey results are valid, include instructions on how to respond to questions on the survey
instrument. Generally, there is a short introductory set of instructions at the top of the
survey instrument. Provide additional instructions for specific questions if needed.
• Place questions concerning personal information at the end of the survey: Demographic
information is often necessary for survey research. Place these questions at the end of the
survey. Providing personal data may annoy some respondents, resulting in incomplete or
inaccurate responses to the survey instrument.
• Group questions on the instrument by subject: If the survey instrument has more than 10
or so questions, the questions need to be grouped by some classification method. Generally,
grouping the questions by subject is a good organization method. If the instrument has
On a scale of 1–7, would you search individually or together with your workmates if you do not
know anything about the problem?
Individual Collaborate
1 2 3 4 5 6 7
* * * * * * *
FIGURE 8.2: Example of a rating question.
On a scale of 1–5 (1—never used, 5 — use every day), rank the following items on how expe-
rienced are you with using the following communication/collaboration applications for group
a. _____ Email
b. _____ Instant messaging
c. _____ Face-to-face meetings
d. _____ Telephone
e. _____ Others (please elaborate)
multiple groups of questions, each group should have a heading identifying the grouping.
Grouping questions allows the respondents to focus their responses around the central
theme of the group of questions.
• Present each question and type of question in a consistent structure: A consistent struc-
ture makes it much simpler for respondents and increases the likelihood of valid data.
Explain the proper method for responding to each question and ensure that the response
methods for similar questions are consistent throughout the instrument.
There are three general categories of survey questions, namely multiple-choice, Likert-scale,
and open-ended questions.
Multiple-choice questions. Multiple-choice questions have a closed set of response items
for the respondents to select. Multiple-choice questions are useful when we have a thorough under-
standing of the range of possible responses (see Figure 8.1).
The items for multiple-choice questions must cover all possible alternatives that the re-
spondents might select, and each of the items must be unique (i.e., they must not overlap). Since
As part of your project, I believe that you must have confronted a situation when you did not
really know how to proceed in order to solve a problem or perform a task on the Web.
(a) Can you speak about a specific instance of your project work in which you were uncer-
tain as to how to proceed?
Which features of Instant Messaging programs do you find most useful when it comes to shar-
ing information with teammates?
a. Real-Time Chat
b. File Sharing
c. Chat logs
d. Others
e. None
presenting all possible alternatives is a difficult task, we normally include a general catch-all item
(e.g., None of the above or Don’t know) at the end of a list of item choices. This approach helps
improve the accuracy of the data collected.
Likert-scale questions. With Likert-scale questions, the items are arranged as a continuum
with the extremes generally at the endpoints. Likert-scale questions may have respondents indicate
the degree to which they agree with a statement (see Figure 8.2) or rank a list of items (see Figure
Open-ended questions. As Figure 8.4 demonstrates, open-ended questions have no list of
items for the respondent to choose from.
Open-ended questions are best for exploring new ideas or when the researcher does not know
any of the expected responses. As such, the open-ended questions are great for qualitative research.
The disadvantages to using open-ended questions are that it can be much more time consuming and
difficult to analyze the data because each question must be coded into order to derive variables.
If we have a partial list of possible responses, one can create a partially open-ended question
(see Figure 8.5).
variables in order to investigate the effect on a dependent variable, while controlling for all the other
variables (i.e., control variables). In such a setting, only the variable of interest affects the outcome.
There are multiple ways of designing a laboratory study. As such, laboratory studies can be
very nuanced, and a review of laboratory studies is a lecture in itself. For more detailed examinations
of laboratory studies and experiments, I refer the interested reader to the Controlled Standards of
Reporting Trials (CONSORT) ( that assist in the design of laboratory
experiments, specifically randomized controlled trials. CONSORT provides a 22-item checklist
and a flow diagram for conducting such studies. The checklist items focus on the study’s design,
analysis, and interpretation. The flow diagram illustrates the progress of participants through a
laboratory study. Together, these tools aid in understanding the design and running of the study, the
analysis of the collected data, and the interpretation of the results.
The Common Industry Format (CIF) is an American National Standards Institute (ANSI)
approved standard for reporting the results of usability studies. The National Institute of Standards
and Technology (NIST) developed this criterion to assist in designing and reporting the results of
usability studies targeted specifically for Websites.
One good way to learn about laboratory studies is to read what others have done. What are
some questions that one should ask when designing (or assessing) a laboratory study? One meth-
odology for accessing laboratory studies is the Centre for Allied Heath Evidence (CAHE) Critical
Appraisal Tools (CATs) ( The aim of the approach
is to identify possible methodological flaws in the design phase or in the reporting. With the use
of such a questionnaire, we can design better experiments and make informed decisions about the
quality of research evidence. The assessments presented below are based on the CriSTAL Checklist
for appraising a user study (
Does the study address a clearly focused issue? Essentially, the research aims should drive
the study. The issue can deal with the population (user group) studied, the intervention (service or
facility) provided, or the system. The laboratory study design must clearly identify the issue in a us-
able manner and explain how the outcomes (quantifiable or qualitative) are measured.
Is a good case made for the approach that the authors have taken? In designing a user study,
researchers can choose from an extensive array of methods. One way to assess a study, then, is to re-
view the selection of methodology (e.g., regression, ANOVA Analysis Of Variance, factor analysis,
etc.) and design setup (e.g., within or between groups). The method and setup should relate directly
to the research questions or objectives, which are tied to the research aim. A good study will clearly
identify the problem and provide justification for the questions or objectives. The methodology
must be appropriate to the research questions or objectives.
Were the methods used in selecting the users appropriate and clearly described? There are
several aspects of recruiting participants for any laboratory study, including:
• Type of sample: This addresses how the participants are recruited. Most studies are conve-
nience samples (i.e., you use who you can get). In academia, these are usually students, and
the participants self-select into the study. Randomly selected sample are generally preferred;
however, many times a convenience sample is not a critical shortcoming if the population
demographics are not essential to the research questions.
• Size of sample: Sample size is an important aspect of user studies and it must be managed
carefully. A sample size calculation (i.e., how many participants do you need to represent
the population you are studying) can determine the appropriate size needed to make the
sample representative of the population (i.e., does the sample represent targeted users).
Representativeness matters for quantitative analysis. For usability studies, this is not so
important. Generally, the demographics of the sample (e.g., age, sex, staff grade, location)
must accurately reflect the demographics of the total population. Any motivation for the
participants (i.e., money, course credit, etc.) must be acknowledged.
• Was the data collection instrument/method reliable? Any questionnaire, survey form,
or interview schedule should be pilot tested before its use in the laboratory study. When
adapting an instrument used in previous research, the case must be made for its appropriate
What was the response rate and how representative were respondents of the population
under study? Whether using a convenience or random sample, researchers must ensure that no
subgroups were either over-represented or under-represented. When using convenience samples for
Web laboratory studies, sex is many times an issue.
Are the results complete and have they been analyzed in an easily interpretable way?
Just as there are several choices in methodologies, there are also several choices concerning meth-
ods of analysis and in how to present these results. Regardless, the variables must be defined and
Are there any limitations in the methodology (that might have influenced results) identi-
fied and discussed? No matter how well one designs a study, selects a sample, executes a methodol-
ogy, and analyzes the data, there are always limitations. After the study is completed, reflect on how
the study might be better implemented next time.
Are the conclusions based on an honest and objective interpretation of the results? Some-
times, a study does not tell you what you want or expect. That is actually good but frustrating.
However, one must base the conclusions clearly on the findings from the study’s data.
Just as log analysis and surveys have limitations, laboratory studies also have limitations that
we must consider [94]. The basic assumption underlying laboratory studies and experiments is that
we can extrapolate the results to the real world. However, when people are involved, this is a dicey
assumption, and the validity of results from laboratory studies to contexts outside the laboratory is
not flawless.
Some of the possible issues that can arise are laboratory effects (i.e., the context of the labora-
tory study is not a naturalistic setting), anonymity issues (i.e., the participants know they are being
observed), context (i.e., regardless of the study design, there are aspects beyond the control of the
researcher), and biased sample (i.e., regardless of the sampling method, there are biases created by
participants self selection into the study and by the fact that they do not represent the portion of the
population that never participates). By using Web analytics in conjunction with laboratory studies,
we can address many of these shortcomings.
Web analytics via log data are an excellent means for recording the behaviors of system users and the
responses of those systems. Because they focus on behavioral data only, however, transaction logs
are ineffective as a method of understanding the underlying motivations, affective characteristics,
cognitive factors, and contextual aspects that influence those behaviors. Used in conjunction with
Web logs, surveys and laboratory studies can be effective methods for investigating these aspects.
The combined methodological approaches can provide a richer picture of the phenomenon under
In this section, we have reviewed a 10-step procedure for conducting survey research, with
explanatory notes on each step. We then discussed the design of a survey instrument, with examples
of the various types of questions, and then discussed aspects of designing a laboratory study, provid-
ing some key questions that can help us in planning and completing a laboratory study.
• • • •
A special case of Web analytics is analyzing data from search logs. Exploiting the data stored in
search logs of Web search engines, Intranets, and Websites can provide important insights into
understanding the information searching tactics of online users. This understanding can inform
information system design, interface development, and information architecture construction for
content collections.
This section presents a review of and foundation for conducting Web SLA [64, 66]. A basic
understanding of search engines and searching behavior is assumed [for a review, see Ref. 93]. SLA
methodology consists of three stages (i.e., collection, preparation, and analysis), and those stages
are presented in detail with discussions of the goals, metrics, and processes at each stage. Fol-
lowing this, the critical terms in TLA for Web searching are defined and suggestions are pro-
vided on ways to leverage the strengths and address the limitations of TLA for Web searching
Information searching researchers have employed search logs for analyzing a variety of Web infor-
mation systems [34, 73, 78, 151]. Web search engine companies use search logs (also referred to as
transaction logs) to investigate searching trends and effects of system improvements (cf. Google at or Yahoo! at
buzz-morebuzz). Search logs are an unobtrusive method of collecting significant amounts of search-
ing data on a sizable number of system users. There are several researchers who have employed the
SLA methodology to study Web searching. Romano et al. [128] present a methodology for general
qualitative analysis of transaction log data. Wang et al. [151] and Spink and Jansen [140] also pre-
sent explanations of approaches to TLA.
Generally, there are limited published works concerning how to employ search logs to sup-
port the study of Web searching, the use of Web search engines, Intranet searching, or other Web
searching applications. Yet, SLA is helpful for studying Web searching on Websites and Web search
typically addresses either issues of system performance, information structure, or user interactions.
Blecic et al. [18] define TLA as the detailed and systematic examination of each search command
or query by a user and the following database result or output. Phippen et al. [120] and Spink and
Jansen [140] also provide comparable definitions of TLA.
For Web searching research, we focus on a subset of Web analytics, namely SLA. Web
analytics is useful for analyzing the browsing or navigation patterns within a Website, while
SLA is concerned exclusively with searching behaviors. SLA is defined as the use of data col-
lected in a search log to investigate particular research questions concerning interactions among Web
users, the Web search engine, or the Web content during searching episodes. Within this interaction
context, we can exploit the data in search logs to discern attributes of the search process, such
as the searcher’s actions on the system, the system responses, or the evaluation of results by the
The goal of SLA is to gain a clearer understanding of the interactions among searcher, con-
tent and system or the interactions between two of these structural elements, based on whatever
research questions drive the study. Employing SLA allows us to achieve some stated objective, such
as improved system design, advanced searching assistance, or better understanding of some user
information searching behavior.
• Data collection: the process of collecting the interaction data for a given period in a trans-
action log;
• Preparation: the process of cleaning and preparing the transaction log data for analysis; and
• Analysis: the process of analyzing the prepared data.
• User Identification: the IP address of the client’s computer. This is sometimes also an anony-
mous user code address assigned by the search engine server, which is our example in
Table 9.1.
• Date: the date of the interaction as recorded by the search engine server.
• The Time: the time of the interaction as recorded by the search engine server.
• Search URL: the query terms as entered by the user.
• Web search engine server software normally records these fields. Other common fields
include Results Page (a code representing a set of result abstracts and URLs returned by the
search engine in response to a query),
• Language (the user preferred language of the retrieved Web pages),
• Source (the federated content collection searched, also known as vertical), and
• Page Viewed (the URL that the searcher visited after entering the query and viewing the
results page, which is also known as either click-thru or click-through).
(1, n) Co_occur
boolean searching_episode
(0, n) Query
(1, n) Terms
(0, n) Query_Total (0, 1) Query_Occurrences
occurrences terms
An ER diagram models the concepts and perceptions of the data and displays the conceptual
schema for the database using standard ER notation. Table 9.2 presents the legend for the schema
Since search logs are in ASCII format, we can easily import the data into most relational
databases. A key thing is to import the data in the same coding schema in which it was recorded
Cooc table term pairs and the number of occurrences of those pairs
(e.g., UTF-8, US-ASCII). Once imported, each record is assigned a unique identifier or primary
key. Most modern databases can assign this automatically on importation, or we can assign it later
using scripts.
(usually on the order of 35% to 40% of all records) as users go to Websites for purposes other than
searching [72].
Term level analysis. The term level of analysis naturally uses the term as the basis for analysis. A
term is a string of characters separated by some delimiter such as a space or some other separator.
At this level of analysis, one focuses on measures such as term occurrence, which is the frequency that
a particular term occurs in the transaction log. Total terms are the number of terms in the dataset.
Unique terms are the terms that appear in the data regardless of the number of times they occur. High
Usage Terms are those terms that occur most frequently in the dataset. Term co-occurrence measures
the occurrence of term pairs within queries in the entire search log. We can also calculate degrees of
association of term pairs using various statistical measures [cf. Refs. 131, 135, 151].
The mutual information formula measures term association and does not assume mutual in-
dependence of the terms within the pair. We calculate the mutual information statistic for all term
pairs within the dataset. Many times, a relatively low frequency term pair may be strongly associ-
ated (i.e., if the two terms always occur together). The mutual information statistic identifies the
strength of this association. The mutual information formula used in this research is
P(w 1 , w 2 )
I(w i , w 2 ) = ln
P(w 1 )P(w 2 )
where P(w1), P(w2) are probabilities estimated by relative frequencies of the two words and P(w1, w2)
is the relative frequency of the word pair and order is not considered. Relative frequencies are ob-
served frequencies (F) normalized by the number of the queries:
F1 F2 F2
P (w1) = ; P (w1) = ; P (w1, w2) =
Both the frequency of term occurrence and the frequency of term pairs are the occurrence of
the term or term pair within the set of queries. However, since a one-term query cannot have a term
pair, the set of queries for the frequency base differs. The number of queries for the terms is the num-
ber of non-duplicate queries in the dataset. The number of queries for term pairs is defined as:
Q = ∑ (2n − 3)Qn
where Qn is the number of queries with n words (n > 1), and m is the maximum query length. So,
queries of length one have no pairs. Queries of length two have one pair. Queries of length three
have three possible pairs. Queries of length four have five possible pairs. This continues up to the
queries of maximum length in the dataset. The formula for queries of term pairs (Q) account for
this term pairing.
Query level analysis. The query level of analysis uses the query as the base metric. A query is defined
as a string list of one or more terms submitted to a search engine. This is a mechanical definition as
opposed to an information searching definition [88]. The first query by a particular searcher is the
initial query. A subsequent query by the same searcher that is different from any of the searcher’s
other queries is a modified query. There can be several occurrences of different modified queries by a
particular searcher. A subsequent query by the same searcher that is identical to one or more of the
searcher’s previous queries is an identical query.
In many Web search engine logs, when the searcher traverses to a new results page, this inter-
action is also logged as an identical query. In other logging systems, the application records the page
rank. A results page is the list of results, either sponsored or organic (i.e., non-sponsored), returned
by a Web search engine in response to a query. Using either identical queries or some results page
field, we can analyze the result page viewing patterns of Web searchers.
Other measures are also observable at the query level of analysis. A unique query refers to a
query that is different from all other queries in the transaction log, regardless of the searcher. A
repeat query is a query that appears more than once within the dataset by two or more searchers.
Query complexity examines the query syntax, including the use of advanced searching tech-
niques such as Boolean and other query operators. Failure rate is a measure of the deviation of
queries from the published rules of the search engine. The use of query syntax that the particular IR
system does not support, but may be common on other IR systems, is carry over.
Session level analysis. At the session level of analysis, we primarily examine the within-session
interactions [48]. However, if the search log spans more than one day or assigns some temporal
limit to interactions from a particular user, we could examine between-sessions interactions. A
session interaction is any specific exchange between the searcher and the system (i.e., submitting
a query, clicking a hyperlink, etc.). A searching episode is defined as a series of interactions within
a limited duration to address one or more information needs. This session duration is typically
short, with Web researchers using between 5 and 120 minutes as a cutoff [cf. Refs. 54, 70, 107, 135].
Each choice of time has an impact on the results, of course. The searcher may be multitasking [106,
138] within a searching episode, or the episode may be an instance of the searcher engaged in suc-
cessive searching [95, 110, 141]. This session definition is similar to the definition of a unique visitor
used by commercial search engines and organizations to measure Website traffic. The number of
queries per searcher is the session length.
Session duration is the total time the user spent interacting with the search engine, including
the time spent viewing the first and subsequent Web documents, except the final document. Ses-
sion duration can therefore be measured from the time the user submits the first query until the
user departs the search engine for the last time (i.e., does not return). The viewing time of the final
Web document is not available since the Web search engine server does not record the time stamp.
Naturally, the time between visits from the Web document to the search engine may not have been
entirely spent viewing the Web document, which is a limitation of the measure.
A Web document is the Web page referenced by the URL on the search engine’s results page.
A Web document may be text or multimedia and, if viewed hierarchically, may contain a nearly
unlimited number of sub-Web documents. A Web document may also contain URLs linking to
other Web documents. From the results page, a searcher may click on a URL, (i.e., visit) one or
more results from the listings on the result page. This is click through analysis and measures the page
viewing behavior of Web searchers. We measure document viewing duration as the time from when
a searcher clicks on a URL on a results page to the time that searcher returns to the search engine.
Some researchers and practitioners refer to this type of analysis as page view analysis. Click through
analysis is possible if the transaction log contains the appropriate data. There are many other factors
one can examine, including query graphs [11].
or relate these queries directly to research questions. Figure 9.2 illustrates the application of such
an approach.
Figure 9.2 also shows each query in sequence and provides a descriptive tag describing that
query’s function.
SLA involves a series of standard analyses that are common to a wide variety of Web search-
ing studies. Some of these analyses may directly address certain research questions, and others may
be the basis for more in-depth research analysis.
One typical question is, “How many searchers have visited the search engine during this
period?” This query will provide a list of unique searchers and the number of queries they have sub-
mitted during the period. We can modify this question and determine “How many searchers have
visited the search engine on each day during this period.” Naturally, a variety of statistical results can
be determined using the previous queries. For example, we can determine the standard deviation of
number of queries per searcher.
In addition to visits, we may want information about the session lengths (i.e., the number of
queries within a session) for each searcher. Similarly, we may be curious about the number of search-
ers who viewed a certain number of results pages.
We can calculate various statistical results on results page viewing, such as the maximum
number of result pages viewed and queries per day. An important aspect for system designers is
results caching because we need to know the number of repeat queries submitted by the entire set of
searchers during a given period in order to optimize our system’s performance.
Some researchers are more interested in how searchers are interacting with a search engine,
and for this purpose the use of Boolean operators is an important feature. Since most search engines
offer other query syntax than just Boolean operators, we can also investigate the use of these other
Counting the terms within the transaction log is another typical measurement. We certainly
want to know about query length, the frequency of terms pairs, and the various term frequencies.
The results from this series of queries provide us with a wealth of information about our data
(e.g., occurrences of session lengths, occurrences of query length, occurrences of repeat queries, most
used terms, most used term pairs) and serves as the basis for further investigations (e.g., session
complexity, query structure, query modifications, term relationships).
It is certainly important to understand both the strengths and limitations of SLA for Web search-
ing. First concerning the strengths, SLA provides a method of collecting data from a great number
of users. Given the current nature of the Web, search logs appears to be a reasonable and non-
intrusive means of collecting user system interaction data during the Web information searching
process from a large number of searchers. We can easily collect data on hundreds of thousands to
millions of interactions, depending on the traffic of the Website.
Second, we can collect this data inexpensively. The costs are the software and storage. Third,
the data collection is unobtrusive, so the interactions represent the unaltered behavior of searchers,
assuming the data is from an operational searching site. Finally, search logs are, at present, the only
method for obtaining significant amounts of search data within the complex environment that is the
Web [37]. Of course, researchers can also undertake SLA from research sites or capture client-side
data across multiple sites using a custom Web browser (for the purpose of data collection) that does
not completely mimic the searcher’s natural environment.
There are limitations with SLA, as with any methodology. First, certain types of data are
not in the transaction log, individuals’ identities being the most common example. An IP address
typically represents the “user” in a search log. Since more than one person may use a computer, an
IP address is an imprecise representation of the user. Search engines are overcoming this limitation
somewhat by the use of cookies.
Second, there is no way to collect demographic data when using search logs in a naturalistic
setting. This constraint is true of many non-intrusive naturalistic studies. However, there are several
sources for demographic data on the Web population based on observational and survey data. From
these data sources we may get reasonable estimations of needed demographic data. However, this
demographic data is still not attributable to specific subpopulations.
Third, a search log does not record the reasons for the search, the searcher motivations, or
other qualitative aspects of use. This is certainly a limitation. In the instances where one needs
this data, one should use TLA in conjunction with other data collection methods. However, this
invasiveness reduces the unobtrusiveness, which is an inherent advantage of search logs as a data
collection method.
Fourth, the logged data may not be complete due to caching of server data on the client
machine or proxy servers. This is an often-mentioned limitation. In reality, this is a relatively minor
concern for Web search engine research due to the method with which most search engines dy-
namically produce their results pages. For example, a user accesses the page of results from a search
engine using the Back button of a browser. This navigation accesses the results page via the cache on
the client machine. The Web server will not record this action. However, if the user clicks on any
URL on that results page, functions coded on the results page redirects the click first to the Web
server, from which the Web server records the visit to the Website.
We presented a three-step methodology for conducting SLA, namely collecting, preparing, and
analyzing. We then reviewed each step in detail, providing observations, guides, and lessons learned.
We also discussed the organization of the database at the ER-level, and we explained the table
design for standard search engine transaction logs. This presentation of the methodology at a
detailed level of granularity will serve as an excellent basis for novice or experienced search log
Search logs are powerful tools for collecting data on the interactions between users and sys-
tems. Using this data, SLA can provide significant insights into user–system interactions, and it
complements other methods of analysis by overcoming the limitations inherent in those methods.
By combining SLA with other data collection methods or other research results, we can improve the
robustness of the analysis. Overall, SLA is a powerful tool for Web searching research, and the SLA
process outlined here can be helpful in future Web searching research endeavors.
• • • •
This lecture presents an overview of the Web analytics process, with a focus on gaining insight
and actionable outcomes from collecting and analyzing Internet data. The lecture first provides
an overview of Web analytics, providing in essence, a condensed version of the entire lecture. The
lecture then outlines the theoretical and methodological foundations of Web analytics in order to
understand clearly the strengths and shortcomings of Web analytics as an approach. These founda-
tional elements include the psychological basis in behaviorism and methodology underpinning of
trace data as an empirical method. The lecture then presents a brief history of Web analytics from
the original transaction log studies in the 1960s, through the information science investigations of
library systems, to the focus on Websites, systems, and applications. The lecture then covers the
various types of ongoing interaction data within the clickstream created using log files and page
tagging for analytics of Website and search logs. The lecture then presents a Web analytic process to
convert this basic data to meaningful KPIs to measure likely converts that are tailored to the orga-
nizational goals or potential opportunities. Supplementary data collection techniques are addressed,
including surveys and laboratory studies. The lecture then discusses the strengths and shortcoming
of Web analytics. The overall goal of this lecture is to provide implementable information and a
methodology for understanding Web analytics in order to improve Web systems, increase customer
satisfaction, and target revenue through effective analysis of user–Website interactions.
Returning to that online retail store selling the latest athletic shoe, Web analytics can tell us
how potential customers find our online store, including those who are referred from other Websites
and those from search engines. Web analytics provides us the methods to know, and our KPIs tell
us why we should care. Our understanding of customer behavior provided by Web analytics gives us
the tool to determine what it might mean if customers come to our Website and then immediately
leave versus if the potential customer explores several pages and then leaves. We can leverage Web
analytics techniques to glean value from this data. Web analytics allows us to focus on organiza-
tion goals, including getting the customer through the entire shopping cart process. In sum, Web
analytics is the strategic tool to make our hypothetical online store successful by understanding why
potential customers behave as they do and what that behavior means.
• • • •
Key Terms
• Abandonment rate: key performance indicator that measures the percentage of visitors
who got to that point on the site but decided not to perform the target action.
• Alignment-centric performance management: method of defining a site’s business goals
by choosing only a few key performance indicators.
• Average order value: key performance indicator that measures the total revenue to the
total number of orders.
• Average time on site: see visit length.
• Behavior: essential construct of the behaviorism paradigm. At its most basic, a behavior is
an observable activity of a person, animal, team, organization, or system. Like many basic
constructs, behavior is an overloaded term because it also refers to the aggregate set of
responses to both internal and external stimuli. Therefore, behaviors address a spectrum
of actions. Because of the many associations with the term, it is difficult to characterize it
without specifying a context in which it takes place to provide meaning.
• Behaviorism: research approach that emphasizes the outward behavioral aspects of thought.
For transaction log analysis, we take a more open view of behaviorism. In this more en-
compassing view, behaviorism emphasizes the observed behaviors without discounting the
inner aspects that may accompany these outward behaviors.
• Checkout conversion rate: key performance indicator that measures the percent of total
visitors who begin the checkout process.
• Commerce Website: a type of Website where the goal is to get visitors to purchase goods
or services directly from the site.
• Committed visitor index: key performance indicator that measures the percentage of visi-
tors that view more than one page or spend more than 1 minute on a site (these measure-
ments should be adjusted according to site type).
• Content/media Website: a type of Website focused on advertising.
• Conversion rate: key performance indicator that measures the percentage of total visitors
to a Website that perform a specific action.
• Cost per lead (CPL): key performance indicator that measures the ratio of marketing ex-
penses to total leads and shows how much it costs a company to generate a lead.
• Customer loyalty: key performance indicator that measures the ratio of new to existing
• Customer satisfaction metrics: key performance indicator that measures how the users
rate their experiences on a site.
• Demographics and system statistics: a metric that measures the physical location and
information of the system used to access the Website.
• Depth of visit: key performance indicator that measures the ratio between page views and
• Electronic survey: method of data collection in which a computer plays a major role in
both the delivery of a survey to potential respondents and the collection of survey data from
actual respondents.
• Ethogram: index of the behavioral patterns of a unit. An ethogram details the differ-
ent forms of behavior that an actor displays. In most cases, it is desirable to create an
ethogram in which the categories of behavior are objective, discrete, and not overlapping
with each other. The definitions of each behavior should be clear, detailed, and distin-
guishable from each other. Ethograms can be as specific or general as the study or field
• Interactions: physical expressions of communication exchanges between the searcher and
the system.
• Internal search: a metric that measures information on keywords and results pages viewed
using a search engine embedded in the Website.
• Key performance indicator (KPI): a combination of metrics tied to a business strategy.
• Lead generation Website: Website used to obtain user contact information in order to
inform them of a company’s new products and developments and to gather data for market
• Log file: log kept by a Web server of information about requests made to the Website in-
cluding (but not limited to) visitor IP address, date and time of the request, request page,
referrer, and information on the visitor’s Web browser and operating system.
• Log file analysis: method of gathering metrics that uses information gathered from a log
file to gather Website statistics.
• Metrics: statistical data collected from a Website such as number of unique visitors, most
popular pages, etc.
• New visitor: a user who is accessing a Website for the first time.
• New visitor percentage: key performance indicator that measures the ratio of new visitors
to unique visitors.
• Visitor path: a metric that measures the route a visitor uses to navigate through the Web-
• Visitor type: a metric that measures users who access a Website. Each user who visits the
Website is a unique user. If it is a user’s first time to the Website, that visitor is a new visitor,
and if it is not the user’s first time, that visitor is a repeat visitor.
• Web analytics: the measurement of visitor behavior on a Website.
• Web analytics: the measurement, collection, analysis, and reporting of Internet data for
the purposes of understanding and optimizing Web usage (http://www.webanalytics-
• • • •
Listed below are several practitioner blogs that offer current and insightful analysis on Web analytics.
• • • •
Fast Moving Consumer Goods
Analytics Framework
Point of view
Amsterdam, 2017
Key Trends impacting FMCG
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 2
Source: Deloitte university press- Consumer product trends Navigating 2020
Using Analytics to stay ahead of the game
Effective use of analytical capabilities will enable FMCG companies to cope with and even
benefit from the key trends impacting FMCG
Unfulfilled economic Analytics supports the shift to value by identifying key price points in the market, defining customer
recovery for core consumer segments, developing new pricing strategies based on competitive intelligence and increasing efficiency in
segments manufacturing and logistics to reduce costs
Health, wellness and Companies will experience greater pressure to better align offerings and activities with customer interests
responsibility as the new basis and values. Big Data and analytics help to better understand customer sentiment, preferences and behaviour.
of brand loyalty At the same time data analytics enables supply chain visibility and identifies potential risks
Pervasive digitization of the An increasingly larger share of consumer's spend and activity will take place through digital channels.
path to purchase Analytics is key in better understanding of purchase and consumption occasions as well as tailoring channel
Proliferation of customization In a world where customized products and personalized, targeted marketing experiences win companies
and personalization market share, technologies like digital commerce, additive manufacturing and artificial intelligence can give a
company an edge by allowing it to create customized product offerings
Continued resource
Analytics can fuel a better understanding of the resource market volatility and more efficient use of critical
shortages and commodity
resources in the production process
price volatility
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 3
FMCG Analytics Framework
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 4
FMCG Analytics Framework
Analytic capabilities for better decisions across the FMCG value chain
Brand Analysis Pricing Competitor Production Asset Quality Inventory Supply Chain Reverse
Strategy Intelligence Efficiency Analytics Analytics Diagnostics Diagnostics Logistics
Digital Marketing Trade Production Workforce Production Location Resource & Fulfilment
Analytics Mix ROI Promotion Forecasting Safety Planning Analytics Route Intelligence
Effectiveness optimization
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 5
FMCG Analytics Framework – Marketing/Sales
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 6
FMCG Value Chain – Marketing/Sales
In the Marketing/Sales process of the FMCG value chain, analyses are geared towards
improving commercial performance and customer centricity
Digital Analytics Pricing Strategy
The online channels are of increasing importance, also in FMCG. Defining a The analysis focuses on demand variation at different price levels with
uniform digital KPI framework and building web analytics capabilities is key different promotion/rebate offers. It is used to determine optimal
to create insights into the digital performance on the ecommerce platforms. prices throughout the product/service lifecycle by customer segment.
Benefits include increasing sales margin, decreasing markdowns and
aiding inventory management.
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 7
Case study – Digital Analytics
Defining a KPI framework and embedding it through online dashboards
Food company
This global Food company wanted to undergo a digital transformation. However there was little visibility on web
analytics capabilities, no accessibility to in-market web analytics, limited standards and KPI definitions and
reporting. For e-commerce there little to no online market share data available in the countries
Deloitte supported in defining uniform KPIs and a roadmap for implementation for both domains. Deloitte
supported in extracting web analytics data and requesting in-market data about on-line market share from the
Digital Sales countries. The first phases for both marketing and e-commerce were to develop tooling to measure and compare
digital performance across target countries for both marketing and e-commerce
Food company
• Delivered a (hosted) e-commerce dashboard & KPI framework with global definitions, also making web
analytics more financial by measuring the financial impact of web analytics
• Finally, the roadmaps for both marketing and e-commerce providing clear guidance on maturing in the area of
online marketing and e-commerce
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 8
Case study – Brand Analysis
Investigating brand perceptions by assessing positive and negative opinions
regarding the firm
Brand analysis
The firm wanted to investigate its brand perception by assess positive and negative opinions regarding the firm.
They wanted to be able to highlight locations showing positive and negative perceptions. The client also wanted
to compare their firm with the main competitors in order to create a data-driven brand strategy
Brand analysis
The project involves a web spider which extracts related and unstructured data from the internet from a number
of different sources (social media, blogs, news feeds etcetera). The analysis is then carried out in a text mining
tool to process the data for sentiment related content and output the results to an interactive dashboard for
Brand analysis
The results of the analysis include sentiment scores across the business areas and a root cause analysis. These
enable a real-time understanding of their online brand and identification of the differentiating factors between
positive and negative perceived programs/areas. The delivered insights can be used to determine the necessary
actions in order to promote the firm’s brand in certain programs/areas
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 9
Case study – Omni channel voice of the customer
Analysis of customer voice topics and sentiment across multiple channels
Customers leave their voices across different channels such as company website, third party resellers, customer
service emails, telephone and social. Capturing, classifying and combining data from these channels is
challenging. Our solution enables CMOs to focus their attention where it is most required
This proof-of-concept focusses on three channels (own website, third party website and social). First web
scraping used to collect raw customer voices from different channels in different markets. Then a classification
model is used to identify key topics and subtopics for each voice, another classification model is used to identify
the product(category) of the topic, and finally sentiment analysis is performed on each of the voices. The results
are visualised in an interactive dashboard
• The solution provides insights into the sentiment of voices per product category, per market
• Key topics are visible and trending topics can be assessed by product category, channel or market
• The solution provides a quick overview of all voices across all products, channels and markets, but also enable
drill-down to the voice level
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 10
Case study – Marketing Mix ROI
The use of combined online & offline Marketing Mix Modelling to improve the
Marketing ROI
Scenario analysis
Deloitte was engaged in improving return on marketing spend and optimizing the advertising investment mix
with disparate departments, differing measuring systems and differing priorities to improve marketing ROI
across both offline and online channels simultaneously. This case was executed for an omnichannel retailer
Marketing ROI
First the metrics needed for the model were prioritized across products, channels, and categories. A data
warehouse was built to hold the required variables for each product that was needed to continuously run the
Marketing Mix Modelling. With all the data present, the Marketing Mix Model was developed to optimize
marketing ROI by using Scenario analysis and Optimization models. Finally the marketing ROI tracking system
was implemented to continuously track the results of the models
• The most significant result was that the marketing ROI doubled over a two-year period
• To ensure recurring improvement, an investment mix allocation change was implemented
• Finally, there was also a strategy shift to target the most profitable customers
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 11
Case study – Pricing Strategy
Using analytics to reshape pricing strategies
Years of inorganic growth and sales led customer negotiations to tailored pricing across trade customers,
resulting in large and difficult to defend price variance across customers. Pricing differences between accounts
exposed this CPG client to downward pressure on pricing when trade partners consolidated or buyers
moved retailers
Existing pricing and trade terms structure were not compliant with internal accounting standards
Margin analysis
Deloitte developed a consistent, commercially justifiable list of pricing and trading terms. The potential impact of
Profitability analysis new pricing and terms on customers was assessed and a high-level roadmap for execution was established. The
business is supported in the preparation for the implementation of the new pricing and trading terms
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 12
Case study – Trade Promotion Effectiveness
Building a shared reporting and analysis solution that allows for
trade promotion evolution
A client’s desired end state with regard to BI was a single integrated and shared reporting and analysis solution;
delivering value in a single version of the truth throughout the organization. As part of this solution they wanted
to gain insight in trade promotion effectiveness through two key dimensions, promotional performance and
promotional planning. This case was executed for a CPG client
Account performance
Interviews within the company showed that trade promotion management & evaluation is not a focus on
corporate level, but very important on regional level. In order to create a cohesive overview into trade marketing
effectiveness across different dimension (regions, channels, categories, products & sales person) Deloitte had to
tie several data sources together, such as GFK panel data, Nielsen scanning data, IRI data and the client’s own
factory data
Budget analysis
• A tool that allows the client to evaluate trade promotion performance. This way they can evaluate the success
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 13
Case study – Competitor Intelligence
Creating an overall view of the category market post
Overall view
It is important to understand how products are offered to the end consumers via the different retail outlets.
Therefore, understanding the competitive market of suppliers as well as retailers is key
The aim of this initiative is to combine disparate data sources in order to develop a solid understanding of the
market position on individual product and category level. This case was executed from a retailer’s point of view
but can be directly applied to FMCG companies
Developing a workflow tool to obtain an overall view of the market as well as an interactive dashboard on
product sales and market positioning, by identifying and combining different data sources such as:
• Internal market sales & market research
• Third party (retailers) sales data (e.g. Nielsen)
• External data sources
Dashboard Deepdive
• Ability to focus on root cause analyses for positive or negative developments in product/market sales using
interactive dashboarding
• Uncover relative market positions of product groups vis a vis main competitors
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 14
FMCG Analytics Framework – Manufacturing
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 15
FMCG Value Chain – Manufacturing
In the Manufacturing process of the FMCG value chain, analyses are focused on
optimizing production processes taking in consideration forecasting, planning, efficiency
and risk exposure
Production Forecasting Optimization Asset Analytics
Analyses focus on the evaluation of promotion forecasting based on a Analyses focus on the prediction of the lifetime of long term assets such as
measurement framework of forecasting accuracy/error, bias and building, large machinery and other structural elements. This is done by
stability. Improving forecasting accuracy can potentially lead to calculating the influence of for instance weather, material and usage of
reductions in excess inventory, lower labour costs, lower expedite the assets.
costs, holding costs, spoilage discounts and reduced stock-outs.
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 16
Case study – Production Forecasting Optimization
Production forecasting is a key capability for many manufacturers, improving
forecasting performance is vital to improve product stock-out, while decreasing costs
due to excess inventory
Accurate forecasting is a key ability to ensure competitive advantages for every manufacturer. Improving
forecasting capability should be a continuous effort in which periodic or continuous forecasting performance
evaluation is an important element. Forecasting demand in FMCG is challenging due to three main reasons: (1)
Demand noise and volatility of demand in market (2) introduction of new products and (3) product promotions
Planner Level of
Statistical by Supply
Forecast Planning
1 2 3 4 5 6 7 8 9 10 11 12 13 14
accuracy Promotion forecasting evaluation is performed based on a three-pronged measurement framework. Performance
is measured in terms of (1) forecasting accuracy (or forecasting error), (2) forecasting bias and (3) forecasting
stability. For each of these measurements, several metrics exist and care should be taken to use the most
Production forecast suitable performance metric
Bias stability
Throughput statistics
At any stage of a company’s evolution, improving operating performance is important. Lean methodologies
applied to nearly any organization enable an efficient and lean enterprise. Analytics can support manufacturers
to proactively address the challenges they face today. If applied correctly, analytics can become a major driver
for Lean Six Sigma and other process improvement disciplines seeking to increase efficiency and reduce costs
Analytics assist management teams to devise the appropriate process control strategy and support its
Different methods are applied to uncover potential inefficiency and cost reduction opportunities such as:
Value added activities • Outlier detection
• Predictive modelling
• Scenario modelling
• Optimization & simulation
• identifying opportunities for consolidating facilities, outsourcing and off shore transfer solutions
• identifying unprofitable product lines for manufacturing operations
• reducing idle time for production facilities
• reducing defects and waste
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 18
Case study – Workforce Safety Analytics
Thorough understanding of the dynamics of workplace incidents through the use of
advanced analytics
Traditional safety analytics defines the scale of the safety problem, but routinely lacks the insights as to why
those safety events occur. A strategic safety profiling analysis can:
• Objectively identify the key factors and behaviours that impact safety related incidents and then design
measurable interventions to minimize safety risk
• Use the profiling model to predict which type of person(s) will get hurt and which employees are most at risk
Over 1.000 unique employees over three years of employee or contractor related data sets have been analysed.
Next, a model is estimated based on this data and the results have been visualized in a dashboard
• Actionable and targeted recommendations regarding what operational changes to consider to help
minimise incidents
• Ability to track, measure and report of the effectiveness of the safety compliance program and internal efforts
to minimise risk
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 19
Case study – Asset Analytics
Asset Analytics enables effective decision making by identification and quantification
of asset-related risks
Data exploration
For a water distribution utility company, Deloitte developed a model to predict maintenance of pipes. Asbestos
cement pipes may fail due to deterioration caused by lime aggressive water, in combination with other factors
such as traffic loads, point loads and root growth. Errors could have major consequences for the water utility,
customer satisfaction, safety and the environment
Pattern recognition
During a five week project, asset data such lime aggressiveness of the water, diameter, wall thickness and age of
the pipes was combined with geographical data such as region, soil type and pH and groundwater level. Based
on this dataset, 3 predictive models were trained and evaluated to predict the deterioration of the cement pipes
due to lime aggressive water
Model evaluation
The analysis revealed which asset properties and geographical variables were most informative in the prediction
of pipe failure. Combined with information about the consequences of pipe failure, a quantitative risk model for
the failure of cement pipes could be developed
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 20
Case study – Production Planning
By taking into account certain production planning variables this analysis enables
real time contingency planning for a complex, multi-layered network in case of disruptions
Network visualization
Analytics is supporting production planners to proactively address possible unforeseen planning challenges.
This analysis enables real time contingency planning for a complex, multi-layered supply chain network when
certain disruption happen by taking into consideration information about cost, service level, and historical
disruption durations
An optimal routing plan for a supply chain network is generated under normal conditions using network
programming with the following input: manufacturing costs, capacity and the customer demand of retailers.
Disruptions are real-time resulting in a better suited contingency plan, which enables cost reductions
Trade-off evaluation
Compared to traditional predefined contingency plans, a real time contingency plan is set-up (also incorporating
the considerations of current supply chain status, including initial stock, utilization rate, etc.) to achieve the
expected customer service level with cost efficiency
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 21
Case study – Quality Analytics
Quality Analytics enables to filter to high impact issues and understand a facility’s
past performance
Facilities overview
The client was an organization responsible for assessing the security compliance of a large number of
organizations. Disparate reporting and data collection techniques made it difficult for staff and leadership to
prioritize action and identify problem areas
Region overview
The dashboard gathered all facility information consistently, provided the ability to filter to high impact issues
and understand a facility’s past performance. The solution consolidates all the organizations information that
allows the user to understand the scope of their organization while also being able to drill down to a single
facility in order to make actionable decisions
The solution provides views for the three types of individuals in the organization (Representative, Field office,
and Regional Manager) as well as prioritization tools and facility details. The tool allows an individual user to
focus on high priority facilities, but with changing definitions of priority. In addition each user can see all the
information they need to understand the scope of their assignments and make decisions
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 22
FMCG Analytics Framework – Logistics
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 23
FMCG Value Chain – Logistics
In the Logistics process of the FMCG value chain, analyses are focused on optimizing
delivery, shipments and warehousing performances
Location Analytics Supply Chain Diagnostics
This type of analysis helps solve the problem on what the optimal location is Supply chain diagnostics aims at enabling and improving the ability to
for a certain facility, based on geographical data. As an example, the fire view every item (Shipment, Order, SKU, etc.) at any point and at all
department would want their facilities to be spread throughout a city, so times in the supply chain. Furthermore its goal is to alert on process
that a fire at any point in the city can be reached with an acceptable exceptions, to provide analytics, and to analyse detailed supply chain
response time. data to determine opportunities of cycle time reduction.
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 24
Approach – Location Analytics
Find whitespots in distribution centers locations
Current distribution
center A company the Netherlands wanted to expand their business. They want to improve delivery times to the store
location by creating one or more extra distribution centres in the Netherlands. The centres should be placed in
Distribution locations such that they get maximum value in lower delivery times, now and in the near future
Determine travel-time
to centers
For the approach we start from the current distribution centre locations. From these locations we can calculate
the traveling time to stores using Dijkstra’s algorithm. This gives us for each location on the map the travel time
to a distribution centre. These results can then be visualized in a heatmap to immediately locate whitespots in
the store distribution. Furthermore an optimization algorithm was run to determine the optimal distribution of
distribution centres
Visualize and
monitor results
With our results and the new locations the fire departments were able to:
• Significantly reduce the response time, saving lives and reducing costs at the same time
• Reduce the total number of fire departments, while giving better response time performance
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 25
Case study – Inventory Diagnostics
Delivering a robust and user friendly Global Transit Planning Tool
Lane detail
To empower transportation personnel to more efficiently analyse ocean and air supply chain shipment data, a
global operating company internally designed a Global Transit Planning (GTP) tool in Tableau. However, the
tool did not achieve high user adoption, since analyses were not intuitive and high manual data updates
were required
The Deloitte team was asked to enhance the tool and incorporate a robust data blending process
Carrier rank
Enhancing the GTP dashboard and blending the data was achieved in four subsequent phases consisting of:
research, visioning, prototyping and iterating
In the prototyping phase, the team built and refined the dashboards and wrote a Python script which indicates
how the various data sources should feed into the unified view of data
Carrier comparison
The existing GTP tool was adjusted to provide maximum flexibility, automation and collaboration. The user flow
allows users to interact in one cohesive interface, while providing tailored information to their specific role.
The redesigned GTP tool is now well adopted within the organization and used on monthly basis to enable more
effective inventory planning decisions, resulting in the gradual and continuous reduction of in-transit inventory
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 26
Case study – Resource & Route optimization
Maximizing profitability by optimizing resource planning and route optimization
Clear understanding of
all steps
A Dutch client that handled waste disposal for large companies struggled with its profitability. After analysis it
was confirmed the one of the key issues was the suboptimal resource planning. Resource planning of trucks and
drivers was done manually, even sometimes by the drivers themselves. The client asked Deloitte to develop and
system for finding optimum routes for their trucks
First Deloitte created an overview of all different customer locations, the number of available trucks per location,
the working hours, pickup points. Next we calculated the drive time matrix between the different locations.
Subsequently Deloitte created a model that would use a customized ‘cost function’ in which weights could be
given to driving time and driving distance. The cost function would then be optimized and by doing so, providing
the optimal routing for each truck for each day
Optimization process
The model Deloitte created was able to plan to optimal routes for the different trucks much faster and efficiently
than the client was able to do. The new route planning model showed that it was possible to significantly reduce
resource usage – it was possible to sell trucks without loss of client service and satisfaction
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 27
Case study – Supply Chain Diagnostics
Provide insight into key drivers of delivery in full and on time and improving
coverage throughout the supply chain
shipped to distribution centres around the globe. In order to satisfy customer demand in
time, it is necessary that the coverage is in order, i.e. the percentage of the products
that arrives at the distribution centres on time and in full. In order to improve the
coverages and meet the set targets, the company wanted insight into the drivers that
most influence the coverage and eventually also the delivery in full and on time.
Therefore they asked Deloitte to perform a detailed analysis on their data
Collected the ‘15 week coverage rate’ for full year of orders. A clustering technique was
used to cluster 26,000 coverage rates. This technique groups the coverage patterns in
Collected coverage patterns transformed into the identified clusters
buckets of similar patterns, which then comprise a single cluster. Eight buckets of
different coverage patterns were visualized and these buckets gave insight in the drivers
of the coverage for the orders
Coverage patterns of 26000 orders
• Carrier performance has the largest impact on the coverage
• Good coverage is usually caused by slack in factory performance
• Identified significant number of orders that were only slightly (1–7 days) late and
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 could be quick wins
Week • Actionable insights to improve process and areas of the order pipeline
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 28
Case study – Fulfilment Intelligence
Gaining insights into the digital order pipeline to improve order fulfilment and
speed of delivery
Clear understanding of
all steps
Over the last years, online sales channels have become more and more important for companies. With the
increase of online channels, however, customers have become more demanding in terms of delivery time and
service. Reliability is therefore extremely important, even more so than speed. Therefore a large company asked
Deloitte to create a clear picture of the Direct-to-consumers online purchase order submission process through
the different systems and increase the reliability of this process
around roughly twice the maximum amount of time, 7% within about 6 times the maximum and 1% took even
longer. The analysis was focused on the group that was completed in twice the maximum time (22%) which held
the largest opportunity to identify the delay drivers. Timestamps were created for different stages in the order
submission process combined from multiple source systems. A clustering on deviation from the reference per
time stamp was performed
Optimization opportunities
The analysis led to the identification of several steps within the process that could be improved with low effort
for a relatively high gain. In total more than 20 improvements were made based on the analysis results, leading
not only to a more reliable order submission process, but also to an average time reduction per order of 50%. As
a results, customer satisfaction and loyalty increased
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 29
Case study – Reverse Logistics
Reducing costs on reverse logistics by analysing end-to-end process
Sankey diagram of
product flow
A global technology firm struggled with high costs on their service logistics. The scope of service logistics
consisted of shipping parts to client sites and take care of returning the defective parts to global re-
manufacturing sites. Clients were served with premium service levels (i.e. <4 hour recovery). Deloitte was asked
to make a fact based assessment of the service logistics process and advice how costs could be reduced
During the process the reasons why customers contacted the service desk were analysed, it turned out that 80%
of the problems could be resolved by online support. From the remaining 20%, 80% of the problems could be
Overview of quick wins resolved by the second line support. For the 4% that could not be resolved this way, a replacement needed to be
Results sent. After inspection, it turned out that half of these returned units actually did not have any malfunctions
The result of the analysis was that the main opportunity for savings was not in the cost for logistics (driven by
the stringent service levels and unpredictable failure rates), but was found in avoiding cost (i.e. reducing the
number of replaced products that turned out to be non-defective). This savings should be realized by
continuously improving online information and the customer services departments
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 30
FMCG Analytics Framework – Business Management & Support
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 31
FMCG Value Chain – Business Management & Support
In the Support process of the FMCG value chain analyses are focused on determining
potential improvements in the organization
Finance Analytics
• Working capital, spend analytics, double payment, risk and tax analyses
• Helping clients to get control of their financial data, finance analytics
enable clients to model business processes and gain deeper insight into
cost and profitability drivers.
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 32
Case study – Workforce Analytics
Strategic Workforce Planning: planning the talent needed for sustainable growth
Clients experience a continuously changing environment in which they have to operate. Within this environment
new products and new sales channels are discovered. In order to be able to gain full advantage of these new
opportunities a variety of new skills within the workforce are needed
Using data from different sources such as People-, Customers-, Work- and Finance data, insights can be
derived in:
• Identifying critical workforce segments. Mapping segments/skills that drive a disproportionate amount of value
Workforce planning tools
creation in comparison to their peers
• Identifying current demand drivers and defining a demand model
• Defining and executing a workforce planning to analyze gaps in the current supply and demand for critical
Clients get a view of how they should move from the current workforce to the workforce needed in 5 years
from now
The approach used makes sure that clients can use evidence based decision making supported by a variety of
fact based workforce planning tools
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 33
Case study – Sustainability Analytics
Sustainability analytics enforces company’s sustainability-related initiatives
Prioritization of
Product Categories
Sustainability analytics can help companies reduce key resource use and at the same time making them less
vulnerable to price and supply volatility. Future risks and opportunities can be identified in areas such as
environmental and health impacts – both within the organization and across the extended supply chain. The
challenge lies in generating the most influential insights from relevant data. These insights are necessary to
develop sustainability related strategies and to improve overall (resource use) efficiency
The approach is divided into three actions:
• Develop a normalized and comprehensive view of resource use to understand (and prioritize) the hot spots
• Conduct a comprehensive analysis of products/services lifecycles to quantify the risks/opportunities
• Align/develop a sustainability strategy using the results of the executed analyses
Supplier Ranking
• Prioritization of product categories: an identification of the top product categories and a prioritization of
categories with most improvement potential
• Reduction product analysis: Development of an implementation strategy and value propositions for the
opportunities of the highest prioritized product groups (how to reduce costs, increase customer preference and
reduce risk)
• Supplier ranking: Ranking of suppliers based on sustainability performance to create individualized
“sustainability report cards” which can be integrated in category buying decision making
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 34
Case study – Working capital
“The Dash for Cash”: Using the Deloitte WCR Cashboard to drive sustainable
performance improvements in working capital
WCR Cashboard
As companies try to stay their course in the downturn and beyond, cash is back as king. Working capital is one
of the few remaining areas which can rapidly deliver a significant amount of cash to a business without a large
restructuring program
The client asked Deloitte to help in the challenge to free up working capital. Reducing working capital in the
short term is fairly easy; making reductions sustainable and changing the mind-set in operations to that of a
Payables – Purchase to Pay CFO is more difficult
To enable sustainable reductions, Deloitte deploys a cash-oriented, entrepreneurial approach to working capital
Inventory – Forecast
management that focuses on concrete actions and creating a “cash flow mind-set" to shorten the cash
to Fulfill conversion cycle. The Cashboard™ is a flexible & configurable dashboard that is powerful but still exceptionally
easy to use. As such, it allows frontline operations staff at companies to zoom in on the key opportunities, risks,
trade-offs and root causes
Receivables – Order
to Cash
• It enables continuous monitoring of the working capital levels throughout the entire company – including all
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 35
Case study – Spend Analytics
Deloitte Spend & Procurement Analytics provides deep insight in the composition of
the volume of spend and identifies key savings opportunities
General overview
The client was struggling with identifying improvement opportunities because of inaccessible information. As a
result, the client was unable to drill down and analyse individual orders and problem solving was limited to the
strategic level
The client asked Deloitte to help identify opportunities for continuous improvement for cost reduction and
provide additional insights into the spending trends of the organization
Price analytics
Our Spend & Procurement Analytics approach facilitates short time-to-deploy and delivers easy-to-use insight
and contains these key components:
Supplier view • Easy upload of procurement data through standard interfaces
• Engine to create a bottom up calculation of your company’s most important Spend KPIs
• Interactive dashboard enabling context driven analysis by time, supplier, product, business line
Geographical view
Through the Spend & Procurement Analytics Dashboard efficiency and savings opportunities can be identified in
several areas:
• Improve process efficiency by identifying fragmented spend and invoicing
• Identify and expel maverick buying
• Negotiate better contracts
• Reduce costs by optimizing the purchase to pay process
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 36
Case study – Double Payments
Because paying once is enough!
cases. They usually know this but have no means for pinpointing exactly which
invoices are paid twice
Many organizations check for invoices paid twice, but rarely detect them all. This can
be caused by inaccurate master data or errors due to invoice entries. The
organization asked Deloitte to help in detecting double payments in a better way
The Deloitte Double Invoice Tracker examines all individual invoices, over multiple
periods in full detail. The Invoice Tracker detects inaccuracies in the master data by
using specially designed algorithms.
By cleverly cross-referencing inaccuracies in the master data with those in the
invoice entries, the Double Invoice Tracker can find lost cash and insights into the
master data quality
The Deloitte Double invoice tracker saves money and helps improving master data
quality, by giving:
• An overview of all the invoices paid twice, including supplier information, so the
restitution process can be started immediately
• Insights into the master data quality
• Insights into the aggregate purchasing expenditures and how these are divided
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 37
Solution – Business Process analytics
Deloitte’s process analytics solution Process X-ray reconstructs what really happened
in the process and provides the capabilities to find the root cause
Process variation is at least 100 times greater than clients imagine. In fact, 5,000 or more variations are
common in most end-to-end processes. Such high levels of variability are a natural enemy of scalability,
efficiency, and process control
Process -ray
Process execution is facilitated by different departments and functions, making it difficult to get and end-to-end
view of the process
Throughput times
Process analytics provides visibility of what is really happening based on the actual event data captured in
transactional systems. This is far different from the subjective recollections or assertions of people
It provides end-to-end visibility of the process, tearing down the walls between functions and departments and
providing an internal benchmark
Process analytics offers the scalability to analyze large volumes of transaction data from different systems (SAP,
Oracle, JDEdwards, SalesForce, etc.)
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 38
Solution – Program Portfolio Analytics
Deloitte’s iPL solution enables timely monitoring by disclosing project portfolio
performance anytime anywhere
Deloitte’s iPL
Portfolio management
Typical challenges that an organization faces relating to monitoring the portfolio performance:
• Getting performance reports is very time consuming and therefore the frequency of delivering these reports is
usually low
• The reports created are static and therefore provide no possibilities to analyze into a detailed level and from
different perspectives
• Decision making is mostly based on one dimension only (e.g. time spent)
Deloitte’s iPL solution is aimed at fact based prioritization and tracking of project performance and enables
financial, resourcing, risk and issue analyses
Resource management iPL combines data from multiple sources and visualizes the results in an interactive analysis environment which
can be accessed online
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 39
Project Approach
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 40
Our Analytic Insights project approach
Our comprehensive and flexible methodology for Analytics projects ensures we can deliver
business critical insights within time and budget
A typical Analytic Insights project takes 8-12 weeks following three main phases connected to our approach
Assess Current Acquire & Prepare & Evaluate & Report &
Analyze & Model
Situation Understand Data Structure Data Interpret Implement
Our structured approach has been built up from our experience in analytical engagements. It comprises of 6 steps to maximize project oversight. Each step
allows looping back to previous steps to apply the insights gained in subsequent steps.
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 41
Why Deloitte?
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 42
Deloitte maintains a market-leading global Analytics practice with extensive
experience in FMCG
We understand what your challenges are as well as the current and future analytics
market, placing Deloitte in a unique position to assist you
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 43
Deloitte’s areas of expertise in Analytics
We have build a wide area of expertise, covering all important streams within the field of
Analytics & Information Management
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 44
Deloitte’s approach towards becoming an Insight Driven Organization (IDO)
Considering analytics with a wider lens than just technology
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 45
Deloitte Greenhouse
Deloitte Greenhouse offers different types of immersive analytics sessions
Analytics Lab
The Analytics Lab, hosted in Deloitte’s innovative Greenhouse environment, is an inspiring and energetic workshop to uncover the impact of data analytics and visualization for your
organization. Participants are provided with a unique opportunity to experience hands-on analytics in a fun and innovative setting, facilitated by Deloitte’s industry specialists and
subject matter experts.
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 46
Privacy by Design
Incorporate privacy (and security) in the design process of the data analytics application
The Privacy by Design (PbD) concept is to design privacy measures • Effective way to make sure compliance is reached already in the
directly into IT systems, business practices and networked infrastructure, design phase (and maintained)
providing a “middle way” by which organizations can balance the need to
innovate and maintain competitive advantage with the need to • Efficient: accommodating privacy enhancing measures is cost
preserve privacy. effective in the early stages of design
It is no flash-in-the-pan theory: PbD has been endorsed by many public- • Time available to do adjustments / look for alternatives
and private sector authorities in the European Union, North America, and
elsewhere. These include the European Commission, European
Parliament and the Article 29 Working Party, the U.S. White House,
Federal Trade Commission and Department of Homeland Security, among
other public bodies around the world who have passed new privacy laws.
Additionally, international privacy and data protection authorities
unanimously endorsed Privacy by Design as an international standard
for privacy.
Adopting PbD is a powerful and effective way to embed privacy into the
DNA of an organization. It establishes a solid foundation for data
analytics activities that support innovation without compromising
personal information.
Deloitte took the basic principles of PbD and built them out into a full
method that can be used to apply privacy to almost any design –
whether it is IT-systems, applications or products, the latter specifically
significant now that the Internet-of-Things is coming upon us.
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 47
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 48
Contact our Analytics Experts
Patrick Schunck
Partner – Lead Consumer Products Deloitte NL
+31 882881671
+31 882884754
Frank Korf
Senior Manager Advanced Analytics
+31 882885911
© 2017 Deloitte The Netherlands Fast Moving Consumer Goods Analytics Framework 49
Chapter VIII
A Review of Methodologies for
Analyzing Websites
Danielle Booth
Pennsylvania State University, USA
Bernard J. Jansen
Pennsylvania State University, USA
This chapter is an overview of the process of Web analytics for Websites. It outlines how basic visitor
information such as number of visitors and visit duration can be collected using log files and page tagging.
This basic information is then combined to create meaningful key performance indicators that are tailored
not only to the business goals of the company running the Website, but also to the goals and content of
the Website. Finally, this chapter presents several analytic tools and explains how to choose the right
tool for the needs of the Website. The ultimate goal of this chapter is to provide methods for increasing
revenue and customer satisfaction through careful analysis of visitor interaction with a Website.
A Review of Methodologies for Analyzing Websites
how to utilize key performance indicators, best lected, they can be used to analyze Web traffic and
key practices, and choosing the right Web ana- improve a Website to better meet its traffic. Ac-
lytics tool. cording to Panalysis (,
The first section addresses metrics, informa- an Australian Web analytics company, these met-
tion that can be collected from visitors on a Web- rics generally fall into one of four categories: site
site. It covers types of metrics based on what kind usage, referrers (or how visitors arrived at your
of data is collected as well as specific metrics and site), site content analysis, and quality assurance.
how they can be utilized. The following section Table 1 shows examples of types of metrics that
discusses the two main methods for gathering visi- might be found in these categories.
tor information -- log files and page tagging. For Although the type and overall number of met-
each method, this section covers the advantages rics varies with different analytics vendors, there
and disadvantages, types of supported informa- is still a common set of basic metrics common to
tion, and examples for data format. Following this most. Table 2 outlines eight widespread types of
is a section on how to choose the key performance information that measure who is visiting a Website
indicators (KPIs). This includes outlining several and what they do during their visits, relating each
business strategies for integrating Web analytics of these metrics to specific categories.
with the rest of an organization as well as identify- Each metric is discussed below.
ing the type of Website and listing several specific
KPIs for each site type. The following section Visitor Type
provides the overall process and advice for Web
analytics integration, and the final section deals Since analyzing Website traffic first became
with what to look for when choosing analytics popular in the 1990s with the Website counter,
tools as well as a comparison of several specific the measure of Website traffic has been one of
tools. Finally, the conclusion discusses the future the most closely watched metrics. This metric,
of Web analytics. however, has evolved from merely counting the
number of hits a page receives into counting the
number of individuals who visit the Website.
METRICS There are two types of visitors: those who have
been to the site before, and those who have not.
In order to understand the benefits of Website This difference is defined in terms of repeat and
analysis, one must first understand metrics – the new visitors. In order to track visitors in such a
different kinds of available user information. way, a system must be able to determine individual
Although the metrics may seem basic, once col- users who access a Website; each individual visitor
• Numbers of visitors and • Which websites are sending • Top entry pages • Broken pages or server
sessions visitors to your site • Most popular pages errors
• How many people • The search terms people used • Top pages for single page view • Visitor response to errors
repeatedly visit the site to find your site sessions
• Geographic information • How many people place • Top exit pages
• Search Engine Activity bookmarks to the site • Top paths through the site
• Effectiveness of key content
A Review of Methodologies for Analyzing Websites
is called a unique visitor. Ideally, a unique visitor time a visitor spends on a site during one session.
is just one visitor, but this is not always the case. One possible area of confusion when using this
It is possible that multiple users access the site metric is handling missing data. This can be caused
from the same computer (perhaps on a shared either by an error in data collection or by a ses-
household computer or a public library). In ad- sion containing only one page visit or interaction.
dition, most analytic software relies on cookies Since the visit length is calculated by subtracting
to track unique users. If a user disables cookies the time of the visitor’s first activity on the site
in their browser or if they clear their cache, the from the time of the visitor’s final activity, what
visitor will be counted as new each time he or happens to the measurement when one of those
she enters the site. pieces of data is missing? According to the Web
Because of this, some companies have instead Analytics Association, the visit length in such
begun to track unique visits, or sessions. A session cases is zero (Burby & Brown, 2007).
begins once a user enters the site and ends when a When analyzing the visit length, the measure-
user exits the site or after a set amount of time of ments are often broken down into chunks of time.
inactivity (usually 30 minutes). The session data StatCounter, for example, uses the following time
does not rely on cookies and can be measured categories:
easily. Since there is less uncertainty with visits,
it is considered to be a more concrete and reli- • Less than 5 seconds
able metric than unique visitors. This approach • 5 seconds to 30 seconds
is also more sales-oriented because it considers • 30 seconds to 5 minutes
each visit an opportunity to convert a visitor into • 5 minutes to 20 minutes
a customer instead of looking at overall customer • 20 minutes to 1 hour
behavior (Belkin, 2006). • Greater than 1 hour (Jackson, 2007)
Referrering URL and Keyword Which sites have directed traffic to the Website and which Referrers
Analysis keywords visitors are using to find the Website
Errors Any errors that occurred while attempting to retrieve the page Quality Assurance
A Review of Methodologies for Analyzing Websites
short amount of time it usually means they either • Identify products that are offered, but which
arrived at the site by accident or the site did not customers have a hard time finding.
have relevant information. By combining this • Identify customer trends.
information with information from referrers and • Improve personalized messages by using
keyword analysis, one can tell which sites are the customers' own words.
referring well-targeted traffic and which sites are • Identify emerging customer service issues
referring poor quality traffic. • Determine if customers are provided with
enough information to reach their goals.
Demographics and System Statistics • Make personalized offers. (Aldrich, 2006)
The demographic metric refers to the physical By analyzing internal search data, one can
location of the system used to make a page request. use the information to improve and personalize
This information can be useful for a Website that the visitors’ experience.
provides region-specific services. For example,
if an e-commerce site can only ship its goods to Visitor Path
people in Spain, any traffic to the site from outside
of Spain is irrelevant. In addition, region-specific A visitor path is the route a visitor uses to navigate
Websites also want to make sure they tailor their through a Website. Excluding visitors who leave
content to the group they are targeting. Demo- the site as soon as they enter, each visitor creates
graphic information can also be combined with a path of page views and actions while perusing
information on referrers to determine if a referral the site. By studying these paths, one can identify
site is directing traffic to a site from outside a any difficulties a user has viewing a specific area
company’s regions of service. of the site or completing a certain action (such as
System statistics are information about the making a transaction or completing a form).
hardware and software with which visitors access According to an article by the Web Analyt-
a Website. This can include information such as ics Association, there are two schools of thought
browser type, screen resolution, and operating regarding visitor path analysis. The first is that
system. It is important that a Website be acces- visitor actions are goal-driven and performed in
sible to all of its customers, and by using this a logical, linear fashion. For example, if a visitor
information, the Website can be tailored to meet wants to purchase an item, the visitor will first
visitors’ technical needs. find the item, add it to the cart, and proceed to the
checkout to complete the process. Any break in
Internal Search that path (i.e. not completing the order) signifies
user confusion and is viewed as a problem.
If a Website includes a site-specific search utility, The second school of thought is that visitor
then it is also possible to measure internal search actions are random and illogical and that the only
information. This can include not only keywords path that can provide accurate data on a visitor’s
but also information about which results pages behavior is the path from one page to the page
visitors found useful. The Patricia Seybold Group immediately following it. In other words, the only
( identifies the follow- page that influences visitor behavior on a Website
ing seven uses for internal search data: is the one they are currently viewing. For example,
visitors on a news site may merely peruse the ar-
• Identify products and services for which ticles with no particular goal in mind. This method
customers are looking, but that are not yet of analysis is becoming increasingly popular
provided by the company.
A Review of Methodologies for Analyzing Websites
because companies find it easier to examine path either good or bad depending on the content of
data in context without having to reference the the referring page.
entire site in order to study the visitors’ behavior In the same way, keyword analysis deals
(Web Analytics Association, n. d.). specifically with referring search engines and
shows which keywords have brought in the most
Top Pages traffic. By analyzing the keywords visitors use
to find a page, one is able to determine what
Panalysis mentions three types of top pages: top visitors expect to gain from the Website and use
entry pages, top exit pages, and most popular that information to better tailor the Website to
pages. Top entry pages are important because the their needs. It is also important to consider the
first page a visitor views makes the greatest im- quality of keywords. Keyword quality is directly
pression about a Website. By knowing the top entry proportional to revenue and can be determined by
page, one can make sure that page has relevant comparing keywords with visitor path and visit
information and provides adequate navigation to length (Marshall, n. d.). Good keywords will bring
important parts of the site. Similarly, identifying quality traffic and more income to your site.
popular exit pages makes it easier to pinpoint
areas of confusion or missing content. Errors
The most popular pages are the areas of a web-
site that receive the most traffic. This metric gives Errors are the final metric. Tracking errors has
insight into how visitors are utilizing the Website, the obvious benefit of being able to identify and
and which pages are providing the most useful fix any errors in the Website, but it is also useful
information. This is important because it shows to observe how visitors react to these errors. The
whether the Website’s functionality matches up fewer visitors who are confused by errors on a
with its business goals; if most of the Website’s Website, the less likely visitors are to exit the site
traffic is being directed away from the main pages because of an error.
of the site, the Website cannot function to its full
potential (Jacka, n. d.).
Referrers and Keyword Analysis
How does one gather these metrics? There are
A referral page is the page a user visits immedi- two major methods for collecting data for Web
ately before entering to a Website, or rather, a site analysis: log files and page tagging. Most current
that has directed traffic to the Website. A search Web analytic companies use a combination of
engine result page link, a blog entry mention- the two methods for collecting data. Therefore,
ing the Website, and a personal bookmark are it is important to understand the strengths and
examples of referrers. This metric is important weaknesses of each.
because it can be used to determine advertising
effectiveness and search engine popularity. As Log Files
always, it is important to look at this information
in context. If a certain referrer is doing worse than The first method of metric gathering uses log files.
expected, it could be caused by the referring link Every Web server keeps a log of page requests
text or placement. Conversely, an unexpected that can include (but is not limited to) visitor IP
spike in referrals from a certain page could be address, date and time of the request, request
page, referrer, and information on the visitor’s
A Review of Methodologies for Analyzing Websites
Web browser and operating system. The same from the first three in that it aims to provide for
basic collected information can be displayed in better control and manipulation of data while still
a variety of ways. Although the format of the log producing a log file readable by most Web analytics
file is ultimately the decision of the company who tools. The extended format contains user defined
runs the Web server, the following four formats fields and identifiers followed by the actual en-
are a few of the most popular: tries, and default values are represented by a dash
“-“ (Hallam-Baker & Behlendorf, 1999). Table 4
• NCSA Common Log shows an example of an extended log file.
• NCSA Combined Log There are several benefits of using system log
• NCSA Separate Log files to gather data for analysis. The first is that
• W3C Extended Log it does not require any changes to the Website
or any extra software installation to create the
The NCSA Common Log format (also known log files. Web servers automatically create these
as Access Log format) contains only basic infor- logs and store them on a company’s own servers
mation on the page request. This includes the cli- giving the company freedom to change their Web
ent IP address, client identifier, visitor username, analytics tools and strategies at will. This method
date and time, HTTP request, status code for also does not require any extra bandwidth when
the request, and the number of bytes transferred loading a page, and since everything is recorded
during the request. The Combined Log format server-side, it is possible to log both page request
contains the same information as the common successes and failures.
log with the following three additional fields: Using log files also has some disadvantages.
the referring URL, the visitor’s Web browser and One major disadvantage is that the collected
operating system information, and the cookie. The data is limited to only transactions with the Web
Separate Log format (or 3-Log format) contains server. This means that they cannot log informa-
the same information as the combined log, but it tion independent from the servers such as the
breaks it into three separate files – the access log, physical location of the visitor. Similarly, while
the referral log, and the agent log. The date and it is possible to log cookies, the server must be
time fields in each of the three logs are the same. specifically configured to assign cookies to visi-
Table 3 shows examples of the common, combined, tors in order to do so. The final disadvantage is
and separate log file formats (notice that default that while it is useful to have all the information
values are represented by a dash “-“): stored on a company’s own servers, the log file
Similarly, W3C provides an outline for stan- method is only available to those who own their
dard formatting procedures. This format differs Web servers.
NCSA Common Log - dsmith [10/Oct/1999:21:15:05 +0500] “GET /index.html HTTP/1.0” 200 1043
NCSA Combined Log - dsmith [10/Oct/1999:21:15:05 +0500] “GET /index.html HTTP/1.0” 200 1043
“” “Mozilla/4.05 [en] (WinNT; I)” “USERID=CustomerA;IMPID=01234”
NCSA Separate Log Common Log: - dsmith [10/Oct/1999:21:15:05 +0500] “GET /index.html HTTP/1.0” 200 1043
Referral Log:
[10/Oct/1999:21:15:05 +0500] “”
Agent Log:
[10/Oct/1999:21:15:05 +0500] “Microsoft Internet Explorer - 5.0”
A Review of Methodologies for Analyzing Websites
Page Tagging use more bandwidth each time a page loads, and
it also makes it harder to change analytics tools
The second method for recording visitor activity because the code embedded in the Website would
is page tagging. Page tagging uses an invisible have to be changed or deleted entirely. The final
image to detect when a page has been success- disadvantage is that page tagging is only capable
fully loaded and then uses JavaScript to send of recording page loads, not page failures. If a
information about the page and the visitor back page fails to load, it means that the tagging code
to a remote server. According to Web Analytics also did not load, and there is therefore no way
Demystified the variables used and amount of data to retrieve information in that instance.
collected in page tagging are dependent on the Web Although log files and page tagging are two
analytics vendor. Some vendors stress short, easy distinct ways to collect information about the
to use page tags while others emphasize specific visitors to a Website, it is possible to use both
tags that require little post-processing. The best together, and many analytics companies provide
thing to look for with this method, however, is ways to use both methods to gather data. Even
flexibility – being able to use all, part, or none so, it is important to understand the strengths and
of the tag depending on the needs of the page weaknesses of both. Table 5 shows the advantages
(Peterson, 2004). and disadvantages of log file analysis and page
There are several benefits to using this method tagging.
of gathering visitor data. The first is speed of
reporting. Unlike a log file, the data received via The Problems with Data
page tagging is parsed as it comes in. This allows
for near real-time reporting. Another benefit is One of the most prevalent problems in Web ana-
flexibility of data collection. More specifically, it lytics is the difficulty identifying unique users.
is easier to record additional information about In order to determine repeat visitors, most Web
the visitor that does not involve a request to the analytic tools employ cookies that store unique
Web server. Examples of such information include identification information on the visitor’s personal
information about a visitor’s screen size, the price computer. Because of problems with users deleting
of purchased goods, and interactions within Flash or disabling cookies, however, some companies
animations. This is also a useful method of gather- have moved towards using Macromedia Flash
ing data for companies that do not run their own Local Shared Objects (LSOs). LSOs act like a
Web servers or do not have access to the raw log cookie, but standard browsers lack the tools re-
files for their site (such as blogs). quired to delete them, anti-spyware software does
There are also some disadvantages of page tag- not delete them because it does not see them as a
ging, most of which are centered on the extra code threat, and most users do not know how to delete
that must be added to the Website. This causes it to them manually. Awareness is growing, however,
A Review of Methodologies for Analyzing Websites
and Firefox and Macromedia are working against In order for a Website to be beneficial, information
LSOs and providing users with tools to delete gathered from its visitors must not merely show
them (Permadi, 2005). what has happened in the past, but it must also
Sen, Dacin, and Pattichis (2006) cite various be able to improve the site for future visitors. The
other problems with log data from Websites includ- company must have clearly defined goals for the
ing large data size and messy data. Problems with future and use this information to support strate-
large data size are caused by massive amounts of gies that will help it achieve those goals.
traffic to a Website and also the amount of informa- For a Website, the first step in achieving this
tion stored in each record. Records with missing is making sure the data collected from the site
IP addresses and changes to Website content cause is actionable. According to the Web Analytics
messy data. Even though the data may be hard to Association (McFadden, 2005), in order for a
work with at first, once it is cleaned up, it provides company to collect actionable data, it must meet
an excellent tool for Web analytics. these three criteria: “(1) the business goals must
be clear, (2) technology, analytics, and the busi-
ness must be aligned, and (3) the feedback loop
CHOOSING KEY PERFORMANCE must be complete” (Web Channel Performance
INDICATORS Management section, para. 3).
There are many possible methods for meet-
In order to get the most out of Web analytics, ing these criteria. One is Alignment-Centric
one must know how to choose effectively which Performance Management (Becher, 2005). This
metrics to analyze and combine them in mean- approach goes beyond merely reviewing past
ingful ways. This means knowing the Website’s customer trends to carefully selecting a few key
business goals and then determining which KPIs KPIs based on their future business objectives.
will provide the most insight. Even though a wealth of metrics is available from
a Website, this does not mean that all metrics
Knowing Your Business Goals are relevant to a company’s needs. Reporting
large quantities of data is overwhelming, so it is
Every company has specific business goals. Every important to look at metrics in context and use
part of the company works together to achieve them to create KPIs that focus on outcome and not
them, and the company Website is no exception. activity. For example, a customer service Website
A Review of Methodologies for Analyzing Websites
might view the number of emails responded to on the next step is choosing relevant KPIs that are
the same day they were sent as a measurement of aligned with the company’s business strategy
customer satisfaction. A better way to measure and then analyzing expected versus actual results
customer satisfaction, however, might be to survey (Sapir 2004).
the customers on their experience. Although this In order to choose the best KPIs and measure
measurement is subjective, it is a better repre- the Website’s performance against the goals of a
sentation of customer satisfaction because even business, there must be effective communication
if a customer receives a response the same day between senior executives and online managers.
they send out an email, it does not mean that the The two groups should work together to define the
experience was a good one (Becher, 2005). relevant performance metrics, the overall goals for
Choosing the most beneficial KPIs using this the Website, and the performance measurements.
method is achieved by following “The Four M’s This method is similar to Alignment-Centric
of Operational Management” as outlined by Performance Management in that it aims to aid
Becher (2005) which facilitate effective selec- integration of the Website with the company’s
tion of KPIs: business objectives by involving major stakehold-
ers. The ultimate goals of OBPM are increased
• Motivate: Ensure that goals are relevant to confidence, organizational accountability, and
everyone involved. efficiency (Sapir 2004).
• Manage: Encourage collaboration and
involvement for achieving these goals. Identifying KPIs Based on Website
• Monitor: Once selected, track the KPIs and Type
quickly deal with any problems that may
arise. Unlike metrics, which are numerical represen-
• Measure: Identify the root causes of prob- tations of data collected from a Website, KPIs
lems and test any assumptions associated are tied to a business strategy and are usually
with the strategy. measured by a ratio of two metrics. By choosing
KPIs based on the Website type, a business can
By carefully choosing a few, quality KPIs to save both time and money. Although Websites
monitor and making sure everyone is involved can have more than one function, each site be-
with the strategy, it becomes easier to align a longs to at least one of the four main categories
Website’s goals with the company’s goals because – commerce, lead generation, content/media, and
the information is targeted and stakeholders are support/self service (McFadden, 2005). Table 6
actively participating. shows common KPIs for each Website type:
Another method for ensuring actionable data We discuss each Website type and related
is Online Business Performance Management KPIs below.
(OBPM) (Sapir, 2004). This approach integrates
business tools with Web analytics to help com- Commerce
panies make better decisions quickly in an ever-
changing online environment where customer data The goal of a commerce Website is to get visi-
is stored in a variety of different departments. The tors to purchase goods or services directly from
first step in this strategy is gathering all customer the site, with success gauged by the amount of
data in a central location and condensing it so revenue the site brings in. According to Peter-
that the result is all actionable data stored in the son, “commerce analysis tools should provide
same place. Once this information is in place, the ‘who, what, when, where, and how’ for your
A Review of Methodologies for Analyzing Websites
Table 6. The four types of Websites and examples of associated KPIs (McFadden, 2005)
Website Type KPIs
Commerce • Conversion rates
• Average order value
• Average visit value
• Customer loyalty
• Bounce rate
Lead Generation • Conversion rates
• Cost per lead
• Bounce rate
• Traffic concentration
Content/Media • Visit depth
• Returning visitor ratio
• New visitor ratio
• Page depth
Support/Self service • Page depth
• Bounce rate
• Customer satisfaction
• Top internal search phrases
online purchasers (2004, p. 92).” In essence, the which to base your conversion rate. For example,
important information for a commerce Website a business may want to filter visitors by exclud-
is who made (or failed to make) a purchase, what ing visits from robots and Web crawlers (Ansari,
was purchased, when purchases were made, where Kohavi, Mason, & Zheng, 2001), or they may
customers are coming from, and how customers want to exclude the traffic that “bounces” from
are making their purchases. The most valuable the Website or (a slightly trickier measurement)
KPIs used to answer these questions are conver- the traffic that is determined not to have intent to
sion rates, average order value, average visit value, purchase anything from the Website (Kaushik,
customer loyalty, and bounce rate (McFadden, 2006).
2005). Other metrics to consider with a commerce It is common for commerce Websites to have
site are which products, categories, and brands are conversion rates around 0.5%, but generally good
sold on the site and internal site product search conversion rates will fall in the 2% range depend-
that could signal navigation confusion or a new ing on how a business structures its conversion
product niche (Peterson, 2004). rate (FoundPages, 2007). Again, the ultimate goal
A conversion rate is the number of users who is to increase total revenue. According to eVision,
perform a specified action divided by the total for each dollar a company spends on improv-
of a certain type of visitor (i.e. repeat visitors, ing this KPI, there is $10 to $100 return (2007).
unique visitors, etc.) over a given period. Types The methods a business uses to improve their
of conversion rates will vary by the needs of the conversion rate (or rates), however, are different
businesses using them, but two common conver- depending on which target action that business
sion rates for commerce Websites are the order chooses to measure.
conversion rate (the percent of total visitors who Average order value (AOV) is a ratio of total
place an order on a Website) and the checkout order revenue to number of orders over a given
conversion rate (the percent of total visitors who period. This number is important because it
begin the checkout process). There are also many allows the analyzer to derive a cost for each
methods for choosing the group of visitors on transaction. There are several ways for a business
A Review of Methodologies for Analyzing Websites
to use this KPI to its advantage. One way is to using visit frequency and transactions, but there
break down the AOV by advertising campaigns are several important factors in this measurement
(i.e. email, keyword, banner ad etc.). This way, including the time between visits (Mason, 2007).
a business can see which campaigns are bring- Customer loyalty can even be measured simply
ing in the best customers and spend more effort with customer satisfaction surveys (SearchCRM,
refining their strategies in those areas (Peterson, 2007). Loyal customers will not only increase
2005). Overall, however, if the cost of making a revenue through purchases but also through
transaction is greater than the amount of money referrals, potentially limiting advertising costs
customers spend for each transaction, the site is (QuestionPro).
not fulfilling its goal. There are two main ways Bounce rate is a measurement of how many
to correct this. The first is to increase the number people arrive at a homepage and leave imme-
of products customers order per transaction, and diately. There are two scenarios that generally
the second is to increase the overall cost of pur- qualify as a bounce. In the first scenario, a visitor
chased products. A good technique for achieving views only one page on the Website. In the second
this is through product promotions (McFadden, scenario, a visitor navigates to a Website but only
2005), but many factors influence how and why stays on the site for five seconds or less (Avinash,
customers purchase what they do on a Website. 2007). This could be due to several factors, but in
These factors are diverse and can range from general, visitors who bounce from a Website are
displaying a certain security image on the site not interested in the content. Like average order
(MarketingSherpa, 2007) to updating the site’s value, this KPI helps show how much quality
internal search (Young, 2007). Like many KPIs, traffic a Website receives. A high bounce rate
improvement ultimately comes from ongoing may be a reflection of unintuitive site design or
research and a small amount of trial and error. misdirected advertising.
Another KPI, average visit value, measures
the total number of visits to the total revenue. Lead Generation
This is a measurement of quality traffic important
to businesses. It is problematic for a commerce The goal for a lead generation Website is to obtain
site when, even though it may have many visi- user contact information in order to inform them
tors, each visit generates only a small amount of of a company’s new products and developments
revenue. In that case, even if the total number and to gather data for market research; these sites
of visits increased, it would not have a profound primarily focus on products or services that cannot
impact on overall profits. This KPI is also useful be purchased directly online. Examples of lead
for evaluating the effectiveness of promotional generation include requesting more information
campaigns. If the average visit value decreases by mail or email, applying online, signing up for
after a specific campaign, it is likely that the a newsletter, registering to download product
advertisement is not attracting quality traffic to information, and gathering referrals for a partner
the site. Another less common factor in this situ- site (Burby, 2004). The most important KPIs for
ation could be broken links or a confusing layout lead generation sites are conversion rates, cost
in a site’s “shopping cart” area. A business can per lead, bounce rate, and traffic concentration
improve the average visit value by using targeted (McFadden, 2005).
advertising and employing a layout that reduces Similar to commerce Website KPIs, a conver-
customer confusion. sion rate is the ratio of total visitors to the amount
Customer loyalty is the ratio of new to existing of visitors who perform a specific action. In the
customers. Many Web analytics tools measure this case of lead generation Websites, the most com-
A Review of Methodologies for Analyzing Websites
mon conversion rate is the ratio of total visitors bounce rate is to increase advertising effective-
to leads generated. The same visitor filtering ness and decrease visitor confusion.
techniques mentioned in the previous section can The final KPI is traffic concentration, or the
be applied to this measurement (i.e. filtering out ratio of the number of visitors to a certain area in
robots and Web crawlers and excluding traffic that a Website to total visitors. This KPI shows which
bounces from the site). This KPI is an essential areas of a site have the most visitor interest. For
tool in analyzing marketing strategies. Average this type of Website, it is ideal to have a high
lead generation sites have conversion rates rang- traffic concentration on the page or pages where
ing from 5-6% and 17-19% conversion rates for users enter their contact information.
exceptionally good sites (Greenfield, 2006). If
the conversion rate of a site increases after the CContent/Media
implementation of a new marketing strategy, it
indicates that the campaign was successful. If it Content/media Websites focus mainly on advertis-
decreases, it indicates that the campaign was not ing, and the main goal of these sites is to increase
effective and might need to be reworked. revenue by keeping visitors on the Website longer
Cost per lead (CPL) refers to the ratio of total and also to keep visitors coming back to the site.
expenses to total number of leads, or how much In order for these types of sites to succeed, site
it costs a company to generate a lead; a more content must be engaging and frequently updated.
targeted measurement of this KPI would be the If content is only part of a company’s Website, the
ratio of total marketing expenses to total number content used in conjunction with other types of
of leads. Like the conversion rate, CPL helps a pages can be used to draw in visitors and provide
business gain insight into the effectiveness of its a way to immerse them with the site. The main
marketing campaigns. A good way to measure the KPIs are visit depth, returning visitors, new visitor
success of this KPI is to make sure that the CPL percentage, and page depth (McFadden, 2005).
for a specific marketing campaign is less than the Visit depth (also referred to as depth of visit
overall CPL (WebSideStory, 2004). Ideally, the or path length) is the measurement of the ratio
CPL should be low, and well-targeted advertising between page views and unique visitors, or how
is usually the best way to achieve this. many pages a visitor accesses each visit. As a
Lead generation bounce rate is the same mea- general rule, visitors with a higher visit depth are
surement as the bounce rate for commerce sites. interacting more with the Website. If visitors are
This KPI is a measurement of visitor retention only viewing a few pages per visit, it means that
based off total number of bounces to total number they are not engaged, and the effectiveness of the
of visitors; a bounce is a visit characterized by a site is low. A way to increase a low average visit
visitor entering the site and immediately leaving. depth is by creating more targeted content that
Lead generation sites differ from commerce sites would be more interesting to the Website’s target
in that they may not require the same level of audience. Another strategy could be increasing
user interaction. For example, a lead generation the site’s interactivity to encourage the users to
site could have a single page where users enter become more involved with the site and keep
their contact information. Even though they only them coming back.
view one page, the visit is still successful if the Unlike the metric of simply counting the
Website is able to collect the user’s information. number of returning visitors on a site, the re-
In these situations, it is best to base the bounce turning visitor KPI is the ratio of unique visitors
rate solely off of time spent on the site. As with to total visits. A factor in customer loyalty, this
commerce sites, the best way to decrease a site’s KPI measures the effectiveness of a Website to
A Review of Methodologies for Analyzing Websites
bring visitors back. A lower ratio for this KPI is be a news page. Information on a news page is
best because a lower number means more repeat constantly updated so that, while the page is still
visitors and more visitors who are interested in always in the same location, the content of that
and trust the content of the Website. If this KPI page is constantly changing. If a Website has
is too low, however, it might signal problems in high page depth in a relatively unimportant part
other areas such as a high bounce rate or even of the site, it may signal visitor confusion with
click fraud. Click fraud occurs when a person navigation in the site or an incorrectly targeted
or script is used to generate visits to a Website advertising campaign.
without having genuine interest in the site. Ac-
cording to a study by Blizzard Internet Marketing, Support/Self Service
the average for returning visitors to a Website is
23.7% (White, 2006). As with many of the other Websites offering support or self-service are in-
KPIs for content/media Websites, the best way terested in helping users find specialized answers
to improve the returning visitor rate is by having for specific problems. The goals for this type of
quality content and encouraging interaction with Website are increasing customer satisfaction and
the Website. decreasing call center costs; it is more cost-effec-
New visitor ratio is the measurement of new tive for a company to have visitors find informa-
visitors to unique visitors and is used to determine tion through its Website than it is to operate a call
if a site is attracting new people. When measur- center. The KPIs of interest are visit length, content
ing this KPI, the age of the Website plays a role depth, and bounce rate. In addition, other areas to
– newer sites will want to attract new people. Simi- examine are customer satisfaction metrics and top
larly, another factor to consider is if the Website internal search phrases (McFadden, 2005).
is concerned more about customer retention or Page depth for support/self service sites is the
gaining new customers. As a rule, however, the same measurement as page depth content/media
new visitor ratio should decrease over time as the sites – the ratio of page views to unique visitors.
returning visitor ratio increases. New visitors can With support/self service sites, however, high page
be brought to the Website in a variety of different depth is not always a good sign. For example, a
ways, so a good way to increase this KPI is to visitor viewing the same page multiple times may
try different marketing strategies and figure out show that the visitor is having trouble finding
which campaigns bring the most (and the best) helpful information on the Website or even that
traffic to the site. the information the visitor is looking for does
The final KPI for content/media sites is page not exist on the site. The goal of these types of
depth. This is the ratio of page views for a spe- sites is to help customers find what they need as
cific page and the number of unique visitors to quickly as possible and with the least amount of
that page. This KPI is similar to visit depth, but navigation through the site (CCMedia, 2007).
its measurements focus more on page popular- The best way to keep page depth low is to keep
ity. Average page depth can be used to measure visitor confusion low.
interest in specific areas of a Website over time As with the bounce rate of other Website types,
and to make sure that the interests of the visitors the bounce rate for support/self service sites re-
match the goals of the Website. If one particular flects ease of use, advertising effectiveness, and
page on a Website has a high page depth, it is an visitor interest. A low bounce rate means that qual-
indication that that page is of particular interest ity traffic is coming to the Website and deciding
to visitors. An example of a page in a Website that the site’s information is potentially useful.
expected to have a higher page depth would
A Review of Methodologies for Analyzing Websites
Poor advertisement campaigns and poor Website Regardless of Website type, the KPIs listed
layout will increase a site’s bounce rate. above are not the only KPIs that can prove use-
Customer satisfaction deals with how the us- ful in analyzing a site’s traffic, but they provide a
ers rate their experience on a site and is usually good starting point. The main thing to remember
collected directly from the visitors (not from log is that no matter what KPIs a company chooses,
files), either through online surveys or through they must be aligned with its business goals, and
satisfaction ratings. Although it is not a KPI in more KPIs do not necessarily mean better analysis
the traditional sense, gathering data directly – quality is more important than quantity.
from visitors to a Website is a valuable tool for
figuring out exactly what visitors want. Customer
satisfaction measurements can deal with customer KEY BEST PRACTICES
ratings, concern reports, corrective actions, re-
sponse time, and product delivery. Using these In this chapter, we have addressed which metrics
numbers, one can compare the online experience can be gathered from a Website, how to gather
of the Website’s customers to the industry average them, and how to determine which information
and make improvements according to visitors’ is important. But how can this help improve a
expressed needs. business? To answer this, the Web Analytics
Similarly, top internal search phrases applies Association provides nine key best practices to
only to sites with internal search, but it can be follow when analyzing a Website (McFadden,
used to measure what information customers are 2005). Figure 1 outlines this process.
most interested in which can lead to improvement
in site navigation. This information can be used to Identify Key Stakeholders
direct support resources to the areas generating the
most user interest, as well as identify which parts A stakeholder is anyone who holds an interest
of the Website users may have trouble accessing. in a Website. This includes management, site
In addition, if many visitors are searching for a developers, visitors, and anyone else who cre-
product not supported on the Website, it could be ates, maintains, uses, or is affected by the site.
a sign of ineffective marketing. In order for the Website to be truly beneficial, it
A Review of Methodologies for Analyzing Websites
must integrate input from all major stakehold- brings in the most revenue. Defining the different
ers. Involving people from different parts of the levels of customers will allow one to consider the
company also makes it more likely that they will goals of these visitors. What improvements can
embrace the Website as a valuable tool. be made to the Website in order to improve their
browsing experiences?
Define Primary Goals for Your
Website Determine the Key Performance
To know the primary goals of a Website, one
must first understand the primary goals of its The next step is picking the metrics that will be
key stakeholders. This could include such goals most beneficial in improving the site and eliminat-
as increasing revenue, cutting expenses, and ing the ones that will provide little or no insight
increasing customer loyalty (McFadden, 2005). into its goals. One can then use these metrics to
Once those goals have been defined, discuss determine which KPI you wish to monitor. As
each goal and prioritize them in terms of how the mentioned in the previous section, the Website
Website can most benefit the company. As always, type – commerce, lead generation, media/con-
beware of political conflict between stakeholders tent, or support/self service – plays a key role
and their individual goals as well as assumptions in which KPIs are most effective for analyzing
they may have made while determining their site traffic.
goals that may not necessarily be true. By going
through this process, a company can make sure Identify and Implement the Right
that goals do not conflict and that stakeholders Solution
are kept happy.
This step deals with finding the right Web analytics
Identify the Most Important Site technology to meet the business’s specific needs.
Visitors After the KPIs have been defined, this step should
be easy. The most important things to consider
According to Sterne, corporate executives are the budget, software flexibility and ease of
categorize their visitors differently in terms of use, and how well the technology will work with
importance. Most companies classify their most the needed metrics. McFadden suggests that it is
important visitors as ones who either visit the site also a good idea to run a pilot test of the top two
regularly, stay the longest on the site, view the vendor choices (McFadden, 2005). We will expand
most pages, purchase the most goods or services, on this topic further in the next section.
purchase goods most frequently, or spend the
most money (Sterne, n. d.). There are three types Use Multiple Technologies and
of customers – (1) customers a company wants Methods
to keep who have a high current value and high
future potential, (2) customers a company wants Web analytics is not the only method available for
to grow who can either have a high current value improving a Website. To achieve a more holistic
and low future potential or low current value and view of a site’s visitors, one can also use tools
high future potential, and (3) customers a company such as focus groups, online surveys, usability
wants to eliminate who have a low current value studies, and customer services contact analysis
and low future potential. The most important visi- (McFadden, 2005).
tor to a Website, however, is the one who ultimately
A Review of Methodologies for Analyzing Websites
A Review of Methodologies for Analyzing Websites
e. Cost of additional hardware you may future functionality, it will also show how
need. much they know about their competitors.
f. Administration costs (which includes 10. Why did you lose your last two clients? Who
the cost of an analyst and any additional are they using now? The benefits of this
employees you may need to hire). question are obvious -- by knowing how
5. What kind of support do you offer? Many they lost prior business, the business can be
vendors advertise free support, but it is im- confident that it has made the right choice.
portant to be aware of any limits that could
incur additional costs. It is also important Some examples of free and commercially
to note how extensive their support is and available analytics tools are discussed below.
how willing they are to help.
6. What features do you provide that will al- Free Tools
low me to segment my data? Segmentation
allows companies to manipulate their data. One of the most popular free analytics tools on the
Look for the vendor’s ability to segment Web now is Google Analytics (previously Urchin).
your data after it is recorded. Many vendors Google Analytics (
use JavaScript tags on each page to segment lytics/) uses page tagging to collect information
the data as it is captured, meaning that the from visitors to a site. In addition to expanding
company has to know exactly what it wants on the already highly regarded Urchin analytics
from the data before having the data itself; tool, it also provides support for integrating other
this approach is less flexible. analytic information (for example, WordPress
7. What options do I have to export data into and AdWords). Google Analytics reports many
our system? It is important to know who of the KPIs discussed in the previous sections
ultimately owns and stores the data and including depth of visit, returning visitors, and
whether it is possible to obtain both raw page depth.
and processed data. Most vendors will not There is, however, concern about privacy is-
provide companies with the data exactly as sues regarding Google Analytics because Google
they need it, but it is a good idea to realize uses their default privacy policy for their analyt-
what kind of data is available before a final ics tools, but the company assures its Google
decision is made. Analytics users that only account owners and
8. Which features do you provide for inte- people to whom the owners give permission will
grating data from other sources into your have access to the data (Dodoo, 2006). Microsoft
tool? This question deals with the previous also provides a free Web analytic software called
section’s Key Best Practice #6: Use Multiple Gatineau (Thomas, 2007).
Technologies and Methods. If a company
has other data it wants to bring to the tool Paid Tools
(such as survey data or data from your ad
agency), bring them up to the potential InfoWorld provides an in-depth analysis compar-
analytics vendor and see if it is possible to ing the top four Web analytic companies – Co-
integrate this information into their tool. remetrics, WebTrends, Omniture, and WebSide-
9. What new features are you developing that Story HBX (Heck, 2005). They created a scoring
would keep you ahead of your competition? chart and measured each vendor on reporting,
Not only will the answer to this question administration, performance, ease-of-use, sup-
tell how much the vendor has thought about port, and value. Coremetrics received a score of
A Review of Methodologies for Analyzing Websites
8.3 with its highest ratings in administration and taking measures to improve their profits based off
support. It is a hosted service that offers special these numbers. Regardless of business size and
configurations for financial, retail, and travel objective, an effective Web analytics strategy is
services. WebTrends also earned a score of 8.3 becoming increasingly essential.
with its highest rating in reporting. This tool is
expensive, but it offers a wide range of perfor-
mance statistics and both client and server hosting. REFERENCES
Omniture is next in line with a score of 8.4 with
its highest ratings in reporting and support. It is an Aldrich, S. E. (2006, May 2). The Other Search:
ASP reporting application that excels in providing Making the Most of Site Search to Optimize the
relevant reports. WebSideStory had the highest Total Customer Experience. Patricia Seybold
score of 8.7 with its highest ratings in reporting, Group. Retrieved March 7, 2007, from WebSide-
administration, ease-of-use, and support. This Story database.
tool is easy to use and is appropriate for many
Ansari, S., Kohavi, R., Mason, L., & Zheng, Z.
different types of businesses.
(2001). Integrating E-Commerce and Data Min-
ing: Architecture and Challenges. IEEE Interna-
tional Conference on Data Mining.
Avinash, A. (2007, June 26). Bounce Rate: Sexiest
The first step in analyzing your Website and Web- Web Metric Ever? Retrieved December 2, 2007,
site visitors is understanding and analyzing your from
business goals and then using that information to bounce_rate_sexiest_web_metric.html.
carefully choose your metrics. In order to take
Becher, J. D. (2005, March). Why Metrics-Centric
full advantage of the information gathered from
Performance Management Solutions Fall Short.
your site’s visitors, you must consider alternative
DM Review Magazine. Retrieved March 7, 2007,
methods such as focus groups and online surveys,
make site improvements gradually, hire a full-time
analyst, and realize that your site’s improvement is
a process and not a one-time activity. Using these Belkin, M. (2006, April 8). 15 Reasons why all
key best practices and choosing the right analytics Unique Visitors are not created equal. Retrieved
vendor to fit your business will save your company March 7, 2007, from
money and ultimately increase revenue. blog/node/16.
As Web analytics continues to mature, the
Burby, J. (2004, July 20). Build a Solid Foun-
methods vendors use to collect information are
dation With Key Performance Indicators, Part
becoming more refined. One article speculates
1: Lead-Generation Sites. Retrieved March 7,
that companies will find concrete answers to
2007, from
the problems with cookies and unique visitors
(Eisenberg, 2005). The Web analytics industry as
a whole is also expanding. According to Eisenberg Burby, J. & Brown, A. (2007). Web Analytics
(2005), a recent Jupiter report predicts an increase Definitions. Retrieved October 30, 2007, from
in the Web analytics industry – 20 percent an-
nually, reaching $931 million in 2009. More and ments/committees/5/WAA-Standards-Analytics-
more businesses are realizing the benefits of Definitions-Volume-I-20070816.pdf.
critically analyzing their Website traffic and are
A Review of Methodologies for Analyzing Websites
A Review of Methodologies for Analyzing Websites
9040-018358f2c0c01033.mspx?mfr=true contentmanagers/336/1%20Path%20AnAnalys.
Permadi, F. (2005, June 19). Introduction to Flash
Local Shared-Object. Retrieved March 7, 2007, WebSideStory. (2004). Use of Key Performance
from Indicators in Web Analytics. Retrieved December
SharedObject/index.html 2, 2007, from
Peterson, E. T. (2004). Web Analytics Demystified.
Celilo Group Media. White, K. (2006, May 10). Unique vs. Returning
Visitors Analyzed. Retrieved March 7, 2007, from
Peterson, E. T. (2005, July 31). Average Order
Value. Retrieved November 3, 2007, from Web
Analytics Demystified Blog Website: http://blog.
average-order-value.html Young, D. (2007, August 15). Site Search: Increases
Conversion Rates, Average Order Value And Loy-
QuestionPro. Measuring Customer Loyalty and
alty. Practical Ecommerce, Retrieved November
Customer Satisfaction. Retrieved November 21,
15, 2007, from http://www.practicalecommerce.
2007, from
Sapir, D. (2004, August). Online Analytics and
Business Performance Management. BI Report.
Retrieved March 7, 2007, from http://www. KEY TERMS
cfm?articleId=1008820 Abandonment Rate: KPI that measures the
percentage of visitors who got to that point on
SearchCRM. (2007, May 9). Measuring Cus-
the site but decided not to perform the target
tomer Loyalty. Retrieved November 4, 2007,
0,295582,sid11_gci1253794,00.html Alignment-Centric Performance Man-
agement: Method of defining a site’s business
Sen, A., Dacin, P. A., & Pattichis, C. (2006, No-
goals by choosing only a few key performance
vember). Current trends in Web data analysis.
Communications of the ACM, 49(11), 85 - 91.
Average Order Value: KPI that measures the
Sterne, J. 10 Steps to Measuring Website Suc-
total revenue to the total number of orders.
cess. Retrieved March 7, 2007, from http://www. Average Time on Site (ATOS): See visit
&source=/4/sterne13.asp length.
Thomas, I. (2007, January 9). The rumors are Checkout Conversion Rate: KPI that mea-
true: Microsoft ‘Gatineau’ exists. Retrieved sures the percent of total visitors who begin the
March 7, 2007, from http://www.liesdamnedlies. checkout process.
Commerce Website: A type of Website where
Web Analytics Association. Onsite Behavior - Path the goal is to get visitors to purchase goods or
Analysis. Retrieved March 7, 2007, from http:// services directly from the site.
A Review of Methodologies for Analyzing Websites
Committed Visitor Index: KPI that measures time of the request, request page, referrer, and
the percentage of visitors that view more than information on the visitor’s Web browser and
one page or spend more than 1 minute on a site operating system.
(these measurements should be adjusted accord-
Log File Analysis: Method of gathering met-
ing to site type).
rics that uses information gathered from a log file
Content/Media Website: A type of Website to gather Website statistics.
focused on advertising.
Metrics: Statistical data collected from a
Conversion Rate: KPI that measures the per- Website such as number of unique visitors, most
centage of total visitors to a Website that perform popular pages, etc.
a specific action.
New Visitor: A user who is accessing a Website
Cost Per Lead (CPL): KPI that measures the for the first time.
ratio of marketing expenses to total leads and
New Visitor Percentage: KPI that measures
shows how much it costs a company to generate
the ratio of new visitors to unique visitors.
a lead.
Online Business Performance Management
Customer Satisfaction Metrics: KPI that
(OBPM): Method of defining a site’s business
measures how the users rate their experience
goals that emphasizes the integration of busi-
on a site.
ness tools and Web analytics to make better
Customer Loyalty: KPI that measures the decisions quickly in an ever-changing online
ratio of new to existing customers. environment.
Demographics and System Statistics: A Order Conversion Rate: KPI that measures
metric that measures the physical location and the percent of total visitors who place an order
information of the system used to access the on a Website.
Page Depth: KPI that measures the ratio of
Depth of Visit: KPI that measures the ratio page views for a specific page and the number of
between page views and visitors. unique visitors to that page.
Internal Search: A metric that measures in- Page Tagging: Method of gathering metrics
formation on keywords and results pages viewed that uses an invisible image to detect when a
using a search engine embedded in the Website. page has been successfully loaded and then uses
JavaScript to send information about the page and
Key Performance Indicator (KPI): A com-
the visitor back to a remote server.
bination of metrics tied to a business strategy.
Prospect Rate: KPI that measures the per-
Lead Generation Website: A type of Website
centage of visitors who get to the point in a site
that is used to obtain user contact information in
where they can perform the target action (even
order to inform them of a company’s new products
if they do not actually complete it).
and developments, and to gather data for market
research. Referrers and Keyword Analysis: A metric
that measures which sites have directed traffic
Log File: Log kept by a Web server of informa-
to the Website and which keywords visitors are
tion about requests made to the Website including
using to find the Website.
(but not limited to) visitor IP address, date and
A Review of Methodologies for Analyzing Websites
Repeat Visitor: A user who has been to a Traffic Concentration: KPI that measures
Website before and is now returning. the ratio of number of visitors to a certain area
in a Website to total visitors.
Returning Visitor: KPI that measures the
ratio of unique visitors to total visits. Unique Visit: One visit to a Website (regard-
less of if the user has previously visited the site);
Search Engine Referrals: KPI that measures
an alternative to unique visitors.
the ratio of referrals to a site from specific search
engines compared to the industry average. Unique Visitor: A specific user who accesses
a Website.
Single Access Ratio: KPI that measures the
ratio of total single access pages (or pages where Visit Length: A metric that measures total
the visitor enters the site and exits immediately amount of time a visitor spends on the Website.
from the same page) to total entry pages.
Visit Value: KPI that measures the total num-
Stickiness: KPI that measures how many ber of visits to total revenue.
people arrive at a homepage and proceed to tra-
Visitor Path: A metric that measures the route
verse the rest of the site.
a visitor uses to navigate through the Website.
Support/Self Service Website: A type of
Visitor Type: A metric that measures users
Website that focuses on helping users find special-
who access a Website. Each user who visits the
ized answers for their particular problems.
Website is a unique user. If it is a user’s first time
Top Pages: A metric that measures the pages to the Website, that visitor is a new visitor, and
in a Website that receive the most traffic. if it is not the user’s first time, that visitor is a
repeat visitor.
Total Bounce Rate: KPI that measures the
percentage of visitors who scan the site and then Web Analytics: The measurement of visitor
leave. behavior on a Website.
Themes for the Final Projects in the course of Web Analytics (2019-20)
Dr. Preeti Khanna
Focus of the project will be to provide feedback to students on How the business competiveness
increases by use of technology and also on how complex situations in industry can have multiple
perspectives. Critical thinking will be an important criterion for evaluation.
1. Up to 20-25 minutes per group followed by Q/A starting from 9th and 10th session (13th
August and 20th August)
2. Submissions: Softcopy of PPT and Report both at the day of PPT.
3. All members of the group have to be present during the presentation.
4. Students are expected to meet the faculty member after 4 th session to discuss the
progress and also get interim feedback.
Some Themes which you can select for your Few References for respective projects
1 Using Google Analytics to evaluate the usability /bitstream
of e-commerce sites /2134/5685 /1/ Using%20Google% 20 Analytics%
To improve the team performance 20to%20Evaluate %20the%20Usability%20ofE-
To increase the sales to events commerce%20Sites.pdf
To optimize their Digital Marketing
2 A Study of Web Mining Application on E-
Commerce using Google Analytics Tool 5f66e21a986b04c381120add6a091ba9538e.pdf